Production-Ready Kubernetes Workloads: The Day-2 Readiness Checklist

Getting a container to run on Kubernetes is the easy part. A kubectl apply and a kubectl get pods showing Running looks like success — and in a demo it is. But the gap between running and production-ready is where most teams quietly accumulate outages: the pod that takes traffic before its database connection is open, the workload with no memory limit that gets OOMKilled at 3 a.m., the Deployment that drops requests during every rollout, the StatefulSet that all lands on one node and disappears when that node is drained for patching.

This lesson is the Day-2 readiness checklist — the set of properties a workload needs before it carries real traffic, and the reasoning behind each one so you can defend your choices in a design review or an interview. We will work through the controls that separate a demo from production: probes (liveness, readiness, startup), resource requests and limits with the QoS classes they produce, PodDisruptionBudgets, topology spread constraints and anti-affinity, the HorizontalPodAutoscaler, graceful shutdown, the rolling-update strategy, ConfigMap and Secret hygiene, securityContext and Pod Security, NetworkPolicy, and observability. We finish with a copy-paste checklist you can put in a pull-request template and a single hardened Deployment manifest that wires almost all of it together.

The voice here is deliberately practical. Every setting below has cost you can feel in production if you get it wrong, and an interviewer will ask you why, not just what.

Learning objectives

By the end of this lesson you can:

Distinguish liveness, readiness and startup probes, choose the right probe type and handler, and explain the failure mode each one prevents.
Set resource requests and limits deliberately, predict the resulting QoS class (Guaranteed, Burstable, BestEffort), and explain how QoS drives eviction order.
Protect availability during voluntary disruptions with a PodDisruptionBudget, and spread replicas across failure domains with topology spread constraints and pod anti-affinity.
Configure a safe rolling update (maxSurge/maxUnavailable) and implement graceful shutdown with preStop hooks and terminationGracePeriodSeconds.
Add a HorizontalPodAutoscaler, manage configuration with ConfigMaps and Secrets, and harden the pod with securityContext, Pod Security Admission and NetworkPolicy.
Apply a single production-readiness checklist to any workload and read a hardened Deployment manifest line by line.

Prerequisites & where this fits

You need to be comfortable with the core workload objects — Pods, ReplicaSets, Deployments and Services — and able to run kubectl apply, kubectl get, kubectl describe and kubectl logs. If those are not yet second nature, work through Pods, ReplicaSets, Deployments & Services: The Core Objects and Your First Cluster: kubectl and a Real Deploy first. You will need a cluster for the lab; a free local one from kind, minikube or k3d is enough.

This is the production-readiness checkpoint of the Kubernetes Zero-to-Hero course. It sits after the fundamentals and before you provision and operate your own clusters in Provisioning Production Kubernetes: kubeadm, HA Control Plane, etcd Backup & Upgrades. Everything here is squarely in the CKAD wheelhouse (designing resilient application deployments) and overlaps heavily with CKA (workload operations).

Core concepts: what “production-ready” actually means

Kubernetes is a declarative reconciliation engine: you describe the desired state and controllers work continuously to make actual state match. “Production-ready” means you have given those controllers enough information to make good decisions on your behalf — and protected the workload against the four things that routinely break it:

Threat to availability	What it looks like	The control that addresses it
Bad rollouts	A new image crashes or serves errors, but the old version is already gone	Readiness probes + rolling-update strategy + (later) progressive delivery
Resource contention	A noisy neighbour starves your pod of CPU/memory; OOMKills	Requests, limits, QoS classes
Voluntary disruptions	A node drain (upgrade, autoscaler) takes down too many replicas at once	PodDisruptionBudget + multiple replicas
Involuntary disruptions	A node, rack or zone fails	Topology spread / anti-affinity across failure domains

Two distinctions underpin the whole lesson. The first is voluntary vs involuntary disruption. Involuntary disruptions are things you do not initiate — a kernel panic, a hardware failure, a node running out of memory. Voluntary disruptions are deliberate operator actions: draining a node to patch it, scaling down a node pool, deleting a pod. You cannot prevent involuntary disruptions, only spread your blast radius; you can rate-limit voluntary disruptions with a PodDisruptionBudget. The second is desired vs actual state — the readiness signal you expose is how a pod tells Kubernetes “actual is not ready yet, do not send me traffic,” and almost every control below is ultimately about making that signal accurate.

Health probes: liveness, readiness and startup

Kubernetes cannot read your application’s mind. It knows a container’s process is alive, but not whether the app inside is healthy or ready to serve. Probes are how you tell it.

Probe	Question it answers	On failure	Use it for
Liveness	“Is this container wedged and beyond recovery?”	The container is restarted (per `restartPolicy`)	Deadlocks, stuck event loops — states a restart fixes
Readiness	“Should this pod receive traffic right now?”	The pod is removed from Service endpoints (not restarted)	Warm-up, lost dependency, overload, draining
Startup	“Has this slow-starting app finished booting yet?”	The container is restarted; gates liveness/readiness until it passes	Legacy/JVM apps with long, variable startup

Three rules save you from the classic self-inflicted outages:

Readiness is the one that protects users. It controls whether the pod is in the Service’s endpoint list. A readiness probe that also checks a critical downstream dependency lets a pod gracefully stop taking traffic when that dependency is gone — but be careful: if every replica checks a shared dependency and that dependency blips, you can take the entire Service out of rotation at once. Probe what this pod needs to serve, not the health of the whole world.
Liveness must be cheap and local. If your liveness probe calls the database and the database is slow, Kubernetes will conclude the container is dead and restart it — turning a dependency hiccup into a restart storm that makes recovery harder. Liveness should answer “is this process wedged,” nothing more.
Startup probes exist so the other two do not have to compensate. Without a startup probe, a slow app forces you to set a long initialDelaySeconds on liveness, which then makes liveness slow to detect real hangs for the container’s whole life. A startup probe gives the app a generous boot budget once, then hands over to a tight liveness probe.

Probe handlers come in four flavours: httpGet (a 2xx/3xx response means pass — the most common for web services), tcpSocket (the port accepts a connection — fine for non-HTTP servers), exec (a command exits 0 — flexible but the most expensive, as it forks a process each run), and grpc (native gRPC health checking, stable since v1.27). The tunables are the same for all three lifecycle probes:

Field	Meaning	Sensible default
`initialDelaySeconds`	Wait before the first probe	Prefer a startup probe over a large value here
`periodSeconds`	How often to probe	`10` (readiness can be tighter, e.g. `5`)
`timeoutSeconds`	How long to wait for a response	`1`–`2` (the default `1` is often too tight for HTTP)
`failureThreshold`	Consecutive failures before acting	`3`
`successThreshold`	Consecutive successes to recover	`1` (must be `1` for liveness/startup)

A startup probe’s total budget is failureThreshold × periodSeconds — set that to comfortably exceed your worst-case boot time. Expose a lightweight /healthz (liveness) and a /readyz (readiness) in your app rather than reusing one endpoint for both; they answer different questions.

Resource requests, limits and QoS classes

Requests and limits are the single most consequential — and most often skipped — production setting.

A request is what the scheduler reserves for the pod. It is the basis for bin-packing: the scheduler only places a pod on a node that has the requested CPU and memory free. Requests are also what the HorizontalPodAutoscaler measures utilisation against.
A limit is the hard ceiling the kubelet/runtime enforces. The two resources behave very differently at the limit:

Resource	Over the limit, what happens	Implication
CPU	The container is throttled (CFS quota) — slowed, never killed	Tail-latency spikes; the pod survives
Memory	The container is OOMKilled when it exceeds its limit	The container dies and restarts

Because CPU throttles but memory kills, the standard guidance is: always set memory requests and limits equal for predictable workloads, set a CPU request, and be cautious with CPU limits — aggressive CPU limits cause throttling that hurts latency without any safety benefit. Many mature platforms set CPU requests but omit CPU limits for latency-sensitive services, relying on requests for fair scheduling.

The combination of requests and limits determines the pod’s Quality of Service (QoS) class, which decides eviction order when a node runs out of memory (the kubelet evicts to reclaim resources):

QoS class	Condition	Eviction order under node pressure
Guaranteed	Every container has requests equal to limits for both CPU and memory	Evicted last — most protected
Burstable	At least one container has a request or limit, but not Guaranteed	Evicted after BestEffort, ordered by usage above requests
BestEffort	No requests or limits set anywhere	Evicted first — never run critical workloads this way

For production: give every container at least requests, and target Guaranteed for anything stateful or latency-critical. A BestEffort pod is a pod the kubelet will sacrifice without hesitation — acceptable only for throwaway batch work. You can constrain a namespace with a LimitRange (defaults and min/max per pod) and cap total consumption with a ResourceQuota; both are how platform teams stop a single team’s workloads from starving a shared cluster.

PodDisruptionBudgets: surviving voluntary disruption

A PodDisruptionBudget (PDB) caps how many of a workload’s pods can be voluntarily disrupted at once. It does not stop a node failing — it stops kubectl drain (and the cluster autoscaler, and node-pool upgrades) from evicting too many replicas simultaneously.

You express it one of two ways, never both:

Field	Meaning	Example
`minAvailable`	Minimum pods that must stay up during disruption	`2` or `50%`
`maxUnavailable`	Maximum pods that may be down during disruption	`1` or `25%`

A PDB only has teeth if you run more than one replica. minAvailable: 1 on a single-replica Deployment means the drain blocks forever and you cannot patch the node — a common foot-gun. For a 3-replica web service, maxUnavailable: 1 (or minAvailable: 2) lets node maintenance proceed one pod at a time while keeping a quorum serving. Percentages are evaluated against the number of pods at disruption time and round in your favour for minAvailable.

Spreading replicas: topology spread and anti-affinity

Three replicas mean nothing if all three land on the same node and that node is drained. You need them spread across failure domains — nodes, then availability zones.

Topology spread constraints are the modern, preferred tool. They tell the scheduler to keep pods evenly distributed across a topology key:

Field	What it controls
`topologyKey`	The domain to spread across — `kubernetes.io/hostname` (node) or `topology.kubernetes.io/zone` (zone)
`maxSkew`	The maximum allowed difference in pod count between the most and least populated domains
`whenUnsatisfiable`	`DoNotSchedule` (hard — pod stays Pending if it would breach skew) or `ScheduleAnyway` (soft — best effort)
`labelSelector`	Which pods are counted when computing the spread

A typical production pattern spreads across zones softly (ScheduleAnyway) and across nodes more firmly, so a pod never piles two replicas on one node when another is free. Pod anti-affinity is the older mechanism that achieves similar goals (preferredDuringScheduling... keeps replicas apart on a best-effort basis); prefer topology spread constraints for new work — they are cheaper for the scheduler and express intent more directly. Use the hard variant (DoNotSchedule / requiredDuringScheduling) only when you genuinely prefer a Pending pod to a co-located one.

Rolling updates and graceful shutdown

A Deployment’s default update strategy is RollingUpdate, governed by two knobs that, combined with readiness probes, give you zero-downtime deploys:

Field	Meaning	Effect
`maxSurge`	Extra pods allowed above the desired count during a rollout	Higher = faster rollout, more peak capacity used
`maxUnavailable`	Pods allowed to be unavailable during a rollout	`0` = never drop below desired count (safest); requires headroom

The safest production setting for an even-numbered, capacity-constrained service is maxUnavailable: 0 with maxSurge: 1 — a new pod must become Ready before an old one is removed, so capacity never dips. This only works if your readiness probe is honest: if it reports ready before the app can serve, the rollout will happily replace healthy pods with broken ones. The other strategy, Recreate, kills all old pods before creating new ones (a downtime window) — use it only when two versions cannot coexist, e.g. an exclusive lock or an incompatible schema.

Graceful shutdown is the other half of zero-downtime. When a pod is deleted (a rollout, a scale-down, a drain), Kubernetes does this, in parallel:

The pod is marked Terminating and removed from Service endpoints (it stops being a traffic target).
The preStop hook runs (if defined).
SIGTERM is sent to PID 1 in each container.
After terminationGracePeriodSeconds (default 30), any remaining processes get SIGKILL.

The subtle race: endpoint removal propagates asynchronously through kube-proxy and ingress controllers, so for a brief moment a Terminating pod may still receive new connections. The standard fix is a preStop sleep (sleep 5–15) that delays SIGTERM long enough for the endpoint removal to propagate, then a graceful in-app handler that drains in-flight requests before exiting. Set terminationGracePeriodSeconds longer than your longest in-flight request plus the preStop sleep. Your app must trap SIGTERM and exit cleanly — if it ignores SIGTERM (common when the process is wrapped in a shell), every shutdown becomes a hard 30-second kill that drops requests.

Configuration and secrets

Hard-coding configuration into an image is the anti-pattern; externalise it:

Mechanism	For	Inject as	Notes
ConfigMap	Non-sensitive config (flags, URLs, files)	Env vars or mounted files	Changing it does not restart pods — roll the Deployment or use a config-reloader
Secret	Sensitive data (tokens, passwords, keys)	Env vars or mounted files	Base64-encoded not encrypted by default; mount as files, not env, where possible

Two production rules: prefer mounting ConfigMaps/Secrets as files over environment variables (mounted files can update live without a restart and do not leak into kubectl describe or crash dumps), and enable encryption at rest for Secrets in etcd (or use an external store via the Secrets Store CSI driver). To force a rollout when config changes, hash the config into a pod-template annotation (e.g. a checksum/config annotation) so the Deployment’s pod template changes and triggers a rolling update.

securityContext and Pod Security

A hardened pod runs as an unprivileged user, with a read-only root filesystem, no extra Linux capabilities, and no privilege escalation. The fields live at pod and container level:

Field	Set to	Why
`runAsNonRoot: true`	always	Refuses to start a container running as UID 0
`runAsUser` / `runAsGroup`	a high non-zero UID (e.g. `10001`)	Drops root explicitly
`allowPrivilegeEscalation: false`	always	Blocks `setuid`/`setgid` gaining more privilege than the parent
`readOnlyRootFilesystem: true`	where feasible	Immutable container FS; mount `emptyDir` for writable paths
`capabilities.drop: ["ALL"]`	always	Start from zero Linux capabilities, add back only what is needed
`seccompProfile.type: RuntimeDefault`	always	Restricts the syscalls the container can make

These are enforced cluster-side by Pod Security Admission (PSA), the built-in replacement for the removed PodSecurityPolicy. PSA applies one of three Pod Security Standards per namespace via labels:

Standard	What it allows	Use for
privileged	Unrestricted	System/infra namespaces only
baseline	Blocks known privilege escalations	A sane minimum for most apps
restricted	Enforces the hardening above (non-root, drop ALL, seccomp, etc.)	The target for production workloads

You set it with namespace labels — pod-security.kubernetes.io/enforce: restricted (plus warn and audit variants to surface violations without blocking during migration). Aim every production namespace at restricted and make the workload comply, rather than weakening the namespace to fit a lax workload.

NetworkPolicy: default-deny networking

By default, every pod can talk to every other pod in the cluster — a flat network with no segmentation. A NetworkPolicy restricts ingress and egress at the pod level (enforced by your CNI — Calico, Cilium, etc.; note that some CNIs do not enforce NetworkPolicy at all, so verify yours does).

The production baseline is default-deny, then allow what is needed: apply a policy that selects all pods in a namespace and denies all ingress (and ideally egress), then add narrow allow-policies for the specific flows your app needs — e.g. “allow ingress to the API on port 8080 from pods labelled role=frontend,” and “allow egress to the database namespace on 5432 and to kube-dns on 53.” This turns a single compromised pod from a cluster-wide pivot point into a contained incident. Remember to allow DNS egress (UDP/TCP 53 to kube-system) or name resolution breaks in subtle ways.

Observability: metrics, logs and traces

You cannot operate what you cannot see. Production-ready means the three pillars are wired in from day one, not bolted on after the first incident:

Pillar	What it gives you	Common stack
Metrics	Aggregate health, alerting, autoscaling signals	Prometheus + Grafana; expose `/metrics`, set `prometheus.io/scrape` or a `ServiceMonitor`
Logs	Per-request detail, debugging	Write structured logs to stdout/stderr; collect with Fluent Bit/Loki/ELK
Traces	Latency across service hops	OpenTelemetry → Tempo/Jaeger

Three minimums: log to stdout/stderr (never to a file inside the container — the platform collects stdout), emit structured (JSON) logs so they are queryable, and expose application metrics including the RED signals (Rate, Errors, Duration) so you can define SLOs and drive the HPA on a meaningful signal. Wire metrics to your readiness/SLO story so alerts fire on user-visible symptoms, not just on pod restarts.

Autoscaling: the HorizontalPodAutoscaler

The HorizontalPodAutoscaler (HPA) adds and removes pod replicas to track a target metric — most commonly CPU utilisation as a percentage of the pod’s CPU request (which is exactly why requests are non-negotiable: with no request, the HPA has nothing to compute a percentage against). It needs the metrics-server installed.

Key knobs: minReplicas/maxReplicas (the bounds), the target (e.g. averageUtilization: 70), and behavior (scale-up/down stabilisation windows and rate limits, to damp flapping). For metrics beyond CPU/memory — queue depth, requests-per-second, external signals — you graduate to KEDA, covered in Kubernetes Autoscaling: HPA, KEDA & Karpenter. Pair the HPA with a PDB and topology spread so scaling events keep replicas well distributed and respect disruption limits.

Kubernetes production-readiness checklist

The diagram groups every control above into the four readiness pillars — health & lifecycle, resources & scaling, resilience & disruption, and security & networking — so you can see at a glance which knob defends against which failure mode.

The copy-paste production-readiness checklist

Paste this into your pull-request template or a READINESS.md and tick each box before a workload carries real traffic.

PRODUCTION-READINESS CHECKLIST  (tick every box before go-live)

HEALTH & LIFECYCLE
[ ] Readiness probe defined; reflects "can serve traffic now" (warm-up + critical deps)
[ ] Liveness probe defined; cheap, local, no external dependency calls
[ ] Startup probe for slow-starting apps (so liveness can stay tight)
[ ] App traps SIGTERM and drains in-flight work before exit
[ ] preStop hook (sleep 5-15s) to cover async endpoint removal
[ ] terminationGracePeriodSeconds > preStop sleep + longest in-flight request

RESOURCES & SCALING
[ ] CPU + memory requests set on every container
[ ] Memory limit == memory request (predictable; avoid OOM surprises)
[ ] QoS class is Guaranteed or Burstable (never BestEffort for prod)
[ ] HPA configured with min/max and a meaningful target (CPU% or custom)
[ ] metrics-server (and Prometheus adapter / KEDA if custom metrics) installed
[ ] Namespace LimitRange + ResourceQuota in place (shared clusters)

RESILIENCE & DISRUPTION
[ ] replicas >= 2 (>=3 for quorum/HA services)
[ ] PodDisruptionBudget set (maxUnavailable or minAvailable) and not blocking drains
[ ] Topology spread across nodes (and zones) configured
[ ] RollingUpdate: maxUnavailable: 0 / maxSurge: 1 (capacity never dips), or justified
[ ] No single points of failure pinned to one node/zone

CONFIG & SECRETS
[ ] Config externalised to ConfigMap (no config baked into the image)
[ ] Secrets in Secret objects; encryption-at-rest enabled (or external store/CSI)
[ ] Secrets mounted as files where possible (not env); checksum annotation to roll on change

SECURITY
[ ] runAsNonRoot: true, runAsUser a high non-zero UID
[ ] allowPrivilegeEscalation: false; capabilities drop ALL
[ ] readOnlyRootFilesystem: true (+ emptyDir for writable paths)
[ ] seccompProfile: RuntimeDefault
[ ] Namespace at Pod Security 'restricted' (enforce)
[ ] Image pinned by digest; scanned; pulled from a trusted registry

NETWORKING
[ ] Default-deny NetworkPolicy in the namespace
[ ] Explicit allow rules for required ingress/egress (incl. DNS egress to kube-dns)

OBSERVABILITY
[ ] Logs to stdout/stderr, structured (JSON)
[ ] App metrics exposed (/metrics) incl. Rate/Errors/Duration
[ ] Dashboards + alerts on user-visible SLOs; tracing wired (OpenTelemetry)
[ ] Labels/annotations: app, version, owner, runbook link

A hardened Deployment manifest

This single manifest wires together almost every control above — probes, resources for a Guaranteed pod, graceful shutdown, a safe rolling update, externalised config, a full securityContext, and topology spread. Read it top to bottom; the inline comments map each block back to the checklist.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-api
  labels:
    app: orders-api
    version: "1.4.2"          # observability: every object carries app + version
spec:
  replicas: 3                  # resilience: >=3 so a PDB + spread are meaningful
  revisionHistoryLimit: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0        # capacity never dips below desired during a rollout
      maxSurge: 1              # one new (Ready) pod created before an old one goes
  selector:
    matchLabels:
      app: orders-api
  template:
    metadata:
      labels:
        app: orders-api
        version: "1.4.2"
      annotations:
        checksum/config: "REPLACED_BY_CI_WITH_HASH"  # roll pods when ConfigMap changes
    spec:
      terminationGracePeriodSeconds: 45   # > preStop sleep + longest in-flight request
      securityContext:                    # pod-level: applies to all containers
        runAsNonRoot: true
        runAsUser: 10001
        runAsGroup: 10001
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway      # spread across zones, best effort
          labelSelector:
            matchLabels:
              app: orders-api
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule       # never two replicas on one node
          labelSelector:
            matchLabels:
              app: orders-api
      containers:
        - name: orders-api
          image: registry.example.com/orders-api@sha256:<digest>  # pin by digest
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
          envFrom:
            - configMapRef:
                name: orders-api-config        # externalised, non-sensitive config
          env:
            - name: DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: orders-api-secrets     # sensitive value from a Secret
                  key: db-password
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "500m"                       # requests == limits => Guaranteed QoS
              memory: "512Mi"                   # memory limit == request avoids OOM surprises
          startupProbe:                         # generous one-time boot budget
            httpGet: { path: /healthz, port: http }
            periodSeconds: 5
            failureThreshold: 30                # up to 150s to start, then hand over
          readinessProbe:                       # gates Service endpoints
            httpGet: { path: /readyz, port: http }
            periodSeconds: 5
            timeoutSeconds: 2
            failureThreshold: 3
          livenessProbe:                        # cheap, local; restarts a wedged process
            httpGet: { path: /healthz, port: http }
            periodSeconds: 10
            timeoutSeconds: 2
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 10"]  # cover async endpoint removal
          securityContext:                      # container-level hardening
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          volumeMounts:
            - name: tmp
              mountPath: /tmp                   # writable path despite read-only root FS
      volumes:
        - name: tmp
          emptyDir: {}

Pair it with the three companion objects the checklist demands — a PDB, an HPA, and a default-deny NetworkPolicy:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: orders-api
spec:
  maxUnavailable: 1            # node drains take at most one replica at a time
  selector:
    matchLabels:
      app: orders-api
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: orders-api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: orders-api
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70   # 70% of the pod's CPU request
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
spec:
  podSelector: {}               # selects every pod in the namespace
  policyTypes: ["Ingress"]      # deny all ingress; add explicit allow-policies next

Hands-on lab

You will harden a workload on a free local cluster, then prove each control works — watching a rollout stay up, a PDB block a drain, and a missing-request pod fail to autoscale. Roughly 25 minutes.

1. Create a cluster and a namespace

# kind (or: minikube start  /  k3d cluster create ready)
kind create cluster --name ready
kubectl create namespace shop
kubectl label namespace shop \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/warn=restricted

Labelling the namespace restricted means Pod Security Admission will reject any pod that is not hardened — a fast way to verify your manifest actually complies.

2. Try an unhardened pod (and watch it get rejected)

kubectl -n shop run nginx --image=nginx:1.27

Expected: the request is denied with a message listing violations (allowPrivilegeEscalation != false, unrestricted capabilities, runAsNonRoot != true, seccompProfile). This is Pod Security doing its job — proof that “restricted” is enforced.

3. Deploy the hardened workload

Save the hardened Deployment above as orders-api.yaml (swap the image for a runnable hardened one — ghcr.io/nginxinc/nginx-unprivileged:1.27 listens on 8080 and runs as non-root; point both probes at /), plus the PDB and HPA, then apply:

kubectl -n shop apply -f orders-api.yaml
kubectl -n shop rollout status deploy/orders-api
kubectl -n shop get pods -o wide        # confirm spread across nodes

Expected: three pods reach Running and READY 1/1. On a multi-node cluster the -o wide output shows them on different nodes (topology spread). Confirm the QoS class is Guaranteed:

kubectl -n shop get pod -l app=orders-api \
  -o jsonpath='{.items[0].status.qosClass}{"\n"}'
# -> Guaranteed

4. Watch a zero-downtime rollout

# In terminal 1, hammer the Service (after exposing it):
kubectl -n shop expose deploy/orders-api --port=80 --target-port=8080
kubectl -n shop run curl --image=curlimages/curl --restart=Never -it --rm -- \
  sh -c 'while true; do curl -s -o /dev/null -w "%{http_code}\n" orders-api; sleep 0.5; done'

# In terminal 2, trigger a rollout:
kubectl -n shop set image deploy/orders-api orders-api=ghcr.io/nginxinc/nginx-unprivileged:1.26

Expected: the curl loop keeps printing 200 throughout — maxUnavailable: 0 plus a working readiness probe means no request is dropped.

5. Prove the PodDisruptionBudget protects you

NODE=$(kubectl -n shop get pod -l app=orders-api \
  -o jsonpath='{.items[0].spec.nodeName}')
kubectl drain "$NODE" --ignore-daemonsets --delete-emptydir-data

Expected: the drain evicts pods one at a time, waiting for replacements to become Ready, because maxUnavailable: 1 forbids taking down more than one at once. With a single replica and minAvailable: 1, this command would block — that is the foot-gun to avoid. Uncordon when done: kubectl uncordon "$NODE".

6. See why requests matter for autoscaling

kubectl -n shop describe hpa orders-api | grep -A3 Metrics

If metrics-server is installed you will see a CPU percentage; if you had omitted CPU requests, the HPA would report <unknown> and refuse to scale — the concrete reason requests are non-negotiable. (On kind, install metrics-server with --kubelet-insecure-tls to see live numbers.)

Cleanup

kubectl delete namespace shop
kind delete cluster --name ready     # or: minikube delete / k3d cluster delete ready

Cost note

Everything here runs on a free local cluster (kind/minikube/k3d) on your laptop — zero cloud spend. The only cost is the few hundred MB of RAM the control plane and three small pods use.

Common mistakes & troubleshooting

Symptom	Likely cause	Fix
Requests dropped during every rollout	No readiness probe, or it reports ready too early	Add an honest readiness probe gating real serving capability; set `maxUnavailable: 0`
Restart storm during a dependency outage	Liveness probe calls the slow/down dependency	Make liveness cheap and local; check dependencies in readiness, not liveness
Pod `OOMKilled`, restarts repeatedly	Memory limit too low, or limit set well below real usage	Set memory request == limit to the observed working set; right-size with VPA recommendations
`kubectl drain` hangs forever	PDB cannot be satisfied (e.g. single replica, `minAvailable: 1`)	Run >=2 replicas; relax PDB; or `--disable-eviction` only as a last resort
All replicas on one node; node drain caused an outage	No topology spread / anti-affinity	Add topology spread on `kubernetes.io/hostname` (and zone)
Pod rejected at apply with policy violations	Namespace enforces `restricted`; manifest not hardened	Add the full `securityContext` (non-root, drop ALL, seccomp, no priv-esc)
Requests work in-cluster but break after deploy	Connections to `Terminating` pods during async endpoint removal	Add a `preStop` sleep; ensure the app traps SIGTERM and drains
HPA shows `<unknown>` targets, never scales	No CPU/memory request, or metrics-server missing	Set requests; install metrics-server
Config change not picked up	ConfigMap updated but pods not restarted	Add a `checksum/config` annotation to the pod template to force a rollout
DNS resolution fails after adding NetworkPolicy	Default-deny egress blocks port 53 to kube-dns	Add an egress allow rule to kube-system DNS on UDP/TCP 53

Best practices

Make readiness honest and liveness cheap. Readiness gates user traffic; liveness only restarts a wedged process. Never let liveness depend on an external system.
Always set requests; set memory limits equal to memory requests. Be deliberate (and often sparing) with CPU limits — throttling hurts latency without preventing failure.
Run at least two (ideally three) replicas for anything that serves traffic, and back them with a PDB plus topology spread so neither maintenance nor a node failure can take you down.
Roll out with maxUnavailable: 0 / maxSurge: 1 for capacity-sensitive services, and pair it with graceful shutdown (preStop + SIGTERM handling + a sufficient grace period).
Externalise config and secrets, mount secrets as files, enable encryption at rest, and force rollouts on config change with a checksum annotation.
Pin images by digest, scan them, and standardise labels (app, version, owner, runbook link) so observability and incident response have something to key on.
Treat the checklist as a gate, not a wishlist — enforce it with Pod Security Admission and policy-as-code (Kyverno/OPA Gatekeeper) so non-compliant workloads cannot reach production.

Security notes

Production-readiness is security here. Three points deserve emphasis. First, Secrets are base64, not encrypted, by default — anyone with get secret RBAC or etcd access can read them; enable encryption at rest, prefer mounting over env vars, and consider an external store via the Secrets Store CSI driver. Second, the default flat network is a lateral-movement highway — a default-deny NetworkPolicy turns a single compromised pod into a contained incident instead of a cluster-wide pivot; just remember to allow DNS egress. Third, restricted Pod Security is the floor, not the ceiling — a non-root, read-only, capability-stripped pod with RuntimeDefault seccomp removes the most common container-escape and privilege-escalation paths; layer on Pod Security Admission to enforce it cluster-side. Combine least-privilege RBAC, image provenance (signed, scanned, digest-pinned), and these pod-level controls for defence in depth.

Interview & exam questions

What is the difference between a liveness and a readiness probe, and what happens when each fails? Liveness answers “is this container wedged?” — on failure the container is restarted. Readiness answers “should this pod get traffic?” — on failure the pod is removed from Service endpoints but not restarted. Liveness fixes hangs; readiness controls traffic during warm-up, overload or dependency loss.
When and why would you add a startup probe? For slow-starting apps (JVM, legacy). It gives a generous one-time boot budget and gates liveness/readiness until it passes, so you can keep the liveness probe tight for the rest of the container’s life instead of inflating initialDelaySeconds.
Why should a liveness probe never call an external dependency? If the dependency is slow or down, the probe fails, Kubernetes restarts the container, and you get a restart storm that makes recovery harder — turning a dependency blip into a self-inflicted outage. Liveness must be cheap and local.
What determines a pod’s QoS class, and why does it matter? The relationship between requests and limits. Guaranteed = requests equal limits for both CPU and memory; Burstable = some requests/limits but not equal; BestEffort = none set. QoS sets eviction order under node memory pressure: BestEffort is evicted first, Guaranteed last.
What happens when a container exceeds its CPU limit versus its memory limit? Over the CPU limit it is throttled (slowed, never killed). Over the memory limit it is OOMKilled and restarted. Hence: be cautious with CPU limits (throttling hurts latency); set memory limit equal to request for predictability.
What does a PodDisruptionBudget protect against, and what does it not? It limits voluntary disruptions (drains, autoscaler scale-down, node-pool upgrades) so too many replicas are not evicted at once. It does not protect against involuntary disruptions (node/hardware failure) — spread (topology/anti-affinity) handles those. And it only works with >1 replica.
How do you achieve a zero-downtime rolling update? Run multiple replicas, set maxUnavailable: 0 and maxSurge: 1 (a new Ready pod before removing an old one), back it with an honest readiness probe, and implement graceful shutdown (preStop sleep + SIGTERM handling + adequate terminationGracePeriodSeconds).
Why might requests still reach a pod after it enters Terminating? Endpoint removal propagates asynchronously through kube-proxy and ingress controllers, so for a short window a terminating pod can still be a target. Mitigate with a preStop sleep that delays SIGTERM until the removal has propagated, plus in-app connection draining.
Prefer topology spread constraints or pod anti-affinity, and why? Topology spread constraints for new work — they express “spread evenly across this domain” directly with maxSkew, are cheaper for the scheduler, and support soft/hard via whenUnsatisfiable. Anti-affinity is the older, more expensive mechanism for keeping pods apart.
How does the HorizontalPodAutoscaler use resource requests? CPU utilisation is computed as a percentage of the pod’s CPU request, so without a request the HPA has no denominator and reports <unknown>, refusing to scale. This is a key reason requests are mandatory. The HPA also needs metrics-server.
What replaced PodSecurityPolicy, and how do you enforce hardening cluster-side? Pod Security Admission (PSA), applied per namespace via labels (pod-security.kubernetes.io/enforce: restricted, with warn/audit for migration). It enforces the Pod Security Standards (privileged / baseline / restricted); restricted requires non-root, dropped capabilities, seccomp RuntimeDefault, no privilege escalation, etc.
What is the default pod-to-pod network behaviour, and how do you secure it? By default every pod can reach every other pod. Apply a default-deny NetworkPolicy (select all pods, deny ingress/egress), then add narrow allow-rules per required flow — remembering to allow DNS egress to kube-dns on port 53. Enforcement depends on a CNI that supports NetworkPolicy.

Quick check

Which probe controls whether a pod appears in a Service’s endpoint list?
A pod has CPU/memory requests equal to its limits. What QoS class is it, and where does it sit in eviction order?
You set minAvailable: 1 on a single-replica Deployment and then run kubectl drain. What happens?
What two rolling-update fields give you “never drop below desired capacity,” and what value does each take?
Name the three minimum observability practices for a production workload.

Answers

The readiness probe — on failure the pod is removed from Service endpoints (it is not restarted).
Guaranteed, and it is evicted last under node memory pressure (most protected).
The drain blocks indefinitely — evicting the only replica would breach minAvailable: 1, so the node cannot be drained. Run at least two replicas.
maxUnavailable: 0 (no pod may be unavailable) and maxSurge: 1 (one extra Ready pod is created before an old one is removed).
Log to stdout/stderr, emit structured (JSON) logs, and expose application metrics (Rate/Errors/Duration) for SLOs and autoscaling.

Exercise

Take an unhardened Deployment of your choice (or the bare nginx from the lab) and bring it to production-readiness against the checklist, proving each control:

Add liveness, readiness and startup probes pointing at real endpoints; demonstrate that failing readiness drops the pod from Service endpoints (kubectl get endpoints) without a restart.
Set requests and limits to land the pod in Guaranteed QoS; verify with kubectl get pod -o jsonpath='{.status.qosClass}'.
Scale to three replicas, add a PDB (maxUnavailable: 1) and topology spread across nodes; drain a node and show eviction proceeds one pod at a time.
Configure maxUnavailable: 0/maxSurge: 1, add a preStop sleep and a sensible grace period, and show a rollout that keeps a curl loop returning 200 throughout.
Move all config to a ConfigMap and any secret to a Secret; apply the full restricted securityContext and confirm the pod is admitted into a restricted namespace.
Add a default-deny NetworkPolicy plus the minimum allow-rules (including DNS egress) and confirm the app still works.

Write a short READINESS.md recording which checklist items you completed and the command that proves each one — exactly what a reviewer would ask for.

Certification mapping

Exam	Where this lesson maps
CKAD	Application Design and Build (probes, multi-container patterns, config), Application Deployment (rolling updates, deployment strategies), Application Observability and Maintenance (probes, logging, monitoring), Services & Networking (NetworkPolicy) — this is core CKAD territory
CKA	Workloads & Scheduling (deployments, rolling updates, resource limits, PDBs, topology), Services & Networking (NetworkPolicy), Troubleshooting (probe and resource failures)
CKS	Minimize Microservice Vulnerabilities (securityContext, Pod Security Standards), System Hardening and Cluster Hardening (NetworkPolicy default-deny, least privilege)
KCNA	Conceptual coverage of probes, resources, scaling and observability for the entry-level exam

Glossary

Liveness probe — a check that, on failure, restarts the container; for detecting wedged processes.
Readiness probe — a check that, on failure, removes the pod from Service endpoints; for controlling traffic.
Startup probe — a one-time boot-budget check that gates liveness/readiness for slow-starting apps.
Request — the CPU/memory the scheduler reserves for a container; the basis for bin-packing and HPA percentages.
Limit — the hard ceiling enforced by the kubelet/runtime; CPU is throttled, memory is OOMKilled.
QoS class — Guaranteed / Burstable / BestEffort, derived from requests vs limits; sets eviction order.
PodDisruptionBudget (PDB) — caps how many pods may be voluntarily disrupted at once.
Topology spread constraint — scheduler rule to distribute pods evenly across a topology domain (node, zone).
Voluntary vs involuntary disruption — operator-initiated (drain, scale-down) vs unplanned (node failure).
Graceful shutdown — endpoint removal → preStop → SIGTERM → (grace period) → SIGKILL; lets a pod drain cleanly.
securityContext — pod/container security settings (non-root, capabilities, read-only FS, seccomp).
Pod Security Admission (PSA) — built-in admission controller enforcing the Pod Security Standards per namespace.
NetworkPolicy — pod-level ingress/egress firewall rules enforced by the CNI.
HorizontalPodAutoscaler (HPA) — scales replica count to track a target metric (often CPU% of request).

Next steps

You can now take any workload from “it runs” to “it is production-ready.” Next, learn to build and operate the cluster itself — HA control planes, etcd backup and safe upgrades — in Provisioning Production Kubernetes: kubeadm, HA Control Plane, etcd Backup & Upgrades. To go deeper on individual controls, see Kubernetes Autoscaling: HPA, KEDA & Karpenter, Right-Sizing with the Vertical Pod Autoscaler, Default-Deny Network Policies with Cilium, and Pod Security Admission: Baseline to Restricted.