Containerization Operations

Production-Ready Kubernetes Workloads: The Day-2 Readiness Checklist

Getting a container to run on Kubernetes is the easy part. A kubectl apply and a kubectl get pods showing Running looks like success — and in a demo it is. But the gap between running and production-ready is where most teams quietly accumulate outages: the pod that takes traffic before its database connection is open, the workload with no memory limit that gets OOMKilled at 3 a.m., the Deployment that drops requests during every rollout, the StatefulSet that all lands on one node and disappears when that node is drained for patching.

This lesson is the Day-2 readiness checklist — the set of properties a workload needs before it carries real traffic, and the reasoning behind each one so you can defend your choices in a design review or an interview. We will work through the controls that separate a demo from production: probes (liveness, readiness, startup), resource requests and limits with the QoS classes they produce, PodDisruptionBudgets, topology spread constraints and anti-affinity, the HorizontalPodAutoscaler, graceful shutdown, the rolling-update strategy, ConfigMap and Secret hygiene, securityContext and Pod Security, NetworkPolicy, and observability. We finish with a copy-paste checklist you can put in a pull-request template and a single hardened Deployment manifest that wires almost all of it together.

The voice here is deliberately practical. Every setting below has cost you can feel in production if you get it wrong, and an interviewer will ask you why, not just what.

Learning objectives

By the end of this lesson you can:

Prerequisites & where this fits

You need to be comfortable with the core workload objects — Pods, ReplicaSets, Deployments and Services — and able to run kubectl apply, kubectl get, kubectl describe and kubectl logs. If those are not yet second nature, work through Pods, ReplicaSets, Deployments & Services: The Core Objects and Your First Cluster: kubectl and a Real Deploy first. You will need a cluster for the lab; a free local one from kind, minikube or k3d is enough.

This is the production-readiness checkpoint of the Kubernetes Zero-to-Hero course. It sits after the fundamentals and before you provision and operate your own clusters in Provisioning Production Kubernetes: kubeadm, HA Control Plane, etcd Backup & Upgrades. Everything here is squarely in the CKAD wheelhouse (designing resilient application deployments) and overlaps heavily with CKA (workload operations).

Core concepts: what “production-ready” actually means

Kubernetes is a declarative reconciliation engine: you describe the desired state and controllers work continuously to make actual state match. “Production-ready” means you have given those controllers enough information to make good decisions on your behalf — and protected the workload against the four things that routinely break it:

Threat to availability What it looks like The control that addresses it
Bad rollouts A new image crashes or serves errors, but the old version is already gone Readiness probes + rolling-update strategy + (later) progressive delivery
Resource contention A noisy neighbour starves your pod of CPU/memory; OOMKills Requests, limits, QoS classes
Voluntary disruptions A node drain (upgrade, autoscaler) takes down too many replicas at once PodDisruptionBudget + multiple replicas
Involuntary disruptions A node, rack or zone fails Topology spread / anti-affinity across failure domains

Two distinctions underpin the whole lesson. The first is voluntary vs involuntary disruption. Involuntary disruptions are things you do not initiate — a kernel panic, a hardware failure, a node running out of memory. Voluntary disruptions are deliberate operator actions: draining a node to patch it, scaling down a node pool, deleting a pod. You cannot prevent involuntary disruptions, only spread your blast radius; you can rate-limit voluntary disruptions with a PodDisruptionBudget. The second is desired vs actual state — the readiness signal you expose is how a pod tells Kubernetes “actual is not ready yet, do not send me traffic,” and almost every control below is ultimately about making that signal accurate.

Health probes: liveness, readiness and startup

Kubernetes cannot read your application’s mind. It knows a container’s process is alive, but not whether the app inside is healthy or ready to serve. Probes are how you tell it.

Probe Question it answers On failure Use it for
Liveness “Is this container wedged and beyond recovery?” The container is restarted (per restartPolicy) Deadlocks, stuck event loops — states a restart fixes
Readiness “Should this pod receive traffic right now?” The pod is removed from Service endpoints (not restarted) Warm-up, lost dependency, overload, draining
Startup “Has this slow-starting app finished booting yet?” The container is restarted; gates liveness/readiness until it passes Legacy/JVM apps with long, variable startup

Three rules save you from the classic self-inflicted outages:

  1. Readiness is the one that protects users. It controls whether the pod is in the Service’s endpoint list. A readiness probe that also checks a critical downstream dependency lets a pod gracefully stop taking traffic when that dependency is gone — but be careful: if every replica checks a shared dependency and that dependency blips, you can take the entire Service out of rotation at once. Probe what this pod needs to serve, not the health of the whole world.
  2. Liveness must be cheap and local. If your liveness probe calls the database and the database is slow, Kubernetes will conclude the container is dead and restart it — turning a dependency hiccup into a restart storm that makes recovery harder. Liveness should answer “is this process wedged,” nothing more.
  3. Startup probes exist so the other two do not have to compensate. Without a startup probe, a slow app forces you to set a long initialDelaySeconds on liveness, which then makes liveness slow to detect real hangs for the container’s whole life. A startup probe gives the app a generous boot budget once, then hands over to a tight liveness probe.

Probe handlers come in four flavours: httpGet (a 2xx/3xx response means pass — the most common for web services), tcpSocket (the port accepts a connection — fine for non-HTTP servers), exec (a command exits 0 — flexible but the most expensive, as it forks a process each run), and grpc (native gRPC health checking, stable since v1.27). The tunables are the same for all three lifecycle probes:

Field Meaning Sensible default
initialDelaySeconds Wait before the first probe Prefer a startup probe over a large value here
periodSeconds How often to probe 10 (readiness can be tighter, e.g. 5)
timeoutSeconds How long to wait for a response 12 (the default 1 is often too tight for HTTP)
failureThreshold Consecutive failures before acting 3
successThreshold Consecutive successes to recover 1 (must be 1 for liveness/startup)

A startup probe’s total budget is failureThreshold × periodSeconds — set that to comfortably exceed your worst-case boot time. Expose a lightweight /healthz (liveness) and a /readyz (readiness) in your app rather than reusing one endpoint for both; they answer different questions.

Resource requests, limits and QoS classes

Requests and limits are the single most consequential — and most often skipped — production setting.

Resource Over the limit, what happens Implication
CPU The container is throttled (CFS quota) — slowed, never killed Tail-latency spikes; the pod survives
Memory The container is OOMKilled when it exceeds its limit The container dies and restarts

Because CPU throttles but memory kills, the standard guidance is: always set memory requests and limits equal for predictable workloads, set a CPU request, and be cautious with CPU limits — aggressive CPU limits cause throttling that hurts latency without any safety benefit. Many mature platforms set CPU requests but omit CPU limits for latency-sensitive services, relying on requests for fair scheduling.

The combination of requests and limits determines the pod’s Quality of Service (QoS) class, which decides eviction order when a node runs out of memory (the kubelet evicts to reclaim resources):

QoS class Condition Eviction order under node pressure
Guaranteed Every container has requests equal to limits for both CPU and memory Evicted last — most protected
Burstable At least one container has a request or limit, but not Guaranteed Evicted after BestEffort, ordered by usage above requests
BestEffort No requests or limits set anywhere Evicted first — never run critical workloads this way

For production: give every container at least requests, and target Guaranteed for anything stateful or latency-critical. A BestEffort pod is a pod the kubelet will sacrifice without hesitation — acceptable only for throwaway batch work. You can constrain a namespace with a LimitRange (defaults and min/max per pod) and cap total consumption with a ResourceQuota; both are how platform teams stop a single team’s workloads from starving a shared cluster.

PodDisruptionBudgets: surviving voluntary disruption

A PodDisruptionBudget (PDB) caps how many of a workload’s pods can be voluntarily disrupted at once. It does not stop a node failing — it stops kubectl drain (and the cluster autoscaler, and node-pool upgrades) from evicting too many replicas simultaneously.

You express it one of two ways, never both:

Field Meaning Example
minAvailable Minimum pods that must stay up during disruption 2 or 50%
maxUnavailable Maximum pods that may be down during disruption 1 or 25%

A PDB only has teeth if you run more than one replica. minAvailable: 1 on a single-replica Deployment means the drain blocks forever and you cannot patch the node — a common foot-gun. For a 3-replica web service, maxUnavailable: 1 (or minAvailable: 2) lets node maintenance proceed one pod at a time while keeping a quorum serving. Percentages are evaluated against the number of pods at disruption time and round in your favour for minAvailable.

Spreading replicas: topology spread and anti-affinity

Three replicas mean nothing if all three land on the same node and that node is drained. You need them spread across failure domains — nodes, then availability zones.

Topology spread constraints are the modern, preferred tool. They tell the scheduler to keep pods evenly distributed across a topology key:

Field What it controls
topologyKey The domain to spread across — kubernetes.io/hostname (node) or topology.kubernetes.io/zone (zone)
maxSkew The maximum allowed difference in pod count between the most and least populated domains
whenUnsatisfiable DoNotSchedule (hard — pod stays Pending if it would breach skew) or ScheduleAnyway (soft — best effort)
labelSelector Which pods are counted when computing the spread

A typical production pattern spreads across zones softly (ScheduleAnyway) and across nodes more firmly, so a pod never piles two replicas on one node when another is free. Pod anti-affinity is the older mechanism that achieves similar goals (preferredDuringScheduling... keeps replicas apart on a best-effort basis); prefer topology spread constraints for new work — they are cheaper for the scheduler and express intent more directly. Use the hard variant (DoNotSchedule / requiredDuringScheduling) only when you genuinely prefer a Pending pod to a co-located one.

Rolling updates and graceful shutdown

A Deployment’s default update strategy is RollingUpdate, governed by two knobs that, combined with readiness probes, give you zero-downtime deploys:

Field Meaning Effect
maxSurge Extra pods allowed above the desired count during a rollout Higher = faster rollout, more peak capacity used
maxUnavailable Pods allowed to be unavailable during a rollout 0 = never drop below desired count (safest); requires headroom

The safest production setting for an even-numbered, capacity-constrained service is maxUnavailable: 0 with maxSurge: 1 — a new pod must become Ready before an old one is removed, so capacity never dips. This only works if your readiness probe is honest: if it reports ready before the app can serve, the rollout will happily replace healthy pods with broken ones. The other strategy, Recreate, kills all old pods before creating new ones (a downtime window) — use it only when two versions cannot coexist, e.g. an exclusive lock or an incompatible schema.

Graceful shutdown is the other half of zero-downtime. When a pod is deleted (a rollout, a scale-down, a drain), Kubernetes does this, in parallel:

  1. The pod is marked Terminating and removed from Service endpoints (it stops being a traffic target).
  2. The preStop hook runs (if defined).
  3. SIGTERM is sent to PID 1 in each container.
  4. After terminationGracePeriodSeconds (default 30), any remaining processes get SIGKILL.

The subtle race: endpoint removal propagates asynchronously through kube-proxy and ingress controllers, so for a brief moment a Terminating pod may still receive new connections. The standard fix is a preStop sleep (sleep 515) that delays SIGTERM long enough for the endpoint removal to propagate, then a graceful in-app handler that drains in-flight requests before exiting. Set terminationGracePeriodSeconds longer than your longest in-flight request plus the preStop sleep. Your app must trap SIGTERM and exit cleanly — if it ignores SIGTERM (common when the process is wrapped in a shell), every shutdown becomes a hard 30-second kill that drops requests.

Configuration and secrets

Hard-coding configuration into an image is the anti-pattern; externalise it:

Mechanism For Inject as Notes
ConfigMap Non-sensitive config (flags, URLs, files) Env vars or mounted files Changing it does not restart pods — roll the Deployment or use a config-reloader
Secret Sensitive data (tokens, passwords, keys) Env vars or mounted files Base64-encoded not encrypted by default; mount as files, not env, where possible

Two production rules: prefer mounting ConfigMaps/Secrets as files over environment variables (mounted files can update live without a restart and do not leak into kubectl describe or crash dumps), and enable encryption at rest for Secrets in etcd (or use an external store via the Secrets Store CSI driver). To force a rollout when config changes, hash the config into a pod-template annotation (e.g. a checksum/config annotation) so the Deployment’s pod template changes and triggers a rolling update.

securityContext and Pod Security

A hardened pod runs as an unprivileged user, with a read-only root filesystem, no extra Linux capabilities, and no privilege escalation. The fields live at pod and container level:

Field Set to Why
runAsNonRoot: true always Refuses to start a container running as UID 0
runAsUser / runAsGroup a high non-zero UID (e.g. 10001) Drops root explicitly
allowPrivilegeEscalation: false always Blocks setuid/setgid gaining more privilege than the parent
readOnlyRootFilesystem: true where feasible Immutable container FS; mount emptyDir for writable paths
capabilities.drop: ["ALL"] always Start from zero Linux capabilities, add back only what is needed
seccompProfile.type: RuntimeDefault always Restricts the syscalls the container can make

These are enforced cluster-side by Pod Security Admission (PSA), the built-in replacement for the removed PodSecurityPolicy. PSA applies one of three Pod Security Standards per namespace via labels:

Standard What it allows Use for
privileged Unrestricted System/infra namespaces only
baseline Blocks known privilege escalations A sane minimum for most apps
restricted Enforces the hardening above (non-root, drop ALL, seccomp, etc.) The target for production workloads

You set it with namespace labels — pod-security.kubernetes.io/enforce: restricted (plus warn and audit variants to surface violations without blocking during migration). Aim every production namespace at restricted and make the workload comply, rather than weakening the namespace to fit a lax workload.

NetworkPolicy: default-deny networking

By default, every pod can talk to every other pod in the cluster — a flat network with no segmentation. A NetworkPolicy restricts ingress and egress at the pod level (enforced by your CNI — Calico, Cilium, etc.; note that some CNIs do not enforce NetworkPolicy at all, so verify yours does).

The production baseline is default-deny, then allow what is needed: apply a policy that selects all pods in a namespace and denies all ingress (and ideally egress), then add narrow allow-policies for the specific flows your app needs — e.g. “allow ingress to the API on port 8080 from pods labelled role=frontend,” and “allow egress to the database namespace on 5432 and to kube-dns on 53.” This turns a single compromised pod from a cluster-wide pivot point into a contained incident. Remember to allow DNS egress (UDP/TCP 53 to kube-system) or name resolution breaks in subtle ways.

Observability: metrics, logs and traces

You cannot operate what you cannot see. Production-ready means the three pillars are wired in from day one, not bolted on after the first incident:

Pillar What it gives you Common stack
Metrics Aggregate health, alerting, autoscaling signals Prometheus + Grafana; expose /metrics, set prometheus.io/scrape or a ServiceMonitor
Logs Per-request detail, debugging Write structured logs to stdout/stderr; collect with Fluent Bit/Loki/ELK
Traces Latency across service hops OpenTelemetry → Tempo/Jaeger

Three minimums: log to stdout/stderr (never to a file inside the container — the platform collects stdout), emit structured (JSON) logs so they are queryable, and expose application metrics including the RED signals (Rate, Errors, Duration) so you can define SLOs and drive the HPA on a meaningful signal. Wire metrics to your readiness/SLO story so alerts fire on user-visible symptoms, not just on pod restarts.

Autoscaling: the HorizontalPodAutoscaler

The HorizontalPodAutoscaler (HPA) adds and removes pod replicas to track a target metric — most commonly CPU utilisation as a percentage of the pod’s CPU request (which is exactly why requests are non-negotiable: with no request, the HPA has nothing to compute a percentage against). It needs the metrics-server installed.

Key knobs: minReplicas/maxReplicas (the bounds), the target (e.g. averageUtilization: 70), and behavior (scale-up/down stabilisation windows and rate limits, to damp flapping). For metrics beyond CPU/memory — queue depth, requests-per-second, external signals — you graduate to KEDA, covered in Kubernetes Autoscaling: HPA, KEDA & Karpenter. Pair the HPA with a PDB and topology spread so scaling events keep replicas well distributed and respect disruption limits.

Kubernetes production-readiness checklist

The diagram groups every control above into the four readiness pillars — health & lifecycle, resources & scaling, resilience & disruption, and security & networking — so you can see at a glance which knob defends against which failure mode.

The copy-paste production-readiness checklist

Paste this into your pull-request template or a READINESS.md and tick each box before a workload carries real traffic.

PRODUCTION-READINESS CHECKLIST  (tick every box before go-live)

HEALTH & LIFECYCLE
[ ] Readiness probe defined; reflects "can serve traffic now" (warm-up + critical deps)
[ ] Liveness probe defined; cheap, local, no external dependency calls
[ ] Startup probe for slow-starting apps (so liveness can stay tight)
[ ] App traps SIGTERM and drains in-flight work before exit
[ ] preStop hook (sleep 5-15s) to cover async endpoint removal
[ ] terminationGracePeriodSeconds > preStop sleep + longest in-flight request

RESOURCES & SCALING
[ ] CPU + memory requests set on every container
[ ] Memory limit == memory request (predictable; avoid OOM surprises)
[ ] QoS class is Guaranteed or Burstable (never BestEffort for prod)
[ ] HPA configured with min/max and a meaningful target (CPU% or custom)
[ ] metrics-server (and Prometheus adapter / KEDA if custom metrics) installed
[ ] Namespace LimitRange + ResourceQuota in place (shared clusters)

RESILIENCE & DISRUPTION
[ ] replicas >= 2 (>=3 for quorum/HA services)
[ ] PodDisruptionBudget set (maxUnavailable or minAvailable) and not blocking drains
[ ] Topology spread across nodes (and zones) configured
[ ] RollingUpdate: maxUnavailable: 0 / maxSurge: 1 (capacity never dips), or justified
[ ] No single points of failure pinned to one node/zone

CONFIG & SECRETS
[ ] Config externalised to ConfigMap (no config baked into the image)
[ ] Secrets in Secret objects; encryption-at-rest enabled (or external store/CSI)
[ ] Secrets mounted as files where possible (not env); checksum annotation to roll on change

SECURITY
[ ] runAsNonRoot: true, runAsUser a high non-zero UID
[ ] allowPrivilegeEscalation: false; capabilities drop ALL
[ ] readOnlyRootFilesystem: true (+ emptyDir for writable paths)
[ ] seccompProfile: RuntimeDefault
[ ] Namespace at Pod Security 'restricted' (enforce)
[ ] Image pinned by digest; scanned; pulled from a trusted registry

NETWORKING
[ ] Default-deny NetworkPolicy in the namespace
[ ] Explicit allow rules for required ingress/egress (incl. DNS egress to kube-dns)

OBSERVABILITY
[ ] Logs to stdout/stderr, structured (JSON)
[ ] App metrics exposed (/metrics) incl. Rate/Errors/Duration
[ ] Dashboards + alerts on user-visible SLOs; tracing wired (OpenTelemetry)
[ ] Labels/annotations: app, version, owner, runbook link

A hardened Deployment manifest

This single manifest wires together almost every control above — probes, resources for a Guaranteed pod, graceful shutdown, a safe rolling update, externalised config, a full securityContext, and topology spread. Read it top to bottom; the inline comments map each block back to the checklist.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-api
  labels:
    app: orders-api
    version: "1.4.2"          # observability: every object carries app + version
spec:
  replicas: 3                  # resilience: >=3 so a PDB + spread are meaningful
  revisionHistoryLimit: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0        # capacity never dips below desired during a rollout
      maxSurge: 1              # one new (Ready) pod created before an old one goes
  selector:
    matchLabels:
      app: orders-api
  template:
    metadata:
      labels:
        app: orders-api
        version: "1.4.2"
      annotations:
        checksum/config: "REPLACED_BY_CI_WITH_HASH"  # roll pods when ConfigMap changes
    spec:
      terminationGracePeriodSeconds: 45   # > preStop sleep + longest in-flight request
      securityContext:                    # pod-level: applies to all containers
        runAsNonRoot: true
        runAsUser: 10001
        runAsGroup: 10001
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway      # spread across zones, best effort
          labelSelector:
            matchLabels:
              app: orders-api
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule       # never two replicas on one node
          labelSelector:
            matchLabels:
              app: orders-api
      containers:
        - name: orders-api
          image: registry.example.com/orders-api@sha256:<digest>  # pin by digest
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
          envFrom:
            - configMapRef:
                name: orders-api-config        # externalised, non-sensitive config
          env:
            - name: DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: orders-api-secrets     # sensitive value from a Secret
                  key: db-password
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "500m"                       # requests == limits => Guaranteed QoS
              memory: "512Mi"                   # memory limit == request avoids OOM surprises
          startupProbe:                         # generous one-time boot budget
            httpGet: { path: /healthz, port: http }
            periodSeconds: 5
            failureThreshold: 30                # up to 150s to start, then hand over
          readinessProbe:                       # gates Service endpoints
            httpGet: { path: /readyz, port: http }
            periodSeconds: 5
            timeoutSeconds: 2
            failureThreshold: 3
          livenessProbe:                        # cheap, local; restarts a wedged process
            httpGet: { path: /healthz, port: http }
            periodSeconds: 10
            timeoutSeconds: 2
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 10"]  # cover async endpoint removal
          securityContext:                      # container-level hardening
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          volumeMounts:
            - name: tmp
              mountPath: /tmp                   # writable path despite read-only root FS
      volumes:
        - name: tmp
          emptyDir: {}

Pair it with the three companion objects the checklist demands — a PDB, an HPA, and a default-deny NetworkPolicy:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: orders-api
spec:
  maxUnavailable: 1            # node drains take at most one replica at a time
  selector:
    matchLabels:
      app: orders-api
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: orders-api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: orders-api
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70   # 70% of the pod's CPU request
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
spec:
  podSelector: {}               # selects every pod in the namespace
  policyTypes: ["Ingress"]      # deny all ingress; add explicit allow-policies next

Hands-on lab

You will harden a workload on a free local cluster, then prove each control works — watching a rollout stay up, a PDB block a drain, and a missing-request pod fail to autoscale. Roughly 25 minutes.

1. Create a cluster and a namespace

# kind (or: minikube start  /  k3d cluster create ready)
kind create cluster --name ready
kubectl create namespace shop
kubectl label namespace shop \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/warn=restricted

Labelling the namespace restricted means Pod Security Admission will reject any pod that is not hardened — a fast way to verify your manifest actually complies.

2. Try an unhardened pod (and watch it get rejected)

kubectl -n shop run nginx --image=nginx:1.27

Expected: the request is denied with a message listing violations (allowPrivilegeEscalation != false, unrestricted capabilities, runAsNonRoot != true, seccompProfile). This is Pod Security doing its job — proof that “restricted” is enforced.

3. Deploy the hardened workload

Save the hardened Deployment above as orders-api.yaml (swap the image for a runnable hardened one — ghcr.io/nginxinc/nginx-unprivileged:1.27 listens on 8080 and runs as non-root; point both probes at /), plus the PDB and HPA, then apply:

kubectl -n shop apply -f orders-api.yaml
kubectl -n shop rollout status deploy/orders-api
kubectl -n shop get pods -o wide        # confirm spread across nodes

Expected: three pods reach Running and READY 1/1. On a multi-node cluster the -o wide output shows them on different nodes (topology spread). Confirm the QoS class is Guaranteed:

kubectl -n shop get pod -l app=orders-api \
  -o jsonpath='{.items[0].status.qosClass}{"\n"}'
# -> Guaranteed

4. Watch a zero-downtime rollout

# In terminal 1, hammer the Service (after exposing it):
kubectl -n shop expose deploy/orders-api --port=80 --target-port=8080
kubectl -n shop run curl --image=curlimages/curl --restart=Never -it --rm -- \
  sh -c 'while true; do curl -s -o /dev/null -w "%{http_code}\n" orders-api; sleep 0.5; done'

# In terminal 2, trigger a rollout:
kubectl -n shop set image deploy/orders-api orders-api=ghcr.io/nginxinc/nginx-unprivileged:1.26

Expected: the curl loop keeps printing 200 throughout — maxUnavailable: 0 plus a working readiness probe means no request is dropped.

5. Prove the PodDisruptionBudget protects you

NODE=$(kubectl -n shop get pod -l app=orders-api \
  -o jsonpath='{.items[0].spec.nodeName}')
kubectl drain "$NODE" --ignore-daemonsets --delete-emptydir-data

Expected: the drain evicts pods one at a time, waiting for replacements to become Ready, because maxUnavailable: 1 forbids taking down more than one at once. With a single replica and minAvailable: 1, this command would block — that is the foot-gun to avoid. Uncordon when done: kubectl uncordon "$NODE".

6. See why requests matter for autoscaling

kubectl -n shop describe hpa orders-api | grep -A3 Metrics

If metrics-server is installed you will see a CPU percentage; if you had omitted CPU requests, the HPA would report <unknown> and refuse to scale — the concrete reason requests are non-negotiable. (On kind, install metrics-server with --kubelet-insecure-tls to see live numbers.)

Cleanup

kubectl delete namespace shop
kind delete cluster --name ready     # or: minikube delete / k3d cluster delete ready

Cost note

Everything here runs on a free local cluster (kind/minikube/k3d) on your laptop — zero cloud spend. The only cost is the few hundred MB of RAM the control plane and three small pods use.

Common mistakes & troubleshooting

Symptom Likely cause Fix
Requests dropped during every rollout No readiness probe, or it reports ready too early Add an honest readiness probe gating real serving capability; set maxUnavailable: 0
Restart storm during a dependency outage Liveness probe calls the slow/down dependency Make liveness cheap and local; check dependencies in readiness, not liveness
Pod OOMKilled, restarts repeatedly Memory limit too low, or limit set well below real usage Set memory request == limit to the observed working set; right-size with VPA recommendations
kubectl drain hangs forever PDB cannot be satisfied (e.g. single replica, minAvailable: 1) Run >=2 replicas; relax PDB; or --disable-eviction only as a last resort
All replicas on one node; node drain caused an outage No topology spread / anti-affinity Add topology spread on kubernetes.io/hostname (and zone)
Pod rejected at apply with policy violations Namespace enforces restricted; manifest not hardened Add the full securityContext (non-root, drop ALL, seccomp, no priv-esc)
Requests work in-cluster but break after deploy Connections to Terminating pods during async endpoint removal Add a preStop sleep; ensure the app traps SIGTERM and drains
HPA shows <unknown> targets, never scales No CPU/memory request, or metrics-server missing Set requests; install metrics-server
Config change not picked up ConfigMap updated but pods not restarted Add a checksum/config annotation to the pod template to force a rollout
DNS resolution fails after adding NetworkPolicy Default-deny egress blocks port 53 to kube-dns Add an egress allow rule to kube-system DNS on UDP/TCP 53

Best practices

Security notes

Production-readiness is security here. Three points deserve emphasis. First, Secrets are base64, not encrypted, by default — anyone with get secret RBAC or etcd access can read them; enable encryption at rest, prefer mounting over env vars, and consider an external store via the Secrets Store CSI driver. Second, the default flat network is a lateral-movement highway — a default-deny NetworkPolicy turns a single compromised pod into a contained incident instead of a cluster-wide pivot; just remember to allow DNS egress. Third, restricted Pod Security is the floor, not the ceiling — a non-root, read-only, capability-stripped pod with RuntimeDefault seccomp removes the most common container-escape and privilege-escalation paths; layer on Pod Security Admission to enforce it cluster-side. Combine least-privilege RBAC, image provenance (signed, scanned, digest-pinned), and these pod-level controls for defence in depth.

Interview & exam questions

  1. What is the difference between a liveness and a readiness probe, and what happens when each fails? Liveness answers “is this container wedged?” — on failure the container is restarted. Readiness answers “should this pod get traffic?” — on failure the pod is removed from Service endpoints but not restarted. Liveness fixes hangs; readiness controls traffic during warm-up, overload or dependency loss.

  2. When and why would you add a startup probe? For slow-starting apps (JVM, legacy). It gives a generous one-time boot budget and gates liveness/readiness until it passes, so you can keep the liveness probe tight for the rest of the container’s life instead of inflating initialDelaySeconds.

  3. Why should a liveness probe never call an external dependency? If the dependency is slow or down, the probe fails, Kubernetes restarts the container, and you get a restart storm that makes recovery harder — turning a dependency blip into a self-inflicted outage. Liveness must be cheap and local.

  4. What determines a pod’s QoS class, and why does it matter? The relationship between requests and limits. Guaranteed = requests equal limits for both CPU and memory; Burstable = some requests/limits but not equal; BestEffort = none set. QoS sets eviction order under node memory pressure: BestEffort is evicted first, Guaranteed last.

  5. What happens when a container exceeds its CPU limit versus its memory limit? Over the CPU limit it is throttled (slowed, never killed). Over the memory limit it is OOMKilled and restarted. Hence: be cautious with CPU limits (throttling hurts latency); set memory limit equal to request for predictability.

  6. What does a PodDisruptionBudget protect against, and what does it not? It limits voluntary disruptions (drains, autoscaler scale-down, node-pool upgrades) so too many replicas are not evicted at once. It does not protect against involuntary disruptions (node/hardware failure) — spread (topology/anti-affinity) handles those. And it only works with >1 replica.

  7. How do you achieve a zero-downtime rolling update? Run multiple replicas, set maxUnavailable: 0 and maxSurge: 1 (a new Ready pod before removing an old one), back it with an honest readiness probe, and implement graceful shutdown (preStop sleep + SIGTERM handling + adequate terminationGracePeriodSeconds).

  8. Why might requests still reach a pod after it enters Terminating? Endpoint removal propagates asynchronously through kube-proxy and ingress controllers, so for a short window a terminating pod can still be a target. Mitigate with a preStop sleep that delays SIGTERM until the removal has propagated, plus in-app connection draining.

  9. Prefer topology spread constraints or pod anti-affinity, and why? Topology spread constraints for new work — they express “spread evenly across this domain” directly with maxSkew, are cheaper for the scheduler, and support soft/hard via whenUnsatisfiable. Anti-affinity is the older, more expensive mechanism for keeping pods apart.

  10. How does the HorizontalPodAutoscaler use resource requests? CPU utilisation is computed as a percentage of the pod’s CPU request, so without a request the HPA has no denominator and reports <unknown>, refusing to scale. This is a key reason requests are mandatory. The HPA also needs metrics-server.

  11. What replaced PodSecurityPolicy, and how do you enforce hardening cluster-side? Pod Security Admission (PSA), applied per namespace via labels (pod-security.kubernetes.io/enforce: restricted, with warn/audit for migration). It enforces the Pod Security Standards (privileged / baseline / restricted); restricted requires non-root, dropped capabilities, seccomp RuntimeDefault, no privilege escalation, etc.

  12. What is the default pod-to-pod network behaviour, and how do you secure it? By default every pod can reach every other pod. Apply a default-deny NetworkPolicy (select all pods, deny ingress/egress), then add narrow allow-rules per required flow — remembering to allow DNS egress to kube-dns on port 53. Enforcement depends on a CNI that supports NetworkPolicy.

Quick check

  1. Which probe controls whether a pod appears in a Service’s endpoint list?
  2. A pod has CPU/memory requests equal to its limits. What QoS class is it, and where does it sit in eviction order?
  3. You set minAvailable: 1 on a single-replica Deployment and then run kubectl drain. What happens?
  4. What two rolling-update fields give you “never drop below desired capacity,” and what value does each take?
  5. Name the three minimum observability practices for a production workload.

Answers

  1. The readiness probe — on failure the pod is removed from Service endpoints (it is not restarted).
  2. Guaranteed, and it is evicted last under node memory pressure (most protected).
  3. The drain blocks indefinitely — evicting the only replica would breach minAvailable: 1, so the node cannot be drained. Run at least two replicas.
  4. maxUnavailable: 0 (no pod may be unavailable) and maxSurge: 1 (one extra Ready pod is created before an old one is removed).
  5. Log to stdout/stderr, emit structured (JSON) logs, and expose application metrics (Rate/Errors/Duration) for SLOs and autoscaling.

Exercise

Take an unhardened Deployment of your choice (or the bare nginx from the lab) and bring it to production-readiness against the checklist, proving each control:

  1. Add liveness, readiness and startup probes pointing at real endpoints; demonstrate that failing readiness drops the pod from Service endpoints (kubectl get endpoints) without a restart.
  2. Set requests and limits to land the pod in Guaranteed QoS; verify with kubectl get pod -o jsonpath='{.status.qosClass}'.
  3. Scale to three replicas, add a PDB (maxUnavailable: 1) and topology spread across nodes; drain a node and show eviction proceeds one pod at a time.
  4. Configure maxUnavailable: 0/maxSurge: 1, add a preStop sleep and a sensible grace period, and show a rollout that keeps a curl loop returning 200 throughout.
  5. Move all config to a ConfigMap and any secret to a Secret; apply the full restricted securityContext and confirm the pod is admitted into a restricted namespace.
  6. Add a default-deny NetworkPolicy plus the minimum allow-rules (including DNS egress) and confirm the app still works.

Write a short READINESS.md recording which checklist items you completed and the command that proves each one — exactly what a reviewer would ask for.

Certification mapping

Exam Where this lesson maps
CKAD Application Design and Build (probes, multi-container patterns, config), Application Deployment (rolling updates, deployment strategies), Application Observability and Maintenance (probes, logging, monitoring), Services & Networking (NetworkPolicy) — this is core CKAD territory
CKA Workloads & Scheduling (deployments, rolling updates, resource limits, PDBs, topology), Services & Networking (NetworkPolicy), Troubleshooting (probe and resource failures)
CKS Minimize Microservice Vulnerabilities (securityContext, Pod Security Standards), System Hardening and Cluster Hardening (NetworkPolicy default-deny, least privilege)
KCNA Conceptual coverage of probes, resources, scaling and observability for the entry-level exam

Glossary

Next steps

You can now take any workload from “it runs” to “it is production-ready.” Next, learn to build and operate the cluster itself — HA control planes, etcd backup and safe upgrades — in Provisioning Production Kubernetes: kubeadm, HA Control Plane, etcd Backup & Upgrades. To go deeper on individual controls, see Kubernetes Autoscaling: HPA, KEDA & Karpenter, Right-Sizing with the Vertical Pod Autoscaler, Default-Deny Network Policies with Cilium, and Pod Security Admission: Baseline to Restricted.

KubernetesProductionDay-2 OperationsReliabilitySecurityContextObservability
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading