Containerization Fundamentals

Kubernetes Interview & Certification Prep: KCNA / CKAD / CKA / CKS Roadmap

You have worked through the course, deployed apps to a local cluster, broken things, and fixed them. Now you want two outcomes: pass a Kubernetes interview without freezing on the “a pod is stuck, what do you do?” question, and turn that knowledge into a certification that hiring managers actually recognise. This lesson is the bridge. It maps the four CNCF certifications to where you are, points each exam topic back to a lesson you have already done, and then drills the questions interviewers really ask — with model answers you can say out loud.

The most important thing to internalise up front: the CKAD, CKA, and CKS are hands-on, terminal exams. There are no multiple-choice questions. You are dropped into a live cluster and told to make something work, fast. So is a good technical interview, really — the strongest answers describe a procedure, not a definition. We will practise both.

Learning objectives

By the end of this lesson you will be able to:

Prerequisites & where this fits

This is the final lesson of the Kubernetes Zero-to-Hero course. It assumes you have done the fundamentals — containers and images, the control-plane/node architecture, the core objects (Pods, Deployments, Services), and the kubectl apply workflow — and ideally the capstone, where you shipped a small multi-service app with autoscaling, network policy, and GitOps. You do not need to memorise anything new here. The goal is to consolidate what you know into interview- and exam-ready form. Everything in the labs uses free, local tooling: Docker (or Podman) plus a local cluster with kind, minikube, or k3d. Nothing to pay for.

The CNCF certification ladder

The Cloud Native Computing Foundation (CNCF) and the Linux Foundation run a coherent ladder of Kubernetes certifications. Think of it as one entry-level knowledge check, two role-based practical exams, and one advanced security specialisation.

The Kubernetes certification path: KCNA entry, then CKAD/CKA, then CKS

The diagram above shows the progression and the gating: KCNA is the optional on-ramp, CKAD and CKA are the two parallel role-based exams most people target, and CKS sits on top — and crucially, you must hold a current CKA before you are allowed to sit the CKS.

Cert Full name Format Length Passing Who it’s for
KCNA Kubernetes and Cloud Native Associate Multiple choice (proctored, online) 90 min ~75% Newcomers, managers, career-switchers wanting a credible foundation
CKAD Certified Kubernetes Application Developer Hands-on, live cluster terminal 2 hours 66% Developers who deploy to and run on Kubernetes
CKA Certified Kubernetes Administrator Hands-on, live cluster terminal 2 hours 66% Operators / platform / SRE who run clusters
CKS Certified Kubernetes Security Specialist Hands-on, live cluster terminal 2 hours 67% Security-focused engineers hardening clusters (requires active CKA)

A few details that matter for planning:

Which one should you take first?

For most engineers coming out of this course, the honest answer is: CKAD or CKA — skip straight to a hands-on exam, because that is what proves you can do the job and it is what this course trained you for. KCNA is worthwhile if you want a low-stakes confidence builder, you are non-technical-but-adjacent (a manager or PM), or your employer reimburses it. Pick CKAD if you spend your day writing app manifests, Helm charts, and debugging your own workloads; pick CKA if you operate clusters — nodes, etcd, upgrades, RBAC, networking. Then, if security is your path, do CKS last.

Topic-to-lesson map

Here is the payoff for finishing the course: nearly every exam domain is something you have already practised. Use this table to find your weak spots and re-read the matching lesson before exam day.

Exam domain Certs Where you learned it in this course
Containers, images, layers, registries KCNA, CKAD Containers & Docker Basics
Control plane, nodes, etcd, kubelet, the reconciliation loop KCNA, CKA What Is Kubernetes? Architecture
Pods, ReplicaSets, Deployments, rolling updates & rollback KCNA, CKAD, CKA Pods, Deployments & Services
Services, ClusterIP/NodePort/LoadBalancer, label selectors CKAD, CKA Pods, Deployments & Services
ConfigMaps, Secrets, namespaces CKAD, CKA Pods, Deployments & Services
kubectl, kubeconfig/contexts, imperative vs declarative, logs/exec/port-forward CKAD, CKA kubectl First Steps
Health probes, resource requests/limits, autoscaling (HPA) CKAD, CKA Capstone + Autoscaling: HPA, KEDA, Karpenter
Ingress / Gateway API, traffic routing CKAD, CKA Gateway API: HTTPRoute & traffic splitting
RBAC, least privilege, ServiceAccounts CKA, CKS Least-Privilege RBAC design
NetworkPolicy, default-deny, zero-trust pod networking CKS Default-Deny NetworkPolicies & Cilium
Pod Security Admission, supply-chain, image signing, policy-as-code CKS Kyverno policy-as-code + Pod Security Admission
GitOps deployment workflow (job skill) GitOps with Argo CD

The two CKA-only gaps the course touches only lightly are cluster lifecycle (kubeadm upgrades, etcd backup/restore) and node troubleshooting (a down kubelet, full disk). Those are pure exam-prep topics — practise them directly against the official docs, because they rarely come up day-to-day on a managed cluster like AKS or EKS.

How to think in a Kubernetes interview

Before the question bank, internalise the meta-skill. Almost every Kubernetes troubleshooting question — in an interview and on the exam — yields to the same loop:

  1. Describe the object, top-down: kubectl get to see status, then kubectl describe to read the Events at the bottom. Events are where Kubernetes tells you, in plain English, why it is unhappy.
  2. Read the logs if the container actually started: kubectl logs <pod>, and kubectl logs <pod> --previous for the crashed instance.
  3. Follow the chain of ownership: Deployment → ReplicaSet → Pod → Node, and Service → Endpoints → Pod. Most “it doesn’t work” problems are a broken link in one of those chains.

Say that loop out loud in interviews. Interviewers are not grading whether you memorised a flag — they are grading whether you have a method that does not panic. “First I’d describe the pod and read the events, then check logs with --previous…” beats any amount of trivia.

Interview questions (with model answers)

These five scenarios cover the overwhelming majority of “debug this” interview rounds and CKA/CKAD troubleshooting tasks. For each: the symptom, how to diagnose, and the likely fixes.

1. “A pod keeps restarting — CrashLoopBackOff. Walk me through it.”

Model answer. CrashLoopBackOff means the container starts, exits, and Kubernetes keeps restarting it with an increasing back-off delay — so the problem is the container process, not scheduling. My procedure:

kubectl describe pod <pod>          # check Events + Last State + exit code
kubectl logs <pod> --previous       # logs from the crashed instance, not the restarting one

I read the exit code first. Exit Code 1 (or any non-zero app error) means the application itself crashed — bad config, a missing env var or Secret, a database it can’t reach, an unhandled exception on startup; the logs will say which. Exit Code 137 means it was OOM-killed (SIGKILL after exceeding its memory limit) — I’d raise the memory limit or fix the leak. Exit Code 127 means “command not found” — a bad command/args or an entrypoint that isn’t in the image. A subtle one: if a liveness probe is failing, the kubelet kills and restarts the container even though the app is fine — so I always check whether the probe’s path/port/initialDelaySeconds are realistic. Fix follows the cause: correct the config/Secret, fix the image entrypoint, bump memory, or loosen the probe.

2. “I created a Service but nothing reaches it. The Service has no endpoints.”

Model answer. “No endpoints” almost always means the Service’s selector doesn’t match any running, ready pods. The Service is just a label query; if the query returns nothing, there is nowhere to route. I check the chain:

kubectl get endpoints <svc>                 # empty (or <none>) confirms it
kubectl describe svc <svc>                   # note the Selector
kubectl get pods --show-labels               # do any pod labels match that selector?

The usual causes, in order of frequency: (1) selector/label mismatch — e.g. the Service selects app=web but the Deployment’s pod template labels them app=frontend; (2) the pods exist but are not Ready, because a readiness probe is failing — an unready pod is deliberately removed from endpoints, which is the system working as designed; (3) targetPort mismatch — the Service forwards to a container port the app isn’t actually listening on, so endpoints populate but connections still fail. Fix: align the labels (or the selector), get the readiness probe passing, and confirm targetPort equals the real containerPort.

3. “A pod is stuck in Pending and never schedules.”

Model answer. Pending means the scheduler hasn’t been able to place it on a node, so this is a scheduling problem, not an application one. The events tell you exactly why:

kubectl describe pod <pod>     # the Events line says e.g. "0/3 nodes are available: ..."

The common reasons the scheduler reports: (1) insufficient resources — “Insufficient cpu/memory”; the pod’s requests exceed what any node has free, so I’d lower the requests or add/scale nodes; (2) taints with no matching toleration — “node(s) had untolerated taint”; add the right toleration or target a different node pool; (3) node affinity / nodeSelector matches nothing — the constraint is too strict; (4) an unbound PersistentVolumeClaim — the pod is waiting on storage that can’t be provisioned (no matching PV or StorageClass); (5) all nodes cordoned/unschedulable. The fix is dictated by the message — and the headline lesson for the interviewer is that you let describe tell you, rather than guessing.

4. “A user (or a CI pipeline) gets Error from server (Forbidden). Diagnose the RBAC.”

Model answer. RBAC in Kubernetes is purely additive with no deny rules, so Forbidden simply means no RoleBinding or ClusterRoleBinding grants this subject that verb on that resource in that namespace. I don’t guess — I ask the API server with kubectl auth can-i:

kubectl auth can-i create deployments -n team-payments \
  --as system:serviceaccount:team-payments:ci-deployer

That impersonation (--as, and --as-group for groups) reproduces the exact decision. Then I locate the gap: confirm which Role/ClusterRole grants the missing apiGroup/resource/verb, and check that a binding actually ties it to this subject in the right namespace — a frequent bug is a Role and RoleBinding created in default when the workload runs in another namespace. The fix is least-privilege: add the specific verb to a scoped Role and bind it with a RoleBinding, rather than reaching for cluster-admin. (I’d reference the Least-Privilege RBAC approach here.) The trap to call out: a RoleBinding referencing a ClusterRole grants those permissions only in the binding’s namespace, which surprises people.

5. “A Deployment rollout is stuck — kubectl rollout status never completes.”

Model answer. A stuck rollout means the new ReplicaSet’s pods aren’t becoming Ready, so the Deployment won’t finish swapping the old ones out. I look at it from both ends:

kubectl rollout status deployment/<name>     # confirms it's wedged
kubectl get rs -l app=<name>                  # old vs new ReplicaSet, desired/current/ready
kubectl describe deployment <name>            # conditions: Progressing / Available
kubectl get pods -l app=<name>                # what state are the NEW pods in?

The new pods being stuck points at the real failure, which is usually one of the scenarios above: a bad image tag (ImagePullBackOff), a crash on startup (CrashLoopBackOff), a failing readiness probe (pods never go Ready so the rollout waits forever), or Pending due to resources. There’s also a quota angle: with the default RollingUpdate strategy and maxSurge, the new pods need headroom — if a ResourceQuota or node capacity blocks the surge, the rollout stalls. I diagnose the new pods, fix the underlying cause, and if I need to stop the bleeding immediately I roll back: kubectl rollout undo deployment/<name>. Mentioning kubectl rollout undo unprompted signals you’ve operated this in anger.

A sixth question shows up constantly: “imperative vs declarative — which and why?” Strong answer: declarative (kubectl apply -f against version-controlled YAML, ideally via GitOps) is the production default because it’s reproducible, reviewable, and self-documenting. Imperative kubectl create/run/expose is for speed — scaffolding, quick debugging, and especially the exam, where you generate YAML fast with --dry-run=client -o yaml and then edit it.

Exam-day tips

The CKAD/CKA/CKS are won on speed and accuracy under time pressure. The knowledge is necessary but not sufficient — these mechanics are what separate a pass from a near-miss.

Set up your aliases in the first 60 seconds. Every cluster gives you a fresh shell. Type this once and save minutes across 17 questions:

alias k=kubectl
export do="--dry-run=client -o yaml"   # "do" = dry-run output
export now="--force --grace-period=0"  # delete pods instantly
source <(kubectl completion bash)      # tab-completion
complete -o default -F __start_kubectl k

Generate, don’t type, YAML. Hand-writing manifests is slow and typo-prone. Scaffold with the imperative generators plus $do, then edit:

k run nginx --image=nginx $do > pod.yaml
k create deployment web --image=nginx --replicas=3 $do > deploy.yaml
k create svc clusterip web --tcp=80:8080 $do > svc.yaml
k create cronjob hello --image=busybox --schedule="*/1 * * * *" $do -- echo hi > cj.yaml

Use kubectl explain instead of guessing field names. It works offline, inside the exam: k explain pod.spec.containers.resources or k explain deployment.spec.strategy --recursive gives you the exact schema. This is faster than hunting through the docs tab.

Manage time ruthlessly. There are ~15–20 weighted tasks in 2 hours — roughly 6–7 minutes each, but the weights differ. Read the weight on each question. Triage: skip anything you can’t crack in ~2 minutes and flag it; bank the easy points first. A 2% question and a 13% question both cost you time — do the 13% ones. Always run kubectl config use-context <ctx> from the question prompt before you touch anything, and after a change, verify it (k get, k rollout status) — a task that “looks done” but didn’t apply earns zero.

Bookmark the docs you’ll actually open: the YAML examples for Pods, Deployments, Services, Ingress, PV/PVC, NetworkPolicy, and (for CKS) Pod Security and the Trivy/Falco pages. In the exam you copy-paste-and-edit from these constantly. Knowing where a snippet lives is a graded skill in disguise.

Common mistakes & troubleshooting

These trip people up both in interviews and on the exam itself.

Symptom / mistake Cause Fix
Edited a manifest but nothing changed Forgot to kubectl apply it, or applied in the wrong context Re-apply; run kubectl config current-context first, every time
“Why no endpoints?” panic Reading the Service in isolation Always check kubectl get endpoints + pod labels + readiness together
Burned 15 min hand-typing YAML Not using generators <cmd> --dry-run=client -o yaml > f.yaml, then edit
logs shows nothing useful on a crash Reading the current (restarting) container Use kubectl logs <pod> --previous
Widened RBAC to cluster-admin to “make it work” Treating Forbidden as a blocker, not a diagnosis kubectl auth can-i ... --as ..., then grant the specific verb
Rollout “stuck” but you only looked at the Deployment The new pods are the real problem kubectl get pods -l app=<name> and diagnose those

Best practices

Security notes

The CKS deserves a specific mention because its mindset differs from CKA/CKAD: it assumes the cluster is already compromised and asks how you’d limit the blast radius. The recurring themes are least-privilege RBAC (no standing cluster-admin, scoped Roles, auth can-i audits), default-deny NetworkPolicies so a popped pod can’t talk laterally, Pod Security Admission at restricted to stop privileged/host-mounting pods, supply-chain controls (image signing, scanning with Trivy, admission policy with Kyverno/Gatekeeper), and runtime detection with Falco. If you’re heading for CKS, treat the security lessons in this course as core, not optional — start with default-deny networking and Kyverno policy-as-code.

Quick check

  1. Which Kubernetes certifications are hands-on terminal exams, and which is multiple choice?
  2. Your Service has no endpoints. What is the single most likely cause, and what one command confirms it?
  3. A pod is Pending. Which single kubectl command tells you why, and where in its output do you look?
  4. What does --dry-run=client -o yaml do, and why is it the most important flag on exam day?
  5. RBAC has no deny rules. Given that, what does an Error: Forbidden actually mean, and how do you reproduce the decision for a specific ServiceAccount?

Answers

  1. CKAD, CKA, and CKS are hands-on, live-cluster terminal exams. KCNA is multiple choice. (CKS additionally requires an active CKA before you may sit it.)
  2. The selector doesn’t match any ready pods (label mismatch, or pods not Ready). Confirm with kubectl get endpoints <svc> (empty), then compare kubectl describe svc <svc> Selector against kubectl get pods --show-labels.
  3. kubectl describe pod <pod> — read the Events section at the bottom; it states the scheduling reason verbatim (e.g. “Insufficient cpu”, “untolerated taint”, unbound PVC).
  4. It renders a manifest locally without contacting the API server and prints it as YAML — so you scaffold a correct object instantly (> file.yaml) and edit, instead of hand-typing. On the exam it’s the fastest path to nearly any “create an X” task.
  5. It means no binding grants that subject the verb/resource/namespace combination — access is purely additive, so the permission was simply never granted. Reproduce it with kubectl auth can-i <verb> <resource> -n <ns> --as system:serviceaccount:<ns>:<sa>.

Exercise

Mock troubleshooting drill (timed, free, local). Recreate the five interview scenarios on a local cluster and fix each one against the clock — this is the single best exam rehearsal.

  1. Create a local cluster (free / local):

    kind create cluster --name cka-drill
    kubectl config use-context kind-cka-drill
    
  2. Break things on purpose. Apply a small manifest that contains four planted faults: a Deployment whose container uses a non-existent image tag (nginx:doesnotexistImagePullBackOff); a Service whose selector is app=web while the pod template labels are app=frontend (→ no endpoints); a pod requesting cpu: "64" (→ Pending, insufficient resources); and a Deployment with a liveness probe pointing at the wrong port (→ CrashLoopBackOff-style restarts).

  3. Diagnose each, narrating the loop: kubectl getkubectl describe (read Events) → kubectl logs --previous. Write down the root cause for each before you touch the fix. Time yourself: aim for under 6 minutes per fault.

  4. Fix and verify: correct the image tag, align the Service selector to the pod labels, lower the CPU request, fix the probe port. Verify with kubectl get endpoints, kubectl rollout status, and kubectl get pods (all Running/Ready).

  5. Self-assess against this rubric:

    Criterion Target
    Found root cause from describe/logs (not by guessing) All 4
    Fixed each via an edited manifest, then re-applied All 4
    Verified Ready/endpoints after each fix All 4
    Whole drill completed Under 25 minutes
  6. Cleanup (so you pay nothing and leave no clusters running):

    kind delete cluster --name cka-drill
    

Cost note: free / local. kind runs the whole cluster in Docker on your laptop — no cloud account, no charges.

Certification mapping

This lesson is the meta-lesson for the whole certification ladder, so the mapping is the ladder itself:

Glossary

Next steps

You’ve finished the Kubernetes Zero-to-Hero course. To keep going beyond the certifications and into production-grade, real-world Kubernetes, read these next:

KubernetesCKACKADCKSInterview
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading