Capstone: Ship a Production-Grade App on Kubernetes (GitOps + Autoscaling + Observability)

You have built images, learned the control plane, met Pods and Deployments and Services, and run kubectl apply against a local cluster. This is where it all comes together. The capstone is one project: take a small but realistic multi-service application and ship it to a production-shaped cluster — not a toy, but a cluster wearing the same clothes a real one wears. Namespaces with quotas. A front door (Ingress or Gateway API). Config and secrets kept out of images. Autoscaling that reacts to load. A default-deny network. Policy guardrails that reject sloppy manifests. Git as the source of truth, reconciled by Argo CD. And enough observability that you can answer “is it healthy?” without guessing.

You will do all of this free and local on a kind cluster. The point is not to operate a cluster forever — it is to prove, end to end, that you can take an app from a Dockerfile to a self-healing, autoscaling, policy-governed, GitOps-managed deployment, and to leave you with a project you can talk about in an interview and adapt to a real managed cluster (EKS/AKS/GKE) on day one.

Learning objectives

By the end of this capstone you can:

Decompose a multi-service app into a sound Kubernetes design — namespaces, Deployments/Services, a front door, config/secrets, autoscaling, network policy, and policy guardrails.
Author and apply production-grade YAML/Helm: health probes, resource requests/limits, an HPA, a default-deny NetworkPolicy, and a Kyverno policy.
Stand up GitOps with Argo CD so the cluster reconciles from Git, and drive it with argocd and kubectl.
Add observability (metrics + a dashboard) and use it, plus probes and events, to judge health.
Verify your work against explicit acceptance criteria, score yourself on a rubric, and produce a presentable capstone deliverable.
Map every piece to the deeper KloudVin articles and to CKA/CKAD exam objectives so you know where to go next.

Prerequisites & where this fits

This is the final lesson of the Kubernetes Zero-to-Hero course and assumes you’ve done the four fundamentals lessons (or have equivalent hands-on time): containers and Docker, Kubernetes architecture, Pods, Deployments and Services, and kubectl and your first cluster. You need Docker Desktop (or Podman) and kubectl installed; we install kind, helm, and argocd in the lab. Everything runs on your laptop. Each production concept here has a dedicated deep-dive article — this lesson is the integration: it shows how the pieces fit, then points you at the article that goes ten levels deeper on each one.

The brief: what you’re shipping

The application is deliberately small but multi-service, because single-Deployment demos hide every interesting problem (service discovery, network policy, per-service scaling). Three services:

Service	Role	Talks to	Scales on
`web`	Stateless HTTP frontend / API	`api`	request load (CPU)
`api`	Stateless backend, business logic	`cache`	CPU + custom metric
`cache`	In-memory store (Redis-style)	—	fixed (1–2 replicas)

Traffic enters through a single front door and is routed to web. web calls api over a ClusterIP Service; api calls cache. Config (feature flags, the cache address) comes from a ConfigMap; the cache password comes from a Secret. The whole thing lives in an application namespace with a resource quota, behind a default-deny network where only the intended flows are allowed, and every manifest must pass Kyverno admission checks before it lands. Git holds the desired state; Argo CD makes the cluster match Git.

You can use any three small images you like (a static-content image for web, a tiny HTTP echo for api, redis:7-alpine for cache). The patterns are what matter and what transfer to real workloads.

The design

Here is the target architecture. Read it as the contract you’re building toward.

Production-shaped Kubernetes capstone: Ingress/Gateway front door routing to web, api and cache Deployments in an app namespace with HPA, default-deny NetworkPolicy and a ConfigMap/Secret, governed by Kyverno admission and reconciled from Git by Argo CD, with Prometheus and Grafana observing the cluster

The diagram shows the request path (front door → web → api → cache) sitting inside an application namespace, wrapped by the cross-cutting concerns: a default-deny NetworkPolicy, an HPA on the scalable Deployments, ConfigMap/Secret feeding config, Kyverno gating admission, Argo CD pulling from Git, and Prometheus/Grafana observing. Each labelled piece below is a design decision with a reason.

Namespaces. Put the app in its own namespace (shop) and the platform tooling in theirs (argocd, kyverno, monitoring). Namespaces are the unit of quota, default network policy, and RBAC. Attach a ResourceQuota and a LimitRange to shop so a runaway Deployment can’t starve the node — this is also what makes resource requests mandatory in practice.

Deployments and Services. Each service is a Deployment (stateless, declarative replica count, rolling updates, free rollback) fronted by a ClusterIP Service for stable in-cluster discovery. Only web is reachable from outside, and even then only through the front door — api and cache stay ClusterIP-internal. Every Pod gets liveness, readiness, and startup probes and resource requests + limits; without requests the scheduler is guessing and the HPA has no denominator. (Refresher: Pods, ReplicaSets, Deployments & Services.)

The front door: Ingress or Gateway API. You need one HTTP entry point. The classic choice is an Ingress + an ingress controller (ingress-nginx). The modern, role-oriented successor is the Gateway API: a Gateway (owned by the platform team) plus HTTPRoutes (owned by app teams), which cleanly separates “who runs the load balancer” from “who routes my paths” and supports traffic splitting natively. We use Ingress in the lab for minimum moving parts, and show the Gateway API equivalent so you’ve seen both. Deep dive: Gateway API: HTTPRoute, traffic splitting & migration.

ConfigMaps & Secrets. Non-secret config (cache host, feature flags) goes in a ConfigMap; the cache password goes in a Secret. Both are mounted or injected as env vars, so the same image runs in any environment with different config. Secrets are base64, not encrypted at rest by default — in production you’d layer Sealed Secrets / External Secrets so the secret in Git is encrypted, never plaintext.

Autoscaling (HPA). The web and api Deployments get a HorizontalPodAutoscaler that adds/removes replicas to hold a target CPU utilisation. This is the first of the three autoscaling layers (pods → custom/event metrics → nodes); the capstone uses pod-level CPU HPA, and the deep dive covers KEDA event-driven scaling and node autoscaling. Deep dive: Kubernetes autoscaling: HPA, KEDA & Karpenter.

NetworkPolicy: default-deny. Out of the box every Pod can reach every other Pod and the cloud metadata endpoint — one compromised container away from lateral movement. We flip that: a default-deny policy in shop, then explicit allows for the only intended flows (web→api, api→cache, everything→DNS, web from the ingress controller). Deep dive: Default-deny NetworkPolicies & Cilium L7 rules.

Policy guardrails: Kyverno. “Images from our registry, no :latest, every Pod has resource limits, every namespace gets a default-deny” should be enforced at admission, not in a wiki. Kyverno policies are plain YAML evaluated on every kubectl apply and every Argo sync — they reject non-compliant manifests before they persist. Deep dive: Kyverno policies: image signing, limits & Pod Security and Policy-as-code with Kyverno.

GitOps: Argo CD. Instead of kubectl apply from your laptop, you commit manifests to Git and Argo CD continuously reconciles the cluster toward that desired state — detecting drift, showing sync status, and (optionally) auto-healing. We use a single Application here; the deep dive scales it to app-of-apps, ApplicationSets, and progressive delivery. Deep dive: GitOps at scale with Argo CD.

Observability. You can’t operate what you can’t see. We install kube-prometheus-stack (Prometheus + Grafana) so you have cluster and Pod metrics and a dashboard — enough to watch the HPA react and confirm health. In a real cluster this extends to logs and traces.

For how this exact shape looks on a managed cloud cluster — VPC CNI, IRSA, load balancer controllers, managed node groups — read Enterprise architecture: AWS EKS microservices alongside this lab. The local design maps almost one-to-one.

The staged build plan

Build in stages and validate after each one. Each stage maps to a course lesson (for the basics) or a deep-dive article (for the production concern), so when a stage fights you, you know exactly where to read.

Stage	You add	Validate	Where to read
0	A `kind` cluster + namespaces, quota, LimitRange	`kubectl get ns`, quota shows	kubectl first cluster
1	The three Deployments + ClusterIP Services + probes + requests/limits	all Pods `Ready`, `web→api→cache` curl works	Pods/Deployments/Services
2	ConfigMap + Secret wired into `api`	env vars present in Pod	Pods/Deployments/Services
3	Front door: Ingress (and the Gateway API equivalent)	curl the hostname returns `web`	Gateway API
4	HPA on `web` + `api` (metrics-server)	load test scales replicas up, then down	Autoscaling: HPA/KEDA/Karpenter
5	Default-deny NetworkPolicy + explicit allows	blocked flow fails, allowed flow works	Default-deny NetworkPolicies
6	Kyverno + a “require requests/limits, block :latest” policy	a bad manifest is rejected	Kyverno policies
7	Argo CD reconciling the whole app from Git	edit Git → cluster converges; drift heals	GitOps with Argo CD
8	Observability (Prometheus + Grafana)	dashboard shows Pods + HPA reacting	(this lesson)

You don’t have to do all eight in one sitting. A sensible split is stages 0–4 in one session (the app + autoscaling), 5–6 in another (security guardrails), 7–8 in a third (GitOps + observability).

Representative manifests

These are condensed but real. The full set is what you commit to your capstone Git repo. Treat them as patterns to adapt, not copy-paste-and-pray.

A Deployment + Service for api, with the three probes and resource requests/limits that production demands:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: shop
  labels: { app: api, tier: backend }
spec:
  replicas: 2
  selector:
    matchLabels: { app: api }
  template:
    metadata:
      labels: { app: api, tier: backend }
    spec:
      containers:
        - name: api
          image: ghcr.io/yourorg/api:1.4.0   # pinned tag, never :latest
          ports: [{ containerPort: 8080 }]
          envFrom:
            - configMapRef: { name: api-config }
          env:
            - name: CACHE_PASSWORD
              valueFrom:
                secretKeyRef: { name: cache-auth, key: password }
          resources:
            requests: { cpu: "100m", memory: "128Mi" }
            limits:   { cpu: "500m", memory: "256Mi" }
          startupProbe:                       # gates liveness until app is up
            httpGet: { path: /healthz, port: 8080 }
            failureThreshold: 30
            periodSeconds: 2
          readinessProbe:                     # gates Service traffic
            httpGet: { path: /readyz, port: 8080 }
            periodSeconds: 5
          livenessProbe:                      # restarts a wedged container
            httpGet: { path: /healthz, port: 8080 }
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: api
  namespace: shop
spec:
  selector: { app: api }   # routes to Pods carrying app=api
  ports: [{ port: 80, targetPort: 8080 }]
  # type defaults to ClusterIP — internal only, which is what we want

The HPA for api (CPU-target; metrics-server must be running):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api
  namespace: shop
spec:
  scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: api }
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 60 }
  behavior:                       # tame flapping: scale up fast, down slow
    scaleDown:
      stabilizationWindowSeconds: 120

The front door — Ingress first, then the Gateway API equivalent so you’ve seen the modern shape:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: shop
  namespace: shop
spec:
  ingressClassName: nginx
  rules:
    - host: shop.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service: { name: web, port: { number: 80 } }
---
# Gateway API equivalent (Gateway owned by platform; HTTPRoute owned by app team)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: web
  namespace: shop
spec:
  parentRefs: [{ name: shop-gateway, namespace: gateway-system }]
  hostnames: ["shop.local"]
  rules:
    - matches: [{ path: { type: PathPrefix, value: / } }]
      backendRefs: [{ name: web, port: 80 }]

The default-deny + allow NetworkPolicies for shop (apply default-deny first, then the explicit allows — and pair egress-deny with the DNS allow so you don’t break name resolution):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny, namespace: shop }
spec:
  podSelector: {}                 # selects every Pod in the namespace
  policyTypes: [Ingress, Egress]  # ...and permits nothing
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: allow-api-from-web, namespace: shop }
spec:
  podSelector: { matchLabels: { app: api } }
  policyTypes: [Ingress]
  ingress:
    - from: [{ podSelector: { matchLabels: { app: web } } }]
      ports: [{ port: 8080 }]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: allow-dns-egress, namespace: shop }
spec:
  podSelector: {}
  policyTypes: [Egress]
  egress:
    - to:
        - namespaceSelector: { matchLabels: { kubernetes.io/metadata.name: kube-system } }
      ports:
        - { port: 53, protocol: UDP }
        - { port: 53, protocol: TCP }

A Kyverno policy that rejects Pods without resource limits and bans the :latest tag — the two rules that catch the most mistakes:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata: { name: require-limits-no-latest }
spec:
  validationFailureAction: Enforce   # reject, don't just warn
  rules:
    - name: require-resource-limits
      match: { any: [{ resources: { kinds: [Pod] } }] }
      validate:
        message: "CPU and memory limits are required."
        pattern:
          spec:
            containers:
              - resources:
                  limits: { memory: "?*", cpu: "?*" }
    - name: disallow-latest-tag
      match: { any: [{ resources: { kinds: [Pod] } }] }
      validate:
        message: "Using :latest (or no tag) is not allowed; pin a version."
        pattern:
          spec:
            containers:
              - image: "!*:latest"

The Argo CD Application that ties your Git repo to the cluster — this is the object that makes GitOps real:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: shop
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/yourorg/k8s-capstone.git
    targetRevision: main
    path: deploy/shop          # Helm chart or raw manifests live here
  destination:
    server: https://kubernetes.default.svc
    namespace: shop
  syncPolicy:
    automated: { prune: true, selfHeal: true }   # converge + heal drift
    syncOptions: [CreateNamespace=true]

If you’d rather package the app as a Helm chart (recommended once you have three services and per-environment values), keep the same objects but template them, and point Argo CD’s source.path at the chart. The mechanics of writing that chart well — values.schema.json, helper templates, library charts — are in Authoring production-grade Helm charts.

Acceptance criteria

You’re done when all of these are true. This is also your demo script.

kubectl get pods -n shop shows web, api, cache all Running and Ready (probes green).
curl -H 'Host: shop.local' http://localhost:8080/ reaches web, which reaches api, which reaches cache (the full chain works through the front door).
api reads at least one value from the ConfigMap and the Secret (verify in the Pod’s env).
Under load the HPA scales api/web up past minReplicas, then back down after load stops.
The default-deny holds: a disallowed flow (e.g. a debug Pod → api) is blocked, while the allowed web→api flow works.
Applying a deliberately bad manifest (no limits, or :latest) is rejected by Kyverno with a clear message.
Argo CD shows the app Synced and Healthy; editing Git triggers a sync, and deleting a live object gets self-healed back.
Grafana shows Pod metrics for shop and you can watch replica count change during the load test.

Self-assessment rubric

Score each row 0–2 (0 = not done, 1 = works but rough, 2 = production-shaped). 12+/16 = capstone passed.

Criterion	0	1	2
Workloads	Pods crash/no probes	Run, basic probes	All 3 probes + requests/limits, rolling update verified
Networking	NodePort/hacks	Service chain works	ClusterIP discovery + working Ingress/Gateway
Config/Secrets	Hardcoded in image	ConfigMap used	ConfigMap and Secret, image is env-agnostic
Autoscaling	None	HPA exists	HPA scales up and down under a real load test
NetworkPolicy	Flat network	Default-deny only	Default-deny + least-privilege allows, drop proven
Policy	None	Kyverno audits	Kyverno enforces, bad manifest rejected
GitOps	`kubectl apply` only	Argo syncs once	Auto-sync + prune + self-heal, drift heals
Observability	None	Metrics installed	Dashboard used to prove health + scaling

Hands-on lab

We’ll build the spine of the capstone on a free local kind cluster: cluster + namespace, the service chain, an HPA that visibly scales, a default-deny network, a Kyverno guardrail, and Argo CD reconciling from Git. This is the path that demonstrates every concept with the fewest moving parts; the remaining stages (Gateway API, full Helm chart, Grafana dashboards) are yours to add as the exercise.

0. Tools and a cluster with an ingress-ready node

# Install the CLIs (macOS/Homebrew shown; Linux: use the official install scripts)
brew install kind kubectl helm
# Argo CD CLI
brew install argocd

Create a kind cluster whose control-plane node can receive ingress traffic on localhost:

cat > kind-capstone.yaml <<'EOF'
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "ingress-ready=true"
    extraPortMappings:
      - { containerPort: 80, hostPort: 8080, protocol: TCP }
EOF

kind create cluster --name capstone --config kind-capstone.yaml
kubectl cluster-info --context kind-capstone
kubectl get nodes

Expected: one node, Ready.

1. Namespace, quota, and the metrics-server (for the HPA)

kubectl create namespace shop

# A quota so a runaway Deployment can't eat the node
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata: { name: shop-quota, namespace: shop }
spec:
  hard: { requests.cpu: "2", requests.memory: 2Gi, limits.cpu: "4", limits.memory: 4Gi, pods: "30" }
EOF

# metrics-server provides the CPU metric the HPA reads.
# --kubelet-insecure-tls is REQUIRED on kind (self-signed kubelet certs) — local only.
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/ >/dev/null
helm repo update >/dev/null
helm upgrade --install metrics-server metrics-server/metrics-server \
  -n kube-system --set 'args={--kubelet-insecure-tls}'

kubectl -n kube-system rollout status deploy/metrics-server
kubectl top nodes      # proves metrics are flowing (give it ~30s)

2. The service chain: cache, api, web

We use redis:7-alpine for cache and a tiny echo image for web/api so the lab needs no custom build. Each gets requests/limits (Kyverno will demand them later) and probes.

cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata: { name: cache, namespace: shop, labels: { app: cache } }
spec:
  replicas: 1
  selector: { matchLabels: { app: cache } }
  template:
    metadata: { labels: { app: cache } }
    spec:
      containers:
        - name: redis
          image: redis:7-alpine
          ports: [{ containerPort: 6379 }]
          readinessProbe: { tcpSocket: { port: 6379 }, periodSeconds: 5 }
          livenessProbe:  { tcpSocket: { port: 6379 }, periodSeconds: 10 }
          resources: { requests: { cpu: 50m, memory: 64Mi }, limits: { cpu: 200m, memory: 128Mi } }
---
apiVersion: v1
kind: Service
metadata: { name: cache, namespace: shop }
spec: { selector: { app: cache }, ports: [{ port: 6379, targetPort: 6379 }] }
---
apiVersion: apps/v1
kind: Deployment
metadata: { name: api, namespace: shop, labels: { app: api } }
spec:
  replicas: 2
  selector: { matchLabels: { app: api } }
  template:
    metadata: { labels: { app: api } }
    spec:
      containers:
        - name: api
          image: hashicorp/http-echo:1.0
          args: ["-text=api-ok", "-listen=:5678"]
          ports: [{ containerPort: 5678 }]
          readinessProbe: { httpGet: { path: /, port: 5678 }, periodSeconds: 5 }
          livenessProbe:  { httpGet: { path: /, port: 5678 }, periodSeconds: 10 }
          resources: { requests: { cpu: 50m, memory: 32Mi }, limits: { cpu: 250m, memory: 64Mi } }
---
apiVersion: v1
kind: Service
metadata: { name: api, namespace: shop }
spec: { selector: { app: api }, ports: [{ port: 80, targetPort: 5678 }] }
---
apiVersion: apps/v1
kind: Deployment
metadata: { name: web, namespace: shop, labels: { app: web } }
spec:
  replicas: 2
  selector: { matchLabels: { app: web } }
  template:
    metadata: { labels: { app: web } }
    spec:
      containers:
        - name: web
          image: hashicorp/http-echo:1.0
          args: ["-text=web-ok", "-listen=:5678"]
          ports: [{ containerPort: 5678 }]
          readinessProbe: { httpGet: { path: /, port: 5678 }, periodSeconds: 5 }
          livenessProbe:  { httpGet: { path: /, port: 5678 }, periodSeconds: 10 }
          resources: { requests: { cpu: 50m, memory: 32Mi }, limits: { cpu: 250m, memory: 64Mi } }
---
apiVersion: v1
kind: Service
metadata: { name: web, namespace: shop }
spec: { selector: { app: web }, ports: [{ port: 80, targetPort: 5678 }] }
EOF

kubectl -n shop rollout status deploy/api
kubectl get pods,svc -n shop

Expected: five Pods Running/Ready (1 cache, 2 api, 2 web), three Services. Prove in-cluster discovery works:

kubectl -n shop run probe --rm -it --image=busybox:1.36 --restart=Never -- \
  wget -qO- http://api.shop.svc.cluster.local
# -> api-ok

3. The front door (ingress-nginx + Ingress)

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
kubectl -n ingress-nginx rollout status deploy/ingress-nginx-controller --timeout=120s

cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata: { name: shop, namespace: shop }
spec:
  ingressClassName: nginx
  rules:
    - host: shop.local
      http:
        paths:
          - { path: /, pathType: Prefix, backend: { service: { name: web, port: { number: 80 } } } }
EOF

# kind maps hostPort 8080 -> node :80 -> ingress controller
curl -s -H 'Host: shop.local' http://localhost:8080/
# -> web-ok

That web-ok through localhost:8080 with the shop.local Host header is acceptance criterion #2: the front door routes to web.

4. Autoscaling that you can watch react

Add an HPA to web, then hammer it and watch replicas climb.

cat <<'EOF' | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: web, namespace: shop }
spec:
  scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: web }
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource: { name: cpu, target: { type: Utilization, averageUtilization: 50 } }
EOF

# Watch in one terminal:
kubectl get hpa web -n shop -w

In a second terminal, generate load from inside the cluster:

kubectl -n shop run load --image=busybox:1.36 --restart=Never -- \
  /bin/sh -c "while true; do wget -q -O- http://web.shop.svc.cluster.local >/dev/null; done"

Within a minute or two the HPA’s TARGETS column rises above 50% and REPLICAS climbs toward 10. Stop the load and the replicas fall back to 2 after the stabilization window — that up-and-down behaviour is acceptance criterion #4.

kubectl -n shop delete pod load

5. Default-deny network, then prove a drop

Important: kind’s default CNI (kindnet) does not enforce NetworkPolicy. To make this stage real, create the cluster with Calico (or install Cilium). The quickest path is to recreate kind with the default CNI disabled and apply Calico — or, if you want to keep this cluster, treat this stage as “author + apply” and verify enforcement on a Calico/Cilium cluster. Enforcement detail and the Cilium L7 model are in the default-deny NetworkPolicies deep dive.

# Default-deny everything in shop, then allow only DNS + web->api
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny, namespace: shop }
spec: { podSelector: {}, policyTypes: [Ingress, Egress] }
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: allow-dns, namespace: shop }
spec:
  podSelector: {}
  policyTypes: [Egress]
  egress:
    - to: [{ namespaceSelector: { matchLabels: { kubernetes.io/metadata.name: kube-system } } }]
      ports: [{ port: 53, protocol: UDP }, { port: 53, protocol: TCP }]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: allow-web-to-api, namespace: shop }
spec:
  podSelector: { matchLabels: { app: api } }
  policyTypes: [Ingress]
  ingress:
    - from: [{ podSelector: { matchLabels: { app: web } } }]
      ports: [{ port: 5678 }]
EOF

On an enforcing CNI, a debug Pod (no app=web label) should be blocked from api, proving the default-deny — acceptance criterion #5:

# Expect this to TIME OUT / fail on an enforcing CNI:
kubectl -n shop run debug --rm -it --image=busybox:1.36 --restart=Never -- \
  wget -T 5 -qO- http://api.shop.svc.cluster.local || echo "BLOCKED (expected)"

6. A Kyverno guardrail that rejects a bad manifest

helm repo add kyverno https://kyverno.github.io/kyverno/ >/dev/null
helm repo update >/dev/null
helm upgrade --install kyverno kyverno/kyverno -n kyverno --create-namespace
kubectl -n kyverno rollout status deploy/kyverno-admission-controller --timeout=120s

cat <<'EOF' | kubectl apply -f -
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata: { name: disallow-latest }
spec:
  validationFailureAction: Enforce
  rules:
    - name: no-latest-tag
      match: { any: [{ resources: { kinds: [Pod] } }] }
      validate:
        message: "Using :latest is not allowed; pin a version."
        pattern: { spec: { containers: [{ image: "!*:latest" }] } }
EOF

# This SHOULD be rejected at admission:
kubectl -n shop run bad --image=nginx:latest --restart=Never
# Error from server: ... policy disallow-latest/no-latest-tag fail: Using :latest is not allowed...

That rejection — the manifest never reaches etcd — is acceptance criterion #6. The full guardrail set (require limits, registry allow-lists, cosign image verification) is in the Kyverno deep dive.

7. GitOps: Argo CD reconciles the app from Git

Install Argo CD, log in, and point an Application at a Git repo containing your manifests (commit the YAML from stage 2 into deploy/shop/ in your own repo first).

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
kubectl -n argocd rollout status deploy/argocd-server --timeout=180s

# Initial admin password + login via port-forward
kubectl -n argocd port-forward svc/argocd-server 8081:443 >/tmp/argo-pf.log 2>&1 &
PW=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath='{.data.password}' | base64 -d)
argocd login localhost:8081 --username admin --password "$PW" --insecure

# Register the app (replace repoURL/path with YOUR repo)
argocd app create shop \
  --repo https://github.com/yourorg/k8s-capstone.git \
  --path deploy/shop --revision main \
  --dest-server https://kubernetes.default.svc --dest-namespace shop \
  --sync-policy automated --auto-prune --self-heal --sync-option CreateNamespace=true

argocd app get shop          # STATUS should be Synced / Healthy

Now prove GitOps both ways — acceptance criterion #7:

# Drift heals: delete a live object, Argo recreates it
kubectl -n shop delete deploy web
argocd app wait shop --health    # Argo self-heals it back
kubectl -n shop get deploy web   # it's back

# Change in Git -> cluster converges: bump web replicas in your repo, commit, push, then:
argocd app sync shop

Validation (run the acceptance checklist)

kubectl get pods -n shop                                   # all Ready (#1)
curl -s -H 'Host: shop.local' http://localhost:8080/       # web-ok (#2)
kubectl get hpa -n shop                                     # HPA present (#4)
kubectl get netpol -n shop                                  # default-deny + allows (#5)
kubectl get clusterpolicy                                   # Kyverno enforcing (#6)
argocd app get shop                                         # Synced/Healthy (#7)

Cleanup

# Stop the Argo port-forward, then delete the whole cluster in one shot
kill %1 2>/dev/null || true
kind delete cluster --name capstone
rm -f kind-capstone.yaml

Cost note

Free / local. Everything runs in Docker on your laptop on a kind cluster — no cloud account, no managed control-plane fee, no load balancer charges. kind, kubectl, helm, metrics-server, ingress-nginx, Kyverno, and Argo CD are all open source. kind delete cluster reclaims every resource; the only lasting footprint is the container images cached by Docker (a docker image prune clears those).

Common mistakes & troubleshooting

Symptom	Likely cause	Fix
HPA shows `TARGETS: <unknown>/50%`	No `metrics-server`, or Pods missing CPU requests	Install metrics-server (with `--kubelet-insecure-tls` on kind); ensure every container has `resources.requests.cpu`. HPA % is computed against requests.
`curl` to the Ingress hangs / 404	Wrong `Host` header, controller not ready, or no `ingress-ready` node label	Send `-H 'Host: shop.local'`; wait for the controller rollout; confirm the kind config set `ingress-ready=true` + the `extraPortMappings`.
Service has no endpoints (`kubectl get endpoints`)	Service `selector` doesn’t match Pod labels	Make the selector exactly match the Pod template labels; Services route by label, not by name.
NetworkPolicy seems ignored	kind’s default CNI doesn’t enforce policy	Use Calico/Cilium; with the stock CNI you can author policy but not enforce it.
Default-deny breaks everything (DNS fails)	Egress deny applied without a DNS allow	Always pair `default-deny` egress with an allow to `kube-system`/CoreDNS on port 53 (UDP+TCP).
Argo app stuck `OutOfSync`/`Progressing`	A Pod isn’t Ready, or a synced object is rejected by Kyverno	`argocd app get shop` and `kubectl describe` the failing object; fix the manifest so it passes admission, then re-sync.
Pod `CrashLoopBackOff`	Bad command/args, or liveness probe failing before app is ready	`kubectl logs` + `kubectl describe pod`; add/loosen a startupProbe so liveness doesn’t kill a slow starter.

Best practices

Pin image tags, never :latest. :latest makes rollouts non-deterministic and rollback meaningless. Pin a version (or digest) and let Kyverno enforce it.
Always set requests and limits. Requests drive scheduling and HPA math; limits cap blast radius. A LimitRange gives sane defaults so no Pod slips through bare.
Probe all three. Readiness gates traffic, liveness restarts wedged containers, startup protects slow starters from premature liveness kills.
Default-deny, then allow. Start from zero connectivity and add named flows; never run a flat pod network in anything resembling production.
Let Git be the source of truth. Once Argo CD is in place, stop kubectl apply-ing changes — edit Git and let reconciliation converge, so the cluster and the repo never diverge.
Enforce policy at admission, not in review. A Kyverno Enforce rule catches the mistake the code reviewer missed, on every apply and every sync.
Validate per stage. Don’t stack stage 5 on a broken stage 2. Each rollout status / curl / argocd app get is a checkpoint.

Security notes

Secrets aren’t encrypted by default. A Secret is base64, readable by anyone with get secret. Lock it down with RBAC, and for GitOps use Sealed Secrets / External Secrets so the value committed to Git is encrypted, never plaintext.
Block the metadata endpoint. Your default-deny egress should keep workloads away from 169.254.169.254 so a compromised Pod can’t pull node credentials (SSRF). On a managed cluster this is how you stop credential theft.
Apply Pod Security / restricted defaults. Drop capabilities, run as non-root, readOnlyRootFilesystem where possible. Kyverno (or Pod Security Admission) can enforce the restricted profile cluster-wide. See Kyverno policies & Pod Security.
Least-privilege RBAC. Argo CD and your CI hold powerful credentials — scope them to the namespaces and verbs they actually need, not cluster-admin.
Verify image provenance. In a hardened pipeline, Kyverno’s verifyImages checks a cosign signature at admission so only signed images from your registry run.

Quick check

Why does an HPA need resource requests on the Pods it targets?
You apply a default-deny egress policy and suddenly every Pod fails to resolve DNS. What did you forget?
What is the difference between an Ingress and the Gateway API’s Gateway + HTTPRoute split, in terms of ownership?
With Argo CD selfHeal: true, what happens if someone runs kubectl delete deploy web directly on the cluster?
A teammate’s manifest uses image: app:latest. Where in the capstone is this caught, and before or after it’s stored in etcd?

Answers

The HPA computes utilisation as actual CPU ÷ requested CPU. With no request there’s no denominator, so the target metric reads <unknown> and the HPA can’t scale. Requests are also what the scheduler uses to place Pods.
The DNS allow. default-deny with policyTypes: [Egress] blocks all egress, including the Pod’s lookups to CoreDNS. Pair it with an egress allow to kube-system/CoreDNS on port 53 (UDP and TCP).
An Ingress is one object that mixes “run the load balancer” and “route my paths,” typically owned by whoever installed the controller. The Gateway API splits them: the Gateway (listeners, the actual LB) is owned by the platform team, while HTTPRoutes (path/host routing) are owned by app teams and reference the Gateway — cleaner multi-team boundaries and native traffic splitting.
Argo CD detects the drift (live state no longer matches Git) and recreates the Deployment to match the desired state in the repo. Self-heal makes the cluster converge back automatically.
At admission, by the Kyverno disallow-latest policy — the API server calls Kyverno’s webhook before persisting the object, so a rejected manifest never reaches etcd.

Exercise

Extend the capstone with three additions, committing each to your Git repo so Argo CD applies it:

Gateway API front door. Install a Gateway API implementation, replace the Ingress with a Gateway + HTTPRoute, and add a second HTTPRoute rule that splits 10% of traffic to a web-v2 Deployment (a canary). Use Gateway API: traffic splitting.
Package as Helm + a values schema. Convert the raw manifests into a Helm chart with a values.yaml (replica counts, image tags, the cache host) and a values.schema.json that fails fast on bad input. Point Argo CD at the chart. Follow Authoring production-grade Helm charts.
Add a second guardrail + a dashboard. Add a Kyverno rule requiring resource limits (not just banning :latest), prove it rejects a bad Pod, then install kube-prometheus-stack and open a Grafana dashboard showing shop Pod CPU and the HPA replica count during a load test.

Write up a one-page “capstone deliverable”: the repo link, a screenshot of argocd app get shop Synced/Healthy, your acceptance-criteria checklist ticked off, and your rubric score. That artefact is what you show in an interview.

Interview questions

Q: Walk me through what happens, end to end, when you push a manifest change to the Git repo Argo CD watches. Argo CD polls (or is webhook-notified of) the repo, renders the desired manifests, and diffs them against live cluster state. On a difference it syncs — applying the changed objects through the API server, which runs them past admission (Kyverno) before persisting to etcd; controllers then reconcile actual state (scheduler places new Pods, kubelet starts them). With selfHeal, any out-of-band drift is corrected back to Git too. Net effect: Git is the single source of truth and the cluster continuously converges to it.

Q: How does an HPA decide how many replicas to run, and what are its failure modes? For a CPU target it computes desiredReplicas = ceil(currentReplicas × currentUtilisation / targetUtilisation), clamped to min/max. Failure modes: no metrics-server (metric <unknown>), no resource requests (no denominator), and flapping when load is spiky — tamed with behavior.scaleDown.stabilizationWindowSeconds. It scales pods only; node capacity is a separate layer (Cluster Autoscaler/Karpenter).

Q: A Service has no endpoints and traffic fails. How do you debug it? kubectl get endpoints <svc> — empty means the Service’s label selector matches no Ready Pods. Check the selector against the Pod template labels (Services route by label, not name) and check the Pods are actually Ready (a failing readiness probe pulls a Pod out of endpoints). Then kubectl describe svc and kubectl get pods --show-labels.

Q: Why “default-deny then allow,” and what’s the one rule people always forget? Default-allow means one compromised Pod can reach everything — lateral movement. Default-deny inverts the posture: nothing is permitted until you write an explicit allow, so the network encodes intent. The forgotten rule is DNS egress — deny-all egress silently breaks CoreDNS lookups, so you must allow port 53 to kube-system.

Q: How is Kyverno different from “just review the YAML in the PR”? PR review is best-effort and human; Kyverno runs in the admission path on every apply and every Argo sync, so it’s deterministic and unbypassable. It can validate (reject), mutate (add defaults like dropped capabilities), and generate (e.g. a default-deny NetworkPolicy per new namespace) — guardrails code review can’t provide.

Q: Your app works on kind but you’re moving it to EKS. What changes, and what doesn’t? The workloads, Services, HPA, NetworkPolicies, Kyverno policies, and Argo CD app are largely portable. What changes is the substrate: a managed control plane, the VPC CNI for pod networking, IRSA/Pod Identity for cloud IAM, an AWS Load Balancer Controller backing your Ingress/Gateway, real LoadBalancer Services, and managed node groups or Karpenter for capacity. See Enterprise architecture: AWS EKS microservices.

Certification mapping

This capstone is a hands-on rehearsal for the practical exams.

CKAD (Certified Kubernetes Application Developer): Deployments, Services, ConfigMaps/Secrets, probes, resource requests/limits, rolling updates/rollback, and Ingress are core CKAD domains — the entire stage 1–3 build is CKAD muscle memory. Practising fast, correct manifests under time pressure is exactly the exam.
CKA (Certified Kubernetes Administrator): Cluster bootstrap, namespaces/quotas, networking (Services, NetworkPolicy, DNS), troubleshooting (no-endpoints, CrashLoop, stuck rollouts), and workload scheduling map to CKA. The troubleshooting table above mirrors CKA scenario tasks.
CKS (Certified Kubernetes Security Specialist): Default-deny NetworkPolicies, admission control with Kyverno, Pod Security/restricted, secret handling, and supply-chain (image provenance) are CKS territory — stages 5–6 plus the security notes.
KCNA (Kubernetes and Cloud Native Associate): The conceptual map (control plane, objects, GitOps, observability) underpins this whole lesson and is tested at the KCNA level.

The next lesson is the full exam roadmap: Kubernetes Interview & Certification Prep.

Glossary

Capstone — an integrative final project that exercises every concept from the course at once, end to end.
Production-shaped — a cluster/app carrying the same structural concerns as production (quotas, probes, autoscaling, network policy, policy, GitOps, observability) even when it runs locally.
HPA (HorizontalPodAutoscaler) — controller that adds/removes Pod replicas to hold a target metric (here, CPU utilisation).
NetworkPolicy — namespaced object defining allowed Pod traffic; additive allow-lists, so you restrict by selecting Pods and permitting nothing (default-deny).
Default-deny — a baseline policy that blocks all ingress/egress for a namespace, after which only explicit allows let traffic through.
Kyverno — a policy-as-code engine that validates, mutates, and generates Kubernetes resources at admission time using YAML policies.
GitOps — operating model where Git is the source of truth and a controller (Argo CD) continuously reconciles the cluster to match it.
Argo CD — a GitOps controller that syncs cluster state from a Git repo, with drift detection, sync status, and self-heal.
Ingress / Gateway API — the cluster’s HTTP front door; Ingress is the classic single object, Gateway API splits Gateway (platform) from HTTPRoute (app team).
metrics-server — lightweight aggregator that supplies CPU/memory metrics for kubectl top and the HPA.

Next steps

You shipped it. The natural next move is to turn this hands-on confidence into interview answers and a certification plan: continue to Kubernetes Interview & Certification Prep: KCNA / CKAD / CKA / CKS Roadmap.

Then go deeper on each pillar you just touched:

GitOps at Scale with Argo CD: App-of-Apps, ApplicationSets & Progressive Delivery — grow the single Application into a fleet.
Kubernetes Autoscaling in Depth: HPA, KEDA & Karpenter — add event-driven scaling and node autoscaling.
Designing Zero-Trust Pod Networking: Default-Deny & Cilium L7 Rules — enforce policy with an L7-aware CNI.
Policy-as-Code with Kyverno: image signing, limits & Pod Security — extend the guardrails to full supply-chain enforcement.
Enterprise Architecture: AWS EKS Microservices — see this exact shape on a real managed cloud cluster.