Containerization Architecture

Kubernetes CRDs, Controllers & the Operator Pattern, In Depth (Fundamentals)

Kubernetes ships with about forty built-in object kinds — Pods, Deployments, Services, ConfigMaps and so on. For most of what you deploy, that vocabulary is enough. But sooner or later you hit a wall: you want to express “a three-node PostgreSQL cluster with automated failover and nightly backups” as a single object, not as a hand-assembled pile of StatefulSets, Services, Secrets and CronJobs that you then babysit by hand. The remarkable thing about Kubernetes is that you can teach it that vocabulary. You add a new kind with a CustomResourceDefinition (CRD), and you write a controller that knows how to make that kind real and keep it real. A CRD plus a purpose-built controller is exactly what the industry calls an operator.

This lesson is the conceptual foundation for that whole world. We will go deep on three load-bearing ideas — the CRD (how the API server learns a new type, with versions, schemas, validation, subresources and conversion), the reconciliation loop (the control-theory pattern every controller obeys), and the operator pattern (encoding human operational knowledge as software) — then survey operator capability levels and the build options (Kubebuilder, Operator SDK, controller-runtime) at a level that lets you choose well. By the end you will be able to read any operator’s source and CRDs and know exactly what each piece is doing, and decide when a CRD or operator is the right tool versus overkill. Writing the Go code itself is the subject of the companion build-it-yourself guide; here we build the mental model that makes that code obvious.

Learning objectives

By the end of this lesson you will be able to:

Prerequisites & where this fits

You should be comfortable with the Kubernetes object model — Pods, Deployments, Services, labels and kubectl apply — and have used kubectl get, describe and -o yaml. It helps to understand the control-plane architecture (API server, etcd, controllers) and the RBAC and ServiceAccount model, because a controller authenticates as a ServiceAccount and needs precise permissions. This lesson sits in the Architecture track of the Kubernetes Zero-to-Hero course, after the workload and admission-control lessons and before the networking internals. It is the bridge from using Kubernetes to extending it: everything from cert-manager to Prometheus Operator to the cloud providers’ database services is built on the patterns here.

Core concepts: declarative APIs and active reconciliation

Two ideas underpin everything in this lesson.

Kubernetes is a declarative API with active controllers. You do not tell Kubernetes how to create a Deployment’s Pods step by step; you apply an object describing the desired state (spec), the API server validates and persists it to etcd, and a controller running in a loop notices the object and works to make reality match. The API server itself is, deliberately, mostly a sophisticated CRUD store with validation, authentication, authorisation and admission — it stores objects and notifies watchers. The intelligence lives in controllers. This separation is the single most important architectural fact about Kubernetes, and it is exactly what makes extension possible: if you can add an object kind and add a controller, you have added a first-class feature.

Custom resources extend the data model; controllers extend the behaviour. A custom resource (CR) is an instance of a kind you defined — say, a Cache or a PostgresCluster. By itself a CR is inert data: creating one just stores YAML in etcd and gives you a typed, validated, RBAC-controlled, kubectl-native object. Nothing happens until a controller watches that kind and acts. So the two halves are orthogonal and you can use them independently:

You have… You get… Typical use
CRD only (no controller) A validated, versioned, RBAC-able object you can kubectl get/apply and watch Config you consume elsewhere; data other tools read (e.g. cert-manager’s Certificate before its controller acts)
Controller only (on built-in types) Active automation over existing kinds A controller that labels every new Namespace, or syncs Secrets
CRD + controller = operator A new declarative kind that does something Databases, message queues, certificate management, backups

Some vocabulary, because it is easy to muddle:

CustomResourceDefinitions: teaching the API server a new kind

A CRD is how you register a new API. Here is a minimal but realistic one for a fictional Cache kind; we will dissect every field.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: caches.cache.example.com          # MUST be <plural>.<group>
spec:
  group: cache.example.com                # the API group
  scope: Namespaced                        # or Cluster
  names:
    kind: Cache                            # PascalCase, used in YAML "kind:"
    plural: caches                         # lowercase, used in the REST path and name
    singular: cache
    shortNames: ["ca"]                     # kubectl get ca
    categories: ["all"]                    # kubectl get all includes it
  versions:
    - name: v1alpha1
      served: true                          # reachable over the API right now
      storage: true                         # THE version persisted to etcd (exactly one)
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              required: ["size"]
              properties:
                size:
                  type: integer
                  minimum: 1
                  maximum: 9
                  default: 3
                engine:
                  type: string
                  enum: ["redis", "valkey"]
                  default: "redis"
            status:
              type: object
              properties:
                readyReplicas:
                  type: integer
                conditions:
                  type: array
                  items:
                    type: object
                    properties:
                      type: { type: string }
                      status: { type: string }
                      reason: { type: string }
                      message: { type: string }
      subresources:
        status: {}                          # carve out /status
        scale:                              # make `kubectl scale` work
          specReplicasPath: .spec.size
          statusReplicasPath: .status.readyReplicas
      additionalPrinterColumns:
        - name: Engine
          type: string
          jsonPath: .spec.engine
        - name: Desired
          type: integer
          jsonPath: .spec.size
        - name: Ready
          type: integer
          jsonPath: .status.readyReplicas
        - name: Age
          type: date
          jsonPath: .metadata.creationTimestamp

Apply that and within a second or two the API server serves a brand-new REST endpoint at /apis/cache.example.com/v1alpha1/namespaces/<ns>/caches, kubectl knows the Cache kind, and kubectl get caches works. No restart, no recompiling the API server. That is the magic of CRDs.

Group, version, kind and scope

Every Kubernetes object is addressed by a GVKGroup, Version, Kind — and stored under a GVRGroup, Version, Resource (the resource being the lowercase plural). Choosing these well matters because they are effectively permanent once people depend on them.

Field What it is Guidance
group A DNS-style namespace for your APIs, e.g. cache.example.com. Keeps your kinds from clashing with core (""/v1) or anyone else’s. Use a domain you control. One group can hold many kinds.
versions[].name API version: v1alpha1v1beta1v1, following the Kubernetes deprecation policy. Start at v1alpha1 (no stability promise), graduate as the schema settles.
names.kind PascalCase type used in kind: in YAML. Singular noun: Cache, PostgresCluster.
names.plural Lowercase, used in the URL and as the first part of metadata.name. caches.
scope Namespaced (object lives in a namespace; the common case) or Cluster (global, like Nodes or StorageClasses). Default to Namespaced unless the concept is genuinely cluster-wide. You cannot change scope after creation.

The CRD’s own metadata.name is not free-form: it must be exactly <plural>.<group> (caches.cache.example.com). Get this wrong and the CRD is rejected.

The OpenAPI v3 schema, structural schemas and validation

The schema.openAPIV3Schema is where a CRD goes from “a bag of JSON” to “a real, validated type”. Since apiextensions.k8s.io/v1 (Kubernetes 1.16) a schema is required, and it must be structural — a well-formed-ness contract that the API server, pruning, defaulting and conversion all rely on. A structural schema means, roughly:

Why “structural” matters in practice: it is the prerequisite for pruning (unknown fields silently stripped on write, so a typo like repicas: simply vanishes rather than being stored), for defaulting, and for conversion. With x-kubernetes-preserve-unknown-fields: true you can opt a subtree out of pruning to store arbitrary JSON, but you lose validation there, so use it sparingly.

The schema doubles as your validation layer. Common building blocks:

Construct Effect Example
required: [...] Field must be present required: ["size"]
minimum/maximum, minLength/maxLength Numeric / string bounds minimum: 1
enum: [...] Closed set of allowed values enum: ["redis", "valkey"]
pattern Regex constraint on a string pattern: '^[a-z0-9-]+$'
format Semantic format hint (date-time, email, …) format: date-time
default Value injected when the field is omitted (defaulting) default: 3
x-kubernetes-validations CEL expression rules (1.25+, GA 1.29) for cross-field logic see below

The last one deserves emphasis. CEL validation rules let you express constraints that plain OpenAPI cannot — relationships between fields, immutability, list semantics — inside the CRD, evaluated by the API server with no webhook to run:

type: object
x-kubernetes-validations:
  - rule: "self.maxSize >= self.size"
    message: "maxSize must be >= size"
  - rule: "self.engine != 'redis' || self.size <= 6"
    message: "redis engine supports at most 6 nodes"
properties:
  size:    { type: integer }
  maxSize: { type: integer }
  engine:  { type: string }

You can also enforce transition rules (compare new value to old with oldSelf) for immutability — e.g. rule: "self == oldSelf" with x-kubernetes-validations on a field that must never change after creation. CEL has largely removed the need for a validating webhook for ordinary structural and cross-field checks, which is a big simplification — webhooks are an extra deployment, an extra failure mode, and latency on every write.

Defaulting happens server-side from default: values in the structural schema: omit engine and the stored object comes back with engine: redis. Defaults apply on read for previously-stored objects too, which is how you can safely add a new optional field with a default to an existing CRD.

Subresources: the spec/status split, and scale

By default a custom resource is one document, and whoever can update the object can write every field — including status. That is wrong: spec is the user’s desired state; status is the controller’s report of observed reality. They have different writers and should have different permissions. The status subresource enforces that split. When subresources.status: {} is set:

The scale subresource wires your CRD into the generic scaling machinery. By mapping specReplicasPath, statusReplicasPath (and optionally labelSelectorPath), you make kubectl scale cache/foo --replicas=5 work and, crucially, you make your CRD a valid target for a HorizontalPodAutoscaler. The HPA scales anything that implements /scale, so a CRD with a scale subresource can be autoscaled like a Deployment — a powerful, often-overlooked feature.

additionalPrinterColumns is pure quality of life but matters for adoption: it controls what kubectl get caches shows in its table (beyond NAME/AGE). Surface the fields an operator actually cares about — desired size, ready replicas, phase — and the resource feels native.

Multiple versions and conversion webhooks

APIs evolve. You might rename a field, restructure spec, or promote v1alpha1 to v1. The hard constraint is etcd: exactly one version has storage: true, and that is the form every object is persisted as. You may serve several versions simultaneously so clients on different versions all work. The question is what happens when a client reads a stored object in a different version than it was written.

There are two conversion strategies:

Strategy How it converts between versions When to use
None (default) No transformation — the same object is returned with only the apiVersion string swapped. Versions are structurally identical (e.g. you only added optional, defaulted fields).
Webhook The API server calls your conversion webhook to translate between versions on every read/write that crosses versions. Any time fields were renamed, moved, split or merged between versions.

With None, all served versions must be schema-compatible, because there is no code to reconcile differences — you are effectively just relabelling. The moment a field changes shape between v1alpha1 and v1, you need a conversion webhook: a service the API server calls, handing it a list of objects and the desired version, expecting them back converted. This makes the multi-version contract real — a v1 client and a v1alpha1 client can both read the same underlying object, each seeing it correctly in their version, with your webhook doing the translation in the middle.

A subtle but important operational point: just because every object is stored in the old storage version does not mean you can delete the old version’s code freely. To retire a stored version you must run a storage-version migration — re-write all existing objects in the new storage version (e.g. with the storage-version-migrator, or a kubectl get ... -o yaml | kubectl apply sweep) — before dropping the old version from served/storage. The CRD also tracks status.storedVersions so you know which versions still exist on disk. Versioning, conversion webhooks and storage migration in full are the subject of the aggregated API server and conversion deep dive; the model above is what you need to design a CRD that can evolve.

CRDs vs the aggregation layer (the other extension path)

CRDs are the easy, declarative way to add a kind — the API server stores and serves it for you. There is a second mechanism, the aggregation layer / extension API server, where you run your own API server binary that the kube-apiserver proxies to. You reach for aggregation when you need custom storage (not etcd), protobuf, arbitrary subresources, or special admission/validation that CRDs cannot express. For 95% of operators, a CRD is the right and far simpler choice; know the aggregation layer exists so you can recognise when you have outgrown CRDs.

The controller and the reconciliation loop

A CRD gives you a noun. A controller gives it a verb. Every Kubernetes controller — built-in or custom — runs the same loop, and internalising it is the single most valuable thing in this lesson.

Watch, diff, act — the control loop

A controller is a closed control loop, exactly like a thermostat. The thermostat has a desired temperature (the dial) and an actual temperature (the sensor); it acts (heat/cool) to close the gap, forever. A controller has:

  1. Desired state — the spec of the objects it watches (your Cache’s size: 3).
  2. Actual state — what really exists in the cluster (how many Pods are actually running).
  3. Reconcile — observe both, compute the diff, take the minimum action to close it, then report what it observed into status.
         +-----------------------------------------+
         |              Reconcile(req)             |
 watch   |  1. GET the object (desired state)      |
 event ->|  2. LIST/GET owned resources (actual)   |--> create/update/delete
         |  3. diff desired vs actual              |    children to converge
         |  4. act to converge                     |
         |  5. write observed state -> .status     |
         +-----------------------------------------+
                          ^        |
                          |        v  (requeue: after error, or on a timer)
                          +--------+

The reconcile function is deliberately given almost no input — typically just the namespace/name of the object that needs attention (a reconcile.Request). It is not told what changed or why. That design choice is the heart of the next idea.

Level-triggered, not edge-triggered

This is the concept interviewers probe and beginners get wrong. Edge-triggered thinking is “on create, do X; on update, do Y; on delete, do Z” — reacting to transitions (the edges). It is a trap in distributed systems: events get coalesced (two quick edits may surface as one), dropped (the controller was down when it happened), delivered out of order, or replayed (informer resyncs). If your logic depends on seeing every transition exactly once, it will eventually corrupt state.

Level-triggered logic ignores the transition entirely and reacts to the current level — the present desired and actual state. The reconcile function fetches the object fresh every time and asks “given how the world is right now, what should I do?” It produces the same correct result whether it is the first invocation or the thousandth, whether it missed ten events or none. The event that triggered the reconcile is just a hint that says “go look at this object”; the controller never trusts the content of the event. This is why reconcile takes only a name: it forces you to write level-triggered code.

A correct controller therefore tolerates being called when nothing changed (it diffs, finds no gap, does nothing) and tolerates being called after missing events (it diffs against reality and catches up). That robustness is the entire payoff.

Idempotency

Because reconcile runs an unpredictable number of times — on every relevant event, on periodic resyncs, on requeues after errors — it must be idempotent: running it N times must leave the world in the same state as running it once. Practical consequences:

A useful litmus test: if you ran your reconcile in a tight loop forever with no spec changes, the cluster should reach a fixed point and stop changing. If it keeps churning (creating/deleting, flapping status), it is not idempotent.

Informers, the cache, and work queues

Naïvely, “watch the API and react” sounds like every controller polling the API server constantly — which would melt the control plane at scale. The real machinery, provided by client-go and wrapped by controller-runtime, is built for efficiency:

The reconcile contract closes the loop: you return either success (forget the key), an error (the queue re-enqueues it with backoff), or a requeue request (re-enqueue now or after a delay). Returning a requeueAfter is how a controller polls something external on a timer without busy-looping. This watch → cache → queue → reconcile pipeline is identical whether you are writing the Deployment controller in Kubernetes itself or your own Cache operator.

The operator pattern

Now assemble the pieces. An operator is a CRD (the desired-state API) plus a custom controller (the reconciliation logic), packaged together to encode the operational knowledge of running a specific application or piece of infrastructure. The name is the whole idea: it automates what a skilled human operator would do.

Think about what a database administrator actually knows: how to bootstrap a cluster, elect a primary, add a replica, take a consistent backup, restore to a point in time, perform a rolling minor-version upgrade without downtime, fail over when the primary dies, and resize storage safely. None of that is expressible in a Deployment. An operator captures that runbook as code behind a single declarative resource:

apiVersion: postgres.example.com/v1
kind: PostgresCluster
metadata: { name: orders-db }
spec:
  instances: 3
  version: "16"
  storage: { size: 100Gi, class: fast-ssd }
  backup: { schedule: "0 2 * * *", retention: 14 }

A human reads that and understands the intent. The operator’s controller reads it and executes the runbook continuously: it provisions the StatefulSet, primes replication, configures backups, watches for a failed primary and promotes a replica, and reports health in status. The operator is the DBA who never sleeps and never forgets a step. This is the difference between a package (a Helm chart that installs PostgreSQL once and then walks away) and an operator (which operates PostgreSQL forever). The stateful-PostgreSQL lesson shows a production operator doing exactly this.

Why this pattern won: it reuses everything you already know. Operators get the declarative API, RBAC, audit logging, kubectl, GitOps compatibility, watch semantics and reconciliation for free, because a CRD is a Kubernetes object and a controller is a Kubernetes controller. You are not building a sidecar control system; you are extending the one that is already running.

Operator capability levels

Not all operators are equal. The community Operator Capability Levels model (popularised by OperatorHub/Operator SDK) describes a five-rung maturity ladder. It is the standard vocabulary for “how much does this operator actually do?” and a good design checklist.

Level Name What the operator can do
1 Basic Install Provision the application and its components from the CR; expose configuration through the spec. The “Helm-chart-equivalent” baseline.
2 Seamless Upgrades Upgrade the managed app (and itself) gracefully — minor/patch version bumps, rolling and orchestrated, without manual steps.
3 Full Lifecycle Day-2 operations: backups, restores, scaling, failure recovery, complex reconfiguration — the runbook automated.
4 Deep Insights Expose metrics, alerts, logs and workload analysis; surface health into status and to monitoring.
5 Auto Pilot Autonomous behaviour: auto-scaling, auto-tuning, auto-healing, anomaly detection, capacity right-sizing — minimal human input.

Most production operators live at level 3. Reaching level 5 is rare and usually unnecessary. The ladder is useful in two ways: when evaluating a third-party operator (a level-1 operator that “installs Kafka” but cannot upgrade or back it up may be worse than a good Helm chart), and when building one (ship level 1 first, then climb deliberately).

When a CRD or operator is the right tool — and when it isn’t

Operators are powerful and seductive, and the most common mistake is reaching for one too early. Use this decision guide.

A CRD (with or without a controller) is a good fit when:

A full operator (CRD + controller) is justified when:

Prefer a simpler tool when:

The honest trade-off: an operator is software you now own and run — it has its own bugs, RBAC, upgrade path, and a blast radius that can span every instance it manages. A buggy reconcile loop can fight itself or thrash a fleet. Add that operational cost to one side of the scale before you build.

Build options: Kubebuilder, Operator SDK and controller-runtime

You do not write the informer/work-queue plumbing by hand. Three layers of tooling sit on top of client-go, and they are complementary rather than competing.

Tool What it is Best for
controller-runtime The Go library under everything else: Manager, Reconciler interface, clients with built-in caching, builder API for watches/owns, leader election, webhook server, metrics. The foundation; you import it regardless of scaffolder.
Kubebuilder A scaffolding CLI + project layout that generates CRD types, controllers, RBAC markers, webhooks, Makefile, CRD/manifest generation (controller-gen) and envtest integration tests — all on top of controller-runtime. Go-native operators; the de facto standard for hand-written controllers.
Operator SDK A superset that wraps Kubebuilder for Go and adds Helm-based and Ansible-based operators (no Go required), plus Operator Lifecycle Manager (OLM) bundle packaging and scorecard testing. Teams wanting Helm/Ansible operators, or targeting OperatorHub/OLM distribution.

How to choose, briefly:

A note on the declarative escape hatch: not everything needs Go. Helm-based and Ansible-based operators (via Operator SDK) cover a large slice of level-1/level-2 needs with zero controller code, by reconciling a CR into a chart or playbook on a timer. They cannot express subtle level-3 logic (a custom failover algorithm), but for “manage this app’s install and upgrades declaratively,” they are dramatically less effort than hand-written Go.

Kubernetes CRDs, controllers & the operator pattern

The diagram traces the full path: a user applys a custom resource; the API server validates it against the CRD’s structural schema, defaults and CEL rules, and persists the storage version to etcd; the controller’s informer receives the watch event and enqueues the key on the work queue; a worker runs reconcile, which diffs desired (spec) against actual (the owned Deployment/Service/Secret), creates or updates children to converge, and writes observed reality back to the status subresource — looping forever.

Hands-on lab

Free and local. Use kind, minikube or k3d — any cluster works. We will create a CRD with a structural schema, validation, defaulting, subresources and printer columns, prove the API server enforces and defaults it, exercise the status and scale subresources, then clean up. No operator code is required to see the CRD machinery work.

# Create a local cluster (pick one)
kind create cluster --name crd-lab          # or: minikube start  /  k3d cluster create crd-lab
kubectl get nodes

1. Install a CRD

cat <<'EOF' | kubectl apply -f -
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: caches.cache.example.com
spec:
  group: cache.example.com
  scope: Namespaced
  names:
    kind: Cache
    plural: caches
    singular: cache
    shortNames: ["ca"]
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              required: ["size"]
              x-kubernetes-validations:
                - rule: "self.maxSize >= self.size"
                  message: "maxSize must be >= size"
              properties:
                size:    { type: integer, minimum: 1, maximum: 9, default: 3 }
                maxSize: { type: integer, minimum: 1, maximum: 9, default: 9 }
                engine:  { type: string, enum: ["redis", "valkey"], default: "redis" }
            status:
              type: object
              properties:
                readyReplicas: { type: integer }
      subresources:
        status: {}
        scale:
          specReplicasPath: .spec.size
          statusReplicasPath: .status.readyReplicas
      additionalPrinterColumns:
        - { name: Engine,  type: string,  jsonPath: .spec.engine }
        - { name: Desired, type: integer, jsonPath: .spec.size }
        - { name: Ready,   type: integer, jsonPath: .status.readyReplicas }
        - { name: Age,     type: date,    jsonPath: .metadata.creationTimestamp }
EOF

kubectl get crd caches.cache.example.com
kubectl api-resources | grep caches            # the API server now knows "caches"

Expected: the CRD shows up, and api-resources lists caches ca cache.example.com/v1alpha1 true Cache.

2. See validation, defaulting and pruning in action

# (a) A valid, minimal CR — watch defaulting fill in engine, maxSize, size
kubectl apply -f - <<'EOF'
apiVersion: cache.example.com/v1alpha1
kind: Cache
metadata: { name: web-cache }
spec: { size: 3 }
EOF
kubectl get cache web-cache -o jsonpath='{.spec}{"\n"}'
# -> {"engine":"redis","maxSize":9,"size":3}   (engine & maxSize were defaulted)

# (b) Violate the schema bound -> rejected by the API server, no webhook involved
kubectl apply -f - <<'EOF'
apiVersion: cache.example.com/v1alpha1
kind: Cache
metadata: { name: too-big }
spec: { size: 99 }
EOF
# -> error: spec.size in body should be less than or equal to 9

# (c) Violate the CEL cross-field rule
kubectl apply -f - <<'EOF'
apiVersion: cache.example.com/v1alpha1
kind: Cache
metadata: { name: bad-bounds }
spec: { size: 5, maxSize: 2 }
EOF
# -> error: maxSize must be >= size

# (d) Pruning: an unknown field is silently dropped on write
kubectl apply -f - <<'EOF'
apiVersion: cache.example.com/v1alpha1
kind: Cache
metadata: { name: typo-cache }
spec: { size: 2, repicas: 4 }   # 'repicas' is a typo, not in the schema
EOF
kubectl get cache typo-cache -o jsonpath='{.spec}{"\n"}'
# -> {"engine":"redis","maxSize":9,"size":2}   ('repicas' was pruned away)

This is the whole value of a structural schema: bad input is rejected, omitted fields are defaulted, and unknown fields are pruned — all by the API server, no controller running yet.

3. Exercise the status and scale subresources

# The status subresource: write status WITHOUT touching spec (controllers do this)
kubectl patch cache web-cache --subresource=status --type=merge \
  -p '{"status":{"readyReplicas":3}}'
kubectl get caches            # printer columns show Engine/Desired/Ready/Age in a native table

# The scale subresource: kubectl scale works on a CRD!
kubectl scale cache/web-cache --replicas=5
kubectl get cache web-cache -o jsonpath='{.spec.size}{"\n"}'   # -> 5

# generation vs observedGeneration intuition: spec change bumps generation
kubectl get cache web-cache -o jsonpath='gen={.metadata.generation}{"\n"}'

Expected: status is set without altering spec; kubectl scale changes .spec.size through the scale subresource; and kubectl get caches renders a tidy table from your printer columns — the resource behaves exactly like a built-in.

Cleanup

kubectl delete cache --all
kubectl delete crd caches.cache.example.com      # deleting the CRD removes ALL its CRs
kind delete cluster --name crd-lab               # or: minikube delete / k3d cluster delete crd-lab

Cost note

Entirely free: a local single-node cluster on your laptop, no cloud resources. The only “cost” is a few hundred MB of RAM for the kind/minikube node while the lab runs.

Common mistakes & troubleshooting

Symptom Likely cause Fix
metadata.name must be spec.plural+"."+spec.group CRD metadata.name isn’t <plural>.<group> Rename to e.g. caches.cache.example.com.
CR applies but my typo’d field “disappears” Pruning strips fields not in the structural schema Add the field to the schema; until then unknown fields are silently dropped (this is working as intended).
Controller’s status writes vanish or fight user edits Not using the status subresource; status written via the main endpoint Add subresources.status: {} and write status via /status; split RBAC accordingly.
observedGeneration never catches up / spec edits don’t bump generation No status subresource (so generation doesn’t track spec) Enable the status subresource; only then does generation increment on spec-only changes.
Reconcile errors with AlreadyExists on the second pass Non-idempotent Create instead of create-or-update Get-then-create-or-update, or server-side apply; drive to target, don’t apply deltas.
Child objects aren’t cleaned up when the CR is deleted Missing ownerReferences on children Set the CR as owner of each child so the garbage collector cascades the delete.
Reads in v1 of a v1alpha1-stored object return wrong/empty fields conversion: None but the versions differ structurally Implement a conversion webhook; None only works when versions are schema-compatible.
Can’t change scope / a stored version won’t drop Scope is immutable; old version still in status.storedVersions Recreate the CRD for scope; run a storage-version migration before removing a served/stored version.

Best practices

Security notes

Interview & exam questions

  1. What is the difference between a CRD and a CR? A CustomResourceDefinition is the registration/schema of a new kind (one per kind, itself an apiextensions.k8s.io/v1 object). A custom resource is an instance of that kind (many per CRD). The CRD defines the type; CRs are the data.

  2. Define the operator pattern in one sentence. An operator is a custom controller plus the CRDs it manages, packaged to encode the operational knowledge of running a specific application — automating what a skilled human operator would do (install, upgrade, back up, fail over) behind a declarative API.

  3. Explain level-triggered vs edge-triggered reconciliation. Why does Kubernetes choose level-triggered? Edge-triggered reacts to transitions (on-create/on-update events); level-triggered reacts to the current state. Kubernetes chooses level-triggered because events get coalesced, dropped, reordered or replayed in distributed systems, so logic that depends on seeing each transition will corrupt state. Reconcile fetches fresh state and converges, giving the same result regardless of how many events it saw — which is why reconcile receives only a name, not a diff.

  4. Why must a reconcile function be idempotent, and how do you achieve it? Because it runs an unpredictable number of times (events, periodic resyncs, error requeues). Achieve it with create-or-update (or server-side apply) instead of blind Create, by driving to a target rather than applying deltas, by recomputing status convergently, and by using ownerReferences so cleanup is declarative. Litmus test: looping reconcile with no spec change should reach a fixed point and stop.

  5. What problem does the status subresource solve, and what does enabling it change? It separates user-owned spec from controller-owned status. Enabling it gives a dedicated /status endpoint (so RBAC and writes don’t cross), makes main-resource updates ignore status and vice-versa (no clobbering), and makes metadata.generation increment only on spec changes — enabling the observedGeneration pattern.

  6. What does the scale subresource enable beyond kubectl scale? By mapping specReplicasPath/statusReplicasPath, the CRD implements the generic /scale interface, so a HorizontalPodAutoscaler can target and autoscale your custom resource exactly as it would a Deployment.

  7. When do you need a conversion webhook? Whenever you serve multiple versions that are not structurally compatible — fields renamed, moved, split or merged. With conversion: None the API server only swaps the apiVersion string, so it requires schema-compatible versions. A webhook translates objects between versions on every cross-version read/write.

  8. There can be only one storage: true version. How do you safely retire an old version? Stop serving new writes in the old version, then run a storage-version migration (re-write all stored objects into the new storage version) so nothing remains on disk in the old form (track via status.storedVersions), and only then drop the old version from served/storage. Removing it prematurely orphans stored objects.

  9. What are informers and work queues, and why not just poll the API server? An informer keeps a single watch and a local cache of a kind, so reconciles read from memory and the API server isn’t hammered. A work queue deduplicates and rate-limits keys: handlers enqueue object keys, workers dequeue and reconcile, giving coalescing, exponential backoff and per-key single-worker concurrency. Polling would not scale and would lose the dedup/backoff guarantees.

  10. Walk through the operator capability levels.

    1. Basic Install, 2) Seamless Upgrades, 3) Full Lifecycle (backups/restore/scaling/failure-recovery), 4) Deep Insights (metrics/alerts/health), 5) Auto Pilot (auto-scale/tune/heal). Most production operators sit at level 3; level 5 is rare.
  11. When would you choose a Helm chart over an operator? When you only need to install and template an app with configurable values and there are no non-trivial day-2 operations. A chart has no controller to run, secure and maintain; an operator is justified only when ongoing lifecycle automation (failover, upgrades, backups) outweighs that operational cost.

  12. Compare Kubebuilder, Operator SDK and controller-runtime. controller-runtime is the underlying Go library (Manager, Reconciler, cached clients, webhook/leader-election). Kubebuilder scaffolds a Go project on top of it (typed APIs, controller-gen, RBAC markers, envtest). Operator SDK is a superset that wraps Kubebuilder for Go and adds Helm- and Ansible-based operators (no Go) plus OLM bundle/scorecard tooling. Choose Kubebuilder for Go-native operators, Operator SDK for Helm/Ansible or OperatorHub distribution, controller-runtime directly for maximum control.

Quick check

  1. What must a CRD’s metadata.name be, exactly?
  2. A user applies a CR with a field that isn’t in the structural schema. What happens to that field, and why?
  3. Which subresource makes a CRD a valid target for a HorizontalPodAutoscaler?
  4. Your controller is invoked but nothing about the object changed (a periodic resync). What must a correct reconcile do?
  5. You serve v1 and v1alpha1 and they have differently-shaped specs. What conversion strategy do you need?

Answers: 1) <plural>.<group>, e.g. caches.cache.example.com. 2) It is silently pruned (dropped on write), because a structural schema enables pruning of unknown fields. 3) The scale subresource (it implements the generic /scale interface). 4) Diff desired vs actual, find no gap, and do nothing — reconcile is level-triggered and idempotent, so a no-op event is normal. 5) A conversion webhook (conversion: Webhook); None only works for schema-compatible versions.

Exercise

Design and install a CRD for a WebApp kind in group apps.example.com, namespaced, version v1alpha1, that:

Apply it, then: (a) create a minimal WebApp and confirm defaulting filled replicas/port; (b) prove a replicas: 99 is rejected by the API server; © kubectl scale webapp/<name> --replicas=4 and confirm .spec.replicas changed; (d) kubectl patch --subresource=status to set availableReplicas and confirm it appears in kubectl get wa. Success: all four behaviours work with no controller running — proving you understand exactly what the CRD machinery gives you before any reconcile code exists.

Certification mapping

Glossary

Next steps

KubernetesCRDOperatorsControllersReconciliationKubebuilder
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading