Kubernetes CRDs, Controllers & the Operator Pattern, In Depth (Fundamentals)

Kubernetes ships with about forty built-in object kinds — Pods, Deployments, Services, ConfigMaps and so on. For most of what you deploy, that vocabulary is enough. But sooner or later you hit a wall: you want to express “a three-node PostgreSQL cluster with automated failover and nightly backups” as a single object, not as a hand-assembled pile of StatefulSets, Services, Secrets and CronJobs that you then babysit by hand. The remarkable thing about Kubernetes is that you can teach it that vocabulary. You add a new kind with a CustomResourceDefinition (CRD), and you write a controller that knows how to make that kind real and keep it real. A CRD plus a purpose-built controller is exactly what the industry calls an operator.

This lesson is the conceptual foundation for that whole world. We will go deep on three load-bearing ideas — the CRD (how the API server learns a new type, with versions, schemas, validation, subresources and conversion), the reconciliation loop (the control-theory pattern every controller obeys), and the operator pattern (encoding human operational knowledge as software) — then survey operator capability levels and the build options (Kubebuilder, Operator SDK, controller-runtime) at a level that lets you choose well. By the end you will be able to read any operator’s source and CRDs and know exactly what each piece is doing, and decide when a CRD or operator is the right tool versus overkill. Writing the Go code itself is the subject of the companion build-it-yourself guide; here we build the mental model that makes that code obvious.

Learning objectives

By the end of this lesson you will be able to:

Explain how a CustomResourceDefinition teaches the API server a new kind, and read its group, version, kind, scope and schema.
Write a CRD with an OpenAPI v3 structural schema, including validation rules, defaulting, and kubectl-friendly printer columns.
Describe the status and scale subresources and why the status/spec split matters.
Explain how multiple versions and conversion webhooks let an API evolve without breaking stored objects.
Articulate the reconciliation loop — watch, diff desired vs actual, act — and why it is level-triggered, idempotent and built on informers and work queues.
Define the operator pattern, place an operator on the capability-level maturity scale, and pick between Kubebuilder, Operator SDK and raw controller-runtime.
Judge when a CRD or operator is the right tool — and when a Helm chart or a plain controller is enough.

Prerequisites & where this fits

You should be comfortable with the Kubernetes object model — Pods, Deployments, Services, labels and kubectl apply — and have used kubectl get, describe and -o yaml. It helps to understand the control-plane architecture (API server, etcd, controllers) and the RBAC and ServiceAccount model, because a controller authenticates as a ServiceAccount and needs precise permissions. This lesson sits in the Architecture track of the Kubernetes Zero-to-Hero course, after the workload and admission-control lessons and before the networking internals. It is the bridge from using Kubernetes to extending it: everything from cert-manager to Prometheus Operator to the cloud providers’ database services is built on the patterns here.

Core concepts: declarative APIs and active reconciliation

Two ideas underpin everything in this lesson.

Kubernetes is a declarative API with active controllers. You do not tell Kubernetes how to create a Deployment’s Pods step by step; you apply an object describing the desired state (spec), the API server validates and persists it to etcd, and a controller running in a loop notices the object and works to make reality match. The API server itself is, deliberately, mostly a sophisticated CRUD store with validation, authentication, authorisation and admission — it stores objects and notifies watchers. The intelligence lives in controllers. This separation is the single most important architectural fact about Kubernetes, and it is exactly what makes extension possible: if you can add an object kind and add a controller, you have added a first-class feature.

Custom resources extend the data model; controllers extend the behaviour. A custom resource (CR) is an instance of a kind you defined — say, a Cache or a PostgresCluster. By itself a CR is inert data: creating one just stores YAML in etcd and gives you a typed, validated, RBAC-controlled, kubectl-native object. Nothing happens until a controller watches that kind and acts. So the two halves are orthogonal and you can use them independently:

You have…	You get…	Typical use
CRD only (no controller)	A validated, versioned, RBAC-able object you can `kubectl get`/`apply` and watch	Config you consume elsewhere; data other tools read (e.g. cert-manager’s `Certificate` before its controller acts)
Controller only (on built-in types)	Active automation over existing kinds	A controller that labels every new Namespace, or syncs Secrets
CRD + controller = operator	A new declarative kind that does something	Databases, message queues, certificate management, backups

Some vocabulary, because it is easy to muddle:

CRD (CustomResourceDefinition) — the definition (the schema/registration) of a new kind. There is one CRD per kind. It is itself a built-in Kubernetes object (apiextensions.k8s.io/v1).
CR (custom resource) — an instance of that kind. There are many CRs per CRD.
Controller — a program that watches some kinds and drives the world toward their spec.
Operator — a controller (or set of controllers) plus the CRDs it manages, packaged to encode the operational knowledge of running a specific application.

CustomResourceDefinitions: teaching the API server a new kind

A CRD is how you register a new API. Here is a minimal but realistic one for a fictional Cache kind; we will dissect every field.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: caches.cache.example.com          # MUST be <plural>.<group>
spec:
  group: cache.example.com                # the API group
  scope: Namespaced                        # or Cluster
  names:
    kind: Cache                            # PascalCase, used in YAML "kind:"
    plural: caches                         # lowercase, used in the REST path and name
    singular: cache
    shortNames: ["ca"]                     # kubectl get ca
    categories: ["all"]                    # kubectl get all includes it
  versions:
    - name: v1alpha1
      served: true                          # reachable over the API right now
      storage: true                         # THE version persisted to etcd (exactly one)
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              required: ["size"]
              properties:
                size:
                  type: integer
                  minimum: 1
                  maximum: 9
                  default: 3
                engine:
                  type: string
                  enum: ["redis", "valkey"]
                  default: "redis"
            status:
              type: object
              properties:
                readyReplicas:
                  type: integer
                conditions:
                  type: array
                  items:
                    type: object
                    properties:
                      type: { type: string }
                      status: { type: string }
                      reason: { type: string }
                      message: { type: string }
      subresources:
        status: {}                          # carve out /status
        scale:                              # make `kubectl scale` work
          specReplicasPath: .spec.size
          statusReplicasPath: .status.readyReplicas
      additionalPrinterColumns:
        - name: Engine
          type: string
          jsonPath: .spec.engine
        - name: Desired
          type: integer
          jsonPath: .spec.size
        - name: Ready
          type: integer
          jsonPath: .status.readyReplicas
        - name: Age
          type: date
          jsonPath: .metadata.creationTimestamp

Apply that and within a second or two the API server serves a brand-new REST endpoint at /apis/cache.example.com/v1alpha1/namespaces/<ns>/caches, kubectl knows the Cache kind, and kubectl get caches works. No restart, no recompiling the API server. That is the magic of CRDs.

Group, version, kind and scope

Every Kubernetes object is addressed by a GVK — Group, Version, Kind — and stored under a GVR — Group, Version, Resource (the resource being the lowercase plural). Choosing these well matters because they are effectively permanent once people depend on them.

Field	What it is	Guidance
`group`	A DNS-style namespace for your APIs, e.g. `cache.example.com`. Keeps your kinds from clashing with core (`""`/`v1`) or anyone else’s.	Use a domain you control. One group can hold many kinds.
`versions[].name`	API version: `v1alpha1` → `v1beta1` → `v1`, following the Kubernetes deprecation policy.	Start at `v1alpha1` (no stability promise), graduate as the schema settles.
`names.kind`	PascalCase type used in `kind:` in YAML.	Singular noun: `Cache`, `PostgresCluster`.
`names.plural`	Lowercase, used in the URL and as the first part of `metadata.name`.	`caches`.
`scope`	`Namespaced` (object lives in a namespace; the common case) or `Cluster` (global, like Nodes or StorageClasses).	Default to `Namespaced` unless the concept is genuinely cluster-wide. You cannot change scope after creation.

The CRD’s own metadata.name is not free-form: it must be exactly <plural>.<group> (caches.cache.example.com). Get this wrong and the CRD is rejected.

The OpenAPI v3 schema, structural schemas and validation

The schema.openAPIV3Schema is where a CRD goes from “a bag of JSON” to “a real, validated type”. Since apiextensions.k8s.io/v1 (Kubernetes 1.16) a schema is required, and it must be structural — a well-formed-ness contract that the API server, pruning, defaulting and conversion all rely on. A structural schema means, roughly:

Every field has a declared type (object, array, string, integer, number, boolean), specified at the right level (so a property’s type lives on the property, not buried in anyOf/oneOf).
additionalProperties and the logical combinators don’t smuggle in untyped fields.
The schema doesn’t set metadata/status types in disallowed ways.

Why “structural” matters in practice: it is the prerequisite for pruning (unknown fields silently stripped on write, so a typo like repicas: simply vanishes rather than being stored), for defaulting, and for conversion. With x-kubernetes-preserve-unknown-fields: true you can opt a subtree out of pruning to store arbitrary JSON, but you lose validation there, so use it sparingly.

The schema doubles as your validation layer. Common building blocks:

Construct	Effect	Example
`required: [...]`	Field must be present	`required: ["size"]`
`minimum`/`maximum`, `minLength`/`maxLength`	Numeric / string bounds	`minimum: 1`
`enum: [...]`	Closed set of allowed values	`enum: ["redis", "valkey"]`
`pattern`	Regex constraint on a string	`pattern: '^[a-z0-9-]+$'`
`format`	Semantic format hint (`date-time`, `email`, …)	`format: date-time`
`default`	Value injected when the field is omitted (defaulting)	`default: 3`
`x-kubernetes-validations`	CEL expression rules (1.25+, GA 1.29) for cross-field logic	see below

The last one deserves emphasis. CEL validation rules let you express constraints that plain OpenAPI cannot — relationships between fields, immutability, list semantics — inside the CRD, evaluated by the API server with no webhook to run:

type: object
x-kubernetes-validations:
  - rule: "self.maxSize >= self.size"
    message: "maxSize must be >= size"
  - rule: "self.engine != 'redis' || self.size <= 6"
    message: "redis engine supports at most 6 nodes"
properties:
  size:    { type: integer }
  maxSize: { type: integer }
  engine:  { type: string }

You can also enforce transition rules (compare new value to old with oldSelf) for immutability — e.g. rule: "self == oldSelf" with x-kubernetes-validations on a field that must never change after creation. CEL has largely removed the need for a validating webhook for ordinary structural and cross-field checks, which is a big simplification — webhooks are an extra deployment, an extra failure mode, and latency on every write.

Defaulting happens server-side from default: values in the structural schema: omit engine and the stored object comes back with engine: redis. Defaults apply on read for previously-stored objects too, which is how you can safely add a new optional field with a default to an existing CRD.

Subresources: the spec/status split, and scale

By default a custom resource is one document, and whoever can update the object can write every field — including status. That is wrong: spec is the user’s desired state; status is the controller’s report of observed reality. They have different writers and should have different permissions. The status subresource enforces that split. When subresources.status: {} is set:

The object exposes a separate /status endpoint. The controller updates status via that endpoint; users update spec via the main endpoint. RBAC can grant update on caches/status to the controller and update on caches to users, cleanly separating the two.
Updates to the main resource ignore changes to status, and updates to /status ignore changes to spec. No more accidental clobbering.
metadata.generation increments only when spec changes. Controllers compare status.observedGeneration to metadata.generation to answer “have I reconciled the current spec yet?” — this only works correctly with the status subresource.

The scale subresource wires your CRD into the generic scaling machinery. By mapping specReplicasPath, statusReplicasPath (and optionally labelSelectorPath), you make kubectl scale cache/foo --replicas=5 work and, crucially, you make your CRD a valid target for a HorizontalPodAutoscaler. The HPA scales anything that implements /scale, so a CRD with a scale subresource can be autoscaled like a Deployment — a powerful, often-overlooked feature.

additionalPrinterColumns is pure quality of life but matters for adoption: it controls what kubectl get caches shows in its table (beyond NAME/AGE). Surface the fields an operator actually cares about — desired size, ready replicas, phase — and the resource feels native.

Multiple versions and conversion webhooks

APIs evolve. You might rename a field, restructure spec, or promote v1alpha1 to v1. The hard constraint is etcd: exactly one version has storage: true, and that is the form every object is persisted as. You may serve several versions simultaneously so clients on different versions all work. The question is what happens when a client reads a stored object in a different version than it was written.

There are two conversion strategies:

Strategy	How it converts between versions	When to use
`None` (default)	No transformation — the same object is returned with only the `apiVersion` string swapped.	Versions are structurally identical (e.g. you only added optional, defaulted fields).
`Webhook`	The API server calls your conversion webhook to translate between versions on every read/write that crosses versions.	Any time fields were renamed, moved, split or merged between versions.

With None, all served versions must be schema-compatible, because there is no code to reconcile differences — you are effectively just relabelling. The moment a field changes shape between v1alpha1 and v1, you need a conversion webhook: a service the API server calls, handing it a list of objects and the desired version, expecting them back converted. This makes the multi-version contract real — a v1 client and a v1alpha1 client can both read the same underlying object, each seeing it correctly in their version, with your webhook doing the translation in the middle.

A subtle but important operational point: just because every object is stored in the old storage version does not mean you can delete the old version’s code freely. To retire a stored version you must run a storage-version migration — re-write all existing objects in the new storage version (e.g. with the storage-version-migrator, or a kubectl get ... -o yaml | kubectl apply sweep) — before dropping the old version from served/storage. The CRD also tracks status.storedVersions so you know which versions still exist on disk. Versioning, conversion webhooks and storage migration in full are the subject of the aggregated API server and conversion deep dive; the model above is what you need to design a CRD that can evolve.

CRDs vs the aggregation layer (the other extension path)

CRDs are the easy, declarative way to add a kind — the API server stores and serves it for you. There is a second mechanism, the aggregation layer / extension API server, where you run your own API server binary that the kube-apiserver proxies to. You reach for aggregation when you need custom storage (not etcd), protobuf, arbitrary subresources, or special admission/validation that CRDs cannot express. For 95% of operators, a CRD is the right and far simpler choice; know the aggregation layer exists so you can recognise when you have outgrown CRDs.

The controller and the reconciliation loop

A CRD gives you a noun. A controller gives it a verb. Every Kubernetes controller — built-in or custom — runs the same loop, and internalising it is the single most valuable thing in this lesson.

Watch, diff, act — the control loop

A controller is a closed control loop, exactly like a thermostat. The thermostat has a desired temperature (the dial) and an actual temperature (the sensor); it acts (heat/cool) to close the gap, forever. A controller has:

Desired state — the spec of the objects it watches (your Cache’s size: 3).
Actual state — what really exists in the cluster (how many Pods are actually running).
Reconcile — observe both, compute the diff, take the minimum action to close it, then report what it observed into status.

         +-----------------------------------------+
         |              Reconcile(req)             |
 watch   |  1. GET the object (desired state)      |
 event ->|  2. LIST/GET owned resources (actual)   |--> create/update/delete
         |  3. diff desired vs actual              |    children to converge
         |  4. act to converge                     |
         |  5. write observed state -> .status     |
         +-----------------------------------------+
                          ^        |
                          |        v  (requeue: after error, or on a timer)
                          +--------+

The reconcile function is deliberately given almost no input — typically just the namespace/name of the object that needs attention (a reconcile.Request). It is not told what changed or why. That design choice is the heart of the next idea.

Level-triggered, not edge-triggered

This is the concept interviewers probe and beginners get wrong. Edge-triggered thinking is “on create, do X; on update, do Y; on delete, do Z” — reacting to transitions (the edges). It is a trap in distributed systems: events get coalesced (two quick edits may surface as one), dropped (the controller was down when it happened), delivered out of order, or replayed (informer resyncs). If your logic depends on seeing every transition exactly once, it will eventually corrupt state.

Level-triggered logic ignores the transition entirely and reacts to the current level — the present desired and actual state. The reconcile function fetches the object fresh every time and asks “given how the world is right now, what should I do?” It produces the same correct result whether it is the first invocation or the thousandth, whether it missed ten events or none. The event that triggered the reconcile is just a hint that says “go look at this object”; the controller never trusts the content of the event. This is why reconcile takes only a name: it forces you to write level-triggered code.

A correct controller therefore tolerates being called when nothing changed (it diffs, finds no gap, does nothing) and tolerates being called after missing events (it diffs against reality and catches up). That robustness is the entire payoff.

Idempotency

Because reconcile runs an unpredictable number of times — on every relevant event, on periodic resyncs, on requeues after errors — it must be idempotent: running it N times must leave the world in the same state as running it once. Practical consequences:

Never blindly create. Use create-or-update semantics (a Get; if NotFound, create; else update to match) or server-side apply. A naive Create on the second pass errors with AlreadyExists.
Drive to a target, don’t apply deltas. “Ensure exactly 3 replicas,” not “add one replica.” The former is idempotent; the latter compounds on repeat.
Make status updates convergent. Recompute conditions from observed reality each pass rather than appending.
Use ownerReferences + garbage collection so cleanup is declarative: set the parent CR as the owner of every child object (Deployment, Service), and Kubernetes’ garbage collector deletes the children automatically when the CR is deleted. You don’t write deletion logic for owned resources; you express ownership and let GC be idempotent for you.

A useful litmus test: if you ran your reconcile in a tight loop forever with no spec changes, the cluster should reach a fixed point and stop changing. If it keeps churning (creating/deleting, flapping status), it is not idempotent.

Informers, the cache, and work queues

Naïvely, “watch the API and react” sounds like every controller polling the API server constantly — which would melt the control plane at scale. The real machinery, provided by client-go and wrapped by controller-runtime, is built for efficiency:

Informer — establishes a single LIST + WATCH against the API server for a kind, and maintains a local in-memory cache (the store/indexer) of those objects. Your reconcile reads from this cache, not the API server, so reads are essentially free and the API server sees one watch per kind per controller, not one request per reconcile. Informers also do a periodic resync, re-delivering everything from the cache — which is precisely why your logic must be level-triggered and idempotent: it will be called for objects that did not change.
SharedInformer — multiple controllers in one process share one informer (one watch) per kind, rather than each opening its own.
Work queue — a rate-limited, deduplicating queue of object keys (namespace/name). Event handlers don’t run your logic directly; they merely enqueue the key of the affected object. Workers pull keys and call reconcile. This gives you, for free: deduplication (ten edits to one object while it sits in the queue collapse to one reconcile), rate limiting with exponential backoff on failure, and concurrency control (a fixed worker pool, with the guarantee that the same key is never processed by two workers at once — so you never race against yourself for a single object).

The reconcile contract closes the loop: you return either success (forget the key), an error (the queue re-enqueues it with backoff), or a requeue request (re-enqueue now or after a delay). Returning a requeueAfter is how a controller polls something external on a timer without busy-looping. This watch → cache → queue → reconcile pipeline is identical whether you are writing the Deployment controller in Kubernetes itself or your own Cache operator.

The operator pattern

Now assemble the pieces. An operator is a CRD (the desired-state API) plus a custom controller (the reconciliation logic), packaged together to encode the operational knowledge of running a specific application or piece of infrastructure. The name is the whole idea: it automates what a skilled human operator would do.

Think about what a database administrator actually knows: how to bootstrap a cluster, elect a primary, add a replica, take a consistent backup, restore to a point in time, perform a rolling minor-version upgrade without downtime, fail over when the primary dies, and resize storage safely. None of that is expressible in a Deployment. An operator captures that runbook as code behind a single declarative resource:

apiVersion: postgres.example.com/v1
kind: PostgresCluster
metadata: { name: orders-db }
spec:
  instances: 3
  version: "16"
  storage: { size: 100Gi, class: fast-ssd }
  backup: { schedule: "0 2 * * *", retention: 14 }

A human reads that and understands the intent. The operator’s controller reads it and executes the runbook continuously: it provisions the StatefulSet, primes replication, configures backups, watches for a failed primary and promotes a replica, and reports health in status. The operator is the DBA who never sleeps and never forgets a step. This is the difference between a package (a Helm chart that installs PostgreSQL once and then walks away) and an operator (which operates PostgreSQL forever). The stateful-PostgreSQL lesson shows a production operator doing exactly this.

Why this pattern won: it reuses everything you already know. Operators get the declarative API, RBAC, audit logging, kubectl, GitOps compatibility, watch semantics and reconciliation for free, because a CRD is a Kubernetes object and a controller is a Kubernetes controller. You are not building a sidecar control system; you are extending the one that is already running.

Operator capability levels

Not all operators are equal. The community Operator Capability Levels model (popularised by OperatorHub/Operator SDK) describes a five-rung maturity ladder. It is the standard vocabulary for “how much does this operator actually do?” and a good design checklist.

Level	Name	What the operator can do
1	Basic Install	Provision the application and its components from the CR; expose configuration through the spec. The “Helm-chart-equivalent” baseline.
2	Seamless Upgrades	Upgrade the managed app (and itself) gracefully — minor/patch version bumps, rolling and orchestrated, without manual steps.
3	Full Lifecycle	Day-2 operations: backups, restores, scaling, failure recovery, complex reconfiguration — the runbook automated.
4	Deep Insights	Expose metrics, alerts, logs and workload analysis; surface health into `status` and to monitoring.
5	Auto Pilot	Autonomous behaviour: auto-scaling, auto-tuning, auto-healing, anomaly detection, capacity right-sizing — minimal human input.

Most production operators live at level 3. Reaching level 5 is rare and usually unnecessary. The ladder is useful in two ways: when evaluating a third-party operator (a level-1 operator that “installs Kafka” but cannot upgrade or back it up may be worse than a good Helm chart), and when building one (ship level 1 first, then climb deliberately).

When a CRD or operator is the right tool — and when it isn’t

Operators are powerful and seductive, and the most common mistake is reaching for one too early. Use this decision guide.

A CRD (with or without a controller) is a good fit when:

You need a first-class, validated, RBAC-controlled API for a domain concept that your team or platform owns (Environment, Tenant, FeatureFlag).
Other tools or controllers will watch and react to the object.
You want users to express intent declaratively and have it stored, versioned and audited like any native object.

A full operator (CRD + controller) is justified when:

The thing has non-trivial, ongoing day-2 operations — failover, backups, upgrades, topology changes — that a static manifest cannot capture and that you would otherwise do by hand at 3 a.m.
The application is stateful or stateful-clustered (databases, queues, search), where lifecycle correctness is genuinely hard.
You are managing many instances and the automation amortises across all of them.

Prefer a simpler tool when:

You just need to install and template an app with knobs → a Helm chart or Kustomize is simpler, with no controller to run, secure and maintain.
The lifecycle is stateless and trivial → a Deployment plus an HPA already reconciles for you.
You only need to react to built-in objects (label new namespaces, copy a Secret) → write a plain controller on existing kinds; you may not need a CRD at all.

The honest trade-off: an operator is software you now own and run — it has its own bugs, RBAC, upgrade path, and a blast radius that can span every instance it manages. A buggy reconcile loop can fight itself or thrash a fleet. Add that operational cost to one side of the scale before you build.

Build options: Kubebuilder, Operator SDK and controller-runtime

You do not write the informer/work-queue plumbing by hand. Three layers of tooling sit on top of client-go, and they are complementary rather than competing.

Tool	What it is	Best for
controller-runtime	The Go library under everything else: `Manager`, `Reconciler` interface, clients with built-in caching, builder API for watches/owns, leader election, webhook server, metrics.	The foundation; you import it regardless of scaffolder.
Kubebuilder	A scaffolding CLI + project layout that generates CRD types, controllers, RBAC markers, webhooks, `Makefile`, CRD/manifest generation (`controller-gen`) and `envtest` integration tests — all on top of controller-runtime.	Go-native operators; the de facto standard for hand-written controllers.
Operator SDK	A superset that wraps Kubebuilder for Go and adds Helm-based and Ansible-based operators (no Go required), plus Operator Lifecycle Manager (OLM) bundle packaging and scorecard testing.	Teams wanting Helm/Ansible operators, or targeting OperatorHub/OLM distribution.

How to choose, briefly:

Go, full control, standard path → Kubebuilder. It is what most CNCF operators use. You get strongly-typed APIs, generated deepcopy/CRD manifests, and envtest for fast tests against a real API server. This is the toolchain the build-your-own-operator guide uses end to end.
No Go, mostly “install + minor lifecycle” → Operator SDK (Helm or Ansible). A Helm-based operator turns an existing chart into a level-1/2 operator (the CR’s spec becomes Helm values, reconciled on a loop) with no code. An Ansible-based operator maps reconcile to a playbook — good when your runbook already exists as Ansible.
Distributing on OperatorHub/OpenShift → Operator SDK for its OLM bundle tooling.
Maximum control or embedding in an existing codebase → controller-runtime directly, skipping the scaffolder.

A note on the declarative escape hatch: not everything needs Go. Helm-based and Ansible-based operators (via Operator SDK) cover a large slice of level-1/level-2 needs with zero controller code, by reconciling a CR into a chart or playbook on a timer. They cannot express subtle level-3 logic (a custom failover algorithm), but for “manage this app’s install and upgrades declaratively,” they are dramatically less effort than hand-written Go.

Kubernetes CRDs, controllers & the operator pattern

The diagram traces the full path: a user applys a custom resource; the API server validates it against the CRD’s structural schema, defaults and CEL rules, and persists the storage version to etcd; the controller’s informer receives the watch event and enqueues the key on the work queue; a worker runs reconcile, which diffs desired (spec) against actual (the owned Deployment/Service/Secret), creates or updates children to converge, and writes observed reality back to the status subresource — looping forever.

Hands-on lab

Free and local. Use kind, minikube or k3d — any cluster works. We will create a CRD with a structural schema, validation, defaulting, subresources and printer columns, prove the API server enforces and defaults it, exercise the status and scale subresources, then clean up. No operator code is required to see the CRD machinery work.

# Create a local cluster (pick one)
kind create cluster --name crd-lab          # or: minikube start  /  k3d cluster create crd-lab
kubectl get nodes

1. Install a CRD

cat <<'EOF' | kubectl apply -f -
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: caches.cache.example.com
spec:
  group: cache.example.com
  scope: Namespaced
  names:
    kind: Cache
    plural: caches
    singular: cache
    shortNames: ["ca"]
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              required: ["size"]
              x-kubernetes-validations:
                - rule: "self.maxSize >= self.size"
                  message: "maxSize must be >= size"
              properties:
                size:    { type: integer, minimum: 1, maximum: 9, default: 3 }
                maxSize: { type: integer, minimum: 1, maximum: 9, default: 9 }
                engine:  { type: string, enum: ["redis", "valkey"], default: "redis" }
            status:
              type: object
              properties:
                readyReplicas: { type: integer }
      subresources:
        status: {}
        scale:
          specReplicasPath: .spec.size
          statusReplicasPath: .status.readyReplicas
      additionalPrinterColumns:
        - { name: Engine,  type: string,  jsonPath: .spec.engine }
        - { name: Desired, type: integer, jsonPath: .spec.size }
        - { name: Ready,   type: integer, jsonPath: .status.readyReplicas }
        - { name: Age,     type: date,    jsonPath: .metadata.creationTimestamp }
EOF

kubectl get crd caches.cache.example.com
kubectl api-resources | grep caches            # the API server now knows "caches"

Expected: the CRD shows up, and api-resources lists caches ca cache.example.com/v1alpha1 true Cache.

2. See validation, defaulting and pruning in action

# (a) A valid, minimal CR — watch defaulting fill in engine, maxSize, size
kubectl apply -f - <<'EOF'
apiVersion: cache.example.com/v1alpha1
kind: Cache
metadata: { name: web-cache }
spec: { size: 3 }
EOF
kubectl get cache web-cache -o jsonpath='{.spec}{"\n"}'
# -> {"engine":"redis","maxSize":9,"size":3}   (engine & maxSize were defaulted)

# (b) Violate the schema bound -> rejected by the API server, no webhook involved
kubectl apply -f - <<'EOF'
apiVersion: cache.example.com/v1alpha1
kind: Cache
metadata: { name: too-big }
spec: { size: 99 }
EOF
# -> error: spec.size in body should be less than or equal to 9

# (c) Violate the CEL cross-field rule
kubectl apply -f - <<'EOF'
apiVersion: cache.example.com/v1alpha1
kind: Cache
metadata: { name: bad-bounds }
spec: { size: 5, maxSize: 2 }
EOF
# -> error: maxSize must be >= size

# (d) Pruning: an unknown field is silently dropped on write
kubectl apply -f - <<'EOF'
apiVersion: cache.example.com/v1alpha1
kind: Cache
metadata: { name: typo-cache }
spec: { size: 2, repicas: 4 }   # 'repicas' is a typo, not in the schema
EOF
kubectl get cache typo-cache -o jsonpath='{.spec}{"\n"}'
# -> {"engine":"redis","maxSize":9,"size":2}   ('repicas' was pruned away)

This is the whole value of a structural schema: bad input is rejected, omitted fields are defaulted, and unknown fields are pruned — all by the API server, no controller running yet.

3. Exercise the status and scale subresources

# The status subresource: write status WITHOUT touching spec (controllers do this)
kubectl patch cache web-cache --subresource=status --type=merge \
  -p '{"status":{"readyReplicas":3}}'
kubectl get caches            # printer columns show Engine/Desired/Ready/Age in a native table

# The scale subresource: kubectl scale works on a CRD!
kubectl scale cache/web-cache --replicas=5
kubectl get cache web-cache -o jsonpath='{.spec.size}{"\n"}'   # -> 5

# generation vs observedGeneration intuition: spec change bumps generation
kubectl get cache web-cache -o jsonpath='gen={.metadata.generation}{"\n"}'

Expected: status is set without altering spec; kubectl scale changes .spec.size through the scale subresource; and kubectl get caches renders a tidy table from your printer columns — the resource behaves exactly like a built-in.

Cleanup

kubectl delete cache --all
kubectl delete crd caches.cache.example.com      # deleting the CRD removes ALL its CRs
kind delete cluster --name crd-lab               # or: minikube delete / k3d cluster delete crd-lab

Cost note

Entirely free: a local single-node cluster on your laptop, no cloud resources. The only “cost” is a few hundred MB of RAM for the kind/minikube node while the lab runs.

Common mistakes & troubleshooting

Symptom	Likely cause	Fix
`metadata.name must be spec.plural+"."+spec.group`	CRD `metadata.name` isn’t `<plural>.<group>`	Rename to e.g. `caches.cache.example.com`.
CR applies but my typo’d field “disappears”	Pruning strips fields not in the structural schema	Add the field to the schema; until then unknown fields are silently dropped (this is working as intended).
Controller’s `status` writes vanish or fight user edits	Not using the status subresource; status written via the main endpoint	Add `subresources.status: {}` and write status via `/status`; split RBAC accordingly.
`observedGeneration` never catches up / spec edits don’t bump `generation`	No status subresource (so `generation` doesn’t track spec)	Enable the status subresource; only then does `generation` increment on spec-only changes.
Reconcile errors with `AlreadyExists` on the second pass	Non-idempotent `Create` instead of create-or-update	Get-then-create-or-update, or server-side apply; drive to target, don’t apply deltas.
Child objects aren’t cleaned up when the CR is deleted	Missing `ownerReferences` on children	Set the CR as owner of each child so the garbage collector cascades the delete.
Reads in `v1` of a `v1alpha1`-stored object return wrong/empty fields	`conversion: None` but the versions differ structurally	Implement a conversion webhook; `None` only works when versions are schema-compatible.
Can’t change `scope` / a stored version won’t drop	Scope is immutable; old version still in `status.storedVersions`	Recreate the CRD for scope; run a storage-version migration before removing a served/stored version.

Best practices

Start the API at v1alpha1. You will get the schema wrong the first time; an alpha version sets that expectation and frees you to iterate. Graduate to v1beta1/v1 only once the shape is stable.
Make the schema structural and strict. Type every field, set sensible defaults, add enum/bounds, and push cross-field logic into CEL (x-kubernetes-validations) so you avoid a validating webhook entirely where possible.
Always use the status subresource for anything with a controller. Model status with conditions (type/status/reason/message/observedGeneration) — it is the idiomatic, tool-friendly health contract.
Write reconcile to be idempotent and level-triggered. Fetch fresh, diff against reality, drive to target, never depend on event content. Test the “called repeatedly with no change → no churn” property.
Use ownerReferences for declarative cleanup instead of writing deletion code for owned resources; add a finalizer only when you must clean up something Kubernetes’ GC cannot (external systems).
Surface intent with additionalPrinterColumns so kubectl get feels native, and add shortNames/categories for ergonomics.
Scope RBAC tightly. A controller should have only the verbs/kinds it touches (and update on <plural>/status separately) — see the RBAC fundamentals.
Run a single active controller via leader election (controller-runtime gives this for free) so two replicas don’t reconcile the same object and fight.

Security notes

The controller’s ServiceAccount is a privilege boundary. An operator that manages workloads cluster-wide often needs broad RBAC; treat its ServiceAccount token as sensitive and grant the least set of verbs/kinds it needs. A compromised operator with * on * is a cluster takeover.
Validate untrusted input in the schema, not just the controller. CEL rules and structural validation run in the API server before anything is stored, so they protect every consumer, including other controllers. Don’t rely on your reconcile loop as the only gate.
Conversion and admission webhooks are attack surface and availability risk. They sit in the write path; an outage (or a failurePolicy: Fail webhook that is down) can block writes to the whole kind. Run them HA, secure their TLS, and prefer CEL over webhooks where it suffices to shrink this surface.
CRDs are cluster-scoped definitions. Anyone who can create/modify a CRD can change validation for all instances cluster-wide — restrict create/update/delete on customresourcedefinitions to platform admins.
Beware x-kubernetes-preserve-unknown-fields. Opting a subtree out of pruning stores arbitrary user JSON unvalidated; only do it where you genuinely need free-form data, and never on a path you later trust blindly.

Interview & exam questions

What is the difference between a CRD and a CR? A CustomResourceDefinition is the registration/schema of a new kind (one per kind, itself an apiextensions.k8s.io/v1 object). A custom resource is an instance of that kind (many per CRD). The CRD defines the type; CRs are the data.
Define the operator pattern in one sentence. An operator is a custom controller plus the CRDs it manages, packaged to encode the operational knowledge of running a specific application — automating what a skilled human operator would do (install, upgrade, back up, fail over) behind a declarative API.
Explain level-triggered vs edge-triggered reconciliation. Why does Kubernetes choose level-triggered? Edge-triggered reacts to transitions (on-create/on-update events); level-triggered reacts to the current state. Kubernetes chooses level-triggered because events get coalesced, dropped, reordered or replayed in distributed systems, so logic that depends on seeing each transition will corrupt state. Reconcile fetches fresh state and converges, giving the same result regardless of how many events it saw — which is why reconcile receives only a name, not a diff.
Why must a reconcile function be idempotent, and how do you achieve it? Because it runs an unpredictable number of times (events, periodic resyncs, error requeues). Achieve it with create-or-update (or server-side apply) instead of blind Create, by driving to a target rather than applying deltas, by recomputing status convergently, and by using ownerReferences so cleanup is declarative. Litmus test: looping reconcile with no spec change should reach a fixed point and stop.
What problem does the status subresource solve, and what does enabling it change? It separates user-owned spec from controller-owned status. Enabling it gives a dedicated /status endpoint (so RBAC and writes don’t cross), makes main-resource updates ignore status and vice-versa (no clobbering), and makes metadata.generation increment only on spec changes — enabling the observedGeneration pattern.
What does the scale subresource enable beyond kubectl scale? By mapping specReplicasPath/statusReplicasPath, the CRD implements the generic /scale interface, so a HorizontalPodAutoscaler can target and autoscale your custom resource exactly as it would a Deployment.
When do you need a conversion webhook? Whenever you serve multiple versions that are not structurally compatible — fields renamed, moved, split or merged. With conversion: None the API server only swaps the apiVersion string, so it requires schema-compatible versions. A webhook translates objects between versions on every cross-version read/write.
There can be only one storage: true version. How do you safely retire an old version? Stop serving new writes in the old version, then run a storage-version migration (re-write all stored objects into the new storage version) so nothing remains on disk in the old form (track via status.storedVersions), and only then drop the old version from served/storage. Removing it prematurely orphans stored objects.
What are informers and work queues, and why not just poll the API server? An informer keeps a single watch and a local cache of a kind, so reconciles read from memory and the API server isn’t hammered. A work queue deduplicates and rate-limits keys: handlers enqueue object keys, workers dequeue and reconcile, giving coalescing, exponential backoff and per-key single-worker concurrency. Polling would not scale and would lose the dedup/backoff guarantees.
Walk through the operator capability levels.
1. Basic Install, 2) Seamless Upgrades, 3) Full Lifecycle (backups/restore/scaling/failure-recovery), 4) Deep Insights (metrics/alerts/health), 5) Auto Pilot (auto-scale/tune/heal). Most production operators sit at level 3; level 5 is rare.
When would you choose a Helm chart over an operator? When you only need to install and template an app with configurable values and there are no non-trivial day-2 operations. A chart has no controller to run, secure and maintain; an operator is justified only when ongoing lifecycle automation (failover, upgrades, backups) outweighs that operational cost.
Compare Kubebuilder, Operator SDK and controller-runtime. controller-runtime is the underlying Go library (Manager, Reconciler, cached clients, webhook/leader-election). Kubebuilder scaffolds a Go project on top of it (typed APIs, controller-gen, RBAC markers, envtest). Operator SDK is a superset that wraps Kubebuilder for Go and adds Helm- and Ansible-based operators (no Go) plus OLM bundle/scorecard tooling. Choose Kubebuilder for Go-native operators, Operator SDK for Helm/Ansible or OperatorHub distribution, controller-runtime directly for maximum control.

Quick check

What must a CRD’s metadata.name be, exactly?
A user applies a CR with a field that isn’t in the structural schema. What happens to that field, and why?
Which subresource makes a CRD a valid target for a HorizontalPodAutoscaler?
Your controller is invoked but nothing about the object changed (a periodic resync). What must a correct reconcile do?
You serve v1 and v1alpha1 and they have differently-shaped specs. What conversion strategy do you need?

Answers: 1) <plural>.<group>, e.g. caches.cache.example.com. 2) It is silently pruned (dropped on write), because a structural schema enables pruning of unknown fields. 3) The scale subresource (it implements the generic /scale interface). 4) Diff desired vs actual, find no gap, and do nothing — reconcile is level-triggered and idempotent, so a no-op event is normal. 5) A conversion webhook (conversion: Webhook); None only works for schema-compatible versions.

Exercise

Design and install a CRD for a WebApp kind in group apps.example.com, namespaced, version v1alpha1, that:

Has a structural schema with spec.image (string, required), spec.replicas (integer, minimum: 1, maximum: 10, default: 2) and spec.port (integer, default: 80).
Enforces a CEL rule that spec.port is in 1..65535, and an immutability rule that spec.image’s registry cannot change after creation is optional but a nice stretch (compare self.image to oldSelf.image shape with a transition rule).
Has a status subresource with status.availableReplicas and a status.conditions array, and a scale subresource mapping .spec.replicas ↔ .status.availableReplicas.
Shows printer columns for Image, Replicas, Available and Age, plus a shortName of wa.

Apply it, then: (a) create a minimal WebApp and confirm defaulting filled replicas/port; (b) prove a replicas: 99 is rejected by the API server; © kubectl scale webapp/<name> --replicas=4 and confirm .spec.replicas changed; (d) kubectl patch --subresource=status to set availableReplicas and confirm it appears in kubectl get wa. Success: all four behaviours work with no controller running — proving you understand exactly what the CRD machinery gives you before any reconcile code exists.

Certification mapping

CKA (Certified Kubernetes Administrator): the Cluster Architecture, Installation & Configuration domain expects you to understand extension points — CRDs, the controller/operator pattern, and how custom controllers fit the control-plane reconciliation model. You should be able to install a CRD, read its schema/versions/subresources, and explain how a controller reconciles desired vs actual.
KCNA (Kubernetes and Cloud Native Associate): CRDs, custom controllers and the operator pattern appear conceptually in the Kubernetes Fundamentals and Cloud Native Architecture domains — know what an operator is and why the pattern exists.
CKAD (Certified Kubernetes Application Developer): while CKAD centres on built-in workloads, recognising and using custom resources (applying a CR, reading its status) is increasingly relevant as platforms ship CRDs developers consume.

Glossary

CustomResourceDefinition (CRD) — the object that registers a new kind (schema, versions, scope) with the API server; one per kind.
Custom resource (CR) — an instance of a kind defined by a CRD.
GVK / GVR — Group-Version-Kind (how an object is typed) and Group-Version-Resource (how it is addressed in the REST path).
Structural schema — an OpenAPI v3 schema where every field is typed correctly; required for pruning, defaulting and conversion.
Pruning — automatic removal, on write, of fields not present in the structural schema.
Defaulting — injection of default: values from the schema for omitted fields, on write and on read.
CEL validation (x-kubernetes-validations) — Common Expression Language rules in the CRD for cross-field and transition (immutability) checks, evaluated by the API server without a webhook.
Status subresource — a separate /status endpoint splitting user-owned spec from controller-owned status; makes generation track spec changes.
Scale subresource — maps replica fields so kubectl scale and the HPA work against a custom resource.
additionalPrinterColumns — fields surfaced in the kubectl get table for a kind.
Conversion webhook — a service the API server calls to translate objects between served versions when they aren’t schema-compatible.
Storage version — the single version (storage: true) every object is persisted as in etcd; changing it requires a storage-version migration.
Controller — a program that watches kinds and drives the world toward their desired state via a reconciliation loop.
Reconciliation loop — the watch → diff desired vs actual → act → report-status cycle every controller runs.
Level-triggered — reacting to current state rather than to transitions; the basis of robust reconciliation.
Idempotent — running an operation N times leaves the same result as running it once.
Informer — client-go machinery maintaining a cached LIST+WATCH of a kind for cheap reads and periodic resyncs.
Work queue — a rate-limited, deduplicating queue of object keys that feeds reconcile workers.
ownerReferences / garbage collection — declaring a parent CR as owner of child objects so Kubernetes cascades their deletion.
Finalizer — a key on an object that blocks deletion until a controller performs external cleanup, then removes the key.
Operator — a CRD plus a custom controller that encodes the operational knowledge of running a specific application.
Capability levels — the 1–5 operator maturity scale (Basic Install → Auto Pilot).
controller-runtime / Kubebuilder / Operator SDK — the Go library, its Go scaffolder, and the Helm/Ansible/OLM superset for building operators.

Next steps

Build one for real, in Go, end to end: Building a Kubernetes Operator with Kubebuilder: CRDs, Reconciliation & Production Hardening.
Go deeper on API evolution: Extending the Kubernetes API: Aggregated API Servers, CRD Conversion Webhooks, and Versioning Strategy.
See a production operator manage state: Running Stateful PostgreSQL on Kubernetes: StatefulSets, Operators, Automated Failover, and Point-in-Time Recovery.
Understand the control plane your controller talks to: Kubernetes Cluster Architecture & the Control Plane, In Depth.
Continue the Architecture track with the networking internals: Kubernetes Networking Internals, In Depth: The Network Model, CNI, IPAM & the Datapath.