Containerization Containers

Extending the Kubernetes API: Aggregated API Servers, CRD Conversion Webhooks, and Versioning Strategy

Most teams stop at CustomResourceDefinitions, and for good reason: a CRD plus a controller covers the overwhelming majority of operator use cases. But there is a second extension mechanism — the aggregation layer — that serves API groups from your own binary, with your own storage, validation, and admission. Knowing which to reach for, and how to evolve either one across versions without breaking stored objects, is the line between a platform team that ships an API once and one that maintains it for years.

This guide covers the hard parts: CRD multi-version lifecycle, writing a conversion webhook that round-trips between v1alpha1 and v1, structural schemas and the status subresource, and standing up a real aggregated API server behind the aggregation layer. Throughout, the failure mode to keep in mind is data corruption during an upgrade — etcd holds objects in exactly one encoding, and getting versioning wrong is silent until a get fails to decode.

1. CRDs vs aggregated API servers: choosing the mechanism

Both mechanisms add new REST paths under /apis/<group>/<version>. The difference is who serves them.

A CRD is declarative. You kubectl apply a CustomResourceDefinition, and the kube-apiserver itself stores your objects in its own etcd, validates them against your OpenAPI v3 schema, and serves CRUD + watch. You write no API server code.

An aggregated API server (an “extension API server”) is a separate binary you build, deploy as a Deployment + Service, and register via an APIService object. The kube-apiserver proxies matching requests to it over TLS. You own storage, validation, admission, and the conversion of every request.

Concern CRD Aggregated API server
Code to write None (schema only) A full apiserver binary (Go)
Storage kube-apiserver’s etcd Your choice: etcd, a DB, or computed
Custom business logic on read/write Webhooks only Arbitrary, in-process
Non-etcd backing (proxy to external system) No Yes
Custom subresources beyond status/scale No Yes (e.g. /exec, /logs, arbitrary verbs)
Operational cost Trivial You run an HA, certificate-managed service
Protobuf serialization No (JSON only) Yes

Rule of thumb: reach for a CRD unless you have a concrete requirement a CRD cannot meet — a non-etcd backing store, computed/virtual resources, custom subresources or verbs, response sizes that demand protobuf, or imperative semantics. Metrics-server and apiregistration.k8s.io’s own metrics.k8s.io are the canonical aggregated APIs precisely because the data is computed, not stored.

The two mechanisms are not mutually exclusive. A common pattern is CRDs for declarative config plus a tiny aggregated API for a virtual subresource (think a custom /scale-like endpoint or a token-minting verb).

2. CRD versioning: storage version, served versions, lifecycle

A CRD declares a list of versions. Each version has two independent booleans that people constantly conflate:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.platform.acme.io
spec:
  group: platform.acme.io
  scope: Namespaced
  names:
    plural: widgets
    singular: widget
    kind: Widget
    listKind: WidgetList
  versions:
    - name: v1alpha1
      served: true
      storage: false          # still served for old clients, no longer stored
      schema:
        openAPIV3Schema: { ... }
    - name: v1
      served: true
      storage: true           # everything new is persisted as v1
      schema:
        openAPIV3Schema: { ... }
      subresources:
        status: {}

The mental model: a client may GET a v1alpha1/widgets/foo even though it is stored as v1. The apiserver decodes the stored v1 object and converts it to v1alpha1 to satisfy the request. With more than one served version whose schemas differ, that conversion is where a webhook becomes mandatory.

The upgrade lifecycle for adding v1 to an existing v1alpha1 CRD is strict:

  1. Add v1 as served: true, storage: false; keep v1alpha1 as the storage version. Ship the conversion webhook (Section 3) in the same change.
  2. Roll out clients that understand v1.
  3. Flip storage: true to v1 (and false on v1alpha1). New writes now persist as v1.
  4. Migrate stored objects so nothing remains physically encoded as v1alpha1 (Section 8).
  5. Only after migration completes: drop served: false on v1alpha1, then remove it from the versions list entirely.

Skipping step 4 is the classic foot-gun: you can never remove v1alpha1 from the CRD while a single object in etcd is still stored in it, because removing the version removes the schema needed to decode it.

3. Writing a conversion webhook (v1alpha1 <-> v1)

When served versions diverge, set spec.conversion.strategy: Webhook. The apiserver POSTs a ConversionReview to your endpoint whenever it must translate between versions — on reads, on storage-version writes, and during migration.

spec:
  conversion:
    strategy: Webhook
    webhook:
      conversionReviewVersions: ["v1"]
      clientConfig:
        service:
          namespace: widget-system
          name: widget-conversion-webhook
          path: /convert
          port: 443
        caBundle: <base64 PEM of the serving CA>

The contract: the request carries a list of objects all in some desiredAPIVersion, and you return the same list converted into the requested target version, preserving metadata and crucially the annotation/label data you cannot lose. Conversion must be lossless and round-trippable — a classic approach is to stash fields with no home in the target version inside an annotation so the reverse conversion can restore them.

Here is the core handler in Go. Note the uid must be echoed and the response APIVersion must match the request.

func handleConvert(w http.ResponseWriter, r *http.Request) {
    body, _ := io.ReadAll(r.Body)
    review := &apix.ConversionReview{}
    if _, _, err := codecs.UniversalDeserializer().Decode(body, nil, review); err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }
    req := review.Request
    resp := &apix.ConversionResponse{UID: req.UID}

    for _, raw := range req.Objects {
        cr := &unstructured.Unstructured{}
        if err := cr.UnmarshalJSON(raw.Raw); err != nil {
            resp.Result = metav1.Status{Status: metav1.StatusFailure, Message: err.Error()}
            break
        }
        if err := convert(cr, req.DesiredAPIVersion); err != nil {
            resp.Result = metav1.Status{Status: metav1.StatusFailure, Message: err.Error()}
            break
        }
        out, _ := cr.MarshalJSON()
        resp.ConvertedObjects = append(resp.ConvertedObjects, runtime.RawExtension{Raw: out})
    }
    if resp.Result.Status == "" {
        resp.Result = metav1.Status{Status: metav1.StatusSuccess}
    }
    review.Response = resp
    review.Request = nil
    writeJSON(w, review)
}

The convert function does the field surgery. Suppose v1alpha1 had a single spec.size string (“small”/“large”) and v1 replaced it with a structured spec.resources.replicas integer:

func convert(cr *unstructured.Unstructured, target string) error {
    switch target {
    case "platform.acme.io/v1":
        size, _, _ := unstructured.NestedString(cr.Object, "spec", "size")
        replicas := map[string]int64{"small": 1, "large": 5}[size]
        unstructured.RemoveNestedField(cr.Object, "spec", "size")
        _ = unstructured.SetNestedField(cr.Object, replicas, "spec", "resources", "replicas")
    case "platform.acme.io/v1alpha1":
        replicas, _, _ := unstructured.NestedInt64(cr.Object, "spec", "resources", "replicas")
        size := "small"
        if replicas >= 5 { size = "large" }
        unstructured.RemoveNestedField(cr.Object, "spec", "resources")
        _ = unstructured.SetNestedField(cr.Object, size, "spec", "size")
    }
    cr.SetAPIVersion(target)
    return nil
}

Two hard rules: never mutate anything outside spec/status and your own fields, and make the conversion total — every object the apiserver hands you must convert or the whole batch fails. Conversion runs on the hot path of every cross-version read, so keep it allocation-light and never call back into the apiserver.

4. Structural schemas, validation, defaulting, status

Since apiextensions.k8s.io/v1, every CRD schema must be structural: each level specifies its type, no bare additionalProperties: true at the root, and value-validations (like oneOf) only inside properties/items. Structural schemas are the precondition for server-side defaulting, pruning, and the conversion machinery.

openAPIV3Schema:
  type: object
  properties:
    spec:
      type: object
      required: ["resources"]
      properties:
        resources:
          type: object
          properties:
            replicas:
              type: integer
              minimum: 1
              maximum: 50
              default: 3              # server-side defaulting
        tier:
          type: string
          enum: ["bronze", "silver", "gold"]
          default: "bronze"
    status:
      type: object
      properties:
        readyReplicas: { type: integer }
      x-kubernetes-preserve-unknown-fields: false
  required: ["spec"]

Pruning is automatic with structural schemas: any field a client sends that is not in the schema is dropped, not stored. If you genuinely need to keep arbitrary keys (rare), opt in with x-kubernetes-preserve-unknown-fields: true on that node only.

Declaring the status subresource changes semantics: writes to /widgets/foo ignore the status stanza, and status is only mutated via /widgets/foo/status. This is what lets your controller update status without fighting the user’s spec edits, and it bumps metadata.generation only on spec changes — the signal your reconcile loop watches.

For validation beyond OpenAPI, prefer CEL validation rules (x-kubernetes-validations, GA since 1.29) over a validating webhook — they run in-process, need no certificates, and survive apiserver restarts:

replicas:
  type: integer
  x-kubernetes-validations:
    - rule: "self <= 10 || oldSelf > 10"
      message: "replicas above 10 can only be decreased, not increased"

That oldSelf transition rule enforces an invariant a static schema cannot — exactly the kind of policy that otherwise pushes people toward webhooks unnecessarily.

5. Standing up an extension API server (aggregation layer)

When a CRD is not enough, you build an aggregated API server. The fastest correct path is to follow the structure of k8s.io/sample-apiserver, which wires k8s.io/apiserver’s GenericAPIServer to your scheme, or to scaffold with apiserver-builder (apiserver-boot), which generates that wiring plus storage and Makefiles.

# apiserver-builder approach
apiserver-boot init repo --domain acme.io
apiserver-boot create group version resource \
  --group platform --version v1 --kind Fleet
apiserver-boot build executables

The aggregation layer must be enabled on the cluster (it is by default on managed providers). The kube-apiserver needs these flags so it can authenticate to and trust your extension server:

--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-allowed-names=front-proxy-client
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
--enable-aggregator-routing=true

You then register the API group with an APIService. The kube-apiserver proxies /apis/platform.acme.io/v1 to your Service and verifies its serving cert against caBundle:

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1.platform.acme.io
spec:
  group: platform.acme.io
  version: v1
  groupPriorityMinimum: 1000
  versionPriority: 15
  service:
    name: fleet-apiserver
    namespace: fleet-system
    port: 443
  caBundle: <base64 PEM>            # omit and use cert-manager CA injection

Your extension server, on startup, must do delegated authn/authz: it does not re-implement auth. Using the genericapiserver recommended options, it calls TokenReview and SubjectAccessReview back against the kube-apiserver, and reads the kube-system/extension-apiserver-authentication ConfigMap to learn the request-header CA. This is what makes RBAC on your aggregated resources behave identically to built-in resources.

// In your server's options wiring (sample-apiserver style)
o.RecommendedOptions.Authentication.RemoteKubeConfigFileOptional = true
o.RecommendedOptions.Authorization.RemoteKubeConfigFileOptional = true
serverConfig := genericapiserver.NewRecommendedConfig(codecs)
if err := o.RecommendedOptions.ApplyTo(serverConfig); err != nil {
    return err   // applies delegated authn, authz, audit, openapi, etcd
}

6. Custom storage, RBAC, and admission for aggregated resources

The aggregation layer’s real power is custom storage. k8s.io/apiserver/pkg/registry/rest defines small interfaces — Getter, Lister, Creater, Updater, GracefulDeleter, Watcher — and whichever you implement determines which verbs your resource supports. Back them with the generic etcd Store for normal cases, or implement them by hand to project an external system as Kubernetes objects (the metrics-server pattern: Get/List only, computed on the fly, no storage).

// Minimal read-only REST storage backed by a live computation
type fleetREST struct{ source ExternalInventory }

func (r *fleetREST) New() runtime.Object          { return &v1.Fleet{} }
func (r *fleetREST) NewList() runtime.Object       { return &v1.FleetList{} }
func (r *fleetREST) NamespaceScoped() bool         { return true }
func (r *fleetREST) Get(ctx context.Context, name string, _ *metav1.GetOptions) (runtime.Object, error) {
    return r.source.Lookup(genericapirequest.NamespaceValue(ctx), name)
}
func (r *fleetREST) Destroy() {}

RBAC is automatic and unified: because your server delegates authorization via SubjectAccessReview, a normal Role/ClusterRole granting verbs on your group works without any special handling.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fleet-reader
rules:
  - apiGroups: ["platform.acme.io"]
    resources: ["fleets"]
    verbs: ["get", "list", "watch"]

Admission inside an aggregated server is in-process: register admission plugins on the GenericAPIServer chain rather than deploying external admission webhooks. You get ValidatingAdmission and MutatingAdmission interfaces and can short-circuit with full type information — no AdmissionReview round trip, no certificate to rotate.

7. Discovery, OpenAPI publishing, and client codegen

For your API to be first-class — kubectl explain fleet, kubectl get fleet, typed clients — three things must publish correctly.

Discovery. The kube-apiserver aggregates your /apis/platform.acme.io/v1 discovery document under the cluster’s discovery. Verify it surfaced:

kubectl get apiservice v1.platform.acme.io
# NAME                     SERVICE                      AVAILABLE
# v1.platform.acme.io      fleet-system/fleet-apiserver  True

kubectl api-resources --api-group=platform.acme.io

AVAILABLE: False with a FailedDiscoveryCheck message almost always means a TLS or networking problem reaching your Service, not a code bug.

OpenAPI. Serve OpenAPI v2 and v3 from the extension server (RecommendedConfig wires the endpoints; you supply generated definitions). This is what powers kubectl explain and server-side apply field management. For CRDs you get OpenAPI v3 for free from the structural schema.

Client codegen. Generate typed clients, listers, and informers with code-generator’s kube_codegen.sh (the modern entry point replacing the per-tool *-gen invocations). Tag your types so the generators know what to produce:

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +kubebuilder:object:root=true
type Fleet struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec   FleetSpec   `json:"spec,omitempty"`
    Status FleetStatus `json:"status,omitempty"`
}
# vendor code-generator, then:
./hack/update-codegen.sh   # wraps kube_codegen.sh: deepcopy, client, lister, informer

For CRDs specifically, controller-gen does the equivalent: controller-gen object crd paths=./... emits deepcopy code and the CRD manifest with embedded schema, so the same Go types feed both your controller and the published API.

8. Rollout, deprecation, and migrating stored objects

The single most dangerous moment is the storage-version flip. After you set v1 as storage, existing objects are still physically encoded as v1alpha1 in etcd until something rewrites them. They are readable (conversion handles it) but they block ever deleting v1alpha1.

Force a rewrite by reading and writing every object back through the new storage version:

# Crude but effective: touch every object so it re-persists as the storage version
kubectl get widgets -A -o name | while read w; do
  kubectl annotate "$w" platform.acme.io/migrated="$(date -u +%FT%TZ)" --overwrite
done

For production scale and correctness, use the official storage-version-migrator (kube-storage-version-migrator), which drives the migration declaratively and tracks progress, instead of hand-rolling annotations:

apiVersion: migration.k8s.io/v1alpha1
kind: StorageVersionMigration
metadata:
  name: widgets-to-v1
spec:
  resource:
    group: platform.acme.io
    resource: widgets
    version: v1

Then track which versions clients still read using the CRD status, and mark old versions deprecated so kubectl warns users before you remove them:

versions:
  - name: v1alpha1
    served: true
    storage: false
    deprecated: true
    deprecationWarning: "platform.acme.io/v1alpha1 Widget is deprecated; use v1. Removal in operator 2.0."

Only when (a) migration reports complete and (b) status.storedVersions on the CRD no longer lists v1alpha1 should you set it served: false, and in a later release remove it entirely. The apiserver actively prevents you from dropping a version that still appears in storedVersions.

Enterprise scenario

A fintech platform team ran a multi-tenant Tenant CRD across 40 clusters, originally shipped as v1alpha1 with a flat spec.quota string like "cpu=8,mem=32Gi". Two years of adoption meant ~6,000 stored Tenant objects and dozens of teams’ GitOps repos pinned to v1alpha1. They needed a real, validated spec.quota object for v1, but a hard cutover was impossible: flipping the storage version while old v1alpha1 objects sat in etcd, with no conversion path, would have made those objects undecodable the moment they touched v1.

The constraint that bit them: they initially shipped v1 as storage version without a conversion webhook, assuming “the schemas are close enough.” The first reconcile that read an old object failed with a decode error, because the apiserver had no way to turn the stored flat string into the structured v1 shape. Reads of unmigrated tenants started 500-ing in two clusters before they rolled back.

The fix was to follow the lifecycle in order. They shipped a conversion webhook first (with v1alpha1 still storage), proving round-trip correctness in CI by converting a corpus of real objects v1alpha1 -> v1 -> v1alpha1 and asserting equality. Then they flipped storage to v1, ran kube-storage-version-migrator cluster by cluster during change windows, and watched status.storedVersions drain to ["v1"] before touching served. The webhook’s reverse path kept every pinned GitOps repo working untouched throughout. The CI round-trip gate is the part they wish they had built first:

func TestConversionRoundTrips(t *testing.T) {
    for _, obj := range loadCorpus(t, "testdata/tenants_v1alpha1") {
        orig := obj.DeepCopy()
        if err := convert(obj, "platform.acme.io/v1"); err != nil { t.Fatal(err) }
        if err := convert(obj, "platform.acme.io/v1alpha1"); err != nil { t.Fatal(err) }
        if diff := cmp.Diff(orig.Object, obj.Object); diff != "" {
            t.Errorf("lossy conversion:\n%s", diff)
        }
    }
}

That single test would have caught the lossy assumption before any cluster did.

Verify

Run these to confirm an extension is healthy end to end.

# 1. CRD multi-version: confirm served vs storage and stored encodings
kubectl get crd widgets.platform.acme.io \
  -o jsonpath='{range .spec.versions[*]}{.name}{" served="}{.served}{" storage="}{.storage}{"\n"}{end}'
kubectl get crd widgets.platform.acme.io -o jsonpath='{.status.storedVersions}'

# 2. Conversion webhook actually fires: read the same object at both versions
kubectl get widget demo --output=yaml --raw \
  /apis/platform.acme.io/v1alpha1/namespaces/default/widgets/demo | grep -E 'apiVersion|size|replicas'
kubectl get widget demo --output=yaml --raw \
  /apis/platform.acme.io/v1/namespaces/default/widgets/demo | grep -E 'apiVersion|size|replicas'

# 3. Aggregated API server reachable and registered
kubectl get apiservice v1.platform.acme.io \
  -o jsonpath='{.status.conditions[?(@.type=="Available")].status} {.status.conditions[?(@.type=="Available")].message}{"\n"}'

# 4. Discovery, explain, and RBAC all resolve
kubectl api-resources --api-group=platform.acme.io
kubectl explain fleet.spec
kubectl auth can-i list fleets.platform.acme.io --as=system:serviceaccount:default:reader

# 5. Defaulting and pruning behave (apply minimal object, read back)
kubectl apply -f - <<'EOF'
apiVersion: platform.acme.io/v1
kind: Widget
metadata: { name: defaults-check }
spec: { resources: {} }
EOF
kubectl get widget defaults-check -o jsonpath='{.spec.resources.replicas} {.spec.tier}{"\n"}'  # expect: 3 bronze

A correct setup shows distinct fields per version in step 2 (proving conversion ran), Available: True in step 3, populated discovery in step 4, and defaulted values in step 5.

Checklist

kubernetesapi-extensioncrdoperatorswebhooks

Comments

Keep Reading