Containerization Backup & DR

Linkerd in Production: Automatic mTLS, Retry/Timeout Budgets, and Multicluster Failover

Linkerd’s pitch is restraint: a Rust micro-proxy that does mTLS, load balancing, retries, and metrics in single-digit megabytes, with no Lua, no WASM, and a CLI that tells you the truth when something is misconfigured. You give up the kitchen-sink extensibility of an Envoy mesh; you get a data plane you can reason about at 3am. This guide takes a workload from unmeshed to zero-trust, layers on retry and timeout budgets that prevent retry storms, then links two clusters and wires automatic failover.

Everything here targets the stable 2.x line (validated against 2.15/2.16 behavior). Where a feature has a sharp edge — ServiceProfiles vs Gateway API, the 64KiB retry ceiling, gateway-mode latency — I call it out rather than paper over it.

1. Architecture: why the micro-proxy is different

Linkerd has two layers. The control plane (destination, identity, proxy-injector) runs in the linkerd namespace. The data plane is linkerd2-proxy, a purpose-built Rust proxy injected as a sidecar next to each application container.

The design choices that matter operationally:

Concern Linkerd Typical Envoy mesh
Data plane Rust micro-proxy, ~10-20Mi/pod Envoy, ~50-100Mi/pod
Config surface Opinionated, few knobs xDS, near-infinite knobs
mTLS On by default, zero config Opt-in policy, PeerAuthentication
Extensibility Intentionally limited Lua / WASM / ext_authz
Identity Per-workload cert from identity service Per-workload SPIFFE

The identity component is a CA that issues short-lived (24h default) leaf certs to each proxy, keyed to the pod’s Kubernetes ServiceAccount; proxies rotate them automatically. Because identity is bootstrapped from the ServiceAccount token, mTLS is on the moment a pod is meshed — there is no separate “turn on mTLS” step like Istio’s PeerAuthentication. The secure default is the only default, and that is the single biggest reason teams reach for Linkerd.

Mental model: the trust anchor (root CA) is long-lived and you guard it like a crown jewel. The issuer (intermediate CA) is what actually signs proxy certs, lives in a Kubernetes Secret, and you rotate it on a schedule. Leaf certs are ephemeral and you never touch them.

2. Install with a custom trust anchor

Never run a production mesh on the auto-generated certs from linkerd install with no arguments — they expire in a year and you cannot rotate the issuer independently of the anchor. Generate your own with the step CLI.

# Long-lived root (trust anchor) — 10 years, kept OFFLINE after this
step certificate create root.linkerd.cluster.local ca.crt ca.key \
  --profile root-ca --no-password --insecure --not-after=87600h

# Issuer (intermediate) — 1 year, signed by the root
step certificate create identity.linkerd.cluster.local issuer.crt issuer.key \
  --profile intermediate-ca --not-after=8760h --no-password --insecure \
  --ca ca.crt --ca-key ca.key

Install the CRDs first, then the control plane wired to those certs:

linkerd install --crds | kubectl apply -f -

linkerd install \
  --identity-trust-anchors-file ca.crt \
  --identity-issuer-certificate-file issuer.crt \
  --identity-issuer-key-file issuer.key \
  | kubectl apply -f -

linkerd check

For GitOps, prefer Helm so the chart owns the lifecycle:

helm install linkerd-crds linkerd/linkerd-crds -n linkerd --create-namespace

helm install linkerd-control-plane -n linkerd \
  --set-file identityTrustAnchorsPEM=ca.crt \
  --set-file identity.issuer.tls.crtPEM=issuer.crt \
  --set-file identity.issuer.tls.keyPEM=issuer.key \
  linkerd/linkerd-control-plane

Rotating the issuer

The issuer is the cert you rotate routinely. Because every proxy already trusts the root, swapping the intermediate is non-disruptive — you do not need to bundle anything.

# New intermediate, same root
step certificate create identity.linkerd.cluster.local issuer-new.crt issuer-new.key \
  --profile intermediate-ca --not-after 8760h --no-password --insecure \
  --ca ca.crt --ca-key ca.key

linkerd upgrade \
  --identity-issuer-certificate-file=./issuer-new.crt \
  --identity-issuer-key-file=./issuer-new.key \
  | kubectl apply -f -

# Proxies pick up the new issuer on their next cert rotation; force it to verify
kubectl -n emojivoto rollout restart deploy
linkerd check --proxy

Rotating the trust anchor is the harder case — you bundle old + new so proxies trust both during the transition, roll everything, re-issue the intermediate from the new root, roll again, then drop the old anchor. That four-step dance is why you give the root a 10-year life and rotate the issuer instead.

3. Mesh workloads and prove mTLS

Meshing is injection: the proxy-injector webhook adds the sidecar when it sees the linkerd.io/inject: enabled annotation. Annotate the namespace so every new pod is meshed:

kubectl annotate namespace emojivoto linkerd.io/inject=enabled
kubectl -n emojivoto rollout restart deploy   # existing pods need a restart to get the sidecar

For one-off or pipeline use, inject at apply time:

kubectl apply -k github.com/BuoyantIO/emojivoto/kustomize/deployment
kubectl -n emojivoto get deploy -o yaml | linkerd inject - | kubectl apply -f -

Verify with Viz

Install the observability extension, then prove mTLS is actually happening rather than assuming it.

linkerd viz install | kubectl apply -f -
linkerd viz check

edges shows you which traffic is secured. The SECURED column is the source of truth:

linkerd viz -n emojivoto edges deployment
SRC          DST          SRC_NS     DST_NS     SECURED
web          emoji        emojivoto  emojivoto  √
web          voting       emojivoto  emojivoto  √
vote-bot     web          emojivoto  emojivoto  √

tap streams live requests with a tls field. tls=true means the connection was mTLS’d between two meshed identities; tls=no_tls_from_remote is normal for kubelet health probes, which have no mesh identity:

linkerd viz -n emojivoto tap deploy/web
# req id=0:1 proxy=in  src=10.1.0.5:48000 dst=10.1.0.9:8080 tls=true :method=GET :path=/api/list

If SECURED is blank or tls=disabled for app-to-app traffic, the destination pod is not meshed — check the injector annotation and that you restarted the deployment.

4. Retry budgets and timeouts

This is where teams get reliability right or badly wrong. A naive “retry on 5xx, 3 attempts” turns a brief dependency blip into a self-inflicted DDoS: every failure triples load exactly when the system can least absorb it. Linkerd’s answer is a retry budget — retries are capped as a fraction of live traffic, not a fixed per-request count, so they can never become a runaway multiplier.

There are two APIs. Gateway API (HTTPRoute annotations) is the current path for new clusters; as of 2.16 it supplants ServiceProfile for routing, timeouts, and retries. ServiceProfile is the legacy API that remains supported and is still the way to express an explicit retryBudget object. I cover both because brownfield clusters run a mix.

Retries and timeouts via HTTPRoute (current)

Retries and timeouts are plain annotations on an HTTPRoute. Note: route annotations are incompatible with a ServiceProfile for the same service — if a ServiceProfile exists, it wins and these are ignored.

apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: web-default
  namespace: emojivoto
  annotations:
    retry.linkerd.io/http: 5xx        # also: gateway-error (502-504), or a range like 500-504
    retry.linkerd.io/limit: "2"       # max attempts per request
    retry.linkerd.io/timeout: 300ms   # cancel+retry a slow attempt after this
    timeout.linkerd.io/request: 2s    # total budget across all attempts
    timeout.linkerd.io/response: 1s   # single backend response ceiling
spec:
  parentRefs:
    - name: web
      kind: Service
      group: core
      port: 80
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: "/api"

Two ceilings worth memorizing because they bite people: requests larger than 64KiB are not retried (the proxy will not buffer an unbounded body), and retry.linkerd.io/limit is a per-request attempt cap that operates underneath the global budget — the budget is the backstop that prevents aggregate retry traffic from exceeding the ratio.

Explicit retry budget via ServiceProfile

When you need to tune the budget itself — the ratio, the free-retry floor, the accounting window — that lives in a ServiceProfile. The default budget is generous: 20% extra load plus 10 free retries/sec. Tighten it for sensitive backends:

apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: web-svc.emojivoto.svc.cluster.local
  namespace: emojivoto
spec:
  routes:
    - name: GET /api/list
      condition:
        method: GET
        pathRegex: /api/list
      isRetryable: true        # GET is idempotent — safe to retry
      timeout: 600ms
    - name: POST /api/vote
      condition:
        method: POST
        pathRegex: /api/vote
      isRetryable: false       # never blind-retry a non-idempotent write
  retryBudget:
    retryRatio: 0.1            # retries may add at most 10% to live traffic
    minRetriesPerSecond: 5     # plus this free floor for low-RPS routes
    ttl: 10s                   # rolling window for computing the ratio

The discipline that prevents outages: only mark idempotent routes isRetryable: true. A retried POST /vote double-counts; a retried POST /charge double-bills. Idempotency is a correctness property of the endpoint, not a mesh setting — the mesh only enforces what you assert.

5. Traffic splitting for canary by golden metrics

Linkerd does weighted splits with the SMI TrafficSplit resource, and the data plane already emits the golden metrics (success rate, RPS, p50/p95/p99) you need to gate a canary. Run web stable at 90% and web-canary at 10%:

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: web-split
  namespace: emojivoto
spec:
  service: web                 # the apex Service clients address
  backends:
    - service: web
      weight: 90
    - service: web-canary
      weight: 10

Watch the canary’s golden metrics live before you shift more weight:

linkerd viz -n emojivoto stat deploy/web-canary --from deploy/vote-bot
# NAME        MESHED   SUCCESS   RPS   LATENCY_P95   LATENCY_P99
# web-canary     1/1   100.00%  4.2   12ms          28ms

The promotion loop is mechanical: hold weight, watch success rate and p99 for a few minutes, step the weight up, repeat. Flagger automates exactly this loop against Linkerd if you want it controller-driven — the metric source is the same proxy data.

6. Multicluster: gateway and service mirroring

Linkerd connects clusters with a gateway plus a service-mirror controller. The mirror watches a target cluster for exported services and creates local mirror Services (named <svc>-<cluster>) that resolve through the remote gateway. Cross-cluster traffic stays mTLS’d end to end, with the gateway as the only exposed ingress.

Both clusters must share the same trust anchor — that is what lets a proxy in west validate a proxy in east. Install the multicluster extension on both, with identical ca.crt.

# Install the gateway + service-mirror on each cluster
for ctx in west east; do
  linkerd --context=${ctx} multicluster install | \
    kubectl --context=${ctx} apply -f -
done

linkerd --context=west multicluster check
linkerd --context=east multicluster check

Link west so it mirrors from east. The link command, run against the target, emits a Link CR (gateway address, gateway identity, mirror credentials) you apply on the source:

linkerd --context=east multicluster link --cluster-name east | \
  kubectl --context=west apply -f -

linkerd --context=west multicluster gateways
# CLUSTER  ALIVE  NUM_SVC  LATENCY
# east     True         2     31ms

Nothing mirrors until you explicitly export a service — this is opt-in by design, so you never accidentally expose an internal service cross-cluster:

# On east: expose podinfo to linked clusters
kubectl --context=east -n test label svc podinfo mirror.linkerd.io/exported=true

A podinfo-east Service now appears in west and resolves to the east gateway:

kubectl --context=west -n test get svc
# NAME           TYPE        CLUSTER-IP      PORT(S)
# podinfo-east   ClusterIP   10.43.81.12     9898/TCP

Call it like any local Service: http://podinfo-east.test.svc.cluster.local:9898. Be honest about the cost: gateway mode adds a network hop and the gateway’s latency to every cross-cluster call — fine for failover, not where you want chatty east-west traffic by default.

7. Automatic failover when a cluster degrades

The linkerd-failover operator turns a TrafficSplit into an active/standby controller. You declare a primary backend; when its success rate collapses, the operator gradually shifts weight to the standby backends — including a mirrored service on another cluster — and shifts back when the primary recovers.

Install the operator (requires linkerd-smi on 2.12+ since SMI is no longer bundled):

helm repo add linkerd https://helm.linkerd.io/stable
helm repo update
helm install linkerd-failover -n linkerd-failover --create-namespace \
  linkerd/linkerd-failover

Declare local podinfo as primary, with the mirrored podinfo-east as standby (weight 0 until needed). Two annotations/labels drive it: failover.linkerd.io/primary-service names the primary, and the controlled-by label tells the operator to manage this split.

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: podinfo
  namespace: test
  annotations:
    failover.linkerd.io/primary-service: podinfo
  labels:
    failover.linkerd.io/controlled-by: linkerd-failover
spec:
  service: podinfo
  backends:
    - service: podinfo           # local primary
      weight: 1
    - service: podinfo-east      # mirrored standby on east
      weight: 0

When local podinfo starts failing health checks, the operator drains weight off the primary and onto podinfo-east; when it recovers, weight returns. The failover is gradual on purpose — dumping 100% onto the standby in one step can overload the surviving cluster or blow your latency SLO from the added gateway hop. Let it ramp.

Verify

Run this top to bottom after any mesh change; treat a non-green check as a release blocker.

# Control plane, data plane, and extension health
linkerd check
linkerd check --proxy
linkerd viz check

# mTLS is live for app-to-app traffic (SECURED = √)
linkerd viz -n emojivoto edges deployment
linkerd viz -n emojivoto tap deploy/web | grep -m1 'tls=true'

# Retry/timeout policy resolves for a route
linkerd viz -n emojivoto stat deploy/web --from deploy/vote-bot

# Multicluster link is alive and gateways reachable
linkerd --context=west multicluster check
linkerd --context=west multicluster gateways

# Failover split is being reconciled
kubectl -n test get trafficsplit podinfo -o jsonpath='{.spec.backends}'

Wire linkerd check --proxy into CI as a gate so a cert nearing expiry or a half-meshed deployment fails the pipeline instead of paging you.

Enterprise scenario

A payments platform ran two regional EKS clusters, us-east-1 and us-west-2, each with a full copy of the authorization service. The constraint was a regulatory hard line: cardholder-data traffic had to be encrypted in transit and the auth path had to survive the loss of an entire region without manual intervention or a human-in-the-loop DNS change. Their prior setup leaned on Route 53 health-check failover, which took 90+ seconds to flip and occasionally black-holed in-flight requests during cutover.

They moved auth onto Linkerd with a shared offline-rooted trust anchor across both clusters and the issuer rotated quarterly via GitOps. mTLS came for free with meshing — nothing extra to enforce or audit. They exported the west auth service into east and put a linkerd-failover-controlled TrafficSplit in front of it, with the local auth service as primary and the mirrored west service as standby.

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: auth
  namespace: payments
  annotations:
    failover.linkerd.io/primary-service: auth
  labels:
    failover.linkerd.io/controlled-by: linkerd-failover
spec:
  service: auth
  backends:
    - service: auth                # local us-east primary
      weight: 1
    - service: auth-uswest2        # mirrored standby
      weight: 0

The hard-won lesson was on the retry side. Their first cut marked the auth verification route isRetryable: true with a 3-attempt limit and no budget tuning. During a partial degradation, retries amplified load on the already-struggling primary and delayed the failover trip. The fix was a retryBudget of retryRatio: 0.1 with minRetriesPerSecond: 5, plus marking only the idempotent verify (GET) retryable and the capture (POST) explicitly not. Result: regional failover completed in single-digit seconds with no dropped transactions, and retries could no longer push a wobbling region over the edge.

Checklist

linkerdservice-meshmtlsmulticlusterreliability

Comments

Keep Reading