A payments platform team has run Istio in sidecar mode for two years and is paying for it: every pod carries an Envoy sidecar that adds ~120 MB of memory and 30–50 ms of cold-start latency, and a mesh-wide upgrade means restarting 3,000 pods across forty teams in a coordinated, weekend-long change window that the on-call rotation has come to dread. Their actual security requirement is narrower than the cost they pay for it — they need mTLS everywhere plus L7 authorization on exactly the dozen services that handle card data (only the accounts service may call POST /ledger/v1/debit, and only with a valid Entra-issued JWT carrying the right scope). Istio ambient mode is built for precisely this asymmetry: it gives every workload mTLS and L4 policy through a per-node ztunnel with zero sidecars, and lets you bolt on a waypoint proxy — a standalone Envoy — only for the namespaces or services that genuinely need L7 rules. This guide deploys ambient on an existing cluster, stands up waypoints for the sensitive namespace, and enforces real L7 AuthorizationPolicy resources, end to end.
Prerequisites
- A Kubernetes cluster on v1.28+ (AKS, EKS, or GKE) with at least three nodes, and
kubectlcontext pointed at it. Ambient needs the Istio CNI, so a managed cluster where you can run a privileged DaemonSet. - Helm 3.15+ and istioctl 1.24+ on your workstation (
istioctl version --remote=falseto confirm). Ambient went GA for L7/waypoints in the 1.24 line; do not attempt this on 1.22 or earlier. - Cluster-admin RBAC for the install (CRDs + a CNI DaemonSet), and the Kubernetes Gateway API CRDs (
v1.2.0) installed — waypoints are provisioned through the Gateway API, not Istio’s legacyGateway. - An OIDC issuer you already trust for service and user identity. Here that is Microsoft Entra ID (workforce SSO is brokered from Okta to Entra, so the JWTs your services validate are first-class Entra tokens). Have the issuer URL and JWKS URI handy.
- HashiCorp Vault reachable from the cluster for any application secrets the waypoint’d services need — it is not in the mTLS path (Istio’s own CA handles workload certs), but it holds the third-party API tokens those services use.
- A demo namespace
paymentswith at least two deployments (accounts,ledger) and acurl-capable client pod, so the policies have something real to gate.
Target topology
The mesh splits into two planes that ambient deliberately keeps separate. The secure overlay (L4) is delivered by ztunnel, a Rust DaemonSet running one instance per node; it transparently captures pod traffic and gives every workload in an ambient namespace mTLS and identity (SPIFFE) without anything injected into the pod. Above it sits the L7 plane: a waypoint proxy — a normal Envoy deployment you scale and place yourself — that ztunnel routes through only for namespaces or services you have opted in. L4 authorization (who may connect to whom, on which port) lives in ztunnel; L7 authorization (which HTTP method, path, and JWT claim) lives in the waypoint. Traffic from a client pod flows: client → its node’s ztunnel → (if the destination is waypoint-enabled) the payments waypoint Envoy, where the AuthorizationPolicy and RequestAuthentication rules run → destination’s node ztunnel → destination pod. Everything is observed by Dynatrace (OneAgent + the Istio/Envoy integration scraping waypoint and ztunnel metrics) and Datadog as the second pane for the mesh dashboards; CrowdStrike Falcon sensors run on every node for runtime threat detection on the ztunnel and waypoint pods themselves.
1. Install the Gateway API and Istio in ambient mode
Waypoints are Gateway API objects, so those CRDs must exist before Istio. Install them, then install Istio with the ambient profile via Helm (the profile that wires up ztunnel and the CNI; the legacy istioctl install works too, but Helm is what your GitHub Actions / Argo CD pipeline will template).
# 1a. Gateway API CRDs (pinned, not 'latest')
kubectl apply -f \
"https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml"
# 1b. Istio Helm repo
helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update
# 1c. Base CRDs into istio-system
kubectl create namespace istio-system
helm install istio-base istio/base -n istio-system --version 1.24.2 --wait
# 1d. The CNI in ambient mode (handles traffic redirection, replaces init-container hacks)
helm install istio-cni istio/cni -n istio-system --version 1.24.2 \
--set profile=ambient --wait
# 1e. The istiod control plane, ambient profile
helm install istiod istio/istiod -n istio-system --version 1.24.2 \
--set profile=ambient --wait
# 1f. ztunnel — the per-node L4 secure overlay DaemonSet
helm install ztunnel istio/ztunnel -n istio-system --version 1.24.2 --wait
Verify the data plane came up. You want istiod, the istio-cni-node DaemonSet, and the ztunnel DaemonSet all ready, one ztunnel pod per node:
kubectl get pods -n istio-system
kubectl get daemonset -n istio-system # istio-cni-node and ztunnel: DESIRED == READY
istioctl version # control plane + data plane on 1.24.2
If you manage clusters as code (you should), this same release is expressed as a Helm release in Terraform (helm_release resources) or an Argo CD Application so the mesh version is GitOps-pinned and an upgrade is a reviewed pull request — not the hand-run, 3,000-pod restart that sidecar mode forced. Ansible handles any node-level prerequisites (kernel modules, the privileged-container policy) on self-managed nodes before the chart lands.
2. Enroll a namespace into the ambient data plane
Adding a workload to ambient is a single label on its namespace — no pod restart, no sidecar injection, no redeploy. This is the headline operational win: existing pods join the secure overlay in place.
# Opt the payments namespace into ambient (L4 mTLS via ztunnel)
kubectl label namespace payments istio.io/dataplane-mode=ambient
# Confirm — running pods are now in the mesh WITHOUT having restarted
kubectl get pods -n payments -o wide
istioctl ztunnel-config workloads --namespace payments
That last command lists every workload ztunnel now sees, each with a SPIFFE identity like spiffe://cluster.local/ns/payments/sa/accounts. At this point you already have mTLS between every pod in payments and L4 identity — but no L7 rules yet, and no waypoint. Sidecar mode could not give you this without injecting into and restarting all of them.
A quick proof that mTLS is live before any policy: exec into a client and call ledger; the connection is now encrypted and identity-bearing at L4 even though nothing changed in the pod spec.
kubectl exec -n payments deploy/accounts -- \
curl -s -o /dev/null -w "%{http_code}\n" http://ledger:8080/healthz
3. Lock down L4 with a default-deny ztunnel policy
Before adding L7, establish a zero-trust L4 baseline: deny all traffic into payments, then explicitly allow only the identities that should connect. These AuthorizationPolicy resources with no to/HTTP rules are enforced by ztunnel (L4) — cheap, sidecar-free, and they apply mesh-wide regardless of waypoints.
# default-deny everything entering the payments namespace (L4)
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: payments-default-deny
namespace: payments
spec:
{} # empty spec == deny-all for the namespace
---
# allow only the 'accounts' service identity to reach 'ledger' on 8080
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: ledger-allow-accounts-l4
namespace: payments
spec:
selector:
matchLabels:
app: ledger
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/payments/sa/accounts"]
to:
- operation:
ports: ["8080"]
kubectl apply -f l4-policies.yaml
# A pod with a different service account is now refused at L4 by ztunnel:
kubectl run probe -n payments --rm -it --image=curlimages/curl --restart=Never -- \
curl -s -o /dev/null -w "%{http_code}\n" http://ledger:8080/healthz # expect connection reset / 000
L4 policy is necessary but blunt — it cannot say “only POST /ledger/v1/debit.” For that you need L7, and for L7 you need a waypoint.
4. Deploy a waypoint proxy for the namespace
istioctl waypoint generates a Gateway API Gateway resource of class istio-waypoint; istiod sees it and provisions a dedicated Envoy deployment. Bind it to the whole payments namespace so all services in it can carry L7 policy. Crucially, a waypoint is just a Deployment — you size and scale it like any service, the antithesis of one sidecar per pod.
# Generate and apply a namespace-scoped waypoint named 'payments-waypoint'
istioctl waypoint apply -n payments \
--name payments-waypoint \
--for service \
--enroll-namespace # label the namespace to route its services via this waypoint
# Inspect what was created (a Gateway + an Envoy Deployment/Service)
kubectl get gateway -n payments
kubectl get pods -n payments -l gateway.networking.k8s.io/gateway-name=payments-waypoint
istioctl waypoint list -n payments
--enroll-namespace stamps the namespace with istio.io/use-waypoint: payments-waypoint, so ztunnel now routes traffic destined for services in payments through this Envoy before delivery. To scope a waypoint to a single workload instead of the namespace, you would label just that service:
# Alternative: route ONLY the 'ledger' service through the waypoint
kubectl label service ledger -n payments istio.io/use-waypoint=payments-waypoint
Scale and pin the waypoint for production — it is in the request path for the sensitive services, so give it an HPA and a PodDisruptionBudget:
kubectl -n payments scale deploy/payments-waypoint --replicas=3
kubectl -n payments autoscale deploy/payments-waypoint --min=3 --max=10 --cpu-percent=70
5. Validate JWTs at the waypoint with RequestAuthentication
L7 authorization on a token requires Istio to first authenticate the JWT. RequestAuthentication tells the waypoint which issuer and JWKS to trust — here Microsoft Entra ID, the IdP that workforce logins from Okta are federated into, so a token minted for a user or a service principal validates natively. This resource only parses and verifies the token; it does not deny anything on its own.
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
name: payments-entra-jwt
namespace: payments
spec:
targetRefs:
- kind: Service
group: ""
name: ledger
jwtRules:
- issuer: "https://login.microsoftonline.com/<TENANT_ID>/v2.0"
jwksUri: "https://login.microsoftonline.com/<TENANT_ID>/discovery/v2.0/keys"
audiences:
- "api://payments-ledger"
forwardOriginalToken: true # pass the JWT on to the app for its own audit log
kubectl apply -f request-auth.yaml
A subtle but critical point: RequestAuthentication alone makes invalid tokens rejected but missing tokens allowed (the request is simply treated as unauthenticated). The deny happens in the next step.
6. Enforce the L7 AuthorizationPolicy
Now the payoff — an AuthorizationPolicy enforced by the waypoint (because it has HTTP to rules and JWT when conditions) that says: only the accounts workload identity, presenting a valid Entra JWT carrying scope ledger.debit, may call POST /ledger/v1/debit. Everything else is denied.
# 6a. Require a valid principal AND a valid request-principal (JWT) for any L7 access
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: ledger-require-jwt
namespace: payments
spec:
targetRefs:
- kind: Service
group: ""
name: ledger
action: DENY
rules:
- from:
- source:
notRequestPrincipals: ["*"] # deny anything WITHOUT a valid JWT
---
# 6b. Allow ONLY accounts -> POST /ledger/v1/debit with the right scope
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: ledger-debit-allow
namespace: payments
spec:
targetRefs:
- kind: Service
group: ""
name: ledger
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/payments/sa/accounts"]
to:
- operation:
methods: ["POST"]
paths: ["/ledger/v1/debit"]
when:
- key: request.auth.claims[scp]
values: ["ledger.debit"]
kubectl apply -f l7-authz.yaml
istioctl waypoint status -n payments # policies programmed into the waypoint
You now have method-, path-, identity-, and claim-scoped authorization running in a standalone Envoy that touches only the services you opted in — and not one sidecar anywhere in the cluster.
Validation
Prove each rule does what you claimed. Mint two test tokens from Entra (the right-scope one and a wrong-scope one) — in a pipeline this is a client-credentials grant; locally use a saved token in $GOOD / $BAD.
# From the accounts pod (correct identity), WITH a valid scoped JWT -> 200
kubectl exec -n payments deploy/accounts -- sh -c \
'curl -s -o /dev/null -w "%{http_code}\n" -X POST \
-H "Authorization: Bearer '"$GOOD"'" http://ledger:8080/ledger/v1/debit' # 200
# Same identity, NO token -> 403 (DENY from 6a)
kubectl exec -n payments deploy/accounts -- \
curl -s -o /dev/null -w "%{http_code}\n" -X POST http://ledger:8080/ledger/v1/debit # 403
# Valid token but WRONG scope -> 403 (fails the 'when' claim check)
kubectl exec -n payments deploy/accounts -- sh -c \
'curl -s -o /dev/null -w "%{http_code}\n" -X POST \
-H "Authorization: Bearer '"$BAD"'" http://ledger:8080/ledger/v1/debit' # 403
# Correct identity + token but a method/path NOT allowed -> 403
kubectl exec -n payments deploy/accounts -- sh -c \
'curl -s -o /dev/null -w "%{http_code}\n" -X DELETE \
-H "Authorization: Bearer '"$GOOD"'" http://ledger:8080/ledger/v1/debit' # 403
Watch the decisions live in the waypoint’s Envoy logs, and confirm the metrics are flowing to your observability stack:
# RBAC allow/deny decisions in the waypoint
kubectl logs -n payments deploy/payments-waypoint -f | grep -i "rbac"
# Envoy/waypoint metrics that Dynatrace and Datadog scrape
kubectl exec -n payments deploy/payments-waypoint -- \
pilot-agent request GET stats/prometheus | grep -E "istio_requests_total|rbac"
In Dynatrace you should see the payments-waypoint service with per-route request counts and a denied-request rate; Datadog’s Istio integration shows the same istio_requests_total{response_code="403"} series, which you alert on. Wiz (and Wiz Code scanning the manifests in the repo before merge) flags any namespace labelled dataplane-mode=ambient that has no default-deny AuthorizationPolicy, or a waypoint exposed without a RequestAuthentication — posture gaps the YAML review should never let through.
Rollback / teardown
Ambient is reversible at every layer, in order of blast radius — peel off L7 first, then the namespace, then the mesh. Removing a waypoint instantly drops L7 enforcement but leaves L4 mTLS intact (ztunnel is untouched), which is exactly the graceful-degradation path you want during an incident.
# 1. Remove L7 policy + waypoint (L4 mTLS via ztunnel stays on)
kubectl delete authorizationpolicy ledger-debit-allow ledger-require-jwt -n payments
kubectl delete requestauthentication payments-entra-jwt -n payments
kubectl label namespace payments istio.io/use-waypoint- # stop routing via waypoint
istioctl waypoint delete payments-waypoint -n payments
# 2. Remove the namespace from ambient entirely (back to plain pods, no restart)
kubectl delete authorizationpolicy --all -n payments
kubectl label namespace payments istio.io/dataplane-mode-
# 3. Full mesh uninstall (only if abandoning Istio)
helm uninstall ztunnel istiod istio-cni istio-base -n istio-system
kubectl delete namespace istio-system
Roll these back through the same Argo CD / GitHub Actions path you rolled them out with, so a teardown is an auditable revert and ServiceNow carries the change record — the mesh team raises a normal change ticket, and a guardrail trip (a spike in waypoint 403s, or Wiz finding an ambient namespace with no deny policy) auto-opens a ServiceNow incident rather than living only in a log line.
Common pitfalls
- Forgetting the Gateway API CRDs.
istioctl waypoint applyfails or theGatewaystaysUnprogrammedifv1.2.0of the Gateway API CRDs is not installed first. This is the single most common ambient mistake. - Expecting L7 policy without a waypoint. An
AuthorizationPolicywith HTTPmethods/pathsis silently a no-op against a destination that has no waypoint — ztunnel only enforces L4. If a path-scoped rule “isn’t working,” check that the service is actually routed through a waypoint (istioctl waypoint status -n payments). - The missing-vs-invalid JWT gap.
RequestAuthenticationrejects bad tokens but lets no-token requests through as unauthenticated. You must add the explicitnotRequestPrincipals: ["*"]DENY (step 6a) or anonymous callers sail past. - Waypoint as an unscaled singleton. It defaults to one replica and sits in the request path; under load it becomes a bottleneck or a SPOF. Give it an HPA and a PodDisruptionBudget (step 4).
- Wrong
targetRefgranularity. Binding a policy to the namespace when you meant a service (or vice versa) changes what it covers. Match thetargetRefskind/name to the scope you enrolled the waypoint for. audiencesmismatch. If the Entra app registration’s exposed API URI does not equal theaudiencesvalue, every token is rejected with a confusing 401 at the waypoint. Pin them to the same string (api://payments-ledger).
Security notes
Ambient is zero-trust by construction: mTLS and SPIFFE identity for every enrolled workload via ztunnel, default-deny L4, and JWT-gated L7 only where it matters — all without sidecars to exploit or restart. Keep the trust boundary honest: validate tokens against Entra ID (federated from Okta for human callers) at the waypoint, never trust an unauthenticated request, and stamp policies to the narrowest identity + method + path + claim that works. Istio’s own CA issues and rotates the workload certificates, so HashiCorp Vault stays out of the mesh-cert path and is used only for the application secrets (third-party API tokens) the gated services consume. Run CrowdStrike Falcon sensors on every node so the ztunnel and waypoint pods themselves are under runtime threat detection, and let Wiz / Wiz Code continuously verify that no ambient namespace is missing its default-deny policy and no waypoint is missing a RequestAuthentication — the posture backstop behind the in-cluster controls. For any north-south traffic, Akamai terminates TLS and applies WAF/bot protection at the edge before requests reach the cluster’s ingress, with the waypoints enforcing east-west L7 authorization once inside.
Cost notes
The economic case is the whole point. Sidecar mode costs one Envoy per pod — at 3,000 pods, ~360 GB of memory and 3,000 proxy restarts per upgrade. Ambient costs one ztunnel per node (a few dozen, lightweight Rust) plus one waypoint Deployment per opted-in scope (here, three replicas for payments). On a forty-node cluster that is roughly 40 ztunnels + 3 waypoint pods versus 3,000 sidecars — a double-digit reduction in proxy memory and CPU, and upgrades that no longer restart application pods at all. You pay for L7 Envoys only where you enforce L7, so a cluster where ten of forty namespaces need HTTP policy runs ten small waypoint Deployments instead of meshing everything. Right-size each waypoint to its real RPS with the HPA above rather than over-provisioning, keep ztunnel on every node (it is the cheap, mandatory L4 layer), and treat any namespace that doesn’t need L7 as waypoint-free — the single biggest lever ambient gives you over the old sidecar bill.