If you have run AGIC (the Application Gateway Ingress Controller) at any scale, you already know its failure mode: a single pod reconciling the entire Application Gateway config, mutating an ARM resource on every Ingress change, and grinding through multi-minute control-plane updates while one noisy namespace starves everyone else. Application Gateway for Containers (AGC) is Microsoft’s clean break from that design. The data plane is a managed, regional, near-real-time proxy fleet; the control plane is the ALB Controller running inside your cluster; and — the change that reorganises everything — it speaks the Kubernetes Gateway API, not the legacy Ingress API. The result is routing that propagates in seconds, a blast radius scoped per Gateway instead of per gateway resource, and first-class weighted splitting, backend re-encryption, mTLS and header/path/query routing expressed as portable Kubernetes objects.
This guide walks the full production path and treats AGC as an operable system, not a demo. You will install the ALB Controller against workload identity, provision a managed AGC from an ApplicationLoadBalancer CRD, expose it with Gateway and HTTPRoute, then layer on weighted traffic splitting, BackendTLSPolicy re-encryption and mTLS, custom HealthCheckPolicy probes, and header/path/query routing — every command in the Gateway API variant. Because AGC has a precise mapping from Gateway API objects to AGC constructs, a precise set of RBAC roles, and a precise set of status conditions that tell you exactly where a request stalls, this is also a reference you keep open mid-incident: every CRD field, every role, every status.conditions value, every error string and every limit is laid out as a scannable table beside the prose and the YAML.
By the end you will stop treating AGC as a black box. When a route stays Accepted=False, a split lands 50/50 instead of 90/10, a re-encrypt leg throws 502, or the GatewayClass never goes Accepted, you will know which CRD field, which controller log line, or which federated-credential subject is the cause — and the exact kubectl or az command that confirms it. AGC does still support a legacy Ingress path, but if you are adopting AGC in 2026 you adopt Gateway API: that is where all the routing capability lives, and it is the only path Microsoft is investing behind.
What problem this solves
AGIC’s architecture made a category of pain inevitable. Every Ingress change anywhere in the cluster triggered a full Application Gateway configuration push to ARM, so a single team’s frequent deploys produced 4–7 minute propagation windows during which unrelated services saw stale routing. The controller was a single point of reconciliation; a malformed Ingress could wedge the whole pipeline; and because the gateway was a Standard_v2 ARM resource, every change paid ARM’s control-plane latency and throttling. Worse, sophisticated L7 routing (header matches, weighted canaries, per-service backend mTLS) was either impossible or expressed through a sprawl of annotations that no two engineers spelled the same way.
What breaks without AGC’s model: canary releases that should be a one-line weight edit become deploy-time gymnastics; a PCI requirement to re-encrypt to the cardholder workload gets quietly skipped because AGIC’s backend-mTLS story was awkward, leaving an audit finding waiting to happen; and the noisy-neighbour reconcile storms turn one team’s churn into everyone’s latency. Teams paper over it by sharding gateways (one AGIC per namespace, multiplying cost and operational surface) or by freezing deploys during business hours.
Who hits this: any platform team running ingress for more than a handful of namespaces on AKS, anyone who needs progressive delivery (Argo Rollouts / Flagger) wired to a real traffic-splitting data plane, and any regulated estate that must prove encryption all the way to the pod. AGC fixes the architecture: near-real-time programming from an in-cluster controller, per-Gateway blast radius, native weighted splits, and BackendTLSPolicy-driven re-encryption/mTLS — all in Gateway API manifests that travel with the workload.
To frame the whole field before the deep dive, here is what AGC changes versus AGIC, the symptom each change removes, and where in this article you act on it:
| Dimension | AGIC (the old way) | AGC (this article) | Symptom it removes | Where you act |
|---|---|---|---|---|
| Data plane | Standard_v2 Application Gateway (ARM) |
Managed regional proxy fleet | ARM throttling on every change | §AGC architecture |
| Control plane | One pod mutating ARM per Ingress |
ALB Controller writing a config plane | Single-point reconcile wedge | §ALB Controller install |
| Propagation | 4–7 min full config push | Single-digit seconds | Stale routing during peer deploys | §First HTTPRoute |
| API | Ingress + annotation sprawl |
Gateway API CRDs | Inconsistent annotation routing | §Gateway + HTTPRoute |
| Blast radius | Whole gateway per change | Per-Gateway object |
Noisy-neighbour reconcile storms | §Enterprise scenario |
| Traffic split | Awkward / annotation-driven | Native backendRefs weights |
Canary as deploy gymnastics | §Weighted splitting |
| Backend mTLS | Limited / awkward | BackendTLSPolicy + clientCert |
PCI re-encrypt gap | §Backend TLS & mTLS |
Learning objectives
By the end of this article you can:
- Explain how AGC differs from AGIC at the data-plane and control-plane level, and choose managed vs bring-your-own (BYO) deployment for greenfield versus governed estates.
- Install the ALB Controller against OIDC + workload identity, federating a user-assigned managed identity to the
azure-alb-system:alb-controller-saservice account and granting the purpose-builtAppGw for Containers Configuration Managerrole. - Provision a managed AGC from the
ApplicationLoadBalancerCRD against a delegated/24subnet, and read its provisioningstatus.conditionsto confirm success. - Expose workloads with the Gateway API: a
Gatewaywith an HTTPS listener terminating a cert from a Secret, andHTTPRouteobjects bound to it, readingProgrammed,AcceptedandResolvedRefsto localise failures. - Ship weighted traffic splitting and canary ramps with
backendRefsweights (includingweight: 0drain), driven by Argo Rollouts/Flagger through the Gateway API provider. - Enforce backend re-encryption and mTLS with
BackendTLSPolicy(sni,verify,caCertificateRef,subjectAltName,clientCertificateRef) and tune health withHealthCheckPolicy. - Compose header, path and query routing with Gateway API
matchesandfilters(URLRewrite,RequestHeaderModifier), and reason about match precedence. - Diagnose every common AGC failure —
GatewayClassnot Accepted, listener cert errors, skewed splits, re-encrypt 502s, cross-namespaceResolvedRefs=False— from the exactkubectl/azcommand that confirms each.
Prerequisites & where this fits
You need an AKS cluster you can administer, with OIDC issuer and workload identity enabled (we turn them on idempotently below), and a dedicated subnet — minimum /24, delegated to Microsoft.ServiceNetworking/trafficControllers — that AGC injects its data plane into. You should be comfortable with kubectl, Helm, and reading Kubernetes object status.conditions, and have az configured with rights to create identities and role assignments in the cluster’s resource groups. Familiarity with the Gateway API object model (GatewayClass / Gateway / HTTPRoute) is assumed at a conceptual level; if it is new, read Kubernetes Gateway API: HTTPRoute, Traffic Splitting & Ingress Migration first — this article is the Azure-managed implementation of exactly those primitives.
This sits in the AKS networking & ingress track. Upstream of it is the managed-Kubernetes decision in Understanding Managed Kubernetes: AKS vs EKS vs GKE Compared and the broader cluster-networking picture in Production AKS: Networking & Observability. It is the containers-native cousin of the classic data-plane covered in Application Gateway v2 with WAF, L7 Routing & TLS in Production and the end-to-end TLS patterns in Application Gateway with WAF, mTLS & End-to-End TLS. The identity mechanism it depends on is detailed in Azure Key Vault & Workload Identity for Secrets, and it pairs naturally with a mesh — compare with AKS Istio Service Mesh Add-on: mTLS, Ingress & Egress when you need pod-to-pod mTLS behind the gateway.
A quick map of who owns what during an AGC incident, so you escalate to the right team fast:
| Layer | What lives here | Who usually owns it | Failure classes it can cause |
|---|---|---|---|
| DNS / client | CNAME to AGC FQDN, TLS | Frontend / SRE | No resolution; cert name mismatch |
| AGC data plane | Managed proxy, listeners, routing rules | Microsoft (managed) | 502/503 if backend unhealthy; rule eval |
| ALB Controller | Reconciles CRDs → config plane | Platform / cluster team | Nothing programmed; Accepted=False |
| Workload identity | Federated cred, role assignment | Platform + identity | Controller 401; provisioning stalls |
| Delegated subnet | /24, trafficControllers delegation |
Network team | AGC won’t inject; association fails |
| Gateway API CRDs | Gateway, HTTPRoute, policies |
App + platform | Routing, splits, mTLS misconfig |
| Backend pods / Services | Workloads, TLS, health paths | App / dev team | Re-encrypt 502; probe eviction |
Core concepts
Five mental-model shifts make every later step obvious.
The proxy is not an Application Gateway v2. AGC is a separate product backed by Microsoft.ServiceNetworking/trafficControllers, not Microsoft.Network/applicationGateways. There is no Standard_v2 SKU, no per-Ingress ARM mutation, and — critically — no WAF policy built in. Routing changes propagate in seconds because the controller writes to a managed config plane rather than re-deploying a gateway resource. If you need a WAF in front of AGC today, you place it upstream (Front Door, or a classic Application Gateway v2 fronting the AGC FQDN), not on the AGC itself.
The control plane lives in your cluster. The ALB Controller is a Helm-installed deployment in the azure-alb-system namespace. It watches Gateway API objects (and its own AGC CRDs), and programs the managed data plane. It authenticates to Azure as a user-assigned managed identity federated to its Kubernetes service account — no secrets, no service-principal passwords. The GatewayClass named azure-alb-external is what the chart registers; Accepted=True on it is your green light that the controller is alive and authorised.
Gateway API objects map cleanly onto AGC constructs. This mapping is worth memorising because every diagnosis traces back to it:
| Gateway API object | AGC construct it becomes | Carries | Status to watch |
|---|---|---|---|
GatewayClass (azure-alb-external) |
The AGC integration itself | Controller binding | Accepted=True |
Gateway |
An AGC frontend + its listeners | Hostnames, ports, TLS | Programmed=True, an address |
HTTPRoute |
AGC routing rules | path/header/query matches | Accepted, ResolvedRefs |
backendRefs with weight |
A weighted traffic split | Relative weights | (reflected in ResolvedRefs) |
BackendTLSPolicy |
Backend re-encryption / mTLS | SAN, CA, client cert | Accepted on the policy |
HealthCheckPolicy |
Per-backend health probe | path, interval, codes | (reflected in backend health) |
Two deployment flavours, two ownership models. AGC can be created and lifecycle-managed by the controller from an in-cluster CRD (managed mode), or provisioned by you via ARM/Bicep/Terraform with the controller only referencing it (BYO mode). Managed mode is faster and keeps everything in cluster manifests; BYO mode fits enterprises where a platform-networking team must own the AGC, its subnet delegation and its Private Link surface independently of any cluster. We deploy managed mode end to end, then show the BYO association, because most regulated estates land there.
| Mode | Who creates the AGC + association | Lifecycle owner | Use when |
|---|---|---|---|
| Managed by ALB Controller | The controller, from an ApplicationLoadBalancer CRD |
Kubernetes manifests | Greenfield, GitOps-driven, lifecycle in-cluster |
| Bring your own (BYO) | You, via ARM/Bicep/Terraform | The network team’s IaC | Central team owns subnet/RBAC/Private Link governance |
Weights are relative, not percentages. A traffic split is multiple backendRefs under one HTTPRoute rule, each with a weight. AGC distributes requests proportionally to the sum — so 90/10 and 9/1 behave identically, and weight: 0 drains a backend to zero without deleting the ref (keeping rollback one edit away). Because propagation is near-real-time, a canary ramp is just a sequence of kubectl applys, which is exactly why Argo Rollouts and Flagger drive AGC through the Gateway API provider.
The vocabulary in one table
Before the deep sections, pin down every moving part. The glossary repeats these for lookup; this is the mental model side by side:
| Term | One-line definition | Where it lives | Why it matters |
|---|---|---|---|
| AGC | Managed L7 proxy fleet (trafficControllers) |
Azure (regional) | The data plane; replaces AGIC’s gateway |
| ALB Controller | In-cluster reconciler that programs AGC | azure-alb-system ns |
No controller → nothing routes |
GatewayClass |
The azure-alb-external binding |
Cluster-scoped | Accepted=True = controller live |
Gateway |
Frontend + listeners (hostnames, TLS) | App namespace | Emits the AGC FQDN address |
HTTPRoute |
Routing rules + weighted backends | App namespace | Where splits and matches live |
ApplicationLoadBalancer |
CRD that creates a managed AGC | Infra namespace | Managed-mode provisioning |
BackendTLSPolicy |
Re-encrypt / mTLS to pods | App namespace | End-to-end encryption |
HealthCheckPolicy |
Per-Service probe override | App namespace | Replaces default GET / |
| Workload identity | Federated SA → managed identity | Azure + cluster | How the controller authenticates |
| Delegated subnet | /24 for trafficControllers |
The VNet | Where AGC injects its data plane |
ReferenceGrant |
Cross-namespace ref permission | Target namespace | Lets a route reach a foreign Secret/Service |
| Weight | Relative share of a backend | HTTPRoute rule |
0 = drain; relative not % |
AGC architecture, and how it differs from AGIC
Two architectural facts drive everything operational. First, the data plane is managed and regional: you never patch it, scale it, or pay ARM latency to change it. The controller writes a desired-state config and the fleet converges in seconds. Second, the control plane is in your cluster and identity-bound: the ALB Controller is the only thing with rights to program the AGC, and it earns those rights through a federated managed identity, not a stored secret.
The practical consequence is a different operational posture than AGIC. With AGIC you debugged ARM deployments and Application Gateway config; with AGC you debug Kubernetes objects and a controller. Here is the side-by-side that matters when you are deciding whether to migrate and what to expect:
| Property | AGIC | AGC | Operational consequence |
|---|---|---|---|
| Backing resource | Microsoft.Network/applicationGateways |
Microsoft.ServiceNetworking/trafficControllers |
Different ARM API, different RBAC |
| Reconcile target | ARM gateway config | Managed config plane | Seconds vs minutes |
| API surface | Ingress + annotations |
Gateway API CRDs | Portable, typed routing |
| WAF | Built-in WAF_v2 policy | None on AGC (put upstream) | WAF moves to Front Door / AppGW v2 |
| Subnet | AppGW subnet | /24 delegated to trafficControllers |
New delegation requirement |
| Identity | AAD pod identity / MSI | Workload identity (federated) | Secretless, OIDC-based |
| Private exposure | Private frontend IP | Private Link to frontend | Private endpoint + Private DNS |
| Multi-tenancy | One gateway, shared config | Per-Gateway, scoped blast radius |
One namespace can’t stall another |
| Splitting | Annotation / awkward | Native backendRefs weights |
First-class canary |
What you lose moving off AGIC is the integrated WAF and the familiarity of Ingress; what you gain is propagation speed, blast-radius isolation, and the full Gateway API routing vocabulary. The WAF gap is the one to plan for deliberately — most teams front AGC with Front Door (Premium, with managed WAF) or keep a thin Application Gateway v2 + WAF_v2 hop for the public edge, then let AGC own all the L7 routing and backend TLS inside. The capability decision in one grid:
| If you need… | AGIC could… | AGC does… | Recommendation |
|---|---|---|---|
| Built-in WAF at the same hop | Yes | No | Front AGC with Front Door / AppGW v2 WAF |
| Sub-10s routing changes | No (4–7 min) | Yes | AGC, native |
| Per-namespace blast radius | No | Yes | AGC, one Gateway per team |
| Weighted canary, edit-to-shift | Awkward | Yes | AGC backendRefs weights |
| Backend mTLS to pods | Limited | Yes | AGC BackendTLSPolicy |
| Header/path/query routing | Annotation soup | Typed matches |
AGC Gateway API |
| Central-team-owned data plane | Possible | Yes (BYO) | AGC BYO mode |
Prerequisites and the ALB Controller install
You need OIDC issuer and workload identity on the cluster, plus the delegated subnet. Turn the cluster features on idempotently:
RG=rg-agc-prod
AKS=aks-agc-prod
LOCATION=eastus2
# Ensure OIDC + workload identity are on (idempotent on an existing cluster)
az aks update -g "$RG" -n "$AKS" \
--enable-oidc-issuer \
--enable-workload-identity
OIDC_ISSUER=$(az aks show -g "$RG" -n "$AKS" \
--query "oidcIssuerProfile.issuerUrl" -o tsv)
The infrastructure prerequisites are unforgiving in specific ways — a /25 subnet or a missing delegation fails provisioning with errors that don’t always name the real cause. Confirm each against this checklist before you install anything:
| Prerequisite | Exact requirement | How to verify | Failure if wrong |
|---|---|---|---|
| OIDC issuer | Enabled on the cluster | az aks show --query oidcIssuerProfile.enabled |
Federation has no issuer to trust |
| Workload identity | Add-on enabled | az aks show --query securityProfile.workloadIdentity |
SA token not projected; controller 401 |
| Subnet size | /24 minimum |
az network vnet subnet show --query addressPrefix |
AGC injection fails (too few IPs) |
| Subnet delegation | Microsoft.ServiceNetworking/trafficControllers |
... --query delegations |
Association cannot bind the subnet |
| Subnet emptiness | No conflicting resources | Subnet has free address space | Injection / association errors |
| Helm | v3.8+ for OCI charts | helm version |
OCI oci:// pull unsupported |
| kubelet identity / RBAC | az rights to assign roles |
az role assignment create succeeds |
Controller cannot be granted Config Manager |
The controller authenticates as a user-assigned managed identity federated to its service account. Create the identity, grant it the purpose-built role on the node resource group (where the controller manages the AGC), and Network Contributor on the subnet’s resource group so it can join the delegated subnet:
IDENTITY=alb-controller-identity
az identity create -g "$RG" -n "$IDENTITY" -l "$LOCATION"
PRINCIPAL_ID=$(az identity show -g "$RG" -n "$IDENTITY" --query principalId -o tsv)
CLIENT_ID=$(az identity show -g "$RG" -n "$IDENTITY" --query clientId -o tsv)
MC_RG=$(az aks show -g "$RG" -n "$AKS" --query nodeResourceGroup -o tsv)
MC_RG_ID=$(az group show -n "$MC_RG" --query id -o tsv)
# The controller manages AGC inside the node resource group
az role assignment create \
--assignee-object-id "$PRINCIPAL_ID" \
--assignee-principal-type ServicePrincipal \
--scope "$MC_RG_ID" \
--role "AppGw for Containers Configuration Manager"
# Reader/Network Contributor on the subnet's resource group so it can join the delegated subnet
az role assignment create \
--assignee-object-id "$PRINCIPAL_ID" \
--assignee-principal-type ServicePrincipal \
--scope "$(az group show -n "$RG" --query id -o tsv)" \
--role "Network Contributor"
The
AppGw for Containers Configuration Managerrole is purpose-built for AGC. Do not substituteContributor— least privilege here is auditable, and Microsoft scopes the built-in role exactly to thetrafficControllersand association operations the controller needs. The exact roles, their scope, and why each is required:
| Role | Scope | Why the controller needs it | Substitute? |
|---|---|---|---|
AppGw for Containers Configuration Manager |
Node resource group (or AGC scope in BYO) | Create/update AGC, frontends, associations, routing config | No — purpose-built, least privilege |
Network Contributor |
Subnet’s resource group | Join/associate the delegated subnet | Narrow to the subnet if your policy demands |
Reader (implicit via above) |
Same | Read VNet/subnet to validate delegation | Covered by Network Contributor |
Federate the identity to the controller’s service account (namespace azure-alb-system, service account alb-controller-sa). The subject string must match exactly — a typo here is the single most common “controller starts but gets 401” cause:
az identity federated-credential create \
--name alb-controller-fedcred \
--identity-name "$IDENTITY" \
-g "$RG" \
--issuer "$OIDC_ISSUER" \
--subject "system:serviceaccount:azure-alb-system:alb-controller-sa" \
--audience api://AzureADTokenExchange
The federated-credential fields and the exact value each must take:
| Field | Required value | Consequence if wrong |
|---|---|---|
--issuer |
The cluster’s OIDC issuer URL | Token issuer not trusted → 401 |
--subject |
system:serviceaccount:azure-alb-system:alb-controller-sa |
Subject mismatch → 401 (most common) |
--audience |
api://AzureADTokenExchange |
Audience rejected → token exchange fails |
--identity-name |
The UAMI you created | Credential federated to wrong identity |
Install the controller via Helm, passing the identity client ID. Pin the chart version explicitly so a helm upgrade is deliberate, not whatever floats at the tag:
az aks get-credentials -g "$RG" -n "$AKS" --overwrite-existing
helm upgrade --install alb-controller \
oci://mcr.microsoft.com/application-lb/charts/alb-controller \
--version 1.7.9 \
--namespace azure-alb-system --create-namespace \
--set albController.namespace=azure-alb-system \
--set albController.podIdentity.clientID="$CLIENT_ID"
Confirm both the controller and its webhook are healthy, and that the GatewayClass is accepted, before going further:
kubectl get pods -n azure-alb-system
kubectl get gatewayclass azure-alb-external -o yaml | grep -A5 status:
azure-alb-external is the GatewayClass the chart registers; ACCEPTED=True on it is your green light. The Helm values you actually touch, and what each controls:
| Helm value | Purpose | Default / typical | When to change |
|---|---|---|---|
albController.podIdentity.clientID |
The UAMI client ID for workload identity | (required) | Always set |
albController.namespace |
Namespace the controller runs in | azure-alb-system |
Rarely; keep the default |
--version |
Chart/controller version | pin explicitly | Deliberate upgrades only |
albController.replicaCount |
Controller replicas (HA) | 2 | Raise for resilience, not throughput |
albController.logLevel |
Controller verbosity | info | debug while diagnosing |
Provision the AGC (managed mode)
In managed mode you declare the AGC and its association as CRDs and let the controller build them. Create an infra namespace and the ApplicationLoadBalancer object, pointing at the delegated subnet:
SUBNET_ID=$(az network vnet subnet show \
-g "$RG" --vnet-name vnet-agc --name subnet-alb \
--query id -o tsv)
kubectl create namespace alb-infra
# alb.yaml
apiVersion: alb.networking.azure.io/v1
kind: ApplicationLoadBalancer
metadata:
name: alb-prod
namespace: alb-infra
spec:
associations:
- /subscriptions/<SUB_ID>/resourceGroups/rg-agc-prod/providers/Microsoft.Network/virtualNetworks/vnet-agc/subnets/subnet-alb
kubectl apply -f alb.yaml
# Watch provisioning; Deployment.Succeeded means the managed AGC + association exist
kubectl get applicationloadbalancer alb-prod -n alb-infra -o yaml | grep -A10 conditions
This step creates the actual Microsoft.ServiceNetworking/trafficControllers resource and a frontend in the node resource group. Provisioning takes a few minutes the first time. The ApplicationLoadBalancer spec fields and what each does:
| Field | Meaning | Required | Notes |
|---|---|---|---|
spec.associations[] |
Full resource ID of the delegated subnet | Yes | The subnet must be /24 + delegated |
metadata.namespace |
Infra namespace holding the CRD | Yes | Convention: alb-infra |
metadata.name |
Logical AGC name referenced by Gateway annotations |
Yes | Used in alb-name annotation |
The provisioning status.conditions you read to know where you are — this is the managed-mode equivalent of watching an ARM deployment:
Condition type |
status you want |
Meaning | If not |
|---|---|---|---|
Deployment |
Succeeded / True |
AGC + association created | Check subnet delegation + Config Manager role |
Available |
True |
Data plane reachable | Wait; then check controller logs |
(any) Reason |
*Succeeded |
No error reason attached | A Reason like SubnetDelegationMissing names the fix |
Expose a Gateway and the first HTTPRoute
The Gateway references the GatewayClass and ties to your ApplicationLoadBalancer via annotation. Here is an HTTPS listener terminating a cert from a Kubernetes Secret:
# gateway.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: gw-prod
namespace: app
annotations:
alb.networking.azure.io/alb-namespace: alb-infra
alb.networking.azure.io/alb-name: alb-prod
spec:
gatewayClassName: azure-alb-external
listeners:
- name: https
protocol: HTTPS
port: 443
hostname: "app.kloudvin.com"
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: app-tls
allowedRoutes:
namespaces:
from: Same
kubectl apply -f gateway.yaml
# AGC publishes a generated FQDN; read it back from the Gateway address
kubectl get gateway gw-prod -n app \
-o jsonpath='{.status.addresses[0].value}{"\n"}'
Point your DNS CNAME for app.kloudvin.com at that generated FQDN (the *.fzXX.alb.azure.com name). The Gateway listener fields you set, and the choices behind each:
| Listener field | Values | Default / typical | When to change | Gotcha |
|---|---|---|---|---|
protocol |
HTTP, HTTPS |
HTTPS in prod |
HTTP only for redirect listeners | HTTP listener serves cleartext |
port |
any TCP port | 443 (HTTPS), 80 (HTTP) | Match your edge | Must align with DNS/clients |
hostname |
FQDN or wildcard | the app host | SNI-based routing | Empty = match all (loosens routing) |
tls.mode |
Terminate, Passthrough |
Terminate |
Passthrough for end-to-end at pod | Passthrough skips L7 routing |
tls.certificateRefs |
Secret ref(s) | app-tls Secret |
Rotate by replacing the Secret | Secret must be in the listener ns |
allowedRoutes.namespaces.from |
Same, All, Selector |
Same |
Multi-team gateways | All widens attach surface |
Now bind a route. The minimal HTTPRoute sends all traffic for the host to one Service:
# route-basic.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: rt-app
namespace: app
spec:
parentRefs:
- name: gw-prod
hostnames:
- "app.kloudvin.com"
rules:
- backendRefs:
- name: app-svc
port: 80
kubectl apply -f route-basic.yaml, and once status.parents[].conditions shows Accepted=True and ResolvedRefs=True, traffic flows. The reconcile is seconds, not the multi-minute ARM churn AGIC inflicted. The status conditions across the three objects — this is your single most-used diagnostic table:
| Object | Condition | True means |
Common False cause |
|---|---|---|---|
GatewayClass |
Accepted |
Controller bound + authorised | Controller down / RBAC missing |
Gateway |
Accepted |
Spec valid, class matched | Bad gatewayClassName |
Gateway |
Programmed |
Data plane configured; has an address | Cert Secret missing; AGC not ready |
HTTPRoute |
Accepted |
Route is valid + attached to parent | parentRefs wrong; not allowed by Gateway |
HTTPRoute |
ResolvedRefs |
All backendRefs/Secret refs resolve |
Service/Secret missing or cross-ns w/o ReferenceGrant |
The HTTPRoute backendRefs fields you’ll set on every route:
backendRef field |
Meaning | Required | Notes |
|---|---|---|---|
name |
Target Service name |
Yes | Must exist in the route’s namespace (or grant cross-ns) |
port |
Service port | Yes | The Service’s exposed port, not the container’s |
weight |
Relative split share | No (default 1) | 0 drains; relative, not percent |
kind |
Service (default) |
No | AGC routes to Services |
namespace |
Cross-namespace target | No | Requires a ReferenceGrant in the target ns |
Weighted traffic splitting and canary
This is where Gateway API earns its keep. Splitting is native: multiple backendRefs under one rule, each with a weight. AGC distributes requests proportionally. A 90/10 canary:
# canary.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: rt-canary
namespace: app
spec:
parentRefs:
- name: gw-prod
hostnames:
- "app.kloudvin.com"
rules:
- backendRefs:
- name: app-svc-stable
port: 80
weight: 90
- name: app-svc-canary
port: 80
weight: 10
Weights are relative, not percentages — 90/10 and 9/1 behave identically. Progress a release by editing weights and re-applying; because propagation is near-real-time, a canary ramp is just a sequence of applies. Set a weight to 0 to drain a backend without deleting the ref, which keeps the rollback path one edit away. A typical ramp, what each step means, and the rollback at every stage:
| Stage | stable / canary weights | Effective canary share | Watch before advancing | Rollback |
|---|---|---|---|---|
| Baseline | 100 / 0 | 0% | Canary deployed, healthy, drained | (already safe) |
| Smoke | 95 / 5 | ~5% | Error rate, p95 latency on canary | Set canary → 0 |
| Early | 80 / 20 | ~20% | Business metrics, saturation | Set canary → 0 |
| Half | 50 / 50 | ~50% | Full load parity | Set canary → 0 |
| Cutover | 0 / 100 | 100% | Soak, then retire stable | Swap weights back |
| Drain | 100 / 0 | 0% | Decommission canary deployment | n/a |
Because each step is one kubectl apply with sub-10s propagation, a controller (Argo Rollouts or Flagger) using the Gateway API provider can drive the whole ramp from metric analysis. The integration shape:
| Tool | How it drives AGC | Weight mechanism | Promotion trigger |
|---|---|---|---|
| Argo Rollouts | Gateway API plugin edits the HTTPRoute |
backendRefs weights |
AnalysisTemplate metric checks pass |
| Flagger | Gateway API provider patches the route | backendRefs weights |
Prometheus metric thresholds |
| Manual / GitOps | kubectl apply of weight edits |
backendRefs weights |
Human / pipeline gate |
For the broader pattern and the metric-analysis side, see Progressive Delivery with Argo Rollouts: Canary Metrics. The split-specific failure modes you’ll actually hit:
| Symptom | Likely cause | Confirm | Fix |
|---|---|---|---|
| Canary takes ~50% not 10% | A backendRef missing its weight (defaults to 1) |
kubectl get httproute -o yaml shows no weight |
Set explicit integer weights on every ref |
| Split ignored entirely | Two rules instead of one (more-specific match wins) | Inspect rules[] — weights must share a rule |
Put weighted refs under one rule |
| Drain not draining | weight: 0 not applied / typo |
ResolvedRefs + the live YAML |
Re-apply; confirm propagation |
| Sticky to one backend | Client/session affinity upstream | Sample many requests, not one | Don’t conclude from a single curl |
Backend TLS, mTLS, and health probes
By default AGC speaks HTTP to your pods. For end-to-end encryption — re-encrypt to the backend — and for mTLS where AGC presents a client cert, you use BackendTLSPolicy targeting the Service. First, server-side re-encryption with hostname validation against a CA you trust:
# backend-tls.yaml
apiVersion: alb.networking.azure.io/v1
kind: BackendTLSPolicy
metadata:
name: btls-app
namespace: app
spec:
targetRef:
group: ""
kind: Service
name: app-svc
default:
sni: backend.app.svc.cluster.local
ports:
- port: 443
clientCertificateRef:
name: alb-client-cert # omit for one-way TLS; include for mTLS
verify:
caCertificateRef:
name: backend-ca
subjectAltName: backend.app.svc.cluster.local
The clientCertificateRef is what turns this into mutual TLS: AGC presents that certificate to the backend, and a backend (an Istio sidecar, an NGINX terminating mTLS, etc.) validates it. Drop that field and you get standard one-way re-encryption. The verify block makes AGC validate the backend’s certificate against backend-ca and pin the SAN — skip it only in non-production. Every BackendTLSPolicy field, what it does, and the trade-off:
| Field | What it does | Required | Omit when | Gotcha if wrong |
|---|---|---|---|---|
targetRef (Service) |
Attaches the policy to a Service | Yes | — | Wrong kind/name → policy never applies |
sni |
SNI sent to the backend | Yes for TLS | — | Mismatch with cert → handshake fail |
ports[].port |
Backend TLS port | Yes | — | Wrong port → connection refused |
verify.caCertificateRef |
CA that signs the backend cert | Prod: yes | non-prod only | Missing CA → cannot validate → 502 |
verify.subjectAltName |
SAN to pin on the backend cert | Prod: yes | non-prod only | SAN mismatch → re-encrypt 502 |
clientCertificateRef |
Client cert AGC presents (mTLS) | Only for mTLS | one-way TLS | Rotated out → backend rejects |
The three backend-encryption postures, side by side, so you pick deliberately:
| Posture | verify |
clientCertificateRef |
Use when | Security |
|---|---|---|---|---|
| Cleartext (default) | n/a | n/a | Internal, low-trust, non-regulated | None on the backend leg |
| One-way re-encrypt | present | absent | Most production; encrypt to pod | Server authenticated, encrypted |
| Mutual TLS (mTLS) | present | present | PCI/Zero-Trust; backend verifies AGC | Both ends authenticated |
Health probes default to GET / on the backend port. Override per-Service with a HealthCheckPolicy:
# health.yaml
apiVersion: alb.networking.azure.io/v1
kind: HealthCheckPolicy
metadata:
name: hc-app
namespace: app
spec:
targetRef:
group: ""
kind: Service
name: app-svc
default:
interval: 5s
timeout: 3s
healthyThreshold: 1
unhealthyThreshold: 3
http:
host: app.kloudvin.com
path: /healthz
match:
statusCodes:
- start: 200
end: 299
Both policies attach by targetRef to the Service, so they travel with the workload, not the gateway — exactly the separation of concerns you want when app and platform teams own different manifests. The HealthCheckPolicy knobs and how to reason about each:
| Field | What it does | Default | Typical | When to change |
|---|---|---|---|---|
interval |
Probe frequency | (managed) | 5s | Faster detect vs more probe load |
timeout |
Per-probe timeout | (managed) | 3s | Slow backends need headroom |
healthyThreshold |
Successes to mark healthy | 1 | 1 | Raise to debounce flapping |
unhealthyThreshold |
Failures to mark unhealthy | 3 | 3 | Lower for fast eviction |
http.path |
Probe path | / |
/healthz |
Always a shallow readiness path |
http.host |
Host header on the probe | (none) | the app host | Backends that route by host |
http.match.statusCodes |
Codes that count as healthy | 200–399 | 200–299 | Tighten to real success codes |
The re-encryption/mTLS failure modes — this is where 502s hide:
| Symptom | Root cause | Confirm | Fix |
|---|---|---|---|
| 502 on the backend leg only | SAN/host mismatch vs backend cert | Controller events; BackendTLSPolicy verify |
Pin subjectAltName to the cert SAN |
| 502 “untrusted” | caCertificateRef wrong/missing |
The referenced CA Secret content | Upload the correct backend root CA |
| Backend rejects AGC | clientCertificateRef rotated/invalid |
Backend (sidecar) TLS logs | Re-issue the client cert Secret |
| Policy never applies | targetRef wrong kind/name |
kubectl describe backendtlspolicy |
Match kind: Service + exact name |
| All backends unhealthy | Probe path 5xx / wrong codes | HealthCheckPolicy + app /healthz |
Shallow path; correct statusCodes |
Header, path, and query routing
Gateway API matches give you composable L7 rules. Match types combine with AND semantics within a single match block; AGC evaluates more specific matches first, so ordering behaves intuitively. A few production patterns in one route:
# routing-advanced.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: rt-advanced
namespace: app
spec:
parentRefs:
- name: gw-prod
hostnames:
- "app.kloudvin.com"
rules:
# Beta cohort: header-based dark launch
- matches:
- headers:
- name: x-cohort
value: beta
type: Exact
backendRefs:
- name: app-svc-beta
port: 80
# API v2 by path prefix, with the prefix rewritten off
- matches:
- path:
type: PathPrefix
value: /api/v2
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /
backendRefs:
- name: api-v2-svc
port: 8080
# Query-param routing for a debug build
- matches:
- queryParams:
- name: debug
value: "true"
type: Exact
backendRefs:
- name: app-svc-debug
port: 80
# Default
- backendRefs:
- name: app-svc-stable
port: 80
The header and query rules win over the catch-all because they are more specific. The URLRewrite filter strips /api/v2 before the request reaches api-v2-svc. You can also inject or strip headers with RequestHeaderModifier/ResponseHeaderModifier filters — the clean way to add X-Forwarded-* or correlation headers without touching app code. The match types and their type values:
| Match type | type values |
Matches on | Notes |
|---|---|---|---|
path |
PathPrefix, Exact, RegularExpression |
URL path | PathPrefix is the common case |
headers |
Exact, RegularExpression |
Request header value | Multiple headers AND together |
queryParams |
Exact, RegularExpression |
Query string param | Useful for debug/feature toggles |
method |
GET/POST/… |
HTTP method | Split read vs write paths |
The filters you compose with matches, and what each does:
| Filter | Purpose | Key fields | Typical use |
|---|---|---|---|
URLRewrite |
Rewrite path/host upstream | path.ReplacePrefixMatch, hostname |
Strip an API prefix |
RequestHeaderModifier |
Add/set/remove request headers | add, set, remove |
Inject correlation/forwarded headers |
ResponseHeaderModifier |
Add/set/remove response headers | add, set, remove |
Security headers (HSTS) |
RequestRedirect |
Issue an HTTP redirect | scheme, statusCode, hostname |
HTTP→HTTPS, host moves |
RequestMirror |
Mirror traffic to a second backend | backendRef |
Shadow a new build (no client impact) |
Match precedence, made explicit so route ordering never surprises you:
| Rule shape | Wins over | Why |
|---|---|---|
Exact path |
PathPrefix path |
Exact is more specific |
Longer PathPrefix |
Shorter PathPrefix |
Longest prefix wins |
| Match with header + path | Match with path only | More match criteria = more specific |
| Any explicit match | Catch-all (no matches) |
Catch-all is least specific |
Bring-your-own AGC and Private Link
When a central network team owns the AGC, you provision it with IaC and the cluster only references it. Create the AGC, a frontend, and a subnet association with Bicep:
resource agc 'Microsoft.ServiceNetworking/trafficControllers@2023-11-01' = {
name: 'agc-shared'
location: location
}
resource frontend 'Microsoft.ServiceNetworking/trafficControllers/frontends@2023-11-01' = {
parent: agc
name: 'fe-prod'
location: location
}
resource assoc 'Microsoft.ServiceNetworking/trafficControllers/associations@2023-11-01' = {
parent: agc
name: 'assoc-prod'
location: location
properties: {
associationType: 'subnets'
subnet: { id: subnetId }
}
}
In BYO mode you skip the ApplicationLoadBalancer CRD and instead annotate the Gateway with the existing AGC’s frontend resource ID, granting the controller identity the Configuration Manager role on that AGC’s scope. For private exposure, the AGC frontend is reachable over Azure Private Link: create a private endpoint against the frontend and resolve its FQDN through a Private DNS zone, so the generated *.fzXX.alb.azure.com name resolves to a private IP inside the spoke. That keeps north-south traffic off the public internet while preserving the same Gateway API manifests. Managed vs BYO across the dimensions that decide it:
| Dimension | Managed mode | BYO mode |
|---|---|---|
| AGC created by | ALB Controller (CRD) | Your IaC (ARM/Bicep/Terraform) |
| Lifecycle in | Kubernetes manifests | Network team’s pipeline |
Gateway references it via |
alb-namespace + alb-name annotations |
The AGC frontend resource ID annotation |
| Config Manager role scope | Node resource group | The AGC’s resource scope |
| Subnet ownership | Convenient, cluster-adjacent | Central, governed |
| Private Link | Possible | First-class (team-owned) |
| Best for | Greenfield, GitOps | Regulated, segregated duties |
The Azure resources behind an AGC, regardless of mode, so you can read them in the portal/ARM:
| Resource type | Role | Created in |
|---|---|---|
Microsoft.ServiceNetworking/trafficControllers |
The AGC itself | Node RG (managed) / chosen RG (BYO) |
.../trafficControllers/frontends |
Listener entry point (the FQDN) | Same |
.../trafficControllers/associations |
Binds the delegated subnet | Same |
| Delegated subnet | /24 for data-plane injection |
Your VNet |
| Private endpoint + Private DNS zone | Private exposure of the frontend | Spoke VNet (optional) |
Architecture at a glance
Read the diagram left to right as a request actually travels, with the control plane feeding in from the side. A client resolves app.kloudvin.com to the AGC-generated FQDN (a *.fzXX.alb.azure.com CNAME) and opens HTTPS on 443 to the AGC data plane — the managed, regional proxy fleet. The fleet’s frontend listener terminates TLS using the cert from the app-tls Secret, then a routing rule evaluates the HTTPRoute matches (path, header, query — more-specific wins) and lands the request on the weighted split: backendRefs to the stable Service (weight 90) and the canary Service (weight 10), where weight: 0 is the drain lever. From the split, traffic is re-encrypted on a fresh TLS leg to the backend pods on 443, with the BackendTLSPolicy pinning the SAN and, when clientCertificateRef is set, presenting a client cert for mTLS that the pod’s sidecar verifies. Off to the side, the control plane — the ALB Controller in azure-alb-system, authorised by a workload identity (a federated service account holding the Configuration Manager role) — watches the Gateway API CRDs and programs the data plane in seconds.
Notice how every numbered failure point maps to one CRD or identity object, which is the whole operational story: if badge 1 (the GatewayClass not Accepted) is red, nothing downstream is programmed; badge 2 is the listener/cert leg (Programmed=False, no address); badge 3 is a skewed split (a missing weight); badge 4 is the re-encrypt 502 (SAN/CA mismatch, or a cross-namespace Service without a ReferenceGrant); and badge 5 is mTLS/identity drift (a wrong federated subject, or a rotated client cert). The diagnostic method is to walk the path left to right, find the first object whose status.conditions isn’t True, and read the legend for that badge’s confirm-and-fix.
Real-world scenario
A fintech platform team I worked with ran AGIC fronting roughly 40 namespaces on one shared AKS cluster. Their pain was concrete and recurring: every Ingress change anywhere triggered a full Application Gateway config push, and a single team’s frequent deploys produced 4–7 minute propagation windows during which unrelated services saw stale routing. During those windows, a customer-facing payments microservice would intermittently route to a just-decommissioned pod set because the gateway hadn’t caught up — a Sev-2 they could not reliably reproduce. Worse, their PCI scope required re-encryption to the payment pods, but AGIC’s backend-mTLS story was awkward enough that they had quietly settled for TLS terminating at the edge and cleartext to the pod — an audit finding waiting to happen, and one the next QSA assessment would certainly flag.
They migrated to AGC in BYO mode so the network team kept ownership of the AGC, its delegated /24 subnet, and a Private Link frontend, all in Terraform. Each app namespace got its own Gateway bound to the shared AGC, which decoupled the reconcile blast radius — a deploy in one namespace no longer touched another’s routing, because the controller programs per-Gateway, not per gateway resource. The PCI gap closed with a BackendTLSPolicy carrying a clientCertificateRef, giving genuine mTLS from AGC to the payment service:
apiVersion: alb.networking.azure.io/v1
kind: BackendTLSPolicy
metadata:
name: btls-payments
namespace: payments
spec:
targetRef:
group: ""
kind: Service
name: payments-svc
default:
sni: payments.internal.kloudvin.com
clientCertificateRef:
name: agc-payments-client
verify:
caCertificateRef:
name: payments-ca
subjectAltName: payments.internal.kloudvin.com
The migration ran with both ingress paths live — AGC on a parallel hostname — and DNS weight-shifted over a week, so there was no cutover big bang. One real snag surfaced on day two: a shared app-tls Secret lived in a platform namespace while several Gateway listeners lived in team namespaces, so those listeners came up Programmed=False until they added a ReferenceGrant permitting the cross-namespace Secret reference. They also briefly chased a 502 on the payments leg that turned out to be a SAN mismatch — the cert’s SAN was payments.internal.kloudvin.com but an early BackendTLSPolicy pinned payments-svc.payments.svc.cluster.local; aligning subjectAltName to the cert fixed it in one apply.
The measurable outcome: routing propagation dropped from minutes to single-digit seconds, the noisy-neighbour reconcile storms disappeared, the intermittent payments mis-route Sev-2 stopped recurring, and the next QSA assessment recorded encryption all the way to the cardholder-data workload. Canary releases that used to be a deploy-time ritual became a weight edit driven by Argo Rollouts. The lesson on the wall: “AGC moves the gateway out of ARM and into Kubernetes objects — so every failure is now a status.conditions you can read, not an ARM deployment you wait on.”
Advantages and disadvantages
AGC’s managed-data-plane-plus-in-cluster-controller model is a clear win for ingress at scale, but it is not free of trade-offs — most notably the missing WAF. Weigh it honestly:
| Advantages (why this model helps you) | Disadvantages (why it bites) |
|---|---|
| Near-real-time programming (seconds), no ARM throttling on every change | No built-in WAF — you must add Front Door / AppGW v2 upstream for L7 protection |
Per-Gateway blast radius — one namespace can’t stall another’s routing |
More moving parts (controller, federated identity, delegated subnet) to stand up correctly |
| Native weighted splitting — canary is a one-line weight edit | Gateway API + AGC CRDs are a learning curve vs familiar Ingress |
First-class backend re-encryption and mTLS via BackendTLSPolicy |
Cross-namespace refs need ReferenceGrant — an easy first-day trip-up |
| Secretless auth via workload identity (no SP passwords to rotate) | Federated-credential subject typos fail as opaque 401s |
Policies (BackendTLSPolicy, HealthCheckPolicy) travel with the Service |
Newer product — smaller community corpus than NGINX/AGIC |
| Portable, typed Gateway API manifests (multi-implementation) | Region availability and feature parity still maturing in places |
The model is right for any AKS estate doing ingress for more than a few namespaces, anyone needing progressive delivery wired to a real splitting data plane, and regulated workloads that must prove encryption to the pod. It is less compelling if you need a WAF at the same hop with zero extra components (then a classic Application Gateway v2 + WAF_v2, possibly via AGIC, is simpler), or if your cluster runs a single app and the AGIC/Ingress familiarity outweighs AGC’s gains. The disadvantages are all manageable — but only if you stand the prerequisites up precisely, which is the point of the install section.
Hands-on lab
Stand up AGC end to end on an existing AKS cluster, ship a 90/10 split, verify it lands, then add header routing and tear down. Run in Cloud Shell (Bash) with kubectl pointed at a test cluster you can administer. This uses a small managed AGC and a couple of single-replica deployments — minutes of runtime, deleted at the end.
Step 1 — Variables and cluster features.
RG=rg-agc-lab
AKS=aks-agc-lab
LOC=eastus2
az aks update -g "$RG" -n "$AKS" --enable-oidc-issuer --enable-workload-identity -o table
OIDC=$(az aks show -g "$RG" -n "$AKS" --query oidcIssuerProfile.issuerUrl -o tsv)
az aks get-credentials -g "$RG" -n "$AKS" --overwrite-existing
Expected: the cluster shows oidcIssuerProfile.enabled = true and a securityProfile.workloadIdentity block.
Step 2 — Identity, role, federation, and the Helm install.
ID=alb-lab-id
az identity create -g "$RG" -n "$ID" -l "$LOC" -o table
PID=$(az identity show -g "$RG" -n "$ID" --query principalId -o tsv)
CID=$(az identity show -g "$RG" -n "$ID" --query clientId -o tsv)
MCID=$(az group show -n "$(az aks show -g "$RG" -n "$AKS" --query nodeResourceGroup -o tsv)" --query id -o tsv)
az role assignment create --assignee-object-id "$PID" --assignee-principal-type ServicePrincipal \
--scope "$MCID" --role "AppGw for Containers Configuration Manager"
az identity federated-credential create --name alb-lab-fc --identity-name "$ID" -g "$RG" \
--issuer "$OIDC" --subject "system:serviceaccount:azure-alb-system:alb-controller-sa" \
--audience api://AzureADTokenExchange
helm upgrade --install alb-controller \
oci://mcr.microsoft.com/application-lb/charts/alb-controller --version 1.7.9 \
--namespace azure-alb-system --create-namespace \
--set albController.podIdentity.clientID="$CID"
Step 3 — Confirm the controller and GatewayClass are healthy.
kubectl get pods -n azure-alb-system
kubectl get gatewayclass azure-alb-external \
-o jsonpath='{.status.conditions[?(@.type=="Accepted")].status}{"\n"}'
Expected: controller pods Running, and Accepted prints True.
Step 4 — Provision a managed AGC against the delegated subnet.
kubectl create namespace alb-infra
SUBNET=$(az network vnet subnet show -g "$RG" --vnet-name vnet-agc-lab --name subnet-alb --query id -o tsv)
cat <<EOF | kubectl apply -f -
apiVersion: alb.networking.azure.io/v1
kind: ApplicationLoadBalancer
metadata: { name: alb-lab, namespace: alb-infra }
spec: { associations: [ "$SUBNET" ] }
EOF
kubectl get applicationloadbalancer alb-lab -n alb-infra -o jsonpath='{.status.conditions[*].type}{"\n"}'
Expected (after a few minutes): a Deployment condition reaching Succeeded.
Step 5 — Deploy two versioned backends and a Gateway, then split 90/10.
kubectl create namespace app
# stable + canary echo deployments that return their version on /version
kubectl create deployment app-stable -n app --image=mcr.microsoft.com/azuredocs/aks-helloworld:v1
kubectl create deployment app-canary -n app --image=mcr.microsoft.com/azuredocs/aks-helloworld:v2
kubectl expose deployment app-stable -n app --name=app-svc-stable --port=80 --target-port=80
kubectl expose deployment app-canary -n app --name=app-svc-canary --port=80 --target-port=80
cat <<'EOF' | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: gw-lab
namespace: app
annotations:
alb.networking.azure.io/alb-namespace: alb-infra
alb.networking.azure.io/alb-name: alb-lab
spec:
gatewayClassName: azure-alb-external
listeners:
- { name: http, protocol: HTTP, port: 80, allowedRoutes: { namespaces: { from: Same } } }
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata: { name: rt-lab, namespace: app }
spec:
parentRefs: [ { name: gw-lab } ]
rules:
- backendRefs:
- { name: app-svc-stable, port: 80, weight: 90 }
- { name: app-svc-canary, port: 80, weight: 10 }
EOF
Step 6 — Read the FQDN, confirm Programmed, and sample the split.
kubectl get gateway gw-lab -n app \
-o jsonpath='{.status.addresses[0].value} {.status.conditions[?(@.type=="Programmed")].status}{"\n"}'
FQDN=$(kubectl get gateway gw-lab -n app -o jsonpath='{.status.addresses[0].value}')
for i in $(seq 1 50); do curl -s "http://${FQDN}/"; echo; done | sort | uniq -c
Expected: an FQDN plus True, and the 50-sample count lands roughly 45 stable / 5 canary (relative weights, so exact counts vary).
Step 7 — Add header routing for a beta cohort and confirm it.
kubectl patch httproute rt-lab -n app --type=json -p='[
{"op":"add","path":"/spec/rules/0","value":{
"matches":[{"headers":[{"name":"x-cohort","value":"beta","type":"Exact"}]}],
"backendRefs":[{"name":"app-svc-canary","port":80}]}}]'
curl -s "http://${FQDN}/" -H "x-cohort: beta" # should always hit canary (v2)
Expected: with the x-cohort: beta header you consistently reach the canary (v2) backend; without it you get the 90/10 split.
Step 8 — Teardown.
kubectl delete namespace app
kubectl delete applicationloadbalancer alb-lab -n alb-infra
kubectl delete namespace alb-infra
helm uninstall alb-controller -n azure-alb-system
az identity delete -g "$RG" -n "$ID"
The lab steps mapped to what each proves:
| Step | What you did | What it proves |
|---|---|---|
| 2 | Federate identity + Helm install | Secretless control-plane auth |
| 3 | GatewayClass Accepted=True |
The controller is live and authorised |
| 4 | ApplicationLoadBalancer CRD |
Managed-mode AGC provisioning |
| 5–6 | Gateway + split, sample FQDN |
Native weighted traffic split works |
| 7 | Header match patch | Composable L7 routing, seconds to apply |
| 8 | Delete CRDs + identity | Clean lifecycle in Kubernetes |
Cost note. A managed AGC plus two single-replica pods for an hour is well under ₹100; deleting the namespaces, the ApplicationLoadBalancer, and the identity stops all AGC charges. The AKS cluster itself is the larger cost — reuse an existing test cluster rather than creating one for the lab.
Common mistakes & troubleshooting
This is the playbook — the part you bookmark. First as a scannable table for mid-incident, then the entries that bite hardest expanded with the full reasoning.
| # | Symptom | Root cause | Confirm (exact cmd) | Fix |
|---|---|---|---|---|
| 1 | GatewayClass azure-alb-external never Accepted |
Controller down, or identity lacks Config Manager role | kubectl get pods -n azure-alb-system; kubectl describe gatewayclass azure-alb-external |
Fix Helm install; grant Config Manager on the AGC scope |
| 2 | Controller pod runs but logs 401 / token errors | Federated-credential subject wrong, or workload identity off | kubectl logs deploy/alb-controller -n azure-alb-system; az identity federated-credential show |
Re-create fedcred with exact system:serviceaccount:azure-alb-system:alb-controller-sa |
| 3 | ApplicationLoadBalancer stuck, no AGC created |
Subnet not /24 or not delegated; missing role |
kubectl get applicationloadbalancer -o yaml (Reason); az network vnet subnet show --query delegations |
Delegate subnet to trafficControllers; size /24; grant role |
| 4 | Gateway has no address, Programmed=False |
TLS Secret missing, or AGC not ready | kubectl describe gateway gw-prod -n app |
Create the app-tls Secret in the listener ns; wait for AGC |
| 5 | HTTPRoute Accepted=False |
parentRefs wrong, or Gateway allowedRoutes disallows it |
kubectl get httproute -o yaml (parents conditions) |
Fix parentRefs; widen allowedRoutes.namespaces |
| 6 | HTTPRoute ResolvedRefs=False |
Service/Secret missing, or cross-namespace without grant | kubectl describe httproute rt-app -n app |
Create the Service; add a ReferenceGrant in the target ns |
| 7 | Split lands ~50/50 not 90/10 | A backendRef missing its weight (defaults 1) |
kubectl get httproute -o yaml |
Set explicit integer weights on every ref |
| 8 | 502 only on the re-encrypt leg | BackendTLSPolicy SAN/CA mismatch |
controller events; the cert’s actual SAN | Pin subjectAltName to the cert SAN; correct caCertificateRef |
| 9 | Backend rejects AGC (mTLS) | clientCertificateRef rotated/invalid |
backend (sidecar) TLS logs | Re-issue the client cert Secret |
| 10 | All backends marked unhealthy → 503 | Probe path 5xx or wrong statusCodes |
HealthCheckPolicy; curl the app /healthz |
Shallow path; correct match.statusCodes |
| 11 | Header/query route never matches | More-specific catch-all, or wrong type |
kubectl get httproute -o yaml rules order |
Use Exact/RegularExpression; rely on specificity |
| 12 | Routing changes don’t propagate | Watching the wrong object; controller wedged | kubectl logs deploy/alb-controller -n azure-alb-system |
Restart controller; check it reconciles the object |
| 13 | Private FQDN resolves publicly | Private DNS zone not linked, no private endpoint | nslookup the AGC FQDN from the spoke |
Create PE on the frontend; link Private DNS zone |
| 14 | Cert rotation didn’t take effect | Replaced cert content but not the referenced Secret | kubectl get secret app-tls -o yaml; Gateway events |
Update the exact Secret the listener references |
The expanded form, for the entries that cost the most time:
1. GatewayClass azure-alb-external never reaches Accepted.
Root cause: the ALB Controller isn’t running, or its identity lacks the AppGw for Containers Configuration Manager role, so it can’t bind the class.
Confirm: kubectl get pods -n azure-alb-system (are they Running?), then kubectl describe gatewayclass azure-alb-external for the condition reason.
Fix: re-check the Helm install (--set albController.podIdentity.clientID correct) and that the role assignment landed on the right scope (node RG for managed, AGC scope for BYO).
2. The controller pod runs but its logs show 401 / token-exchange errors.
Root cause: the federated-credential subject is wrong (the single most common cause), or workload identity isn’t actually enabled so the SA token isn’t projected.
Confirm: kubectl logs deploy/alb-controller -n azure-alb-system shows AAD token failures; az identity federated-credential show and compare the subject to system:serviceaccount:azure-alb-system:alb-controller-sa exactly.
Fix: re-create the federated credential with the exact subject, issuer, and api://AzureADTokenExchange audience; confirm --enable-workload-identity on the cluster.
3. The ApplicationLoadBalancer CRD applies but no AGC is created.
Root cause: the subnet is smaller than /24 or not delegated to Microsoft.ServiceNetworking/trafficControllers, or the controller lacks rights on the subnet’s RG.
Confirm: kubectl get applicationloadbalancer alb-prod -n alb-infra -o yaml and read the condition Reason (e.g. SubnetDelegationMissing); az network vnet subnet show --query "{prefix:addressPrefix, deleg:delegations}".
Fix: delegate the subnet, size it /24+, and ensure Network Contributor on the subnet’s RG.
6. HTTPRoute shows ResolvedRefs=False.
Root cause: a referenced Service or TLS Secret doesn’t exist, or it lives in another namespace without a ReferenceGrant permitting the reference.
Confirm: kubectl describe httproute rt-app -n app names the unresolved ref; check the Service/Secret exists in the expected namespace.
Fix: create the missing object, or add a ReferenceGrant in the target namespace allowing the route/listener’s namespace to reference it:
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
name: allow-app-to-platform-tls
namespace: platform # the namespace that OWNS the Secret/Service
spec:
from:
- group: gateway.networking.k8s.io
kind: Gateway
namespace: app # the namespace that REFERENCES it
to:
- group: ""
kind: Secret
name: app-tls
7. The split lands ~50/50 when you configured 90/10.
Root cause: one backendRef is missing its weight and defaults to 1, so a 90 against a defaulted-1 is not what you think, or — more often — the two backends ended up in separate rules (most-specific-match wins, no split).
Confirm: kubectl get httproute rt-canary -n app -o yaml and verify both refs share one rules[] entry and both carry explicit integer weights.
Fix: put both weighted refs under a single rule with explicit weights; remember weights are relative.
8. A 502 appears only on the re-encrypt leg, not before the gateway.
Root cause: BackendTLSPolicy verify doesn’t match the backend cert — wrong subjectAltName, or a caCertificateRef that doesn’t sign the backend’s cert.
Confirm: controller events on the BackendTLSPolicy; inspect the backend cert’s real SAN (e.g. openssl s_client against the pod) and compare.
Fix: set subjectAltName to the cert’s actual SAN and caCertificateRef to the CA that signed it.
12. Routing changes stop propagating.
Root cause: the controller is wedged (a bad object earlier in the watch, or it lost its lease), or you’re editing an object the route doesn’t actually parent to.
Confirm: kubectl logs deploy/alb-controller -n azure-alb-system for reconcile errors; confirm the HTTPRoute parentRefs points at the live Gateway.
Fix: fix or remove the offending object; if needed restart the controller deployment to force a clean reconcile.
Best practices
- Adopt Gateway API, not the legacy
Ingresspath. All AGC routing capability — splits, header matches, backend mTLS — lives in Gateway API. Starting onIngressmeans migrating later. - One
Gatewayper team/namespace. This is what buys you the per-Gatewayblast radius; sharing oneGatewayacross teams re-creates AGIC’s noisy-neighbour coupling. - Pin the chart and controller version.
helm upgrade --version <x>so upgrades are deliberate; never track a floating tag in production. - Use the purpose-built role, scoped tight.
AppGw for Containers Configuration Manageron the AGC scope, notContributor— least privilege here is auditable. - Always set explicit integer weights on every
backendRefin a split, even the stable one — a defaulted weight is the classic skewed-canary bug. - Keep a
weight: 0drain in the manifest for the previous version during a rollout, so rollback is one edit, not a redeploy. - Re-encrypt to the pod with
verifyon in production.caCertificateRef+subjectAltNamepinned; cleartext-to-pod is a finding in any regulated estate. - Put the WAF upstream deliberately. AGC has none — front it with Front Door (managed WAF) or a thin AppGW v2 + WAF_v2 hop; decide this at design time, not after a pen test.
- Set a shallow
HealthCheckPolicypath (/healthz, tightstatusCodes) per Service; the defaultGET /can mark healthy backends unhealthy. - Pre-empt cross-namespace refs with
ReferenceGrant. If a shared TLS Secret or Service lives elsewhere, the grant must exist or listeners come upProgrammed=False. - Drive canaries from metrics, not a human. Wire Argo Rollouts/Flagger via the Gateway API provider so weight ramps gate on real error/latency analysis.
- Read
status.conditionsfirst, always.GatewayClass.Accepted→Gateway.Programmed→HTTPRoute.Accepted/ResolvedRefslocalises 90% of failures before you touch a log.
The defaults to override on every new AGC, and what each prevents:
| Default | Override to | Prevents |
|---|---|---|
| HTTP to backend (cleartext) | BackendTLSPolicy re-encrypt + verify |
Plaintext-to-pod audit finding |
GET / health probe |
HealthCheckPolicy /healthz |
Healthy backends marked unhealthy |
Defaulted backendRef weight |
Explicit integer weights | Skewed canary splits |
allowedRoutes: All (if set) |
Same / a Selector |
Unintended route attachment |
| Floating chart tag | Pinned --version |
Surprise controller upgrades |
No ReferenceGrant |
Pre-created grants | Programmed=False on shared Secrets |
Security notes
- Secretless control-plane auth. The ALB Controller authenticates via workload identity (a federated service account), so there are no service-principal passwords to store or rotate. Keep the federation subject exact and the identity least-privileged.
- Least-privilege role, scoped to the AGC. Grant
AppGw for Containers Configuration Manageron the AGC’s resource scope only — not subscription-wideContributor. In BYO mode this lets the network team hand the controller exactly the rights it needs and no more. - Encrypt to the pod, and verify the backend. Use
BackendTLSPolicywithverify(caCertificateRef+subjectAltName) so AGC authenticates the backend cert; addclientCertificateReffor true mTLS where the backend must also authenticate AGC (PCI/Zero-Trust). - Private exposure via Private Link. For internal-only services, put a private endpoint on the AGC frontend and resolve its FQDN through a Private DNS zone, keeping north-south traffic off the public internet.
- No WAF on AGC — plan the edge. Because AGC has no built-in WAF, front it with Front Door Premium (managed WAF) or a classic Application Gateway v2 + WAF_v2 to inspect L7 before traffic reaches AGC; never assume AGC is filtering OWASP-class attacks.
- Guard the TLS Secrets. Listener and client-cert Secrets are sensitive; scope them with RBAC, prefer syncing from Key Vault (see AKS Secrets Store CSI: Key Vault Sync & Rotation), and use
ReferenceGrantrather than copying Secrets across namespaces. - Constrain route attachment. Use
allowedRoutes.namespaces(Sameor a labelSelector) so only intended namespaces can bind routes to aGateway, preventing a foreign namespace from attaching an unwanted route.
The security controls that also harden routing, mapped to what each defends and prevents:
| Control | Mechanism | Secures against | Also prevents |
|---|---|---|---|
| Workload identity | Federated SA → UAMI | Stored SP secrets | Credential-rotation breakage |
| Scoped Config Manager role | Built-in role on AGC scope | Over-privileged controller | Accidental cross-resource changes |
BackendTLSPolicy + verify |
Re-encrypt + CA/SAN pin | Cleartext-to-pod, MITM | Backend cert drift going unnoticed |
| Client cert (mTLS) | clientCertificateRef |
Unauthenticated upstream to backend | Spoofed gateway traffic |
| Private Link frontend | PE + Private DNS | Public exposure | DNS leakage of internal services |
allowedRoutes scoping |
Same / Selector |
Foreign route attachment | Route hijack across teams |
| Upstream WAF | Front Door / AppGW v2 | OWASP-class L7 attacks | (AGC has no WAF of its own) |
Cost & sizing
The cost model is fundamentally different from AGIC’s per-gateway-hour-plus-capacity-units billing, and far simpler to reason about once you separate the data plane from what surrounds it.
- AGC data plane bills on a managed-resource basis (an hourly component plus usage), independent of how many
Gateway/HTTPRouteobjects you create — so one AGC serving 40 namespaces is dramatically cheaper than 40 sharded AGICs, which was a real cost driver for teams that sharded gateways to dodge AGIC’s blast radius. - The AKS cluster is the larger line item and is unchanged by AGC; the ALB Controller is two small pods (negligible CPU/RAM).
- Upstream WAF, if you add Front Door Premium or an Application Gateway v2 + WAF_v2 for L7 protection, is usually the biggest added cost of an AGC design — budget it deliberately, because it’s the price of the WAF AGC doesn’t include.
- Private Link (a private endpoint on the frontend) adds a small hourly + per-GB charge when you expose AGC privately.
- Data processing / egress scales with traffic as on any L7 proxy; re-encryption and mTLS add negligible cost (CPU on the managed fleet, which you don’t pay per-cycle).
The cost drivers and what each one buys you:
| Cost driver | What you pay for | Rough INR / month | What it buys | Watch-out |
|---|---|---|---|---|
| AGC data plane | Managed proxy (hourly + usage) | ~₹3,000–8,000 (traffic-dependent) | The whole L7 ingress fleet | Usage scales with traffic |
| ALB Controller | 2 small pods on AKS | negligible | The control plane | Counts against node capacity |
| Front Door Premium (WAF) | Edge + managed WAF | ~₹25,000+ | OWASP protection AGC lacks | Often the biggest added cost |
| AppGW v2 + WAF_v2 (alt) | Gateway-hour + capacity units | ~₹15,000–30,000 | WAF at a thin edge hop | Reintroduces an ARM gateway |
| Private Link | PE hourly + per-GB | ~₹1,500–3,000 | Private-only exposure | Per-endpoint, per-spoke |
| Data processing / egress | Per-GB through the proxy | traffic-dependent | (the traffic itself) | Spikes during incidents/sales |
Sizing rule of thumb: one AGC per cluster (or per environment) serves many teams via per-namespace Gateways — you almost never need multiple AGCs for capacity, only for hard isolation or BYO governance. Consolidating off sharded AGICs onto a single AGC was, for the fintech team above, a net cost reduction even after adding a Front Door WAF hop, because they collapsed dozens of gateway resources into one managed data plane. For broader AKS cost levers, see Kubernetes Cost Allocation & Rightsizing with Kubecost.
Interview & exam questions
1. How does Application Gateway for Containers differ architecturally from AGIC? AGIC ran a single in-cluster pod that mutated a Standard_v2 Application Gateway ARM resource on every Ingress change, producing multi-minute propagation. AGC has a managed, regional proxy data plane and an in-cluster ALB Controller that programs it via a config plane in seconds, speaks the Gateway API instead of Ingress, scopes blast radius per-Gateway, and has no built-in WAF.
2. Why does AGC use the Gateway API rather than the Ingress API? Because all of AGC’s routing capability — weighted traffic splitting via backendRefs weights, header/path/query matches, BackendTLSPolicy re-encryption and mTLS, HealthCheckPolicy — maps onto typed Gateway API objects, avoiding the annotation sprawl Ingress required. Gateway API is also portable across implementations and is where Microsoft is investing.
3. How does the ALB Controller authenticate to Azure? Via workload identity: a user-assigned managed identity is federated to the controller’s Kubernetes service account (azure-alb-system:alb-controller-sa), and the controller exchanges the projected SA token for an Azure token — no service-principal secret. The identity holds the AppGw for Containers Configuration Manager role on the AGC scope.
4. What are managed mode and BYO mode, and when do you pick each? In managed mode the controller creates the AGC and its subnet association from an ApplicationLoadBalancer CRD — best for greenfield, GitOps-driven estates. In BYO mode a central team provisions the AGC via IaC and the cluster only references it — best when a platform-networking team must own the AGC, subnet delegation, and Private Link independently of any cluster.
5. How do you ship a 90/10 canary on AGC, and what does weight: 0 do? Put two backendRefs (stable and canary) under one HTTPRoute rule with weight: 90 and weight: 10; AGC distributes proportionally. Weights are relative, not percentages. Setting a backend’s weight to 0 drains it to zero traffic without deleting the ref, keeping rollback one edit away.
6. A route shows ResolvedRefs=False. What are the two most likely causes? Either a referenced Service or TLS Secret doesn’t exist in the route’s namespace, or it lives in another namespace without a ReferenceGrant permitting the cross-namespace reference. Confirm with kubectl describe httproute; fix by creating the object or adding a ReferenceGrant in the target namespace.
7. How do you enforce mTLS from AGC to a backend pod? Apply a BackendTLSPolicy targeting the Service with a clientCertificateRef (the cert AGC presents) plus a verify block (caCertificateRef + subjectAltName) so AGC also validates the backend. Drop clientCertificateRef for one-way re-encryption; keep verify on in production either way.
8. The GatewayClass azure-alb-external never reaches Accepted. What do you check? Whether the ALB Controller pods are Running in azure-alb-system, and whether its identity holds AppGw for Containers Configuration Manager on the AGC scope. A controller that’s down or unauthorised can’t bind the class. Then check kubectl describe gatewayclass for the condition reason.
9. Does AGC include a WAF? If not, how do you protect L7? No — AGC has no built-in WAF. You place protection upstream: Front Door Premium (managed WAF) or a classic Application Gateway v2 + WAF_v2 hop in front of the AGC FQDN, letting AGC own routing and backend TLS while the edge does OWASP-class inspection.
10. A split you set to 90/10 is landing ~50/50. What’s wrong? Most likely one backendRef is missing its weight (defaulting to 1) or the two backends ended up in separate rules[] (so more-specific-match wins and there’s no split at all). Put both weighted refs under a single rule with explicit integer weights.
11. How does AGC achieve near-real-time routing changes when AGIC took minutes? AGC’s controller writes desired state to a managed config plane and the regional proxy fleet converges in seconds, rather than re-deploying an ARM Application Gateway resource (which paid ARM control-plane latency and throttling on every change as AGIC did).
12. What subnet requirements does AGC impose? A dedicated subnet of at least /24, delegated to Microsoft.ServiceNetworking/trafficControllers, into which AGC injects its data plane. A smaller or undelegated subnet fails provisioning (often surfaced as an ApplicationLoadBalancer condition reason).
These map primarily to AZ-700 (Designing and Implementing Azure Networking) — load balancing and application delivery — and the CKA/CKAD Gateway API and services/networking domains, with the workload-identity mechanics touching AZ-500. A compact cert-mapping for revision:
| Question theme | Primary cert | Objective area |
|---|---|---|
| AGC vs AGIC, AGC architecture | AZ-700 | Design & implement application delivery |
| Gateway API objects, routing, splits | CKA / CKAD | Services & networking; Gateway API |
| Workload identity, federation, roles | AZ-500 / AZ-700 | Secure identity; secretless access |
| BackendTLSPolicy, mTLS, re-encrypt | AZ-700 / AZ-500 | Secure connectivity; encryption in transit |
| BYO mode, Private Link, subnet delegation | AZ-700 | Hybrid/private connectivity |
Quick check
- AGC has no built-in WAF. Name two ways to add L7 protection in front of it.
- You set a 90/10 split but traffic lands ~50/50. What is the single most likely misconfiguration?
- A
Gatewaylistener’s TLS Secret lives in a different namespace and the listener isProgrammed=False. What object fixes it? - How does the ALB Controller authenticate to Azure, and what is the one string that most often breaks it?
- What does setting a backend’s
weightto0accomplish, and why is it useful during a rollout?
Answers
- Front AGC with Front Door Premium (managed WAF) or a thin Application Gateway v2 + WAF_v2 hop pointed at the AGC FQDN. AGC owns routing/backend TLS; the upstream hop does OWASP-class inspection.
- A
backendRefis missing its explicitweight(defaulting to 1), or the two backends are in separaterules[]so there is no split at all. Put both weighted refs under one rule with explicit integer weights. - A
ReferenceGrantin the target namespace (the one that owns the Secret), permitting the listener’s namespace andGatewaykind to reference that Secret. Without it, cross-namespace refs are denied and the listener staysProgrammed=False. - Via workload identity — a user-assigned managed identity federated to the
azure-alb-system:alb-controller-saservice account. The string that most often breaks it is the federated-credential subject, which must be exactlysystem:serviceaccount:azure-alb-system:alb-controller-sa. - It drains the backend to zero traffic without deleting the
backendRef, so the route and the backend stay in the manifest and rollback is a one-line weight edit rather than a redeploy.
Glossary
- Application Gateway for Containers (AGC) — Azure’s managed, regional L7 proxy fleet for Kubernetes ingress, backed by
Microsoft.ServiceNetworking/trafficControllers; the successor to AGIC. - AGIC — Application Gateway Ingress Controller; the older model that mutated a
Standard_v2Application Gateway ARM resource perIngress. - ALB Controller — the in-cluster Helm-installed controller (namespace
azure-alb-system) that programs the AGC data plane from Gateway API and AGC CRDs. - Gateway API — the Kubernetes API (
GatewayClass/Gateway/HTTPRoute+ policies) that AGC implements; replacesIngressfor AGC routing. GatewayClass(azure-alb-external) — the class the ALB Controller registers;Accepted=Truemeans the controller is live and authorised.Gateway— a Gateway API object defining listeners (hostnames, ports, TLS); becomes an AGC frontend and emits the AGC FQDN address.HTTPRoute— a Gateway API object defining routing rules (path/header/query matches), weightedbackendRefs, and filters; becomes AGC routing rules.ApplicationLoadBalancer(CRD) — the AGC-specific CRD that, in managed mode, makes the controller create the AGC and its subnet association.BackendTLSPolicy— an AGC policy attaching to aServiceto re-encrypt to the backend (verifywithcaCertificateRef/subjectAltName) and optionally present a client cert for mTLS (clientCertificateRef).HealthCheckPolicy— an AGC policy overriding the defaultGET /backend probe with a path, interval, timeout, and accepted status codes.ReferenceGrant— a Gateway API object in a target namespace that permits objects in another namespace to reference its Secrets/Services (required for cross-namespace refs).- Workload identity — federated authentication where a user-assigned managed identity is bound to a Kubernetes service account, exchanging the SA token for an Azure token (no secrets).
- Federated credential subject — the
system:serviceaccount:<ns>:<sa>string that ties the managed identity to the controller’s service account; must match exactly. AppGw for Containers Configuration Manager— the purpose-built Azure role the controller identity needs on the AGC scope to program it.- Managed mode / BYO mode — whether the controller creates the AGC from a CRD (managed) or references an IaC-provisioned AGC (BYO).
- Weight (relative) — a
backendRef’s share of traffic, proportional to the sum of weights in the rule;0drains a backend without removing it. - Delegated subnet — the
/24+ subnet delegated toMicrosoft.ServiceNetworking/trafficControllersthat AGC injects its data plane into. - Private Link frontend — a private endpoint on the AGC frontend, resolved via a Private DNS zone, exposing AGC on a private IP inside a spoke.
Next steps
You can now stand up AGC on AKS, drive ingress through Gateway API, and ship splits, mTLS and header routing in production. Build outward:
- Next: Kubernetes Gateway API: HTTPRoute, Traffic Splitting & Ingress Migration — the portable API model AGC implements, and how to migrate off
Ingress. - Related: Application Gateway v2 with WAF, L7 Routing & TLS in Production — the classic data plane and the WAF you’ll often front AGC with.
- Related: Progressive Delivery with Argo Rollouts: Canary Metrics — wire weighted splits to metric-gated automated promotion.
- Related: Azure Key Vault & Workload Identity for Secrets — the secretless identity pattern the ALB Controller depends on.
- Related: AKS Istio Service Mesh Add-on: mTLS, Ingress & Egress — pod-to-pod mTLS behind the gateway, complementing AGC’s backend mTLS.
- Related: Production AKS: Networking & Observability — the cluster-networking foundation AGC plugs into.