Ingress on GKE always felt like a compromise. A single object had to express routing, TLS, health checks, Cloud Armor, CDN, and backend timeouts, and it did so through kubernetes.io/ingress annotations that were undiscoverable, unvalidated, and impossible to delegate safely. The Gateway API replaces that one overloaded resource with a role-oriented set of typed objects: infrastructure teams own Gateway, application teams own HTTPRoute, and Google-specific behavior moves into first-class policy resources that attach to a target instead of hiding in an annotation string. On GKE this is not a thin shim over Ingress — the GKE Gateway controller provisions real Google Cloud load balancers (the same global and regional Envoy-based ALBs), and the multi-cluster variant programs a single load balancer across a Fleet. This walkthrough builds single-cluster and multi-cluster Gateways end to end, attaches the policies a platform team needs, and covers the migration and debugging realities.
Step 1: Gateway API vs Ingress — roles, resources, and GatewayClasses
The Gateway API splits the old monolith along ownership boundaries. Three resource kinds matter:
| Resource | Owner | Role | Analogue in Ingress world |
|---|---|---|---|
| GatewayClass | Cloud provider | Defines an implementation (the controller + LB type) | IngressClass |
| Gateway | Platform / infra team | Listeners: ports, protocols, TLS, allowed routes | The frontend half of an Ingress |
| HTTPRoute | Application team | Host/path/header matching, splitting, filters | The rules half of an Ingress |
The key design property is route delegation: a Gateway in an infra-owned namespace can permit HTTPRoute attachment only from labelled namespaces, so application teams attach routes without ever editing the shared frontend. That is the capability Ingress never had.
GKE ships several managed GatewayClasses. You pick one; you never create a GatewayClass yourself. The important ones:
| GatewayClass | Scope | Load balancer provisioned |
|---|---|---|
gke-l7-global-external-managed |
Single cluster | Global external Application LB |
gke-l7-regional-external-managed |
Single cluster | Regional external Application LB |
gke-l7-rilb |
Single cluster | Regional internal Application LB |
gke-l7-gxlb-mc |
Fleet (multi-cluster) | Global external Application LB |
gke-l7-global-external-managed-mc |
Fleet (multi-cluster) | Global external Application LB |
gke-l7-rilb-mc |
Fleet (multi-cluster) | Regional internal Application LB |
The -mc suffix is the multi-cluster signal. Those classes are owned by the MultiClusterGateway controller running against a Fleet host project, not by an individual cluster.
The Gateway API CRDs are bundled with GKE — on a sufficiently recent cluster (GKE 1.26+ for the GA gateway.networking.k8s.io/v1 API) the controller and CRDs are installed automatically. Confirm they exist before doing anything else:
kubectl get gatewayclass
# Expect: gke-l7-global-external-managed, gke-l7-rilb, gke-l7-regional-external-managed, ...
kubectl api-resources --api-group=gateway.networking.k8s.io
# gateways, httproutes, grpcroutes, referencegrants, ...
If kubectl get gatewayclass returns nothing, enable the Gateway controller explicitly:
gcloud container clusters update CLUSTER_NAME \
--location=us-central1 \
--gateway-api=standard
Step 2: Provision a single-cluster Gateway
Start with a global external Gateway. The Gateway declares listeners; it does not know about your apps. Reserve a static anycast IP first so the address survives Gateway recreation:
gcloud compute addresses create web-gw-ip \
--global \
--ip-version=IPV4
Now the Gateway. Bind it to the reserved address with an annotation, and open an HTTP listener that permits routes from any namespace (tighten this later):
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
name: external-http
namespace: infra-gateways
annotations:
networking.gke.io/gateway-ip-name: web-gw-ip
spec:
gatewayClassName: gke-l7-global-external-managed
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
allowedRoutes.namespaces.from is the delegation control. All is permissive; the production pattern is Selector with a namespace label, so only sanctioned namespaces can bind:
allowedRoutes:
namespaces:
from: Selector
selector:
matchLabels:
gateway-access: "true"
Apply it and the controller begins programming a Google Cloud load balancer. This takes a few minutes the first time — the controller is creating forwarding rules, a target proxy, a URL map, and backend services behind the scenes.
The Gateway is intentionally inert without routes. A Gateway with zero attached HTTPRoutes provisions the LB frontend but has no backends, so it returns 404 from the default backend. That is correct behavior, not a failure.
Step 3: HTTPRoute — header matching, traffic splitting, and request mirroring
The HTTPRoute is where application teams live. It references a parent Gateway and routes to Kubernetes Service backends. GKE reads the Service’s cloud.google.com/neg annotation (Autopilot and recent Standard clusters create the standalone NEG automatically for VPC-native clusters) and wires the Service’s pods directly as a backend via NEGs — there is no NodePort hop.
A route doing host matching, weighted splitting between two backends, and a header match for a canary cohort:
kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
metadata:
name: store
namespace: store
spec:
parentRefs:
- name: external-http
namespace: infra-gateways
hostnames:
- "store.example.com"
rules:
# 1. Internal cohort header -> canary, full weight
- matches:
- headers:
- name: x-canary
value: "true"
backendRefs:
- name: store-canary
port: 8080
# 2. Everyone else -> 95/5 weighted split
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: store-stable
port: 8080
weight: 95
- name: store-canary
port: 8080
weight: 5
Rule order matters: the header-matched rule is evaluated with higher specificity, so internal traffic carrying x-canary: true always lands on the canary regardless of the weighted split below it. Weights are relative, not percentages — weight: 95 and weight: 5 happen to sum to 100 here, but 30/10 would mean 75%/25%.
Request mirroring (shadow traffic) is a filter, not a backend, so the mirror target receives a copy and its response is discarded. This is how you load-test a new version against real production traffic with zero user impact:
rules:
- matches:
- path:
type: PathPrefix
value: /api
filters:
- type: RequestMirror
requestMirror:
backend:
name: api-v2-shadow
port: 8080
backendRefs:
- name: api-v1
port: 8080
The URLRewrite and RequestHeaderModifier filters cover path rewriting and header injection, replacing the old rewrite-target and custom-header annotations with validated fields.
Step 4: Policy attachment — HealthCheckPolicy, GCPBackendPolicy, timeouts
This is where GKE’s Google-specific behavior lives, and it is the single biggest improvement over Ingress. Instead of stuffing health check and backend config into annotations, you attach policy objects to a target via a targetRef. Two policies carry most of the weight.
HealthCheckPolicy controls the load balancer health check — without it GKE infers a check from your pod’s readiness probe, which is often wrong (wrong port, wrong path). Pin it explicitly:
kind: HealthCheckPolicy
apiVersion: networking.gke.io/v1
metadata:
name: store-stable-hc
namespace: store
spec:
default:
config:
type: HTTP
httpHealthCheck:
port: 8080
requestPath: /healthz
checkIntervalSec: 5
timeoutSec: 5
healthyThreshold: 1
unhealthyThreshold: 3
targetRef:
group: ""
kind: Service
name: store-stable
GCPBackendPolicy configures the backend service itself: timeouts, connection draining, session affinity, Cloud Armor, IAP, and Cloud CDN. Set a backend timeout and connection draining here — note this is the backend service timeout (how long the LB waits for a response), distinct from the route-level request timeout:
kind: GCPBackendPolicy
apiVersion: networking.gke.io/v1
metadata:
name: store-stable-backend
namespace: store
spec:
default:
timeoutSec: 30
connectionDraining:
drainingTimeoutSec: 60
sessionAffinity:
type: CLIENT_IP
targetRef:
group: ""
kind: Service
name: store-stable
Request-level timeouts and retries live on the HTTPRoute rule directly, using the upstream Gateway API timeouts field — this is the time budget for the whole request including retries:
rules:
- matches:
- path:
type: PathPrefix
value: /
timeouts:
request: "10s"
backendRequest: "2s"
backendRefs:
- name: store-stable
port: 8080
A GCPBackendPolicy attaches to a Service; a HealthCheckPolicy attaches to a Service. There is also GCPGatewayPolicy for frontend-level concerns (like SSL policy) that attaches to a Gateway. Keep the target kinds straight — attaching a backend policy to a Gateway is a common and silent mistake.
Step 5: Securing Gateways with Cloud Armor, TLS, and certificate maps
TLS. Terminate TLS by adding an HTTPS listener and referencing a certificate. The cleanest approach on GKE is a Google-managed certificate via Certificate Manager certificate maps, referenced by annotation, which lets one Gateway serve many domains and decouples cert lifecycle from the Gateway:
# Certificate Manager: managed cert + map + map entry
gcloud certificate-manager certificates create store-cert \
--domains="store.example.com"
gcloud certificate-manager maps create store-cert-map
gcloud certificate-manager maps entries create store-entry \
--map=store-cert-map \
--certificates=store-cert \
--hostname=store.example.com
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
name: external-https
namespace: infra-gateways
annotations:
networking.gke.io/gateway-ip-name: web-gw-ip
networking.gke.io/certmap: store-cert-map
spec:
gatewayClassName: gke-l7-global-external-managed
listeners:
- name: https
protocol: HTTPS
port: 443
allowedRoutes:
namespaces:
from: Selector
selector:
matchLabels:
gateway-access: "true"
When a certificate map is referenced via networking.gke.io/certmap, the LB sources its certs from the map, so you omit tls.certificateRefs from the listener — the annotation wins. For Secret-based certs instead, drop the annotation and set tls.mode: Terminate with a certificateRefs entry pointing at a Kubernetes Secret.
Cloud Armor. A Cloud Armor security policy attaches through GCPBackendPolicy, putting WAF and rate limiting per backend service:
kind: GCPBackendPolicy
apiVersion: networking.gke.io/v1
metadata:
name: store-armor
namespace: store
spec:
default:
securityPolicy: store-edge-policy # name of an existing Cloud Armor policy
targetRef:
group: ""
kind: Service
name: store-stable
Create the policy and its rules with gcloud compute security-policies as usual; the Gateway controller only references it by name. Because it attaches to the backend, different backends behind the same Gateway can carry different WAF postures — a public marketing backend and a partner API backend need not share a rate-limit rule.
Step 6: Multi-cluster Gateways with Fleet and the MC Gateway controller
A multi-cluster Gateway programs one Google Cloud load balancer whose backends span clusters in different regions, all registered to a Fleet. This is the native way to do active-active, geo-distributed serving on GKE without stitching together per-cluster Ingresses behind an external traffic manager.
The model has three pieces:
- A Fleet (GKE Hub) with member clusters registered.
- A designated config cluster that hosts the
GatewayandHTTPRouteresources for the Fleet. - The MultiClusterGateway and MultiClusterService controllers, enabled as Fleet features.
Enable the features and nominate a config cluster:
# Enable multi-cluster Services (the prerequisite) and the MC Gateway controller
gcloud container fleet multi-cluster-services enable
gcloud container fleet ingress enable \
--config-membership=projects/PROJECT_ID/locations/us-central1/memberships/cluster-west
cluster-west is now the config cluster. Apply Gateway and HTTPRoute objects there only. The multi-cluster Gateway uses an -mc GatewayClass:
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
name: external-mc
namespace: infra-gateways
spec:
gatewayClassName: gke-l7-global-external-managed-mc
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
Backends in a multi-cluster Gateway are not plain Services — they are ServiceExport objects. Each member cluster exports its Service; the MCS controller synthesizes a Fleet-wide ServiceImport that the HTTPRoute targets. Export the same Service from every cluster:
# Apply in EACH member cluster, in the workload's namespace
kind: ServiceExport
apiVersion: net.gke.io/v1
metadata:
name: store
namespace: store
The HTTPRoute on the config cluster references the derived ServiceImport as its backend group:
kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
metadata:
name: store-mc
namespace: store
spec:
parentRefs:
- name: external-mc
namespace: infra-gateways
hostnames:
- "store.example.com"
rules:
- backendRefs:
- group: net.gke.io
kind: ServiceImport
name: store
port: 8080
The single global LB now load-balances across pods in both clusters, with Google’s anycast steering each client to the nearest healthy region.
Step 7: Cross-cluster failover and capacity-based routing
The reason to use a multi-cluster Gateway over DNS-based failover is the data plane behaves like one global ALB with proximity routing and automatic failover built in. Two behaviors do the heavy lifting:
Proximity-based routing and overflow. The global LB routes a client to the closest region with healthy capacity. If the nearest region is saturated or unhealthy, traffic overflows to the next region automatically — no DNS TTL to wait out, failover is at the connection level in seconds.
Capacity-based routing. Overflow is governed by backend capacity, which you set with GCPBackendPolicy using maxRatePerEndpoint (RATE balancing mode). Once a region’s endpoints hit their configured rate, the LB spills surplus to other regions before the local region degrades:
kind: GCPBackendPolicy
apiVersion: networking.gke.io/v1
metadata:
name: store-capacity
namespace: store
spec:
default:
maxRatePerEndpoint: 100 # requests/sec per endpoint before overflow
targetRef:
group: net.gke.io
kind: ServiceImport
name: store
To drain a region for maintenance, scale its workload to zero or kubectl delete serviceexport store in that cluster; the MCS controller removes those endpoints from the global LB and all traffic shifts to the surviving cluster. Re-applying the ServiceExport brings it back into rotation.
Verify
Programming an LB is asynchronous, so trust status, not the apply. Walk the chain from Gateway to route to policy.
# 1. Gateway: PROGRAMMED=True and an assigned address
kubectl get gateway external-http -n infra-gateways -o wide
kubectl describe gateway external-http -n infra-gateways
# Look in Status.Conditions for: Accepted=True, Programmed=True
# Status.Addresses holds the VIP once the LB is live
# 2. HTTPRoute: Accepted=True and ResolvedRefs=True per parent
kubectl describe httproute store -n store
# ResolvedRefs=False usually means a backend Service or its NEG is missing
# 3. Policies attached and accepted
kubectl describe healthcheckpolicy store-stable-hc -n store
kubectl describe gcpbackendpolicy store-stable-backend -n store
# 4. Multi-cluster: the derived ServiceImport exists on the config cluster
kubectl get serviceimport store -n store
The two most common stuck states: Programmed=False lingering past ~10 minutes points at an IAM, quota, or NEG problem in the resource graph (check the controller events in kubectl describe gateway), and ResolvedRefs=False on the route means the backend Service exists but its standalone NEG was never created — confirm the cluster is VPC-native and the Service has a cloud.google.com/neg annotation. Send a live request once Programmed=True:
ADDR=$(kubectl get gateway external-http -n infra-gateways \
-o jsonpath='{.status.addresses[0].value}')
curl -s -o /dev/null -w "%{http_code}\n" \
-H "Host: store.example.com" "http://${ADDR}/"
# Header-matched canary path
curl -s -H "Host: store.example.com" -H "x-canary: true" "http://${ADDR}/"
Enterprise scenario
A payments platform team ran an active-passive setup: primary GKE cluster in us-central1, a warm standby in us-east4, failover orchestrated by flipping a Cloud DNS record. During a regional control-plane event they measured real failover at just under nine minutes — DNS TTL plus resolver caching plus client connection pools holding the dead IP. For a payments SLO that was a quarter’s worth of error budget burned in one incident.
The constraint that ruled out a naive fix: the standby cluster sat idle, doubling cost, and they could not simply run active-active because their existing two Ingresses produced two independent VIPs with no shared capacity awareness. They moved to a multi-cluster Gateway on a Fleet. Both clusters now export the payments Service; a single global-external-managed-mc Gateway fronts both with one anycast VIP. They set per-endpoint capacity so each region carries steady-state load but absorbs the other’s traffic on failure, and the LB overflows at the connection level rather than waiting on DNS.
kind: GCPBackendPolicy
apiVersion: networking.gke.io/v1
metadata:
name: payments-capacity
namespace: payments
spec:
default:
maxRatePerEndpoint: 80
timeoutSec: 15
connectionDraining:
drainingTimeoutSec: 60
targetRef:
group: net.gke.io
kind: ServiceImport
name: payments
The result: measured failover dropped from ~9 minutes to under 30 seconds in their next game day, the standby capacity now serves live traffic instead of sitting idle, and DNS was removed from the failover path entirely. The single behavioral change they had to socialize widely was that “the standby region” no longer existed as a concept — both regions were always live, which simplified on-call reasoning more than any runbook.
Migration playbook and debugging programming status
Migrating from Ingress is not a flag flip; run both in parallel and cut over by DNS. A pragmatic sequence:
- Inventory each Ingress and map its annotations to the new model:
kubernetes.io/ingress.global-static-ip-name->networking.gke.io/gateway-ip-name;BackendConfig(timeouts, Cloud Armor, IAP, CDN) ->GCPBackendPolicy;BackendConfighealth check ->HealthCheckPolicy;FrontendConfigSSL policy/redirects ->GCPGatewayPolicyand a redirect filter; managed-cert annotation -> Certificate Manager cert map. - Stand up the Gateway and HTTPRoutes on a new reserved IP, alongside the live Ingress. Nothing is cut over yet.
- Validate against the new VIP directly with
Hostheaders and synthetic checks; confirmProgrammed=Trueand exercise every route including TLS and Cloud Armor. - Shift DNS to the new VIP gradually (weighted records), monitor, then retire the Ingress and its old IP.
For debugging, the conditions are the contract. Accepted means the controller understood the spec; Programmed means the Google Cloud LB is actually configured and serving. A Gateway stuck at Accepted=True, Programmed=False is almost always one of: insufficient quota (forwarding rules, backend services), missing IAM on the GKE service account, a NEG that never materialized because the cluster is not VPC-native, or a referenced Cloud Armor / cert-map resource that does not exist. Read the events:
kubectl get events -n infra-gateways --sort-by=.lastTimestamp | tail -20
kubectl describe gateway external-https -n infra-gateways
Cross-reference what the controller built against the Cloud Console load-balancing view — every Gateway maps to a forwarding rule, target proxy, URL map, and backend services you can inspect directly. When the Kubernetes status and the GCP resource graph disagree, the controller events name the missing piece.