GKE Gateway API: Single and Multi-Cluster Traffic Management

Ingress on GKE always felt like a compromise. A single object had to express routing, TLS, health checks, Cloud Armor, CDN, and backend timeouts, and it did so through kubernetes.io/ingress annotations that were undiscoverable, unvalidated, and impossible to delegate safely. The Gateway API replaces that one overloaded resource with a role-oriented set of typed objects: infrastructure teams own Gateway, application teams own HTTPRoute, and Google-specific behavior moves into first-class policy resources that attach to a target instead of hiding in an annotation string. On GKE this is not a thin shim over Ingress — the GKE Gateway controller provisions real Google Cloud load balancers (the same global and regional Envoy-based ALBs), and the multi-cluster variant programs a single load balancer across a Fleet. This walkthrough builds single-cluster and multi-cluster Gateways end to end, attaches the policies a platform team needs, and covers the migration and debugging realities.

Step 1: Gateway API vs Ingress — roles, resources, and GatewayClasses

The Gateway API splits the old monolith along ownership boundaries. Three resource kinds matter:

Resource	Owner	Role	Analogue in Ingress world
GatewayClass	Cloud provider	Defines an implementation (the controller + LB type)	`IngressClass`
Gateway	Platform / infra team	Listeners: ports, protocols, TLS, allowed routes	The frontend half of an Ingress
HTTPRoute	Application team	Host/path/header matching, splitting, filters	The rules half of an Ingress

The key design property is route delegation: a Gateway in an infra-owned namespace can permit HTTPRoute attachment only from labelled namespaces, so application teams attach routes without ever editing the shared frontend. That is the capability Ingress never had.

GKE ships several managed GatewayClasses. You pick one; you never create a GatewayClass yourself. The important ones:

GatewayClass	Scope	Load balancer provisioned
`gke-l7-global-external-managed`	Single cluster	Global external Application LB
`gke-l7-regional-external-managed`	Single cluster	Regional external Application LB
`gke-l7-rilb`	Single cluster	Regional internal Application LB
`gke-l7-gxlb-mc`	Fleet (multi-cluster)	Global external Application LB
`gke-l7-global-external-managed-mc`	Fleet (multi-cluster)	Global external Application LB
`gke-l7-rilb-mc`	Fleet (multi-cluster)	Regional internal Application LB

The -mc suffix is the multi-cluster signal. Those classes are owned by the MultiClusterGateway controller running against a Fleet host project, not by an individual cluster.

The Gateway API CRDs are bundled with GKE — on a sufficiently recent cluster (GKE 1.26+ for the GA gateway.networking.k8s.io/v1 API) the controller and CRDs are installed automatically. Confirm they exist before doing anything else:

kubectl get gatewayclass
# Expect: gke-l7-global-external-managed, gke-l7-rilb, gke-l7-regional-external-managed, ...

kubectl api-resources --api-group=gateway.networking.k8s.io
# gateways, httproutes, grpcroutes, referencegrants, ...

If kubectl get gatewayclass returns nothing, enable the Gateway controller explicitly:

gcloud container clusters update CLUSTER_NAME \
  --location=us-central1 \
  --gateway-api=standard

Step 2: Provision a single-cluster Gateway

Start with a global external Gateway. The Gateway declares listeners; it does not know about your apps. Reserve a static anycast IP first so the address survives Gateway recreation:

gcloud compute addresses create web-gw-ip \
  --global \
  --ip-version=IPV4

Now the Gateway. Bind it to the reserved address with an annotation, and open an HTTP listener that permits routes from any namespace (tighten this later):

kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: external-http
  namespace: infra-gateways
  annotations:
    networking.gke.io/gateway-ip-name: web-gw-ip
spec:
  gatewayClassName: gke-l7-global-external-managed
  listeners:
  - name: http
    protocol: HTTP
    port: 80
    allowedRoutes:
      namespaces:
        from: All

allowedRoutes.namespaces.from is the delegation control. All is permissive; the production pattern is Selector with a namespace label, so only sanctioned namespaces can bind:

    allowedRoutes:
      namespaces:
        from: Selector
        selector:
          matchLabels:
            gateway-access: "true"

Apply it and the controller begins programming a Google Cloud load balancer. This takes a few minutes the first time — the controller is creating forwarding rules, a target proxy, a URL map, and backend services behind the scenes.

The Gateway is intentionally inert without routes. A Gateway with zero attached HTTPRoutes provisions the LB frontend but has no backends, so it returns 404 from the default backend. That is correct behavior, not a failure.

Step 3: HTTPRoute — header matching, traffic splitting, and request mirroring

The HTTPRoute is where application teams live. It references a parent Gateway and routes to Kubernetes Service backends. GKE reads the Service’s cloud.google.com/neg annotation (Autopilot and recent Standard clusters create the standalone NEG automatically for VPC-native clusters) and wires the Service’s pods directly as a backend via NEGs — there is no NodePort hop.

A route doing host matching, weighted splitting between two backends, and a header match for a canary cohort:

kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: store
  namespace: store
spec:
  parentRefs:
  - name: external-http
    namespace: infra-gateways
  hostnames:
  - "store.example.com"
  rules:
  # 1. Internal cohort header -> canary, full weight
  - matches:
    - headers:
      - name: x-canary
        value: "true"
    backendRefs:
    - name: store-canary
      port: 8080
  # 2. Everyone else -> 95/5 weighted split
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: store-stable
      port: 8080
      weight: 95
    - name: store-canary
      port: 8080
      weight: 5

Rule order matters: the header-matched rule is evaluated with higher specificity, so internal traffic carrying x-canary: true always lands on the canary regardless of the weighted split below it. Weights are relative, not percentages — weight: 95 and weight: 5 happen to sum to 100 here, but 30/10 would mean 75%/25%.

Request mirroring (shadow traffic) is a filter, not a backend, so the mirror target receives a copy and its response is discarded. This is how you load-test a new version against real production traffic with zero user impact:

  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api
    filters:
    - type: RequestMirror
      requestMirror:
        backend:
          name: api-v2-shadow
          port: 8080
    backendRefs:
    - name: api-v1
      port: 8080

The URLRewrite and RequestHeaderModifier filters cover path rewriting and header injection, replacing the old rewrite-target and custom-header annotations with validated fields.

Step 4: Policy attachment — HealthCheckPolicy, GCPBackendPolicy, timeouts

This is where GKE’s Google-specific behavior lives, and it is the single biggest improvement over Ingress. Instead of stuffing health check and backend config into annotations, you attach policy objects to a target via a targetRef. Two policies carry most of the weight.

HealthCheckPolicy controls the load balancer health check — without it GKE infers a check from your pod’s readiness probe, which is often wrong (wrong port, wrong path). Pin it explicitly:

kind: HealthCheckPolicy
apiVersion: networking.gke.io/v1
metadata:
  name: store-stable-hc
  namespace: store
spec:
  default:
    config:
      type: HTTP
      httpHealthCheck:
        port: 8080
        requestPath: /healthz
    checkIntervalSec: 5
    timeoutSec: 5
    healthyThreshold: 1
    unhealthyThreshold: 3
  targetRef:
    group: ""
    kind: Service
    name: store-stable

GCPBackendPolicy configures the backend service itself: timeouts, connection draining, session affinity, Cloud Armor, IAP, and Cloud CDN. Set a backend timeout and connection draining here — note this is the backend service timeout (how long the LB waits for a response), distinct from the route-level request timeout:

kind: GCPBackendPolicy
apiVersion: networking.gke.io/v1
metadata:
  name: store-stable-backend
  namespace: store
spec:
  default:
    timeoutSec: 30
    connectionDraining:
      drainingTimeoutSec: 60
    sessionAffinity:
      type: CLIENT_IP
  targetRef:
    group: ""
    kind: Service
    name: store-stable

Request-level timeouts and retries live on the HTTPRoute rule directly, using the upstream Gateway API timeouts field — this is the time budget for the whole request including retries:

  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    timeouts:
      request: "10s"
      backendRequest: "2s"
    backendRefs:
    - name: store-stable
      port: 8080

A GCPBackendPolicy attaches to a Service; a HealthCheckPolicy attaches to a Service. There is also GCPGatewayPolicy for frontend-level concerns (like SSL policy) that attaches to a Gateway. Keep the target kinds straight — attaching a backend policy to a Gateway is a common and silent mistake.

Step 5: Securing Gateways with Cloud Armor, TLS, and certificate maps

TLS. Terminate TLS by adding an HTTPS listener and referencing a certificate. The cleanest approach on GKE is a Google-managed certificate via Certificate Manager certificate maps, referenced by annotation, which lets one Gateway serve many domains and decouples cert lifecycle from the Gateway:

# Certificate Manager: managed cert + map + map entry
gcloud certificate-manager certificates create store-cert \
  --domains="store.example.com"

gcloud certificate-manager maps create store-cert-map

gcloud certificate-manager maps entries create store-entry \
  --map=store-cert-map \
  --certificates=store-cert \
  --hostname=store.example.com

kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: external-https
  namespace: infra-gateways
  annotations:
    networking.gke.io/gateway-ip-name: web-gw-ip
    networking.gke.io/certmap: store-cert-map
spec:
  gatewayClassName: gke-l7-global-external-managed
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    allowedRoutes:
      namespaces:
        from: Selector
        selector:
          matchLabels:
            gateway-access: "true"

When a certificate map is referenced via networking.gke.io/certmap, the LB sources its certs from the map, so you omit tls.certificateRefs from the listener — the annotation wins. For Secret-based certs instead, drop the annotation and set tls.mode: Terminate with a certificateRefs entry pointing at a Kubernetes Secret.

Cloud Armor. A Cloud Armor security policy attaches through GCPBackendPolicy, putting WAF and rate limiting per backend service:

kind: GCPBackendPolicy
apiVersion: networking.gke.io/v1
metadata:
  name: store-armor
  namespace: store
spec:
  default:
    securityPolicy: store-edge-policy   # name of an existing Cloud Armor policy
  targetRef:
    group: ""
    kind: Service
    name: store-stable

Create the policy and its rules with gcloud compute security-policies as usual; the Gateway controller only references it by name. Because it attaches to the backend, different backends behind the same Gateway can carry different WAF postures — a public marketing backend and a partner API backend need not share a rate-limit rule.

Step 6: Multi-cluster Gateways with Fleet and the MC Gateway controller

A multi-cluster Gateway programs one Google Cloud load balancer whose backends span clusters in different regions, all registered to a Fleet. This is the native way to do active-active, geo-distributed serving on GKE without stitching together per-cluster Ingresses behind an external traffic manager.

The model has three pieces:

A Fleet (GKE Hub) with member clusters registered.
A designated config cluster that hosts the Gateway and HTTPRoute resources for the Fleet.
The MultiClusterGateway and MultiClusterService controllers, enabled as Fleet features.

Enable the features and nominate a config cluster:

# Enable multi-cluster Services (the prerequisite) and the MC Gateway controller
gcloud container fleet multi-cluster-services enable

gcloud container fleet ingress enable \
  --config-membership=projects/PROJECT_ID/locations/us-central1/memberships/cluster-west

cluster-west is now the config cluster. Apply Gateway and HTTPRoute objects there only. The multi-cluster Gateway uses an -mc GatewayClass:

kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: external-mc
  namespace: infra-gateways
spec:
  gatewayClassName: gke-l7-global-external-managed-mc
  listeners:
  - name: http
    protocol: HTTP
    port: 80
    allowedRoutes:
      namespaces:
        from: All

Backends in a multi-cluster Gateway are not plain Services — they are ServiceExport objects. Each member cluster exports its Service; the MCS controller synthesizes a Fleet-wide ServiceImport that the HTTPRoute targets. Export the same Service from every cluster:

# Apply in EACH member cluster, in the workload's namespace
kind: ServiceExport
apiVersion: net.gke.io/v1
metadata:
  name: store
  namespace: store

The HTTPRoute on the config cluster references the derived ServiceImport as its backend group:

kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: store-mc
  namespace: store
spec:
  parentRefs:
  - name: external-mc
    namespace: infra-gateways
  hostnames:
  - "store.example.com"
  rules:
  - backendRefs:
    - group: net.gke.io
      kind: ServiceImport
      name: store
      port: 8080

The single global LB now load-balances across pods in both clusters, with Google’s anycast steering each client to the nearest healthy region.

Step 7: Cross-cluster failover and capacity-based routing

The reason to use a multi-cluster Gateway over DNS-based failover is the data plane behaves like one global ALB with proximity routing and automatic failover built in. Two behaviors do the heavy lifting:

Proximity-based routing and overflow. The global LB routes a client to the closest region with healthy capacity. If the nearest region is saturated or unhealthy, traffic overflows to the next region automatically — no DNS TTL to wait out, failover is at the connection level in seconds.

Capacity-based routing. Overflow is governed by backend capacity, which you set with GCPBackendPolicy using maxRatePerEndpoint (RATE balancing mode). Once a region’s endpoints hit their configured rate, the LB spills surplus to other regions before the local region degrades:

kind: GCPBackendPolicy
apiVersion: networking.gke.io/v1
metadata:
  name: store-capacity
  namespace: store
spec:
  default:
    maxRatePerEndpoint: 100    # requests/sec per endpoint before overflow
  targetRef:
    group: net.gke.io
    kind: ServiceImport
    name: store

To drain a region for maintenance, scale its workload to zero or kubectl delete serviceexport store in that cluster; the MCS controller removes those endpoints from the global LB and all traffic shifts to the surviving cluster. Re-applying the ServiceExport brings it back into rotation.

Verify

Programming an LB is asynchronous, so trust status, not the apply. Walk the chain from Gateway to route to policy.

# 1. Gateway: PROGRAMMED=True and an assigned address
kubectl get gateway external-http -n infra-gateways -o wide
kubectl describe gateway external-http -n infra-gateways
# Look in Status.Conditions for: Accepted=True, Programmed=True
# Status.Addresses holds the VIP once the LB is live

# 2. HTTPRoute: Accepted=True and ResolvedRefs=True per parent
kubectl describe httproute store -n store
# ResolvedRefs=False usually means a backend Service or its NEG is missing

# 3. Policies attached and accepted
kubectl describe healthcheckpolicy store-stable-hc -n store
kubectl describe gcpbackendpolicy store-stable-backend -n store

# 4. Multi-cluster: the derived ServiceImport exists on the config cluster
kubectl get serviceimport store -n store

The two most common stuck states: Programmed=False lingering past ~10 minutes points at an IAM, quota, or NEG problem in the resource graph (check the controller events in kubectl describe gateway), and ResolvedRefs=False on the route means the backend Service exists but its standalone NEG was never created — confirm the cluster is VPC-native and the Service has a cloud.google.com/neg annotation. Send a live request once Programmed=True:

ADDR=$(kubectl get gateway external-http -n infra-gateways \
  -o jsonpath='{.status.addresses[0].value}')

curl -s -o /dev/null -w "%{http_code}\n" \
  -H "Host: store.example.com" "http://${ADDR}/"

# Header-matched canary path
curl -s -H "Host: store.example.com" -H "x-canary: true" "http://${ADDR}/"

Enterprise scenario

A payments platform team ran an active-passive setup: primary GKE cluster in us-central1, a warm standby in us-east4, failover orchestrated by flipping a Cloud DNS record. During a regional control-plane event they measured real failover at just under nine minutes — DNS TTL plus resolver caching plus client connection pools holding the dead IP. For a payments SLO that was a quarter’s worth of error budget burned in one incident.

The constraint that ruled out a naive fix: the standby cluster sat idle, doubling cost, and they could not simply run active-active because their existing two Ingresses produced two independent VIPs with no shared capacity awareness. They moved to a multi-cluster Gateway on a Fleet. Both clusters now export the payments Service; a single global-external-managed-mc Gateway fronts both with one anycast VIP. They set per-endpoint capacity so each region carries steady-state load but absorbs the other’s traffic on failure, and the LB overflows at the connection level rather than waiting on DNS.

kind: GCPBackendPolicy
apiVersion: networking.gke.io/v1
metadata:
  name: payments-capacity
  namespace: payments
spec:
  default:
    maxRatePerEndpoint: 80
    timeoutSec: 15
    connectionDraining:
      drainingTimeoutSec: 60
  targetRef:
    group: net.gke.io
    kind: ServiceImport
    name: payments

The result: measured failover dropped from ~9 minutes to under 30 seconds in their next game day, the standby capacity now serves live traffic instead of sitting idle, and DNS was removed from the failover path entirely. The single behavioral change they had to socialize widely was that “the standby region” no longer existed as a concept — both regions were always live, which simplified on-call reasoning more than any runbook.

Migration playbook and debugging programming status

Migrating from Ingress is not a flag flip; run both in parallel and cut over by DNS. A pragmatic sequence:

Inventory each Ingress and map its annotations to the new model: kubernetes.io/ingress.global-static-ip-name -> networking.gke.io/gateway-ip-name; BackendConfig (timeouts, Cloud Armor, IAP, CDN) -> GCPBackendPolicy; BackendConfig health check -> HealthCheckPolicy; FrontendConfig SSL policy/redirects -> GCPGatewayPolicy and a redirect filter; managed-cert annotation -> Certificate Manager cert map.
Stand up the Gateway and HTTPRoutes on a new reserved IP, alongside the live Ingress. Nothing is cut over yet.
Validate against the new VIP directly with Host headers and synthetic checks; confirm Programmed=True and exercise every route including TLS and Cloud Armor.
Shift DNS to the new VIP gradually (weighted records), monitor, then retire the Ingress and its old IP.

For debugging, the conditions are the contract. Accepted means the controller understood the spec; Programmed means the Google Cloud LB is actually configured and serving. A Gateway stuck at Accepted=True, Programmed=False is almost always one of: insufficient quota (forwarding rules, backend services), missing IAM on the GKE service account, a NEG that never materialized because the cluster is not VPC-native, or a referenced Cloud Armor / cert-map resource that does not exist. Read the events:

kubectl get events -n infra-gateways --sort-by=.lastTimestamp | tail -20
kubectl describe gateway external-https -n infra-gateways

Cross-reference what the controller built against the Cloud Console load-balancing view — every Gateway maps to a forwarding rule, target proxy, URL map, and backend services you can inspect directly. When the Kubernetes status and the GCP resource graph disagree, the controller events name the missing piece.

GKE Gateway API: Single and Multi-Cluster Traffic Management

Step 1: Gateway API vs Ingress — roles, resources, and GatewayClasses

Step 2: Provision a single-cluster Gateway

Step 3: HTTPRoute — header matching, traffic splitting, and request mirroring

Step 4: Policy attachment — HealthCheckPolicy, GCPBackendPolicy, timeouts

Step 5: Securing Gateways with Cloud Armor, TLS, and certificate maps

Step 6: Multi-cluster Gateways with Fleet and the MC Gateway controller

Step 7: Cross-cluster failover and capacity-based routing

Verify

Enterprise scenario

Migration playbook and debugging programming status

Checklist

Written by Vinod

Comments

Keep Reading

BigQuery Fine-Grained Security: Column-Level, Row-Level, and Data Masking

Cloud DNS at Scale: Private Zones, Peering, Forwarding, and Response Policies

Event-Driven Architecture with Cloud Functions 2nd Gen and Eventarc