Engineering the Global External Application Load Balancer on GCP

The global external Application Load Balancer (ALB) is the front door for most internet-facing workloads on GCP, and it is deceptively deep. It is not one resource but a graph: a global anycast IP fronts a target proxy, which references a URL map, which routes to backend services, which point at backends spread across regions. Each layer has tuning knobs that quietly determine your tail latency, your blast radius during a regional outage, and your bill. This walkthrough builds one end to end against the Envoy-based global ALB, then layers on the controls a platform team needs in production: balancing modes, header-based routing, hybrid NEGs, Cloud CDN, Cloud Armor at the edge, mTLS, and the observability to debug it at 2 a.m.

This is the global external Application Load Balancer, the modern successor to the classic HTTP(S) load balancer: global anycast, Envoy data plane, advanced traffic management. Its components live in the global scope, which matters for every command below.

Step 1: Understand the resource graph

Before any gcloud, internalize the chain. A request flows through five distinct resources:

Resource	Role	Scope
Forwarding rule	Binds the global anycast IP + port to a target proxy	Global
Target proxy	Terminates TLS (HTTPS proxy) or plain HTTP; references the URL map	Global
URL map	Routes by host/path/header/query to backend services	Global
Backend service	Health checks, balancing mode, CDN, Cloud Armor, timeouts	Global
Backend (NEG/MIG)	The actual endpoints: instance groups, serverless, hybrid, internet	Regional or zonal

Two rules save hours of confusion. First, the global ALB uses global backend services and forwarding rules; create a regional forwarding rule by accident and you get the regional ALB, a different product with no global anycast. Second, Cloud CDN and Cloud Armor attach to the backend service, not the proxy, so caching and WAF policy are per-backend, not per-frontend.

# Reserve the global anycast IP first; everything else references it.
gcloud compute addresses create web-ip \
  --ip-version=IPV4 \
  --network-tier=PREMIUM \
  --global

Premium tier is mandatory for the global ALB; Standard tier only supports regional load balancing. The --global flag is the tell that you are on the right product in every command below.

Step 2: Build and tune backend services

The backend service is where most production decisions live. Start with a health check and a service, then tune the balancing behavior.

gcloud compute health-checks create http web-hc \
  --port=8080 \
  --request-path="/healthz" \
  --check-interval=5s \
  --timeout=5s \
  --healthy-threshold=2 \
  --unhealthy-threshold=3 \
  --global

gcloud compute backend-services create web-bes \
  --protocol=HTTP \
  --port-name=http \
  --health-checks=web-hc \
  --global \
  --load-balancing-scheme=EXTERNAL_MANAGED \
  --timeout=30s

EXTERNAL_MANAGED is the load balancing scheme for the Envoy-based global ALB. The older EXTERNAL scheme selects the classic HTTP(S) load balancer and locks you out of advanced traffic management. Get this wrong and you will not understand why header routing silently does nothing.

Balancing mode and capacity scaler

When you add a backend, the balancing mode decides how the ALB measures whether a backend is “full.” For an ALB the choice is usually RATE (requests per second) or UTILIZATION (backend CPU). RATE is more predictable for stateless web tiers because it does not depend on noisy CPU signals.

gcloud compute backend-services add-backend web-bes \
  --instance-group=web-mig-usc1 \
  --instance-group-region=us-central1 \
  --balancing-mode=RATE \
  --max-rate-per-instance=200 \
  --capacity-scaler=1.0 \
  --global

# Repeat add-backend per region (e.g. europe-west1) for global spread.

The global ALB prefers the region closest to the user, then spills to the next-closest region once a region hits its configured capacity. The capacity scaler (0.0 to 1.0) is your pressure-relief valve: it scales the effective max-rate down without redeploying. Setting a backend’s scaler to 0.0 drains it gracefully — new connections stop arriving while existing ones finish — which is how you cordon a region for maintenance. Leave the others at 1.0 and the ALB rebalances automatically.

Capacity scaler is the single most useful operational knob on the ALB. Wire update-backend --capacity-scaler=0.0 into your regional-drain runbook; it is far gentler than deleting a backend and faster than scaling a MIG to zero.

Outlier detection and connection draining

Health checks catch a dead instance; outlier detection catches a sick one — an endpoint returning 5xx or gateway errors while still passing health checks. It ejects the bad endpoint from the load balancing pool for a cooldown, much like a circuit breaker.

gcloud compute backend-services update web-bes --global \
  --connection-draining-timeout=60s \
  --outlier-detection-consecutive-errors=5 \
  --outlier-detection-interval=10s \
  --outlier-detection-base-ejection-time=30s \
  --outlier-detection-max-ejection-percent=50

Connection draining gives in-flight requests up to the timeout to finish when an endpoint is removed or a MIG scales in, so rolling deploys do not sever live requests. Outlier detection plus draining is the difference between one bad pod causing a brief blip versus a sustained error rate.

Step 3: Route with URL maps

The URL map is the routing brain. The simplest form sends everything to one backend; the interesting form routes by host, path, header, and query, and splits traffic for canaries.

gcloud compute url-maps create web-urlmap --default-service=web-bes --global

For anything beyond a default service, export the URL map to YAML, edit, and re-import. This is the only sane way to manage host rules, path matchers, and header routing, and it is reviewable in git.

gcloud compute url-maps export web-urlmap --global --destination=urlmap.yaml

A URL map that routes /api to an API backend, header-routes a beta cohort, and canaries 5% of root traffic to a new backend looks like this:

defaultService: global/backendServices/web-bes
hostRules:
  - hosts: ["app.example.com"]
    pathMatcher: app-matcher
pathMatchers:
  - name: app-matcher
    defaultService: global/backendServices/web-bes
    routeRules:
      # Header-based routing: beta cohort to a separate backend.
      - priority: 10
        matchRules:
          - prefixMatch: "/"
            headerMatches:
              - headerName: "X-Canary"
                exactMatch: "beta"
        service: global/backendServices/web-beta-bes
      # Path-based routing for the API tier.
      - priority: 20
        matchRules:
          - prefixMatch: "/api/"
        service: global/backendServices/api-bes
      # Weighted traffic split: 5% canary on the root path.
      - priority: 30
        matchRules:
          - prefixMatch: "/"
        routeAction:
          weightedBackendServices:
            - backendService: global/backendServices/web-bes
              weight: 95
            - backendService: global/backendServices/web-canary-bes
              weight: 5

routeRules are evaluated by ascending priority, and the first match wins — order is semantic, not cosmetic. Put specific matches (headers, exact paths) at low priority numbers and catch-alls last. weightedBackendServices is true L7 traffic splitting at the edge: no client changes, no DNS tricks, just shift the weights to ramp a canary and watch your SLOs. You can match on queryParameterMatches the same way, and add routeAction.urlRewrite to rewrite host or path before the request reaches the backend.

gcloud compute url-maps import web-urlmap --global \
  --source=urlmap.yaml --quiet

Step 4: Hybrid and internet NEGs

Backends are not limited to GCP instance groups. Network Endpoint Groups (NEGs) let the same ALB front serverless services, on-prem systems, and arbitrary internet endpoints — one consistent edge (CDN + Cloud Armor + TLS) in front of a heterogeneous estate during a migration.

Hybrid NEGs (NON_GCP_PRIVATE_IP_PORT) point at on-prem or other-cloud backends reachable over Cloud VPN or Interconnect — the canonical strangler-fig pattern: new paths to GCP, legacy paths to the data center, all behind one IP.

gcloud compute network-endpoint-groups create onprem-neg \
  --network-endpoint-type=NON_GCP_PRIVATE_IP_PORT \
  --zone=us-central1-a \
  --network=hub-vpc \
  --subnet=prod-usc1

gcloud compute network-endpoint-groups update onprem-neg \
  --zone=us-central1-a \
  --add-endpoint="ip=10.50.0.10,port=8080"

Internet NEGs (INTERNET_FQDN_PORT or INTERNET_IP_PORT) reference an external endpoint by FQDN or IP — useful for fronting a third-party API or an external origin with Cloud CDN and Cloud Armor in front of it.

gcloud compute network-endpoint-groups create ext-origin-neg \
  --network-endpoint-type=INTERNET_FQDN_PORT \
  --global

gcloud compute network-endpoint-groups update ext-origin-neg --global \
  --add-endpoint="fqdn=origin.partner.example.com,port=443"

Hybrid NEGs require an EXTERNAL_MANAGED backend service and a distributed Envoy health check with a proxy-only subnet in the relevant region. Internet NEGs are global and cannot be health-checked by the ALB — it trusts the endpoint’s availability — so pair them with outlier detection to eject a failing origin.

Step 5: Cloud CDN

Cloud CDN is a flag on the backend service plus a cache policy. The cache mode is the first decision and the easiest to get wrong.

gcloud compute backend-services update web-bes --global \
  --enable-cdn \
  --cache-mode=CACHE_ALL_STATIC \
  --default-ttl=3600 \
  --max-ttl=86400 \
  --client-ttl=3600 \
  --negative-caching \
  --serve-while-stale=86400

Cache mode	Behavior
`USE_ORIGIN_HEADERS`	Cache only what the origin marks cacheable via `Cache-Control`. Safest; origin is authoritative.
`CACHE_ALL_STATIC`	Cache static content types automatically; honor origin headers for the rest. Good default for web.
`FORCE_CACHE_ALL`	Cache every response regardless of headers. Dangerous near auth-gated content — you can cache a logged-in user’s page and serve it to others.

negative-caching caches error responses (404, 410, 5xx) for a short TTL so a thundering herd for a missing object does not hammer the origin. serve-while-stale keeps serving slightly-stale content while revalidating in the background, protecting you during an origin blip.

Cache keys

By default the cache key includes the full host, path, and query string. If your URLs carry per-user or tracking query params, every variation becomes a distinct cache entry and your hit rate collapses. Strip what does not change the response.

gcloud compute backend-services update web-bes --global \
  --cache-key-include-protocol \
  --cache-key-include-host \
  --no-cache-key-include-query-string

To key on only specific params (for example a v cache-buster) use --cache-key-query-string-whitelist=v. Getting the cache key right is usually the single biggest lever on hit ratio.

Signed URLs

For paywalled or time-limited assets, signed URLs (or signed cookies) let the CDN serve private content without an origin round-trip per request. Attach a signing key with gcloud compute backend-services add-signed-url-key web-bes --global --key-name=key1 --key-file=cdn-key.b64; requests must then carry a valid Expires, KeyName, and Signature query string, and the CDN rejects anything expired or tampered before it reaches your origin.

Step 6: Cloud Armor at the edge

Cloud Armor is a security policy attached to the backend service, enforced at Google’s edge before traffic reaches your backends. Build the policy, add rules, then bind it.

gcloud compute security-policies create web-armor \
  --description="Edge WAF + rate limiting for web tier"

# Pre-configured WAF rule: block SQL injection at sensitivity 1.
gcloud compute security-policies rules create 1000 \
  --security-policy=web-armor \
  --expression="evaluatePreconfiguredWaf('sqli-v33-stable', {'sensitivity': 1})" \
  --action=deny-403

# Per-client rate limiting: throttle abusive IPs.
gcloud compute security-policies rules create 2000 \
  --security-policy=web-armor \
  --src-ip-ranges="*" \
  --action=throttle \
  --rate-limit-threshold-count=100 \
  --rate-limit-threshold-interval-sec=60 \
  --conform-action=allow \
  --exceed-action=deny-429 \
  --enforce-on-key=IP

The pre-configured WAF rules implement the OWASP ModSecurity Core Rule Set (sqli, xss, lfi, rfi, rce, and more). Sensitivity is the false-positive dial: move to --action=deny-403 only after a soak in preview mode (--preview), because a too-aggressive WAF blocks legitimate traffic that happens to look like an attack.

Rate limiting supports throttle (cap the rate) and rate-based-ban (block an offender entirely once they exceed a threshold). --enforce-on-key controls the bucket: IP, HTTP-HEADER, XFF-IP, or HTTP-COOKIE to limit per authenticated user rather than per source IP.

For bot management, layer in reCAPTCHA-based rules and Google’s threat intelligence feeds, then bind the policy to the backend service:

# Block known malicious IPs via Google threat intelligence.
gcloud compute security-policies rules create 500 \
  --security-policy=web-armor \
  --expression="evaluateThreatIntelligence('iplist-known-malicious-ips')" \
  --action=deny-403

gcloud compute backend-services update web-bes --global \
  --security-policy=web-armor

Always deploy a new WAF rule with --preview first, then read the verdict = preview-deny entries in the load balancer logs for a few days. Promoting straight to deny is how you take down checkout during a sale because someone’s coupon code matched an SQLi signature.

Step 7: TLS termination, certificate maps, and mTLS

The HTTPS target proxy terminates TLS. Use Google-managed certificates so you never rotate manually. The modern path uses the Certificate Manager API and a certificate map, which lets one proxy serve many domains and handles wildcard plus SAN combinations cleanly.

gcloud certificate-manager certificates create web-cert \
  --domains="app.example.com,www.example.com"

gcloud certificate-manager maps create web-cert-map

gcloud certificate-manager maps entries create web-primary \
  --map=web-cert-map \
  --certificates=web-cert \
  --hostname="app.example.com"

gcloud compute target-https-proxies create web-https-proxy \
  --url-map=web-urlmap \
  --certificate-map=web-cert-map \
  --global

gcloud compute forwarding-rules create web-https-fr \
  --address=web-ip \
  --target-https-proxy=web-https-proxy \
  --ports=443 \
  --load-balancing-scheme=EXTERNAL_MANAGED \
  --global

Managed certificates require the domain to resolve to the ALB IP for validation, so create the A record before expecting ACTIVE status. Pin a modern TLS policy with --ssl-policy to disable TLS 1.0/1.1 and weak ciphers; the default profile is permissive.

mTLS with Trust Config

For mutual TLS — verifying client certificates at the edge — the global ALB uses Certificate Manager Trust Config (a store of root and intermediate CAs) plus a ServerTlsPolicy that references it. Validation happens before the request hits your backend, and the result is passed downstream as a header.

# Trust config holds the CA bundle that signs valid client certs.
gcloud certificate-manager trust-configs import client-trust \
  --source=trust-config.yaml

# ServerTlsPolicy ties the trust config to a validation mode.
gcloud network-security server-tls-policies import mtls-policy \
  --source=server-tls-policy.yaml \
  --location=global

A minimal server-tls-policy.yaml enforcing client certs looks like:

name: mtls-policy
mtlsPolicy:
  clientValidationMode: REJECT_INVALID
  clientValidationTrustConfig: "projects/PROJECT/locations/global/trustConfigs/client-trust"

REJECT_INVALID drops connections without a valid client cert at the edge; ALLOW_INVALID_OR_MISSING_CLIENT_CERT lets them through but stamps the validation outcome into a header (X-Client-Cert-*) so the backend decides. Attach the ServerTlsPolicy to the target HTTPS proxy, and client-cert verification stays off your application servers entirely.

Verify

Prove each layer independently before declaring victory.

# Frontend chain resolves and the cert is ACTIVE.
gcloud compute target-https-proxies describe web-https-proxy --global \
  --format="value(sslCertificates,certificateMap)"
gcloud certificate-manager certificates describe web-cert \
  --format="value(managed.state)"

# Backends are HEALTHY in every region.
gcloud compute backend-services get-health web-bes --global

# End-to-end request; inspect status, CDN cache, and timing.
curl -sS -o /dev/null -w "code=%{http_code} ttfb=%{time_starttransfer}s\n" \
  https://app.example.com/

# Confirm CDN is serving from cache (look for the cache header).
curl -sSI https://app.example.com/static/app.js | grep -i "x-cache\|age\|cache-control"

# Header routing actually splits to the beta backend.
curl -sSI -H "X-Canary: beta" https://app.example.com/ | grep -i "via\|server"

For any failure, the load balancer log’s jsonPayload.statusDetails gives the verdict, and httpRequest.cacheHit tells you CDN hit versus miss (covered next).

Observability and 5xx triage

Enable logging on the backend service with a sample rate; 100% is fine to start, then dial down for cost.

gcloud compute backend-services update web-bes --global \
  --enable-logging \
  --logging-sample-rate=1.0

The ALB log’s statusDetails field is the fastest 5xx triage tool on GCP because it distinguishes who failed. A 502 with failed_to_connect_to_backend is a backend or firewall problem; 502 with backend_timeout means your --timeout is shorter than the backend’s real latency; 503 with no_healthy_upstream means health checks are failing across the board. The latency breakdown separates frontend RTT from backend latency, so you can tell a slow client from a slow service.

resource.type="http_load_balancer"
httpRequest.status>=500
| project timestamp, httpRequest.status, jsonPayload.statusDetails,
          httpRequest.requestUrl, httpRequest.latency

Enterprise scenario

A retail platform team ran a global ALB fronting a regional GKE service in us-central1 and europe-west1, with Cloud CDN on the static backend and Cloud Armor in front. Black Friday traffic tripled and they began seeing sporadic 502s — but only on the API path, never on static assets, and only under load. Their first instinct was “the ALB is overwhelmed,” which sent them chasing capacity that was not the problem.

The log told the real story. Every failing request carried jsonPayload.statusDetails = "backend_timeout". The backend timeout was the default-ish 30s, but the API’s p99 under peak load had drifted to ~34s because a downstream dependency was slow. The ALB did exactly what it was told: cut the connection at the timeout and return 502. Static assets never tripped it because they were served from the CDN edge and never reached the origin.

Raising the timeout blindly would have masked the latency regression, so they did two things. They bumped the timeout to a deliberate 45s to stop severing nearly-complete requests, and they enabled the outlier detection from Step 2 so a single slow pod got ejected instead of dragging down the pool:

gcloud compute backend-services update api-bes --global --timeout=45s

They also wired a Cloud Monitoring alert on backend 5xx rate broken down by statusDetails, so the next incident would name its own root cause. The lesson: on the global ALB, a 502 is not one failure mode — statusDetails tells you whether it is connectivity, timeout, or health, and the backend timeout is a latency contract you set on purpose, not inherit.

Checklist

Closing notes

Treat the global ALB as a configurable distributed system, not a black-box appliance. Three layers deserve attention from day one: the backend timeout as an explicit latency contract, the cache key as your hit-ratio lever, and Cloud Armor preview mode as the seatbelt before you arm a WAF rule. Manage the URL map and security policy as code, alert on statusDetails rather than raw 5xx counts, and use capacity scaler for graceful regional drains.