Google Cloud Load Balancing, In Depth: Global vs Regional, the LB Types & Backends

Sooner or later every workload on Google Cloud needs a front door. One virtual machine is a single point of failure; the moment you run two, something has to spread traffic across them, notice when one dies, and keep users from ever seeing the failure. That something is a load balancer. On most clouds a load balancer is a single box you put in a region. On Google Cloud it is something stranger and more powerful: for the flagship product, the load balancer is the network itself — a single anycast IP address announced from over a hundred Google edge locations worldwide, with no instance to size, patch, or scale.

That power comes with a price: choice. Google Cloud does not have one load balancer, it has a family, and picking the wrong member is the single most common networking mistake new architects make. Reach for a global product when you only serve one region and you over-pay and over-engineer; reach for a regional one when you have a global audience and you lose the anycast front door that makes GCP special; confuse a proxy load balancer with a passthrough one and you spend an afternoon wondering why the client IP your application logs is wrong.

This lesson is the map. By the end you will be able to look at any workload — a global web app, an internal microservice, a TCP game server, a Cloud Run container — and name the exact load balancer it needs and why. We will walk the whole family with a decision table, then take the flagship apart screw by screw: the chain of resources from forwarding rule to backend that every Google Cloud load balancer is built from, the health checks that keep it honest, session affinity, balancing modes, Cloud Armor at the edge, and the serverless network endpoint groups that let a load balancer point at Cloud Run and Cloud Functions. It maps to the Associate Cloud Engineer (ACE) and Professional Cloud Network Engineer (PCNE) exams.

Learning objectives

By the end of this lesson you can:

Explain the two axes that define every Google Cloud load balancer — traffic type (Application/L7 vs Network/L4) and deployment scope (global vs regional, external vs internal) — and use them to choose the right one.
Name and place each member of the family: global and regional external Application LB, internal Application LB, external and internal passthrough Network LB, and the proxy Network LB.
Distinguish a proxy load balancer from a passthrough one, and explain what that means for the client IP, TLS termination, and protocols.
Assemble the building blocks every load balancer shares — forwarding rule → target proxy → URL map → backend service → backend (instance group or NEG) — and explain what each layer does.
Configure health checks, balancing modes and capacity, and session affinity, and reason about their trade-offs.
Attach Cloud Armor for WAF and DDoS protection, and use a serverless NEG to load-balance Cloud Run, Cloud Functions and App Engine.

Prerequisites & where this fits

You should be comfortable with a virtual private cloud (VPC), subnets and firewall rules — load balancers live inside a VPC and forward to backends in subnets, and a missing firewall rule for health-check probes is the classic reason a brand-new load balancer reports every backend as unhealthy. If those terms are hazy, read Google Cloud VPC, In Depth (gcp-vpc-deep-dive-subnets-routes-firewall-nat) first. A working knowledge of managed instance groups helps but is not required. This lesson sits in the Networking module of the Google Cloud Zero-to-Hero course, after VPC and before Google Kubernetes Engine. It is the conceptual companion to the hands-on build in Engineering the Global External Application Load Balancer on GCP (gcp-global-external-application-load-balancer-deep-dive): this lesson teaches you which load balancer to choose and how the pieces fit; that one walks you through wiring the flagship end to end with every tuning knob.

Core concepts: the two axes that define every load balancer

Before any product names, internalise the two questions that uniquely identify a Google Cloud load balancer. Every member of the family is just a point on this 2×2 (well, 2×3) grid.

Axis 1 — what kind of traffic? (the OSI layer). A load balancer either understands your application protocol or it does not.

An Application Load Balancer (ALB) operates at Layer 7 (HTTP/HTTPS/HTTP2/gRPC). It terminates the connection, reads the request — host, path, headers, cookies — and makes routing decisions from it. Because it terminates TLS, it can do content-based routing, caching (Cloud CDN), rewriting, and WAF inspection. It is a proxy.
A Network Load Balancer (NLB) operates at Layer 4 (TCP/UDP and other IP protocols). It does not read your application data. Within the NLB family there are two flavours: a passthrough NLB that routes packets without terminating the connection (the backend sees the original client IP and answers the client directly), and a proxy NLB that terminates the TCP/TLS connection and opens a new one to the backend (used for TCP/SSL offload without L7 routing).

The single most important consequence: a proxy load balancer hides the client IP (the backend sees Google’s IP unless you read the X-Forwarded-For header or enable the PROXY protocol), while a passthrough load balancer preserves it (the backend sees the real client). Architects who log the wrong field and see Google IPs everywhere have invariably forgotten this.

Axis 2 — where does it live and who can reach it? (scope and exposure).

Global vs regional. A global load balancer has one anycast IP served from Google’s worldwide edge; a user in Mumbai and a user in São Paulo hit the same IP but are served from the nearest healthy backend. A regional load balancer lives in one region and its IP is anchored there. Global products require Premium Network Tier; regional products can run on Standard Tier (cheaper egress, regional reach).
External vs internal. An external load balancer has a public-facing front end for internet clients. An internal load balancer has a private IP reachable only from inside your VPC (and connected networks) — it is how microservices call each other without traversing the internet.

Two terms you will meet throughout:

A forwarding rule is the front end — the IP-address-plus-port the load balancer answers on. It is the entry point of the resource chain.
A backend service is the brain — it groups your backends, owns the health check, and holds the policy (balancing mode, session affinity, timeouts, Cloud CDN, Cloud Armor). Everything interesting is configured here.

One more idea worth fixing early: Google Cloud load balancers are software-defined, not appliances. There is no instance to provision, no throughput SKU to pick for the flagship, and the global ALB scales to millions of queries per second without any pre-warming. You configure a graph of resources and Google’s edge fabric runs it.

The load balancer family: a decision table

Here is the whole family on one page. Read the traffic type and scope columns first; they determine the product, and everything else is detail.

Load balancer	Layer / proxy	Scope	Exposure	Protocols	Frontend IP	Network Tier	Primary use case
Global external Application LB	L7 proxy	Global	External	HTTP, HTTPS, HTTP/2, gRPC	Global anycast	Premium	Internet-facing web apps & APIs with a global audience; Cloud CDN, advanced routing
Regional external Application LB	L7 proxy	Regional	External	HTTP, HTTPS, HTTP/2	Regional	Standard or Premium	Internet-facing web app pinned to one region; data-residency or Standard-Tier cost
Internal Application LB	L7 proxy	Regional (or cross-region)	Internal	HTTP, HTTPS, HTTP/2, gRPC	Private (VPC)	n/a	L7 routing between internal microservices
External passthrough Network LB	L4 passthrough	Regional	External	TCP, UDP, ESP, ICMP, L3_DEFAULT	Regional	Standard or Premium	Internet-facing non-HTTP (game servers, custom TCP/UDP), source-IP preservation, very low overhead
Internal passthrough Network LB	L4 passthrough	Regional	Internal	TCP, UDP, ICMP, L3_DEFAULT	Private (VPC)	n/a	Internal L4 distribution; the only LB usable as a next-hop route; source-IP preserved
External proxy Network LB	L4 proxy	Global or regional	External	TCP, SSL (TLS)	Anycast (global) / regional	Premium / Standard	Internet-facing TCP with TLS offload but no L7 routing
Internal proxy Network LB	L4 proxy	Regional (or cross-region)	Internal	TCP, SSL	Private (VPC)	n/a	Internal TCP proxying / TLS offload between services

How to read this in practice — the decision tree in words:

Is your traffic HTTP/HTTPS/gRPC and do you want path/host routing, TLS termination, caching, or a WAF? Use an Application Load Balancer. Then: internet-facing and global audience → global external ALB; internet-facing but single region (or you need Standard Tier / data residency) → regional external ALB; service-to-service inside the VPC → internal ALB.
Is it raw TCP/UDP (a game server, a database protocol, SMTP, syslog), or do you need the backend to see the real client IP, or do you need the lowest possible overhead? Use a passthrough Network LB — external for the internet, internal for inside the VPC. The internal passthrough NLB is also special: it is the only load balancer you can name as the next hop in a custom route, which is how you build network virtual appliance (firewall) chains.
Is it TCP and you want TLS offload or a global anycast TCP front end but you do not need to inspect the application layer? Use a proxy Network LB.

A few clarifying notes that trip people up. The global external ALB is the modern, Envoy-based successor to the legacy “HTTP(S) Load Balancer”; you may still see the old name in documentation. There are two editions of the global external ALB — a global one and a classic one (the latter is the older control plane); new builds should use the global (non-classic) one for the full feature set. The regional Application and proxy LBs and the internal ALB all run on the same open-source Envoy data plane, which is why they share advanced traffic-management features. The passthrough Network LBs use Google’s Maglev data plane, which is why they are connectionless, preserve source IP, and add almost no latency.

The building blocks: from forwarding rule to backend

Every Google Cloud load balancer — whatever its layer or scope — is assembled from the same chain of resources. Learn the chain once and you understand all of them; the only differences are which pieces are global vs regional and whether a URL map exists (L7 only). This is also exactly what an exam will ask you to put in order.

#	Resource	What it does	Scope	L7 only?
1	Forwarding rule	The front end: binds an IP address + port + protocol and points at a target proxy (L7/proxy) or backend service (passthrough). This is what clients connect to.	Global or regional	No
2	Target proxy	Terminates the connection. `target-http(s)-proxy` for ALB, `target-tcp/ssl-proxy` for proxy NLB. Holds the SSL certificate and SSL policy for HTTPS/SSL. References the URL map (L7) or backend service (proxy NLB).	Global or regional	Proxy LBs only
3	URL map	The router: matches host, path, header and query parameters and sends each request to the right backend service. Also does redirects and header/path rewrites.	Global or regional	Yes (ALB)
4	Backend service	The brain: groups backends, owns the health check, and holds policy — protocol, balancing mode, session affinity, timeouts, connection draining, Cloud CDN, Cloud Armor, logging.	Global or regional	No
5	Backend	The actual endpoints behind the service: a managed/unmanaged instance group (MIG), a network endpoint group (NEG) — zonal, serverless, internet, hybrid, or Private Service Connect — or a Cloud Storage bucket (CDN origin).	Zonal/regional	No
—	Health check	Probes each backend and removes unhealthy ones from rotation. Attached to the backend service.	Global or regional	No

Read the chain top to bottom as a request’s journey: a packet hits the forwarding rule (the IP:port), which hands it to the target proxy (which terminates TLS), which consults the URL map (which inspects the path and chooses a route), which points at a backend service (which applies policy and load-balances), which selects a healthy backend endpoint. For a passthrough Network LB the chain is shorter — forwarding rule → backend service → backend — because there is no proxy and no URL map; packets flow straight through.

Here is the chain built in gcloud for a global external Application LB in front of a managed instance group, so the abstractions become concrete. (The companion lesson, gcp-global-external-application-load-balancer-deep-dive, expands every flag below.)

# 5 + health check: a backend MIG already exists as "web-mig" in us-central1.
gcloud compute health-checks create http web-hc \
  --port=8080 --request-path=/healthz \
  --check-interval=5s --timeout=5s \
  --healthy-threshold=2 --unhealthy-threshold=3 \
  --global

# 4: backend service (global) — the brain.
gcloud compute backend-services create web-bes \
  --protocol=HTTP --port-name=http \
  --health-checks=web-hc \
  --global

# attach the MIG as a backend, with a balancing mode (see below).
gcloud compute backend-services add-backend web-bes \
  --instance-group=web-mig \
  --instance-group-region=us-central1 \
  --balancing-mode=UTILIZATION --max-utilization=0.8 \
  --global

# 3: URL map — send everything to web-bes for now.
gcloud compute url-maps create web-map --default-service=web-bes

# 2: target proxy (HTTP here; HTTPS would attach a certificate).
gcloud compute target-http-proxies create web-proxy --url-map=web-map

# 1: forwarding rule — reserve a global anycast IP, then bind :80.
gcloud compute addresses create web-ip --ip-version=IPV4 --global
gcloud compute forwarding-rules create web-fr \
  --address=web-ip --target-http-proxy=web-proxy \
  --ports=80 --global

Notice that --global appears on the health check, backend service, URL map, target proxy, address and forwarding rule. Consistency of scope is everything: mix a regional forwarding rule into this chain and you have silently built a regional ALB — a different product with no anycast. If gcloud complains that a resource “cannot be used” by another, a scope mismatch is the first thing to check.

Health checks: how the load balancer knows what is alive

A load balancer is only as good as its ability to stop sending traffic to a dead backend. That is the health check — a probe Google sends to each endpoint on an interval; pass enough times in a row and the backend is healthy and receives traffic, fail enough times and it is unhealthy and is pulled from rotation until it recovers.

Setting	What it is	Choices / default	Notes
Protocol	How the probe is made	HTTP, HTTPS, HTTP/2, TCP, SSL, gRPC	Match it to your app. HTTP(S) checks can assert a path and an expected response.
Port	Where to probe	A fixed port, or use serving port, or a named port	A dedicated health-check port/path that checks dependencies (DB, cache) gives a truer signal than a static page.
Request path	The URL to hit (HTTP[S])	default `/`	Use a real `/healthz` that returns 200 only when the instance can actually serve.
Check interval	Seconds between probes	default 5s	Lower = faster detection, more probe traffic.
Timeout	How long to wait for a reply	default 5s	Must be ≤ interval.
Healthy threshold	Consecutive passes to mark healthy	default 2
Unhealthy threshold	Consecutive fails to mark unhealthy	default 2	Higher avoids flapping on a transient blip.

Two operational facts cause almost every “all my backends are unhealthy” support ticket:

You must allow the probe source IPs in your firewall. Health-check probes come from fixed Google ranges, not from your clients. For most modern load balancers (global ALB, proxy LBs, internal LBs) the probes — and the proxied data plane — originate from 130.211.0.0/22 and 35.191.0.0/16. Add an ingress allow rule for those ranges to your backend port or the load balancer reports everything down even though the app is fine. (The legacy/passthrough NLB health checks also use 35.191.0.0/16 and 209.85.152.0/22/209.85.204.0/22.)
A health check is a load-balancing health check, not the same thing as an MIG autohealing health check. The load-balancing one removes a sick backend from traffic; an autohealing health check on the managed instance group recreates the VM. You usually want both, and you usually want the autohealing one to be more lenient so a brief load-balancer blip does not trigger a full VM rebuild.

There is also a centralised vs distributed distinction for internal/regional Envoy-based load balancers: traditional health checks probe from Google’s central infrastructure, while distributed Envoy health checks probe from the Envoy proxies themselves — relevant at very large scale, but the central model is the default and is correct for most workloads.

Balancing mode and capacity: how traffic is spread

When a backend service has more than one backend, the balancing mode decides how a new request is assigned and, crucially, defines when a backend is considered “full” so traffic spills to the next region (for global LBs) or the next backend.

Balancing mode	“Full” is measured by	Available on	Typical use
UTILIZATION	Average CPU utilisation of the instance group	Instance-group backends	General compute backends; cap with `--max-utilization` (e.g. 0.8).
RATE	Requests per second, per instance or per group	Instance groups & some NEGs	When you know the QPS a backend can take; cap with `--max-rate` / `--max-rate-per-instance`.
CONNECTION	Number of concurrent connections	TCP/SSL & passthrough backends	L4 load balancers where connections, not requests, are the unit.

The companion levers:

--capacity-scaler (0.0–1.0) is a multiplier on the configured capacity, letting you drain a backend gradually (set it toward 0 to bleed traffic away before maintenance) without removing it.
--max-utilization / --max-rate* / --max-connections* set the ceiling that, once reached, makes the global load balancer overflow to the next-closest region — this is how a global ALB does graceful regional overflow and failover.
Connection draining (--connection-draining-timeout) lets in-flight requests finish when a backend is removed or scaled down, instead of being cut off.

For a global external ALB the practical pattern is: backends in two or more regions, each with a balancing mode and a sensible ceiling, so that normal traffic is served from the nearest region and a regional failure (or saturation) automatically overflows to the next — no DNS changes, no manual failover.

Session affinity: pinning a client to a backend

By default a load balancer treats every request independently and may send consecutive requests from the same user to different backends. Session affinity (“sticky sessions”) instead pins a client to the same backend, which matters for applications that keep per-user state in memory. It is configured on the backend service.

Affinity type	Pins on	Layer	Notes
NONE	nothing (default)	any	Best distribution; use stateless backends + external session store.
CLIENT_IP	client IP (and protocol/port variants)	L4 / L7	Coarse: clients behind one NAT share a backend; breaks if client IP changes.
GENERATED_COOKIE	a cookie the LB issues (`GCLB`)	L7 only	Most precise for web apps; survives client-IP changes.
HEADER_FIELD	a named HTTP header	L7 only	Affinity keyed on, e.g., a tenant header.
HTTP_COOKIE	a cookie you name	L7 only	Like generated cookie but you control the name/TTL/path.

The architectural caveat worth saying out loud: session affinity is a performance optimisation, not a correctness guarantee. Affinity can break when a backend becomes unhealthy, when capacity is exceeded, or when the backend set changes — so a robust design keeps session state in Memorystore or a database and treats stickiness as a nice-to-have, not a load-bearing assumption. Also note affinity and balancing mode can pull against each other: strong affinity can leave some backends hotter than others, undermining even distribution.

Cloud Armor: WAF and DDoS at the edge

For external Application and external proxy Network load balancers you can attach Cloud Armor, Google’s web application firewall and DDoS service, as a security policy on the backend service. Because the global external ALB terminates connections at Google’s edge, Cloud Armor inspects and filters traffic at the edge — before it ever reaches your backends or even your region.

What Cloud Armor gives you:

Always-on volumetric DDoS protection for L3/L4 attacks against the load balancer’s anycast IP (this baseline is automatic for global external LBs).
WAF rules including pre-configured rule sets based on the OWASP ModSecurity Core Rule Set (SQL injection, XSS, LFI/RFI, etc.), tunable by sensitivity.
Custom rules in CEL-based rules language matching on IP/CIDR, geography (country), headers, cookies, paths, and more — to allow, deny (with a status code), throttle, or rate-limit.
Rate limiting & throttling (e.g. N requests per minute per client) and ban actions for abusive clients.
Adaptive Protection, which uses machine learning to detect and propose mitigations for L7 DDoS attacks automatically.
Edge security policies and bot management (reCAPTCHA integration) on the global external ALB.

# A minimal Cloud Armor policy: deny one country, rate-limit the rest, attach it.
gcloud compute security-policies create web-armor --description="edge WAF"

gcloud compute security-policies rules create 1000 \
  --security-policy=web-armor \
  --expression="origin.region_code == 'XX'" \
  --action=deny-403

gcloud compute backend-services update web-bes \
  --security-policy=web-armor --global

The mental model: Cloud Armor attaches to the backend service, like Cloud CDN does, so policy is per-backend, not per-frontend — you can apply a strict WAF to your /admin backend and a looser one to static content. It is only available where there is an edge proxy to enforce it, i.e. external ALBs and external proxy NLBs, not the passthrough NLBs.

Serverless NEGs: load-balancing Cloud Run, Functions and App Engine

A load balancer does not only point at VMs. A network endpoint group (NEG) is a backend that is a set of endpoints rather than an instance group, and one of its most useful forms is the serverless NEG, which points the load balancer at a Cloud Run service, a Cloud Functions function, or an App Engine app. This is the supported way to put a custom domain, Cloud CDN, Cloud Armor, or path-based routing in front of serverless — capabilities the bare *.run.app URL does not give you.

NEG types worth knowing (this is exam fodder):

NEG type	Endpoints are	Used by	Example
Zonal NEG (`GCE_VM_IP_PORT`)	IP:port of VMs/containers in a zone	ALB / proxy NLB	Fine-grained backends, GKE container-native LB
Serverless NEG	a Cloud Run / Functions / App Engine service	external & internal ALB	Custom domain + Cloud Armor in front of Cloud Run
Internet NEG (`INTERNET_FQDN_PORT` / `INTERNET_IP_PORT`)	an external FQDN or IP	global external ALB	Front an on-prem or third-party origin behind GCP CDN/Armor
Hybrid connectivity NEG (`NON_GCP_PRIVATE_IP_PORT`)	private IP:port reachable via VPN/Interconnect	ALB	Route to on-prem or another cloud over hybrid links
Private Service Connect NEG	a published PSC service	ALB	Reach a Google or partner service via PSC

# Serverless NEG → Cloud Run service "api", wired into a global external ALB.
gcloud compute network-endpoint-groups create api-neg \
  --region=us-central1 \
  --network-endpoint-type=serverless \
  --cloud-run-service=api

gcloud compute backend-services create api-bes --global   # no health check needed for serverless
gcloud compute backend-services add-backend api-bes \
  --global --network-endpoint-group=api-neg \
  --network-endpoint-group-region=us-central1
# then reference api-bes from the URL map as the route for /api/*

Two gotchas: serverless NEGs do not use health checks (the serverless platform manages availability), and a serverless NEG is regional — to serve a Cloud Run service globally you add a serverless NEG per region to one global backend service. The internet and hybrid NEGs are how the same global front door — with its anycast IP, CDN, and Cloud Armor — can sit in front of workloads that are not even on GCP.

Google Cloud Load Balancing family

The diagram lays out the family along the two axes — Application (L7) versus Network (L4), external versus internal, global versus regional — and shows the shared resource chain (forwarding rule → target proxy → URL map → backend service → backend/NEG) that every member is assembled from.

Hands-on lab: build a global external Application LB over a managed instance group

This lab builds the flagship — a global external ALB serving a simple web app from a managed instance group — using only the GCP Free Tier and $300 credit. You will create the backend, the full resource chain, validate that traffic flows, and tear it all down.

Prerequisites: a project with billing enabled, the Compute Engine API enabled, and Cloud Shell (which has gcloud pre-installed and authenticated). Set defaults:

gcloud config set project YOUR_PROJECT_ID
gcloud config set compute/region us-central1
gcloud config set compute/zone us-central1-a
gcloud services enable compute.googleapis.com

Step 1 — a backend that serves something. Create an instance template whose VMs run a tiny web server on port 80 and identify themselves, then a managed instance group of two.

gcloud compute instance-templates create web-tmpl \
  --machine-type=e2-small \
  --image-family=debian-12 --image-project=debian-cloud \
  --tags=lb-backend \
  --metadata=startup-script='#! /bin/bash
    apt-get update && apt-get install -y nginx
    HOST=$(hostname)
    echo "Served by ${HOST}" > /var/www/html/index.html
    echo OK > /var/www/html/healthz'

gcloud compute instance-groups managed create web-mig \
  --template=web-tmpl --size=2 --region=us-central1
gcloud compute instance-groups set-named-ports web-mig \
  --named-ports=http:80 --region=us-central1

Step 2 — allow health-check and proxy traffic. Without this, every backend shows UNHEALTHY.

gcloud compute firewall-rules create allow-lb-health \
  --network=default --direction=INGRESS --action=ALLOW \
  --rules=tcp:80 \
  --source-ranges=130.211.0.0/22,35.191.0.0/16 \
  --target-tags=lb-backend

Step 3 — the resource chain. Health check → backend service → URL map → proxy → forwarding rule, all --global.

gcloud compute health-checks create http web-hc \
  --port=80 --request-path=/healthz --global

gcloud compute backend-services create web-bes \
  --protocol=HTTP --port-name=http --health-checks=web-hc --global
gcloud compute backend-services add-backend web-bes \
  --instance-group=web-mig --instance-group-region=us-central1 \
  --balancing-mode=UTILIZATION --max-utilization=0.8 --global

gcloud compute url-maps create web-map --default-service=web-bes
gcloud compute target-http-proxies create web-proxy --url-map=web-map

gcloud compute addresses create web-ip --ip-version=IPV4 --global
gcloud compute forwarding-rules create web-fr \
  --address=web-ip --target-http-proxy=web-proxy --ports=80 --global

Step 4 — validate. Find the IP, wait for health, then curl it a few times.

gcloud compute addresses describe web-ip --global --format='value(address)'
# Backend health (wait until HEALTHY — can take a few minutes):
gcloud compute backend-services get-health web-bes --global

IP=$(gcloud compute addresses describe web-ip --global --format='value(address)')
for i in 1 2 3 4; do curl -s http://$IP/; done

Expected output: get-health eventually shows both instances HEALTHY. The curl loop returns Served by web-... and, across repeated calls, you should see both instance hostnames — proof the load balancer is distributing. (The first request after the IP goes live may take a minute or two to propagate across the edge; a 404/502 immediately after creation is normal — retry.)

Cleanup — delete in reverse order of creation (front to back), or the dependencies block deletion:

gcloud compute forwarding-rules delete web-fr --global -q
gcloud compute target-http-proxies delete web-proxy -q
gcloud compute url-maps delete web-map -q
gcloud compute backend-services delete web-bes --global -q
gcloud compute health-checks delete web-hc --global -q
gcloud compute addresses delete web-ip --global -q
gcloud compute firewall-rules delete allow-lb-health -q
gcloud compute instance-groups managed delete web-mig --region=us-central1 -q
gcloud compute instance-templates delete web-tmpl -q

Cost note: the global external ALB has a small hourly charge for the forwarding rule plus a per-GB data-processing charge, and the two e2-small VMs cost a few cents per hour. Running this lab for an hour costs well under a dollar and fits comfortably inside the $300 free credit — but the forwarding rule and the VMs bill while they exist, so do the cleanup. A reserved global IP that is not attached to a forwarding rule also incurs a small charge, which the cleanup releases.

Common mistakes & troubleshooting

Symptom	Likely cause	Fix
All backends report `UNHEALTHY`, app works when you SSH in	Firewall does not allow the health-check ranges	Add ingress allow for `130.211.0.0/22` and `35.191.0.0/16` (plus `209.85.152.0/22`, `209.85.204.0/22` for passthrough NLB) to the backend port
Accidentally built a regional LB; no global anycast	A regional forwarding rule / backend service slipped into the chain	Keep `--global` consistent across every resource; recreate the mismatched ones globally
Backend logs show Google IPs, not real client IPs	It is a proxy LB (ALB / proxy NLB) — client IP is hidden	Read `X-Forwarded-For` (ALB) or enable PROXY protocol (proxy NLB); or use a passthrough NLB if you must have the raw source IP
`502 Bad Gateway` from a healthy-looking app	Backend timeout exceeded, or app closed the keepalive before the LB’s timeout	Tune the backend-service `--timeout`; ensure the app’s keepalive ≥ the LB’s
HTTPS certificate “PROVISIONING” forever (managed cert)	DNS for the domain does not yet point at the LB IP	Point the A/AAAA record at the forwarding-rule IP; Google-managed certs validate via DNS and only go ACTIVE once it resolves
Sticky sessions sometimes break	Affinity is best-effort; backend went unhealthy / over capacity / set changed	Treat affinity as an optimisation; store session state in Memorystore or a DB
Serverless backend won’t attach / asks for a health check	Wrong NEG type, or expecting health checks on serverless	Use `--network-endpoint-type=serverless`; serverless backends need no health check
Traffic not overflowing to a second region on overload	No capacity ceiling set, so the LB never considers a region “full”	Set `--max-utilization` / `--max-rate*` so saturation triggers graceful overflow

Best practices

Choose the load balancer deliberately from the two axes (L7 vs L4, global vs regional, external vs internal) before you touch the console — the decision table above is your checklist. Most internet web apps want the global external ALB; most service-to-service calls want the internal ALB or passthrough NLB.
Default to global + Premium Tier for internet-facing web workloads. The anycast front door, edge termination, Cloud CDN and Cloud Armor are the reasons to be on GCP; only step down to regional/Standard for deliberate cost or data-residency reasons.
Put serverless behind a load balancer when you need a custom domain, CDN, WAF, or path routing — do not expose the bare *.run.app URL for production.
Always provision a real health-check endpoint (/healthz) that reflects whether the instance can truly serve (dependencies included), and allow the probe ranges in your firewall as part of the same change.
Set balancing-mode ceilings so global LBs overflow gracefully across regions instead of overloading the nearest one.
Terminate TLS at the load balancer with Google-managed certificates where possible (auto-renewing), and front it with Cloud Armor including the OWASP CRS and rate limiting.
Keep application state out of the instance; rely on affinity only as an optimisation so the loss of a backend never loses a user’s session.
Enable load-balancer logging (and Cloud Armor logging) so you can debug 502s, latency and blocked requests after the fact.

Security notes

Cloud Armor is the edge of your security perimeter for external L7/proxy LBs. Attach a security policy with the pre-configured OWASP rule set, geo/IP rules, and rate limiting; turn on Adaptive Protection for automated L7 DDoS defence.
Use the load balancer, not public VM IPs. Give backends private IPs only and let the load balancer be the single public entry point — fewer attack surfaces, central WAF and logging. The same firewall rule that admits health checks should not admit the whole internet to the backend port.
Terminate TLS centrally and enforce a modern SSL policy (minimum TLS 1.2, strong cipher profile) on the target proxy; redirect HTTP→HTTPS at the URL map.
Internal load balancers keep east-west traffic private; combined with VPC firewall rules and (for sensitive data) VPC Service Controls, microservice traffic never touches the internet.
The internal passthrough NLB as a next hop lets you steer traffic through a fleet of network virtual appliances (next-gen firewalls/IDS) for inspection — the building block of a hub-and-spoke security architecture.
Identity-Aware Proxy (IAP) can sit on the external ALB to require Google authentication before a request reaches the backend — application-level access control without a VPN.

Interview & exam questions

What are the two questions that determine which Google Cloud load balancer to use? (a) Traffic type — Application/L7 (HTTP/S/gRPC, proxy) vs Network/L4 (TCP/UDP, passthrough or proxy); (b) scope/exposure — global vs regional, external vs internal. Those two axes uniquely identify the product.
Explain the difference between a proxy and a passthrough load balancer, and why it matters. A proxy LB terminates the client connection and opens a new one to the backend, so the backend sees Google’s IP (real client in X-Forwarded-For or via PROXY protocol) and the LB can do TLS termination, L7 routing, CDN and WAF. A passthrough LB forwards packets without terminating, so the backend sees the original client IP and replies directly — lowest overhead, no L7 features. It matters for client-IP logging, TLS handling, and which features are available.
Put the resource chain of an Application Load Balancer in order. Forwarding rule → target (HTTP/S) proxy → URL map → backend service → backend (instance group or NEG); the health check attaches to the backend service. A passthrough NLB omits the proxy and URL map.
A new global ALB shows all backends UNHEALTHY but the app responds over SSH. Why? The VPC firewall is not allowing the health-check/proxy source ranges 130.211.0.0/22 and 35.191.0.0/16 to the backend port. Add an ingress allow rule for them.
When would you choose a regional external ALB over the global one? When the audience is in one region, when you need Standard Network Tier to cut egress cost, or when data-residency rules require traffic to stay in a region — at the cost of losing the global anycast front door.
You need to load-balance a UDP game server and the backend must see the real player IP. Which LB? An external passthrough Network LB — L4, connectionless, preserves source IP, supports UDP. An ALB or proxy NLB would hide the client IP and not handle raw UDP.
What is the only load balancer that can be a next hop in a route, and why does that matter? The internal passthrough Network LB. It enables steering traffic through network virtual appliances (firewalls/IDS), the basis of hub-and-spoke inspection architectures.
How do you put a custom domain, Cloud CDN and Cloud Armor in front of a Cloud Run service? Create a serverless NEG pointing at the Cloud Run service, attach it to a backend service on a (global) external ALB, and route to it from the URL map. Serverless NEGs need no health check and are regional, so add one per region for global serving.
What does the balancing mode do, and name the three modes. It defines how requests are assigned and when a backend is “full” (triggering overflow). Modes: UTILIZATION (CPU), RATE (requests/sec), CONNECTION (concurrent connections). Pair with --max-* ceilings and --capacity-scaler.
Why is session affinity not a substitute for external session storage? Affinity is best-effort and can break when a backend becomes unhealthy, exceeds capacity, or the backend set changes — so per-user state must live in a shared store (Memorystore/DB); affinity is only an optimisation.
Which load balancers can use Cloud Armor, and where does the policy attach? External Application LBs and external proxy Network LBs (there must be an edge proxy to enforce it); the security policy attaches to the backend service, so it is per-backend like Cloud CDN.
What network tier do global load balancers require, and why? Premium Tier — global anycast and edge serving ride Google’s premium backbone; Standard Tier only supports regional load balancing.

Quick check

Which two products are proxy Network Load Balancers, and what do they do that a passthrough NLB cannot?
In the ALB resource chain, which resource owns the health check and the Cloud Armor policy?
Your backend service has backends in two regions but never overflows when one is overloaded. What did you forget to configure?
What NEG type fronts an on-prem origin behind GCP’s CDN and Cloud Armor?
True or false: a Google-managed SSL certificate becomes ACTIVE before you point DNS at the load balancer IP.

Answers

The external and internal proxy Network LBs. They terminate the TCP/SSL connection (enabling TLS offload and, for the external one, a global anycast TCP front end), whereas a passthrough NLB never terminates and so preserves the client IP but offers no offload or L7 features.
The backend service owns both the health check and the Cloud Armor security policy (as well as balancing mode, session affinity, timeouts, and Cloud CDN).
A balancing-mode capacity ceiling (--max-utilization, --max-rate*, or --max-connections*). Without a ceiling the LB never marks a region “full”, so it never overflows to the other region.
An internet NEG (INTERNET_FQDN_PORT or INTERNET_IP_PORT) attached to a global external ALB.
False. A Google-managed cert stays in PROVISIONING until the domain’s DNS resolves to the forwarding-rule IP; only then does it validate and go ACTIVE.

Exercise

Take a two-tier application: a public web front end and a private internal API the front end calls. Using the decision table, write down (a) which load balancer fronts the public web tier and why, including the network tier; (b) which load balancer the front end uses to reach the internal API and why; © the full resource chain you would create for the public LB; (d) where you would attach Cloud Armor and one rule you would add; and (e) if the API were re-platformed onto Cloud Run, exactly what changes in the internal LB’s backend (name the NEG type). Then sketch the gcloud commands for part © from memory and check them against the lab above.

Certification mapping

Associate Cloud Engineer (ACE): “Set up load balancing” — choosing and configuring the right load balancer, backend services, instance-group and serverless backends, and health checks; understanding global vs regional and external vs internal.
Professional Cloud Network Engineer (PCNE): the load-balancing domain in full — the entire LB family and selection criteria, the forwarding-rule-to-backend architecture, balancing modes and capacity, session affinity, Cloud Armor, hybrid/internet/PSC/serverless NEGs, and network tiers. This lesson plus its companion (gcp-global-external-application-load-balancer-deep-dive) cover the core of that domain.
Also relevant to Professional Cloud Architect (PCA) for designing resilient, globally distributed front ends.

Glossary

Application Load Balancer (ALB): an L7, HTTP(S)/gRPC, proxy load balancer that terminates connections and routes on host/path/headers.
Network Load Balancer (NLB): an L4 (TCP/UDP) load balancer; either passthrough (preserves client IP, no termination) or proxy (terminates TCP/SSL).
Proxy vs passthrough: a proxy terminates the connection (client IP hidden, L7/TLS features); a passthrough forwards packets (client IP preserved, lowest overhead).
Forwarding rule: the load balancer’s front end — the IP:port:protocol clients connect to; points at a target proxy or backend service.
Target proxy: terminates the connection for proxy LBs; holds the SSL certificate and SSL policy; references the URL map (ALB) or backend service (proxy NLB).
URL map: the L7 router that matches host/path/header/query and selects a backend service; also handles redirects and rewrites.
Backend service: the policy hub — health check, balancing mode, session affinity, timeouts, Cloud CDN, Cloud Armor, logging; groups the backends.
Backend: the actual endpoints — a managed instance group (MIG) or a network endpoint group (NEG), or a Cloud Storage bucket.
Network endpoint group (NEG): a backend made of endpoints rather than instances — zonal, serverless, internet, hybrid, or PSC.
Health check: the probe that marks backends healthy/unhealthy; attached to the backend service; must be allowed through the firewall.
Balancing mode: how traffic is assigned and when a backend is “full” — UTILIZATION, RATE, or CONNECTION.
Session affinity: best-effort pinning of a client to a backend (CLIENT_IP, GENERATED_COOKIE, HEADER_FIELD, HTTP_COOKIE, or NONE).
Anycast IP: one IP address announced from many edge locations so users hit the nearest one; the global external LB’s front door.
Cloud Armor: Google’s WAF/DDoS service, attached as a security policy on the backend service of external L7/proxy LBs.
Network Tier: Premium (global, Google backbone — required for global LBs) vs Standard (regional, cheaper egress).
Maglev / Envoy: Google’s data planes — Maglev powers passthrough NLBs (connectionless, source-IP preserving); Envoy powers the regional/internal Application and proxy LBs.

Next steps

You can now name and assemble any Google Cloud load balancer and know which one each workload needs. To turn the flagship into a production front end — every forwarding-rule, URL-map, balancing-mode, hybrid-NEG, Cloud CDN, Cloud Armor and mTLS knob, wired end to end — read Engineering the Global External Application Load Balancer on GCP (gcp-global-external-application-load-balancer-deep-dive). After that, the course moves into containers with Google Kubernetes Engine, In Depth: Autopilot vs Standard, Node Pools, Networking & Security (gke-deep-dive-autopilot-standard-node-pools-networking), where the GKE Gateway and Ingress controllers provision the very load balancers you have just learned, driven by Kubernetes manifests.