A global web application looks deceptively simple from the browser: a URL, a fast page, a checkout button that works. Behind that calm surface is one of the harder distributed-systems problems in cloud architecture — serving users on five continents from a single logical application, keeping the data consistent enough to take money, and doing it without a global outage every time a region wobbles. This article is a complete, reusable GCP reference architecture for exactly that, built on the Global External Application Load Balancer, Cloud Run (with a GKE variant), Cloud Spanner, and Cloud CDN. It is written to scale down to a Series-B startup and up to a publicly listed enterprise without changing shape.
The business scenario
Picture a company that sells something online to a worldwide audience — a SaaS dashboard, a media subscription, a retail storefront, a booking platform. The specifics vary; the pressures do not. Three forces consistently push teams toward this exact architecture.
The first is latency tied directly to revenue. Every study an e-commerce or media team has ever run says the same thing: page latency and conversion move together. A customer in Singapore or São Paulo who waits 1.8 seconds for first contentful paint converts measurably worse than one served in 400 ms. When your application servers and database live in us-central1 and a third of your revenue comes from outside North America, you are quietly taxing your best growth markets with the speed of light. The round trip from Sydney to Iowa is roughly 160 ms one way before your code does anything.
The second is the operational cost of regional sharding. The traditional answer — stand up a full stack per region and shard customers geographically — works until the day a European customer travels to the US, or a B2B account has offices on three continents, or compliance asks “show me this user’s data” and the answer is “which shard were they in last March?” Cross-shard transactions, dual writes, and bespoke reconciliation jobs become a permanent tax on every feature the team ships. Most engineering orgs underestimate this cost by an order of magnitude.
The third is the demand for a single global brand experience that never fully goes down. Modern customers do not accept “the site is down for maintenance in your region.” A payments page that 500s during a regional GCP incident is an incident on the company’s revenue and its reputation. Boards now ask about RTO and RPO for the customer-facing tier the way they used to ask about it only for the back-office ERP.
This architecture solves all three at once. A single anycast IP fronts the application worldwide, so the closest Google edge terminates the connection. Stateless services run in multiple regions and scale to zero when idle, so a small company is not paying for a global footprint it is not using. And Cloud Spanner provides one logical, strongly consistent, horizontally scalable database spanning regions — eliminating sharding logic entirely while surviving the loss of an entire region with zero data loss (RPO = 0). The same blueprint serves a 5,000-user beta and a 50-million-user platform; only the instance sizing and region count change.
Architecture overview
Follow a single request from a user in Frankfurt and the design explains itself.
The browser resolves the application’s hostname to a single global anycast IP address — one address advertised from every Google point of presence on Earth. Because it is anycast, the network routes the user to the nearest Google edge (Frankfurt, in this case), not to wherever the servers happen to live. TLS is terminated at that edge by the Global External Application Load Balancer, a single global resource (not one per region) with a Google-managed certificate covering the apex and wildcard hostnames.
At the edge, the request meets two gatekeepers before it ever touches compute. Cloud Armor evaluates the request against WAF rules (the OWASP-tuned preconfigured rule sets), per-IP and per-token rate limits, and optional geo or bot-management policies. Immediately after, the load balancer consults its URL map. Static and cacheable paths — /_next/static/*, images, JS/CSS bundles, public marketing pages — are served by Cloud CDN directly from the edge cache, so the vast majority of byte volume never travels to a region at all. A cache hit in Frankfurt is answered in Frankfurt.
Dynamic paths — /api/*, the authenticated app shell, the checkout flow — are routed by the URL map to a backend service. That backend service is the load balancer’s most important decision: it is fronted by a serverless network endpoint group (NEG) per region, pointing at Cloud Run services deployed in, say, europe-west1, us-central1, and asia-southeast1. The global load balancer steers each request to the closest healthy region with capacity, automatically failing over to the next-nearest region if one is unhealthy or saturated. The Frankfurt user lands on Cloud Run in europe-west1 — same continent, single-digit-millisecond regional hops.
The Cloud Run service runs the application’s stateless business logic. It reads configuration and secrets from Secret Manager, calls downstream services over the VPC via Serverless VPC Access or Direct VPC egress, and — for the request that matters — reads and writes the Cloud Spanner database. Spanner is the architectural keystone: a multi-region instance (for example, eur6 or nam-eur-asia1) presents itself as one database with one schema and one connection endpoint, while physically replicating synchronously across regions using Paxos. A write to a customer’s order in europe-west1 is committed by a quorum of replicas across regions before the API returns success. The Frankfurt user’s order is durable across a continental failure the instant they see the confirmation.
So the end-to-end flow is: anycast DNS → nearest Google edge → TLS termination → Cloud Armor → URL map → (Cloud CDN cache hit) OR (regional Cloud Run via serverless NEG) → Spanner multi-region. Asynchronous work — emails, search indexing, analytics, fan-out — is published to Pub/Sub and processed by separate Cloud Run jobs or workers, keeping the synchronous request path short. Telemetry from every hop flows into Cloud Logging, Cloud Monitoring, and Cloud Trace.
The diagram, described in words: at the top, many globe-distributed users; below them a single anycast VIP feeding one global front end (Cloud Armor + Cloud CDN + URL map). From the URL map, one arrow goes left to the CDN/edge cache for static content, and one goes down to a backend service that fans out to three regional Cloud Run boxes. All three regional boxes point downward to a single Spanner cylinder drawn straddling all three regions to signify one logical database. Off to the side, Pub/Sub and worker jobs hang off the Cloud Run tier, and a monitoring plane underlays everything.
Component breakdown
| Component | Role in this architecture | Key configuration choices |
|---|---|---|
| Global External Application Load Balancer | The single global entry point; anycast IP, L7 routing, TLS termination, cross-region failover. | One global backend service (not regional). Google-managed cert. HTTP/3 (QUIC) enabled. EXTERNAL_MANAGED scheme. Outlier detection + health checks for automatic regional drain. |
| Cloud CDN | Edge caching of static and cacheable dynamic content; absorbs the bulk of traffic at the PoP. | Enabled on the static backend; CACHE_ALL_STATIC or custom cache keys. Negative caching, stale-while-revalidate, and cache-key normalization to strip tracking query params. Signed URLs/cookies for private media. |
| Cloud Armor | WAF and L7 DDoS defense at the edge, before compute. | Preconfigured OWASP rules (SQLi, XSS, LFI/RFI) in preview→enforce. Per-IP rate-based bans. Adaptive Protection for volumetric anomalies. Optional bot management and geo rules. |
| Cloud Run | Stateless, autoscaling, scale-to-zero compute for the app and API tier; deployed in multiple regions. | One service per region behind the global LB via serverless NEGs. min-instances for the latency-sensitive region(s), max-instances as a cost ceiling. Concurrency tuned (e.g. 80) to the workload. Direct VPC egress. |
| GKE (Autopilot) — variant | Drop-in replacement for Cloud Run when you need sidecars, gRPC streaming, stateful workloads, or a service mesh. | Regional Autopilot clusters as container-native (NEG) backends of the same global LB. Anthos Service Mesh for mTLS. Multi Cluster Ingress / gateway for unified routing. |
| Cloud Spanner | The single global, strongly consistent, horizontally scalable relational database — eliminates sharding. | Multi-region config (e.g. eur6, nam-eur-asia1). Autoscaler on processing units. Interleaved tables + well-distributed PKs (UUID/hash prefix) to avoid hotspots. staleness reads for read-heavy paths. |
| Secret Manager | Central store for DB credentials, API keys, signing keys. | Auto-replication or region-pinned. Accessed via the service’s runtime service account; rotation enabled. No secrets in env vars or images. |
| Pub/Sub | Decouples async work (email, indexing, webhooks, analytics) from the request path. | Global by default; push to Cloud Run workers or pull. Dead-letter topics + exactly-once delivery where needed. |
| Cloud DNS | Authoritative DNS publishing the single anycast A/AAAA record. | A/AAAA → global LB IP. DNSSEC on. Short TTLs only if you need fast failover to a secondary front end. |
| Operations suite | Logging, Monitoring, Trace, Error Reporting, Profiler — the observability plane. | SLO-based alerting on the LB and Spanner. Trace context propagated edge→Run→Spanner. Log-based metrics for business KPIs. |
Two component choices deserve emphasis because they are where teams most often go wrong.
The backend service is global, singular, and the seam of the whole design. It is tempting to create a load balancer per region and stitch them with DNS. Do not. A single global backend service with regional serverless NEGs is what gives you anycast entry, automatic capacity-aware steering, and instant cross-region failover with no DNS TTL to wait out. The intelligence lives in one object.
Spanner’s schema is a performance decision, not just a data-modeling one. Spanner scales by splitting data across servers on primary-key ranges. A monotonically increasing key (a timestamp or auto-increment ID) funnels all new writes to one split — the classic hotspot that makes a benchmark look terrible and gets blamed on “Spanner being slow.” Use UUIDv4, a hashed prefix, or bit-reversed sequences for high-write tables, and interleave child rows (order line items under an order) so related data is co-located and joins stay local. Get this right on day one; it is painful to change once you have data.
Implementation guidance
Infrastructure as code. Provision everything with Terraform using the Google provider; nothing here should be click-ops. The dependency order that keeps terraform apply clean is:
- Foundation — project, VPC, subnets per region, Cloud NAT, firewall rules, and the org/IAM scaffolding. Many teams use the Cloud Foundation Toolkit (
terraform-google-modules) for this layer so it matches Google’s security baseline. - Data — the Spanner instance and database. Apply schema with the native Terraform
google_spanner_databaseddllist, or keep DDL under a migration tool (Liquibase has a Spanner extension;wrenchis the lightweight Spanner-native option). Treat schema changes as versioned migrations in CI. - Compute — Cloud Run services per region (
google_cloud_run_v2_service), each with its runtime service account, Direct VPC egress, andmin/maxinstances. For the GKE variant, regional Autopilot clusters and their NEG-backed services. - Edge — the serverless NEGs (
google_compute_region_network_endpoint_group), the global backend service binding all regional NEGs, the URL map, the CDN-enabled static backend bucket/service, the Cloud Armor security policy, the managed certificate, the target HTTPS proxy, and the global forwarding rule. - DNS & secrets — the Cloud DNS records pointing at the global IP, and Secret Manager entries (values injected out-of-band, never committed).
Keep state remote in a GCS backend with object versioning and state locking, and split the layers into separate state files (or Terragrunt stacks) so an edge change cannot accidentally destroy the Spanner instance. The reverse of the create order is your safe destroy order. (If your org standardizes elsewhere, the same topology is expressible in Pulumi or Config Connector; Deployment Manager is legacy and not recommended for new builds.)
Networking wiring. Cloud Run reaches private resources through Direct VPC egress (preferred over the older Serverless VPC Access connector — lower latency, no connector instances to size). Egress to the internet for third-party APIs goes through Cloud NAT so you present stable, allowlistable IPs. Spanner is reached over Google’s private network via its API endpoint; lock it down with VPC Service Controls so the database cannot be exfiltrated to a project outside your perimeter even if a credential leaks. Keep the entire backend free of public ingress — the only public surface is the global load balancer’s VIP, and Cloud Run services are set to allow ingress from “internal and Cloud Load Balancing” only.
Identity wiring. Every Cloud Run service and GKE workload runs as a dedicated, least-privilege service account — one per service, never the default compute SA. Grant roles/spanner.databaseUser (not databaseAdmin) on the specific database, and roles/secretmanager.secretAccessor on the specific secrets. For the GKE variant, bind Kubernetes service accounts to Google service accounts with Workload Identity Federation so no JSON keys ever exist. End-user authentication is handled in the app tier — typically Identity Platform (the productized Firebase Auth) or your existing OIDC IdP — and the front door can additionally enforce Identity-Aware Proxy (IAP) on internal/admin paths for a Zero-Trust, identity-aware perimeter. Human and pipeline access to deploy uses Workload Identity Federation from your CI (GitHub Actions/GitLab) — again, no long-lived keys.
Deploy and release. Build images in Cloud Build or your CI, push to Artifact Registry, and roll out with Cloud Deploy across the regions. Cloud Run’s revision-based traffic splitting gives you canary and blue-green for free: send 5% to the new revision, watch SLOs, then ramp. Because the database is a single Spanner instance shared by all regions and revisions, schema migrations must be backward-compatible (expand-then-contract): add columns/tables, deploy code that tolerates both shapes, backfill, then remove the old shape in a later release. Never ship a breaking DDL in lockstep with code across regions.
Enterprise considerations
Security and Zero Trust. The design is defense-in-depth with a single hardened ingress. Cloud Armor provides WAF and DDoS at the edge; the global LB terminates TLS with modern ciphers and HTTP/3; Cloud Run/GKE accept traffic only from the load balancer. Inside the perimeter, VPC Service Controls draw a boundary around Spanner, Secret Manager, and GCS so data cannot egress to an untrusted project. Service-to-service calls are authenticated with service-account identity (and mTLS via Anthos Service Mesh in the GKE variant). Customer data in Spanner is encrypted at rest (optionally with CMEK in Cloud KMS for regulated workloads), and IAP plus context-aware access enforce identity- and device-based access on administrative surfaces — the core Zero-Trust posture: never trust the network, always verify identity.
Cost optimization. This architecture is economical because it is demand-shaped. Cloud Run scales to zero, so non-production environments and low-traffic regions cost almost nothing when idle; you pay per request and per 100 ms of CPU. Cloud CDN offloads the majority of bytes to the edge, cutting both egress and compute. The two line items to watch are Spanner and inter-region egress. Spanner multi-region is the floor cost of the design — start at a small node/PU count and turn on the Spanner autoscaler to track load; use committed-use discounts once your baseline is known. Trim egress with aggressive CDN caching and by keeping chatty service-to-service traffic in-region. A practical rule: start single-region Spanner + multi-region Cloud Run if your RPO tolerance allows, and graduate to multi-region Spanner only when zero-RPO survivability justifies the premium.
Scalability. Each tier scales independently and horizontally. The global LB is effectively unbounded. Cloud Run scales out per region to its max-instances ceiling; raise the ceiling and add regions to grow. Spanner scales by adding processing units with no downtime and no re-sharding — this is its signature property and the reason it anchors the design. Push read-heavy workloads onto stale reads (e.g. 10–15 s staleness) to serve from the nearest replica without a leader round-trip, dramatically increasing read throughput and lowering latency for feeds, catalogs, and dashboards.
Reliability and DR (RTO/RPO). Multi-region Spanner offers a 99.999% availability SLA and, critically, RPO = 0: synchronous Paxos replication means a committed write survives the total loss of any single region with no data loss. RTO is effectively zero for a regional failure — Spanner transparently elects a new leader, and the global LB drains the failed region’s Cloud Run within health-check intervals (seconds), so traffic reroutes automatically without human action. There is no failover runbook to execute for the common case. For correctness and corruption recovery, enable Spanner backups plus point-in-time recovery (PITR) (up to 7 days) to recover from a bad deploy or logical error, and run regular DR game-days that kill a region in staging to validate the automatic behavior is real.
Observability. Instrument the full path. The LB exports request/latency/error metrics per backend and per region; Spanner exposes CPU utilization, latency percentiles, and lock/abort stats; Cloud Run exposes request, instance-count, and cold-start metrics. Define SLOs in Cloud Monitoring (e.g. 99.9% of API requests < 300 ms, served from the user’s region) and alert on burn rate, not raw thresholds. Propagate trace context from edge through Cloud Run to Spanner with Cloud Trace / OpenTelemetry so a slow checkout can be attributed to a specific span — network, cold start, or a hot Spanner key. Use Error Reporting and Profiler to close the loop in production.
Governance. Enforce the perimeter with Organization Policy constraints (block public IPs on VMs, restrict resource locations to approved regions for data residency, require CMEK where mandated). Use folders and projects to separate prod/non-prod and to scope IAM blast radius. Centralize findings in Security Command Center. Tag resources with labels for cost allocation and FinOps reporting, and gate every infra change through PR review on the Terraform repo with policy-as-code (OPA/gcloud policy validator) in CI.
Reference enterprise example
Meridian Threads is a fictional direct-to-consumer apparel brand. Founded in Bengaluru, it now sells in 38 countries with three demand centers: India/SE Asia, Western Europe, and North America. At Series C, the engineering team of 22 hit a wall: their single-region stack in asia-south1 gave a great experience in Mumbai and a sluggish one everywhere else, and a 40-minute regional networking incident during a flash sale cost them an estimated ₹1.1 crore (~$130k) in abandoned carts. The board mandated a “no global-blast-radius” customer tier with a defined RTO/RPO. They adopted this architecture.
What they built. Cloud Run services in asia-southeast1, europe-west1, and us-central1, all behind one Global External Application Load Balancer on a single anycast IP. Cloud CDN fronts the storefront — product images, the Next.js static bundle, and category pages — and absorbs ~88% of total bytes at the edge. Cloud Armor runs OWASP rules plus a 600-requests-per-minute per-IP rate-ban that quietly defeats the credential-stuffing they used to fight manually. The catalog, cart, orders, and inventory live in a single Spanner multi-region instance (nam-eur-asia1), starting at a modest processing-unit count with the autoscaler enabled. Order-confirmation emails, search re-indexing, and the data-warehouse feed go through Pub/Sub to Cloud Run workers, keeping checkout latency clean.
Decisions and trade-offs they made.
- They debated regional sharding to save money and rejected it: a meaningful slice of orders involve customers who travel or have addresses in multiple regions, and the sharding logic would have slowed every future feature. One global Spanner database removed the question entirely.
- They set
min-instances=2only inasia-southeast1andeurope-west1(their traffic centers) to kill cold starts where it matters, and leftus-central1scaling from zero off-peak — a deliberate latency-vs-cost trade. - They used 15-second stale reads for the product catalog and bestseller feeds (read-heavy, tolerant of slight staleness) while keeping cart and checkout on strong reads. This roughly tripled catalog read throughput per processing unit and cut catalog latency for distant users.
- They put CMEK on Spanner because they store customer addresses and partial order history, satisfying their EU customers’ data-handling expectations, and pinned non-EU PII processing to approved regions via Org Policy.
The outcome. Median first-contentful-paint for European and SE-Asian shoppers dropped from ~1.9 s to ~520 ms, and checkout p95 latency fell by more than half outside India. Six weeks after cutover, GCP took a real outage in one of their three regions during business hours; the global LB drained that region within health-check intervals and Spanner never lost a write. Customer impact: none. Pages served: every one. Orders lost: zero. The on-call engineer found out from the monitoring channel, not from customers. Conversion in the EU and US markets rose enough in the following quarter that the multi-region Spanner premium — their largest new line item — paid for itself several times over. The 22-person team operates this global footprint without a dedicated DBA, because there is no sharding to babysit and no failover runbook to rehearse for the common case.
When to use it
Use this architecture when you have a genuinely global (or fast-globalizing) user base, your data needs strong consistency (anything touching money, inventory, or identity), and the business has set an aggressive RTO/RPO for the customer-facing tier. It shines for e-commerce, global SaaS, media/subscription, marketplaces, and any “single brand, every continent” product. The scale-to-zero compute and pay-per-use edge mean a small company can adopt the shape early and grow into it, which is the whole point of a reference architecture.
Be honest about the trade-offs. The dominant one is Spanner’s cost and model. Multi-region Spanner has a real monthly floor that a hobby project cannot justify, and Spanner is a (mostly) relational system with its own idioms — interleaving, key design, the absence of some PostgreSQL niceties (though the PostgreSQL interface narrows this gap considerably). If your data is naturally a single-region workload, or you can tolerate an RPO measured in seconds, a regional Cloud SQL (with cross-region read replicas and a documented failover) is far cheaper and simpler — use it and revisit Spanner when global write-survivability actually becomes a requirement. Likewise, if your application is read-mostly with no transactional writes, you may not need Spanner at all; a CDN over a regional database, or AlloyDB for Postgres-heavy analytical-transactional workloads, can be the better fit.
Anti-patterns to avoid. Do not build a load balancer per region and glue them with DNS — you lose anycast entry and instant failover, and you inherit DNS TTL as your RTO floor. Do not put a monotonic primary key on a high-write Spanner table; you will hotspot a single split and conclude, wrongly, that the database is slow. Do not route static assets through Cloud Run — that is what Cloud CDN is for, and skipping it inflates both cost and latency. Do not ship breaking schema migrations in lockstep with multi-region code rollouts; always expand-then-contract. And do not grant services the default compute service account or databaseAdmin — least privilege per service is the baseline, not an enhancement.
Alternatives at the edges. For workloads that need rich service-mesh features, sidecars, or long-lived gRPC streams, swap Cloud Run for the GKE Autopilot variant described above — same global LB, same Spanner, different compute substrate. For a simpler, smaller global app where eventual consistency is acceptable, Firestore in multi-region mode behind the same front end is a lighter-weight data tier. And if you are multi-cloud or portability-constrained, the compute and edge tiers map cleanly to other providers — but Spanner’s globally consistent, horizontally scalable write capability is the piece that is genuinely hard to replicate, and it is usually the reason teams choose GCP for this pattern in the first place.