Architecture Design Patterns

The Google Cloud Architecting Ladder: From a Static Site to Multi-Region Global

The most expensive mistake I see in Google Cloud architecture reviews is not under-engineering. It is over-engineering — a four-person team running a global, active-active estate on Cloud Spanner and multi-region Cloud Run to serve an internal tool that a few hundred staff touch during office hours, haemorrhaging cost and operational toil it has no capacity to sustain. The second most common mistake is its mirror image: a single Compute Engine VM in one zone quietly running the checkout flow for a business that loses serious money per hour when it falls over. Both teams skipped the only question that matters in architecture: what do the requirements actually demand, and what is the cheapest design that meets them with margin to spare?

Architecture on Google Cloud is not a catalogue of impressive services. It is the disciplined act of climbing exactly as high as the requirements force you, and not one rung higher. So this lesson teaches architecture as a ladder — six designs for the same application, starting from a static site that costs almost nothing and ending at a global active-active system engineered to transact through the loss of an entire Google Cloud region. Each rung adds a specific, named capability (durable global edge delivery, a contractual SLA, event-driven elasticity, independent team scaling, regional disaster recovery, regional failure tolerance with zero data loss) and each addition has a price in money, complexity, and operational burden. The skill this builds is reading a set of requirements — RTO, RPO, scale, budget, team shape, compliance — and landing on the right rung.

We will use the lens of the Google Cloud Architecture Framework’s Reliability pillar throughout, because every step up this ladder is a deliberate trade between the framework’s pillars: you spend Cost-Optimisation and Operational-Excellence currency to buy Reliability and Performance. The two geographic rungs at the top land precisely on the territory of the multi-region DR & resilience reference architecture and the global web application reference architecture — so by the time you arrive there, every concept (cross-region replica, dual-region storage, the global load balancer’s single anycast front door, Spanner’s external consistency, composite SLA) will already be familiar, because you watched the requirements force it into existence.

Learning objectives

By the end of this lesson you will be able to:

Prerequisites & where this fits

This is an Advanced lesson in the Architecture & Design Mastery module. You will get the most from it if you have already met the Google Cloud Architecture Framework: Reliability pillar (SLIs/SLOs, error budgets, redundancy across failure domains, graceful degradation, capacity and quota planning), have a working mental model of high availability versus disaster recovery and RTO/RPO, and understand the three-tier web application on GCP (VPC subnets, a load balancer in front of a compute tier, Cloud SQL behind it). A passing familiarity with the core GCP compute and data services (Compute Engine, Cloud Storage, Cloud Run, GKE, Cloud SQL, Firestore, Spanner, Pub/Sub) helps, but each is reintroduced in context.

Where it fits in the ladder of learning: the Architecture-Framework pillars taught you the value system, the troubleshooting lessons taught you to keep a running system alive, and this lesson shows you how to design one — requirements in, the right rung out, across a realistic progression. It is the bridge from operating Google Cloud to architecting on it.

A note on the running example: every rung serves the same application — “ShopKart”, a product catalogue with a shopping basket and checkout. Keeping the app constant is the whole point. It isolates the one variable that actually changes between these designs: how much failure the business can tolerate, and what it will pay to tolerate less.

A word on the costs below. The figures are deliberately rough, order-of-magnitude monthly estimates for a small-to-moderate workload (a few hundred GB of data, low-millions of requests per month), at indicative pricing, to teach the shape of the cost curve — they are not quotes. Real numbers depend on region, machine type, committed-use and sustained-use discounts, and network egress. Always model your own in the Google Cloud Pricing Calculator. The lesson is in the ratios between rungs, not the absolute figures — and the biggest hidden line item as you climb is almost always inter-region and internet egress, which most first estimates forget.

How to read the requirement axes

Every rung is described against the same set of axes. Internalise these — they are the vocabulary of an architecture decision, and they are exactly what a Professional Cloud Architect case study hands you before asking for a design.

Axis What it asks Why it drives design
RTO (Recovery Time Objective) After a failure, how long until service is restored? Drives redundancy: hours allows backup & restore; minutes forces a warm standby; near-zero forces active-active.
RPO (Recovery Point Objective) How much data can you afford to lose? Drives replication: hours allows scheduled backups; near-zero forces continuous/synchronous replication or multi-region writes.
Scale Peak concurrent load and its variability (steady vs spiky)? Drives elasticity and the compute model (serverless vs managed VMs vs orchestrated containers).
Availability target (SLA) What uptime % must you promise, and is it composite? Drives zonal/regional/global redundancy; each “nine” is roughly an order of magnitude harder and dearer.
Budget Capital and run-rate ceiling? The hard constraint. Caps how high you can climb regardless of desire.
Team topology One team or many? What is their operational maturity? A microservices rung needs many autonomous teams; a small team should stay on managed/serverless.
Compliance / data residency Regulatory constraints on where data lives and how DR is proven? Can force a region choice or a provable multi-region DR design irrespective of pure availability maths.

Keep this table in your head as you read each rung. The design is always a response to a specific movement in these axes — never an aesthetic preference and never a CV-building exercise.

A quick word on Google Cloud’s geography, because the whole ladder turns on it. A region is a physical location (e.g. europe-west1) containing multiple, isolated zones — each one or more discrete data centres with independent power, cooling and networking, interconnected by low-latency links. Spreading across zones buys you data-centre-fault tolerance within a region (Rungs 2–4). Spreading across regions buys you tolerance to the loss of an entire region (Rungs 5–6). They are different failures at different price points, and confusing the two is the single most common architecture error this lesson exists to prevent.

Google Cloud has one more trick that shapes this ladder more than any other provider’s: a genuinely global control and data plane for several key services. The Global External Application Load Balancer is a single anycast IP served from Google’s edge in every region at once — there is no per-region load balancer to fail over. Cloud Storage offers multi-region and dual-region buckets that replicate for you. And Cloud Spanner offers a horizontally-scalable relational database with synchronous multi-region replication and external consistency. These global primitives are why the top of the GCP ladder can reach zero-RPO active-active more cleanly than most platforms — and why “global by default” is a temptation you must still resist when the requirement does not demand it.

Rung 1 — Static site (Cloud Storage + Cloud CDN + global HTTPS load balancer)

Scenario & requirements. ShopKart begins as an MVP — a catalogue and a marketing front, the dynamic basket bolted on later. A two-person startup needs it online to validate the idea. Traffic is a trickle and unpredictable: could be 10 visitors, could be 10,000 if a post goes viral. RTO: hours is fine (a hobbyist-grade promise). RPO: effectively the content is in version control, so “data loss” barely applies. Availability: best-effort, though as we will see this rung quietly delivers far better. Budget: as close to zero as possible. Team: two generalist developers, no operations function.

The design. A single-page application (React/Vue) or a static-site-generator build is compiled to static files and stored in a Cloud Storage bucket. A Global External Application Load Balancer sits in front with a backend bucket, and Cloud CDN is enabled on that backend so assets are cached at Google’s edge in hundreds of locations worldwide. The load balancer terminates TLS with a free Google-managed SSL certificate and serves everything over HTTPS on a single global anycast IP. Cloud DNS hosts the zone and points the apex domain at that IP. The bucket stays private, reachable only through the load balancer’s backend bucket. Deploys are a gcloud storage rsync (or gsutil rsync) plus a CDN cache invalidation, trivially wired into CI/CD with Cloud Build.

This is the static content hosting pattern in its purest form — serve static assets straight from object storage through a CDN, with no web server to run, patch, or scale.

Services.

Component Service Role
Origin store Cloud Storage (private bucket) Durable object storage for the built site (eleven nines of annual durability)
Edge / CDN Cloud CDN on the load balancer Global caching, low-latency delivery from Google’s edge
Front door Global External Application Load Balancer (backend bucket) Single anycast IP, global HTTPS ingress, TLS termination
DNS Cloud DNS Managed zone + record to the global IP
Certificate Google-managed SSL certificate Free, auto-renewing TLS certificate

Key design decisions & Architecture-Framework tradeoffs. The defining decision is no servers at all, and the principle behind it is use managed services and let the platform absorb scale. You pay essentially nothing when idle and Cloud CDN soaks up traffic spikes automatically — superb Cost Optimisation and Performance for a spiky, low-baseline, read-heavy workload. The quiet bonus is Reliability: Cloud Storage offers eleven nines of durability and a regional/multi-region availability SLA, and the global load balancer plus Cloud CDN spread delivery across Google’s edge, so this “best-effort” rung is in practice extraordinarily robust — far more available than a single server ever is. The tradeoff is that it is static: there is no server-side logic. The moment you need a real “place order” endpoint, a database write, or anything dynamic, you must add it — which is exactly Rungs 2 and 3. From the Reliability pillar’s “define your reliability based on user-experience goals” principle, this is the right call: the requirement here is cheap and good-enough, and this rung over-delivers on reliability almost for free.

Rough cost. With low traffic the standing cost is dominated by a few pence of Cloud Storage and the load-balancer forwarding rule, plus pennies of CDN cache fills. Realistically a few US dollars a month until you have meaningful traffic; the main variable as you grow is CDN/internet egress. This is the cheapest functional rung on the ladder by one to two orders of magnitude. (If you want it even cheaper for a pure-static site, Firebase Hosting is a one-command alternative that bundles the CDN; the load-balancer route shown here is the path that scales cleanly into the later rungs.)

When this is enough. Marketing sites, MVPs, documentation, JAMstack content, single-page apps that call a separate API, and any read-heavy front end. If your app is genuinely static (or static plus a third-party API), you may never need to climb off this rung — it scales globally and costs almost nothing. Stop here unless you need server-side business logic, your own database, authenticated write operations, or a backend you control.

Rung 2 — Single-region 3-tier (global LB + managed instance group + Cloud SQL HA)

Scenario & requirements. ShopKart found product-market fit. It is now a real business with paying customers, a relational data model (orders, inventory, customers with foreign-key integrity), server-side logic, and the need for consistent, predictable performance. RTO: minutes, automatic for a component failure. RPO: near-zero for committed transactions (a database failover must not lose acknowledged orders). Availability: a solid ~99.9%+ that survives the loss of one zone. Budget: modest but real — the business can fund a few hundred dollars a month. Team: a small product team, light on dedicated operations.

The design. The canonical three-tier web application, deployed across at least two zones inside one region’s VPC. A Global External Application Load Balancer distributes traffic to the app tier. The app tier runs as a regional managed instance group (MIG) of Compute Engine VMs spanning the zones, with autoscaling and autohealing driven by health checks — or, more simply, the same workload on Cloud Run (covered properly at Rung 3). The data tier is Cloud SQL (PostgreSQL/MySQL) in a high-availability configuration: a regional primary in one zone with a synchronously-replicated standby in another, and automatic failover. Cloud Storage + Cloud CDN still serve static assets; secrets live in Secret Manager; the internet-facing path is protected by Cloud Armor (the WAF/DDoS layer) attached to the load balancer. The VMs sit in private subnets and reach Google APIs through Private Google Access, with Cloud NAT for outbound internet.

This is the N-tier architecture style — layered, well-understood, the natural home for a lift-and-shift or a straightforward new build with a relational core. Crucially, multi-zone from the outset is what makes it production-grade rather than a single point of failure.

Services.

Tier Service Role
Edge / WAF Cloud Armor + Cloud CDN on the global LB DDoS/OWASP protection, caching, global ingress
Load balancing Global External Application Load Balancer Layer-7 routing, health checks, single anycast IP
App tier Regional managed instance group (2+ zones) or Cloud Run Run the application; autoscale and autoheal
Data tier Cloud SQL HA (regional) Relational store with synchronous standby + auto-failover
Secrets/config Secret Manager Externalised credentials and settings
Static Cloud Storage + Cloud CDN Images, assets
Networking VPC, Private Google Access, Cloud NAT Private subnets, private API access, outbound egress

Key design decisions & Architecture-Framework tradeoffs. Two decisions define this rung. First, multi-zone everything: the global LB is inherently zone-spanning, the MIG runs across two or more zones, and Cloud SQL runs in HA (regional) mode. This is the build redundancy across failure domains principle applied at the zone granularity, and it is the single highest-ROI reliability step in all of Google Cloud — it removes the most common real-world failure (a data-centre/zone fault) for a modest premium rather than a re-architecture. Second, warm provisioned capacity trades scale-to-zero for predictability: a MIG keeps VMs running whether or not anyone visits, in exchange for no cold starts and consistent latency on checkout. The chief tradeoff is Cost and a little Operational Excellence (you now own scaling policies, instance templates or images, and a database to tune) bought in exchange for Reliability and Performance. Note the Cloud SQL subtlety: HA is for availability, not for scaling reads — the standby is not a readable replica; if you need read scale you add read replicas, a different feature. (If the workload is request-driven rather than long-running, choosing Cloud Run for the app tier here gives you scale-to-zero and removes VM patching — a sensible default that blurs Rungs 2 and 3.)

Rough cost. Global LB + a small regional MIG (two e2-class VMs, or Cloud Run) + Cloud SQL HA (a small-to-medium instance, which roughly doubles the single-zone database cost for the standby) + Cloud CDN: ballpark $150–$500/month depending on machine sizes and traffic. An order of magnitude above Rung 1 — the price of warm capacity, a managed relational engine, and zone-fault tolerance.

When this is enough. The overwhelming majority of business web applications. SaaS products in early-to-mid growth, internal line-of-business apps, e-commerce that can tolerate a rare, very short blip during a Cloud SQL failover, anything where ~99.9–99.95% and an RTO of minutes is contractually fine. Stop here unless a whole-region outage would be unacceptable, you have an event-driven workload that scale-to-zero would serve far more cheaply, or your organisation (many teams) has outgrown a single deployable.

Rung 3 — Serverless event-driven (Cloud Run + Firestore/Cloud SQL + Pub/Sub + Eventarc)

Scenario & requirements. ShopKart’s load is now genuinely spiky and event-shaped: flat for hours, then a flash-sale or a marketing push drives a 50× burst for twenty minutes. The team is small and wants to stop managing servers, patching, and scaling policies entirely — and to stop paying for idle capacity between bursts. RTO: minutes (the managed services self-heal). RPO: near-zero (Firestore and Cloud SQL are durable and replicated by default). Availability: high, ~99.95%+, inherited from regional managed services that are themselves multi-zone. Budget: pay strictly for what you use — ideally near-zero when idle. Team: a small team that wants to ship features, not operate infrastructure.

The design. A fully serverless, event-driven architecture, all of it regional and multi-zone by default. Cloud Run is the front door and the compute tier: a fully-managed container runtime that terminates traffic (behind the global LB or directly), scales from zero to many instances automatically, and bills per request and per 100 ms of CPU. State lives in Firestore — a fully-managed, serverless document database that is multi-zone (or multi-region) and scales automatically — or in Cloud SQL when you need a relational core. The event spine is Pub/Sub (Google’s global, fully-managed messaging) for asynchronous, buffered work between producers and consumers, with Competing Consumers Cloud Run services draining subscriptions. Eventarc routes events — from Cloud Storage object writes, Audit Logs, Firestore changes, or custom sources — to Cloud Run targets, decoupling “something happened” from “do this in response”. Durable multi-step workflows use Cloud Workflows, and scheduled jobs use Cloud Scheduler. Static assets stay on Cloud Storage + Cloud CDN.

This is the event-driven architecture style composed with serverless compute, and it leans on the publisher-subscriber, queue-based load levelling, competing consumers and async request-reply patterns to decouple producers from consumers and absorb spikes.

Services.

Component Service Role
Compute / API front door Cloud Run Managed containers; scale to zero, scale to many; per-request billing
Data (document) Firestore Serverless, multi-zone document store; auto-scaling
Data (relational) Cloud SQL Relational store when ACID/joins are needed
Messaging Pub/Sub (+ dead-letter topic) Global async buffering; competing consumers; poison-message handling
Eventing Eventarc Route events from GCP sources to Cloud Run
Orchestration Cloud Workflows / Cloud Scheduler Durable multi-step workflows; cron-style jobs

Key design decisions & Architecture-Framework tradeoffs. The defining decision is serverless and event-driven, and the principle is use managed services taken to its conclusion plus design to scale horizontally. You get extraordinary Cost Optimisation for a spiky workload (you pay per request and per unit of compute time, nothing when idle) and effortless Performance at the burst (the platform scales out for you). High availability is inherited — Cloud Run, Firestore and Pub/Sub are all regional, multi-zone managed services, so you get strong reliability without designing it. The tradeoffs are real and exam-relevant. Cold starts add tail latency after idle (mitigated with minimum instances on Cloud Run, at a cost). The model is eventually consistent and asynchronous by nature, so you accept the fallacies of distributed computing and design for idempotency, retries and out-of-order delivery (Pub/Sub is at-least-once, so consumers must be idempotent). If you choose Firestore you trade away relational joins and cross-document transactions at scale, which rewards access-pattern-first modelling — a genuine skill shift; if you keep Cloud SQL you keep ACID but reintroduce a connection-pooling and scaling concern (use the Cloud SQL Auth Proxy / connector). You also accept quotas and limits (per-service concurrency, Pub/Sub throughput) as first-class design constraints. The Operational-Excellence story is excellent for a small team, but observability is harder (many short-lived instances and async hops), so distributed tracing with Cloud Trace plus Cloud Logging/Monitoring becomes essential, not optional.

Rough cost. With bursty, low-average traffic this rung can be dramatically cheaper than Rung 2 — frequently $20–$200/month for a low-millions-of-requests workload, and often inside the free tier at MVP volumes, because there is no idle compute or standby database to pay for. The cost scales cleanly with usage, which is precisely its appeal; the watch-outs are high-throughput sustained traffic (where always-on VMs or committed use can become cheaper than per-request) and chatty designs that explode the request and message counts.

When this is enough. Event-driven and spiky workloads: APIs with uneven traffic, webhooks, ingestion and processing pipelines, scheduled jobs, glue between systems, and product backends a small team wants to run with minimal operational surface. It is also a superb complement to the other rungs — even a Rung 2 or 4 estate uses Pub/Sub, Eventarc and Cloud Run for its asynchronous edges. Stop here unless you need long-running or specialised compute that Cloud Run’s request model doesn’t suit, you have a strongly relational/transactional core that a document store fights (keep Cloud SQL, or look ahead to Spanner), or — the big one — your organisation has grown to many teams who each need to own and deploy a service independently, which is the driver for Rung 4.

Rung 4 — Containerised microservices (GKE + Gateway API + Cloud Service Mesh)

Scenario & requirements. ShopKart is now a large product with many engineering teams. The catalogue team, the basket team, the payments team and the fulfilment team each want to own, deploy, and scale their slice independently, on their own cadence, without a coordinated big-bang release. The domain has grown complex enough that a single deployable is a bottleneck — every change requires whole-app regression and a shared release train. RTO/RPO: similar to Rung 2/3 (still single-region, multi-zone; the move here is organisational, not a reliability upgrade). Availability: ~99.95%+ within the region. Budget: the business will fund the additional platform overhead because team velocity is the constraint. Team: many autonomous teams with real platform/DevOps maturity and on-call.

The design. A microservices architecture: the application is decomposed by business capability into independently-deployable services, each containerised, running on Google Kubernetes Engine (GKE) — ideally GKE Autopilot, where Google manages the nodes, bin-packing and security posture so the team operates workloads, not a cluster. Each service has its own data store (“database per service” — Firestore here, Cloud SQL there, Spanner for the service that needs it — the right data store for the job). North-south traffic enters through the Gateway API (the modern, role-oriented successor to Ingress, implemented on GKE by the global load balancer). East-west service-to-service traffic runs through Cloud Service Mesh (Google’s managed Istio-based mesh) for mutual TLS, traffic shaping, retries and rich telemetry without touching application code. Workloads authenticate to Google APIs with Workload Identity Federation (no service-account keys on the cluster). Asynchronous integration between services uses Pub/Sub / Eventarc (the same event spine as Rung 3). Everything is still spread across multiple zones (a regional cluster).

This is the microservices architecture style, and it draws heavily on the sidecar, ambassador, anti-corruption layer, backends for frontends, gateway routing/aggregation/offloading and strangler fig patterns — the last being how you usually get here, by carving services off a monolith incrementally rather than rewriting.

Services.

Concern Service Role
Orchestration GKE (Autopilot preferred) Run containers; Google-managed nodes and scaling
North-south ingress Gateway API (on the global LB) Role-oriented routing, TLS, health checks
East-west / mesh Cloud Service Mesh (managed Istio) mTLS, traffic shaping, retries, telemetry — sidecar pattern
Identity Workload Identity Federation Keyless service-to-Google-API authentication
Per-service data Firestore / Cloud SQL / Spanner (best fit) Database per service
Async integration Pub/Sub / Eventarc Decoupled, event-driven service integration

Key design decisions & Architecture-Framework tradeoffs. The single most important thing to understand about this rung — and the most common exam and review trap — is that microservices are an answer to organisational and scaling pressure, not an availability upgrade. A zone-redundant Rung 2 or serverless Rung 3 is just as available within a region, and far simpler. You climb to Rung 4 when many teams need to deploy independently and the domain complexity justifies the split — not to chase nines. The principle in play is minimise coordination (autonomous teams shipping without a shared release train) and design around limits. What you pay for it is substantial Operational-Excellence and Cost currency: a distributed system brings the full fallacies of distributed computing — network partitions, partial failures, eventual consistency, distributed transactions (handled with the saga pattern over compensating transactions, never a cross-service two-phase commit) — plus the platform burden of a mesh, ingress, distributed tracing, and per-service pipelines. GKE Autopilot vs Standard is itself a tradeoff: Autopilot minimises operational surface and is the right default; Standard buys node-level control (custom machine types, GPUs, DaemonSets, particular kernels) at the price of running the node pool yourself. Choosing Autopilot unless you have a concrete reason for node control is the keep it simple call. And a GCP-specific honesty check: for many “we want containers” requirements, Cloud Run already is containers — reach for GKE when you genuinely need Kubernetes (complex networking, stateful workloads, a large multi-team platform), not by default.

Rough cost. Compute cost is broadly comparable to running the same workload as a monolith on VMs or Cloud Run — you are paying for the same aggregate vCPU/memory — but the platform overhead (a service mesh, more ingress, more observability, the GKE cluster-management fee, and a larger DevOps investment) adds a real premium. Ballpark $500–$3,000+/month for a modest multi-service estate, dominated less by raw compute than by the surrounding platform and the engineering time to run it.

When this is enough. Large applications with many teams, complex domains, and independent scaling/deployment needs — the textbook fit. It is the right rung when the organisation, not the availability target, is the binding constraint. Stop here (do not climb to multi-region) unless a whole-region outage is genuinely unacceptable or a regulator demands a provable, geographically separate DR capability. And critically: do not climb to here for availability — if your pain is “we need to survive a zone failure”, Rungs 2 and 3 already solve that for a fraction of the cost and toil.

Rung 5 — Multi-region active-passive (disaster recovery)

Scenario & requirements. ShopKart now underpins revenue the business feels acutely, and a whole-region event — rare, but real — must not take the service down for long or lose committed orders. A regulator (or the business’s own risk appetite) requires a demonstrable, geographically separate disaster-recovery capability. RTO: tens of minutes, via a controlled failover. RPO: small — seconds to a few minutes — the last sliver of un-replicated data may be lost in a sudden regional loss, and that is acceptable. Availability: a higher composite, surviving the loss of the primary region. Budget: the business will pay an insurance premium for geographic redundancy, but not the full cost of a second live estate. Team: a mature platform team that will own and test a runbook.

The design. A second region holds a standby copy of the stack, kept at one of two warmth levels chosen by the RTO. Pilot light: the data layer is continuously replicated to the second region and minimal core infrastructure exists, but compute is scaled to (near) zero and is scaled up only on failover — cheapest, with the longest RTO. Warm standby: a scaled-down-but-running copy of the full stack is always live in the second region, ready to scale up fast — more expensive, faster RTO. Data replication is the heart of it: a Cloud SQL cross-region read replica (promoted to primary on failover) for relational data; Firestore in a multi-region location (or scheduled exports) for document data; and dual-region or multi-region Cloud Storage for objects, which replicates automatically. The traffic switch exploits Google’s global front door: the Global External Application Load Balancer already has a single anycast IP with backends in both regions, so failover is health-check-driven backend draining and capacity-based failover rather than a DNS flip — when the primary region’s backend turns unhealthy, the global LB shifts traffic to the standby region. Infrastructure is defined as code (Terraform) so the standby is a faithful, redeployable twin — and the failover runbook is automated and regularly rehearsed.

This rung instantiates the pilot light and warm standby disaster-recovery strategies — covered in depth in the multi-region DR & resilience reference architecture. (The simplest strategy, backup & restore, is essentially “Rung 2 plus cross-region backups” with an RTO of hours; the most aggressive, active-active, is Rung 6.)

Services & the warmth dial.

Concern Pilot light Warm standby
Compute in standby Scaled to ~zero; scaled up on failover Always running, scaled down
Relational data Cloud SQL cross-region read replica (async) Cloud SQL cross-region read replica (async)
Document data Firestore multi-region / scheduled export Firestore multi-region / scheduled export
Object data Dual-region / multi-region Cloud Storage Dual-region / multi-region Cloud Storage
Traffic switch Global LB health-check failover (anycast IP) Global LB health-check failover (anycast IP)
RTO band Tens of minutes to ~1 hour Minutes to tens of minutes
Relative cost Lower (no idle compute) Higher (always-on, scaled-down stack)

Key design decisions & Architecture-Framework tradeoffs. Two decisions dominate. First, how warm is the standby? — a pure RTO-versus-cost dial. Pay only for the warmth the RTO actually requires: a pilot light that must scale up may or may not hit a tight RTO (so test it), while a fully warm standby wastes money if a slower failover is acceptable. Second, and the one architects most often get wrong: active-passive relational replication is asynchronous, so the RPO is never zero. At the instant the primary region fails, the last few seconds-to-minutes of un-replicated transactions on the Cloud SQL replica are lost. If the business truly needs zero data loss across regions, active-passive with Cloud SQL cannot deliver it — that requirement alone is what justifies Rung 6’s Spanner. (One GCP nuance worth knowing: dual-region Cloud Storage with turbo replication offers an RPO target measured in minutes for objects, and Spanner can be dropped into a Rung-5 design specifically for the transactional slice that needs zero loss while everything else stays active-passive.) The Architecture-Framework trade is Cost and significant Operational-Excellence investment (a second estate, replication, and a tested failover runbook) spent to buy Reliability against a regional disaster. The most dangerous failure mode is an untested runbook: a DR capability you have never exercised is a hope, not a control — rehearse the failover (and failback) on a schedule, ideally as a game-day.

Rough cost. Pilot light adds the cost of cross-region data replication, standby data stores, and the inter-region egress bill — but little idle compute — so it might add 40–80% over the single-region baseline. Warm standby adds an always-on (scaled-down) second stack on top, pushing the total toward 1.5–2× the single-region cost. The frequently-forgotten line item is inter-region egress, which on a chatty replication workload can rival the compute savings.

When this is enough. Revenue-critical and regulated workloads that must survive a regional outage and can tolerate an RTO of minutes-to-tens-of-minutes and a small, non-zero RPO. This is the right rung for most “serious DR” requirements — it delivers geographic resilience without the cost and consistency complexity of running two live regions with multi-region writes. Stop here unless the business has quantified a catastrophic cost for any downtime, demands an RTO/RPO approaching zero on the transactional data, or needs to serve users in multiple geographies with low latency from the nearest region — all of which push you to Rung 6.

Rung 6 — Global active-active (mission-critical, with Spanner)

Scenario & requirements. ShopKart is now a system whose downtime cost is catastrophic and quantified — every minute offline is a number the board can recite — and it serves a global user base that expects low latency from the nearest region. There can be no failover step: traffic must already be flowing to multiple regions, so the loss of one is absorbed, not recovered from. RTO: effectively zero — the surviving regions are already serving. RPO: effectively zero on the transactional data, requiring synchronous multi-region writes, not async replication. Availability: the highest tier, surviving the complete loss of a region with no human in the loop. Budget: justified only by the cost-of-downtime maths — this is the most expensive rung by a wide margin. Team: a mature engineering organisation with deep operational practice (chaos testing, game-days, continuous validation).

The design. Two or more regions all serving live production traffic simultaneously, fronted by a single Global External Application Load Balancer whose one anycast IP routes each user to the nearest healthy region automatically — Google’s edge does the geo-proximity routing and drains an unhealthy region with no DNS change and no human in the loop. Cloud CDN caches at the edge globally. The defining challenge is data: writes happen in every region, so you need a store built for synchronous multi-region write. Cloud Spanner is the linchpin — a horizontally-scalable relational database with synchronous multi-region replication and external consistency (TrueTime-backed), giving you strongly-consistent reads and writes across regions with RPO = 0 and no application-level conflict resolution. Document and object data use Firestore multi-region and multi-region Cloud Storage. The compute tier is replicated per region as an independently-deployable scale unit / deployment stampregional MIGs, Cloud Run, or GKE in each region — so blast radius is contained and whole regional stacks can be deployed blue-green. Idempotency and graceful degradation are designed in from the first line of code; chaos testing and continuous validation prove the design survives region loss.

This rung lands exactly on the global web application reference architecture — the global load balancer, Cloud Run/GKE per region, Cloud Spanner and Cloud CDN — and represents the active-active apex of the multi-region DR spectrum. It draws on the geode pattern (geographically distributed nodes serving any request) and deployment stamps (the per-region scale unit).

Services.

Concern Service Role
Global routing Global External Application Load Balancer (anycast IP) Send users to the nearest healthy region; drain unhealthy ones — no failover step
Edge Cloud CDN Global caching at Google’s edge
Relational data Cloud Spanner (multi-region) Synchronous multi-region writes, external consistency, RPO = 0
Document data Firestore (multi-region) Multi-region document store
Object data Multi-region Cloud Storage Auto-replicated objects across the multi-region
Compute (per region) Regional MIG / Cloud Run / GKE per region Independently-deployable scale unit / deployment stamp
Validation Fault injection / game-days Continuously prove the design survives region loss

Key design decisions & Architecture-Framework tradeoffs. The decision that makes this rung is synchronous multi-region writes, and Google Cloud’s distinctive answer is Cloud Spanner: because it provides external consistency across regions, you largely avoid the application-level conflict resolution (last-writer-wins, vector clocks, CRDTs) that other platforms force on you at this tier — which is Spanner’s whole reason to exist, and the cleanest path to genuine global active-active in the industry. That power has a price: Spanner is the most expensive data tier on the ladder, it rewards interleaved, well-distributed schema design (avoid monotonic/hotspot primary keys), and synchronous cross-region commits add write latency (the unavoidable physics of a quorum spanning continents). The Architecture-Framework trade is stark: you spend the maximum of Cost (multiple full live estates plus a multi-region Spanner instance and inter-region traffic) and Operational-Excellence currency (multi-region deployments, continuous validation, chaos engineering, sophisticated observability) to buy the maximum Reliability (zero-RTO, zero-RPO, region-loss tolerance). A subtle but vital point: the composite SLA of two regions each at, say, 99.95% combines as 1 − (1 − A)² ≈ 99.999975% for that redundant tier in isolation — but the real number is capped by the least-available serial dependency in front of them (a single global LB misconfiguration, a single Spanner instance that runs hot, a single project-level quota). Adding regions buys nothing if a serial choke point remains. And honesty matters: active-active is not automatically better than active-passive — it is far more complex and costly to operate, and complexity avoidance is itself a mission-critical principle. You climb here only when the cost-of-downtime maths forces it.

Rough cost. You are now running N full live estates plus a multi-region Cloud Spanner instance and continuous inter-region traffic, so cost scales roughly with the number of regions — commonly 2–3×+ the single-region baseline, and frequently more once Spanner (which carries a meaningful node floor) and the operational investment (tooling, chaos programmes, a larger SRE function) are counted. This is the most expensive rung on the ladder by a wide margin, and the only justification for it is a downtime cost that exceeds the spend.

When this is enough. This is the apex. It is right for systems where any downtime is catastrophic and quantified, where a global audience demands low-latency local serving, and where the organisation has the maturity to operate a continuously-validated multi-region estate. For the overwhelming majority of workloads, this rung is over-engineering — the discipline is recognising that and climbing back down. There is no rung above this; the work from here is operational excellence (chaos testing, game-days, tightening the health model and deployment automation), not a more elaborate topology.

How to choose a rung from requirements

You never pick a rung by taste. You read the axes and let them point. Here is the decision distilled into a single table — read it top to bottom and stop at the first row whose requirement you genuinely have.

If the requirement is… …the rung is Why
Static/JAMstack front end, cheap, spiky, best-effort (over-delivers anyway) 1 — Static (Cloud Storage + Cloud CDN + global LB) No servers; global edge; near-zero cost; eleven nines of durability
Server-side logic, relational data, ~99.9%+, survive a zone failure 2 — Single-region 3-tier (global LB + MIG/Cloud Run + Cloud SQL HA) Warm capacity + ACID + multi-zone; the simplest real production design
Spiky/event-shaped load, want zero idle cost and no servers to run 3 — Serverless event-driven (Cloud Run + Firestore/Cloud SQL + Pub/Sub + Eventarc) Pay-per-use, scale-to-zero, HA inherited; cold starts/eventual consistency accepted
Many autonomous teams + complex domain + independent deploy/scale 4 — Containerised microservices (GKE + Gateway API + Cloud Service Mesh) Organisational/scaling driver — not an availability driver
Must survive a whole-region outage; DR provable; RTO tens of min, small RPO 5 — Multi-region active-passive (pilot light / warm standby) Geographic insurance; composite SLA up; async ⇒ non-zero RPO on Cloud SQL
Catastrophic, quantified cost of any downtime; RTO/RPO ≈ 0; global low latency 6 — Global active-active (Spanner) The apex; Spanner gives synchronous multi-region writes and RPO = 0

Four rules govern the whole climb:

  1. Requirements drive the rung — not fashion, not CV-building. The single best question in any review is “what requirement forces us off the rung below?” If you cannot answer it crisply, you have over-engineered.
  2. Availability and organisation are different axes. Rungs 1→2→3 (and 5→6) climb reliability/geography; Rung 4 climbs organisational structure. Do not reach for GKE to get availability — Rung 2’s multi-zone MIG or Rung 3’s managed services do that more cheaply.
  3. Multi-zone (inside Rungs 2–4) is the highest-ROI reliability step on the ladder. It removes the most common real failure for a modest premium. Most teams under-buy zone redundancy and over-buy multi-region.
  4. Every step up spends Cost and Operational-Excellence currency to buy Reliability and Performance. That is the Architecture-Framework trade in one sentence. Make it deliberately, write down what you bought and what you paid, and you will rarely be wrong.

The honest summary: most production workloads belong on Rung 2 or 3 — multi-zone, single-region, on managed or serverless compute. Rung 4 is for organisations whose team structure forces it. Rung 5 is for the regulated and the genuinely revenue-critical. Rung 6 is for the few systems with a catastrophic, quantified downtime cost, and is over-engineering everywhere else. Climbing the ladder is easy; the discipline — and the seniority — is in knowing when to stop.

The Google Cloud architecting ladder

The diagram above stacks the six rungs as a single climb — static site, single-region 3-tier, serverless event-driven, containerised microservices, multi-region active-passive DR, and global active-active — showing for each the headline Google Cloud services and the capability it adds, so you can see at a glance how cost, complexity and resilience all rise together with every step up.

Real-world application

In a real Google Cloud design engagement this ladder is the backbone of the first conversation, before a single resource is drawn. You sit with the business owner and pin the axes: “What does an hour of downtime actually cost? Can you lose a minute of data? How many teams will touch this? What does compliance demand? Is your traffic flat or spiky?” Their answers land you on a rung, and from there the service list almost writes itself.

It also reframes migration and modernisation. A lift-and-shift typically lands a workload on Rung 2 (rehost onto a Compute Engine MIG + Cloud SQL HA), and the modernisation roadmap is literally “which rung, and when?” — often 2→3 for spiky workloads (replatform to Cloud Run to kill idle cost), or 2→4 only when team structure demands it, and 2/3→5 only if a regional-DR requirement appears. It anchors cost conversations with FinOps: each rung has a recognisable cost shape, and “we are paying for Rung 6 but only need Rung 2” — a global, Spanner-backed active-active estate fronting an internal app — is one of the most common and expensive findings in a Google Cloud cost review. And in the Professional Cloud Architect exam (with its Mountkirk Games, EHR Healthcare and Helicopter Racing League case studies), the questions are this ladder in disguise: a scenario hands you RTO/RPO/scale/budget and asks for the design that meets it — you are being tested on whether you can land on the right rung without over- or under-shooting.

Common mistakes & anti-patterns

Interview & exam questions

  1. Walk me through how you would choose between a single-region multi-zone design and a multi-region active-passive design on GCP. (Look for: RTO/RPO and regional-outage tolerance as the deciding axis; multi-zone survives a zone fault but not a region; multi-region is justified by a regional-outage or compliance-DR requirement; cost roughly 1.5–2× and a non-zero RPO from async Cloud SQL replication.)
  2. A startup with two engineers wants to build “microservices on GKE” for their MVP. What’s your advice? (Look for: that’s an over-engineering anti-pattern; microservices answer organisational complexity they don’t have; recommend Rung 1, 2 or serverless Rung 3 — and note Cloud Run already gives them containers; “keep it simple”; revisit GKE when many teams and domain complexity force it, and prefer Autopilot over Standard.)
  3. Calculate the approximate composite availability of two regions, each 99.95%, behind the global load balancer, and explain what caps it. (Look for: redundant paths combine as 1 − (1 − A)² ≈ 99.999975% for that tier in isolation; then multiply by the SLA of any serial dependency (the global LB, a shared Spanner instance, a project quota); the least-available serial component caps the composite.)
  4. What’s the minimum change to make a single-VM web app survive a data-centre failure, and why is it the highest-ROI step? (Look for: run a regional managed instance group across two+ zones behind the global LB with autohealing, and make Cloud SQL HA (regional); it removes the most common real failure — a zone fault — for a modest premium rather than a re-architecture; that’s the core of Rung 2.)
  5. A workload needs RPO ≈ 0 across regions. Which rung does that force on GCP, and why can’t a cheaper one deliver it? (Look for: active-passive uses async Cloud SQL cross-region replication → non-zero RPO; near-zero cross-region RPO requires synchronous multi-region writesCloud Spanner (external consistency), i.e. Rung 6 active-active. For zonal zero-loss within a region, Cloud SQL HA’s synchronous standby suffices.)
  6. Explain the DR strategies on GCP and the RTO/RPO band of each. (Look for: backup & restore — hours, cheapest; pilot light — tens of minutes, data replicated/compute off; warm standby — minutes, scaled-down stack live; active-active — near-zero, all regions serving. Map them to rungs: backup & restore ≈ Rung 2 + cross-region backups, pilot light & warm standby = Rung 5, active-active = Rung 6.)
  7. When would you choose Cloud SQL read replicas versus Cloud SQL HA? (Look for: HA (regional) is for availability — a synchronous standby with auto-failover, not readable for scaling; read replicas are for read scaling/offloading and are asynchronous; a cross-region read replica also serves as a DR building block. They solve different problems and are often used together.)
  8. Cloud Run vs GKE for a containerised estate — how do you decide? (Look for: Cloud Run is the sensible default — fully-managed, scale-to-zero, no cluster to run; GKE (Autopilot) is for when you genuinely need Kubernetes — complex networking, stateful workloads, a large multi-team platform, a service mesh; “keep it simple” — don’t run a cluster you don’t need.)
  9. Where does a static-site design break down, and what’s the next rung? (Look for: the moment you need server-side logic, your own database, or authenticated write operations; climb to Rung 2 (3-tier multi-zone) for a relational/transactional core, or Rung 3 (serverless event-driven) for a spiky, event-shaped workload.)
  10. Why might you choose active-passive over active-active even when you can afford active-active? (Look for: complexity and cost — even with Spanner removing conflict resolution, active-active means N live estates, a multi-region Spanner instance, and continuous validation; if an RTO of minutes-to-tens-of-minutes is acceptable, active-passive is far simpler and cheaper; complexity avoidance is itself a reliability principle.)
  11. What makes Cloud Spanner the linchpin of GCP global active-active, and what does it cost you? (Look for: synchronous multi-region replication + external consistency (TrueTime) → RPO = 0 and no app-level conflict resolution; the price is cost (a node floor), write latency from cross-region quorum, and schema discipline to avoid hotspots.)
  12. Is the ladder strictly linear? Defend your answer. (Look for: no — Rung 4 (microservices) is an organisational axis orthogonal to the geographic axis of 5–6; a monolith can be global active-active and a microservices estate can be single-region; Rungs 1→2→3 and 5→6 climb reliability/geography, 4 climbs structure. Also: serverless (3) composes with multi-region (5/6).)

Quick check

  1. Which step is the highest-ROI reliability upgrade on GCP, and what failure does it remove?
  2. What is the composite-availability formula for two independent redundant regions each at availability A, and what caps the real number?
  3. Name the axis that drives a move to containerised microservices on GKE — and the axis it does not improve.
  4. Why does multi-region active-passive always have a non-zero RPO on GCP’s Cloud SQL?
  5. Which GCP data service makes global active-active relational writes possible with RPO = 0, and what does it cost you in return?

Answers.

  1. Going multi-zone (a regional managed instance group across zones behind the global LB + Cloud SQL HA) — the heart of Rung 2. It removes a single-zone/data-centre failure for a modest premium rather than a re-architecture.
  2. 1 − (1 − A)² for that redundant tier in isolation (the system is down only when both regions are down) — then multiply by the SLA of any serial dependency in front (the global load balancer, a shared Spanner instance, a project quota). The least-available serial component caps the composite.
  3. It is driven by organisational scale and domain complexity (many autonomous teams + independent deploy/scale needs). It does not improve availability — Rung 2’s multi-zone MIG or Rung 3’s managed services deliver HA more cheaply (and Cloud Run already gives you containers).
  4. Because Cloud SQL cross-region replication is asynchronous — at the instant the primary region fails, the last few seconds-to-minutes of un-replicated data are lost. Near-zero cross-region RPO requires synchronous multi-region writes.
  5. Cloud Spanner — synchronous multi-region replication with external consistency (TrueTime), giving RPO = 0 with no application-level conflict resolution. In return you pay a higher cost (a node floor), accept cross-region write latency, and must design the schema to avoid hotspots.

Exercise

The brief. You are the architect for “MediShip”, a pharmacy fulfilment platform on Google Cloud. Requirements as stated by the business: it handles prescription orders across one country; a regional Google Cloud outage must not lose orders and must not take the system down for more than ~15 minutes; losing more than a minute or two of order data in a disaster is unacceptable for audit reasons; traffic is steady with predictable evening peaks; there is one moderately-sized engineering team; a regulator requires a demonstrable, geographically separate DR capability; the budget is real but not unlimited. Choose a rung, name the key GCP services, and state the one decision you would push back on.

Write your answer before reading on.

Model answer. Read the axes. “Regional outage must not take it down for >15 min” + “regulator requires a geographically separate, demonstrable DR” → this crosses the regional boundary, so a single-region multi-zone design (Rung 2/3) is not sufficient. An RTO of ~15 minutes is achievable with a controlled failover, so you do not need full global active-active (Rung 6) for the whole estate — its cost and the Spanner node floor aren’t justified by a 15-minute RTO alone. One moderately-sized team → microservices on GKE (Rung 4) is an over-engineering trap; stay on managed/serverless compute. The right rung is 5 — multi-region active-passive, specifically a warm standby (a pilot light may struggle to hit 15 minutes — but test it before deciding), with each region’s stack already multi-zone (so you also get Rung 2’s HA inside it). Services: the Global External Application Load Balancer with backends in both regions and health-check-driven failover (no DNS flip needed); a regional MIG or Cloud Run (multi-zone) in the primary and a scaled-down warm copy in the second region; a Cloud SQL cross-region read replica (promote on failover) for relational data; dual-region or multi-region Cloud Storage for objects; Firestore multi-region if any document data is involved; Cloud Armor + Cloud CDN in front; everything in Terraform so the standby is a redeployable twin; and a tested, automated failover (and failback) runbook, rehearsed as a game-day.

The decision to push back on: the stated RPO of “a minute or two” sits in tension with async Cloud SQL cross-region replication, which under a sudden regional loss can lose more than that. Surface it explicitly: either (a) move only the order-transaction slice onto Cloud Spanner (multi-region, RPO = 0) while the rest of the estate stays active-passive — a precise, cost-controlled way to meet the audit requirement on exactly the data that needs it — or (b) accept and validate the Cloud SQL replica’s real replication lag against the audit requirement and document the residual risk. Naming that tension — rather than silently designing past it — is the senior move. Also flag: tune the standby’s warmth to the 15-minute RTO and prove it with a real failover test, and budget the inter-region egress cost explicitly, because it is the line item that most often surprises on a Rung-5 design.

Certification mapping

This lesson is squarely Professional Cloud Architect (PCA) territory — the exam is, in essence, a series of “given these requirements, choose the design that meets them” questions (often inside the Mountkirk Games, EHR Healthcare and Helicopter Racing League case studies), which is precisely this ladder.

Cert Relevance
Professional Cloud Architect Primary. Design and plan a solution that is scalable, reliable and cost-effective: multi-zone vs multi-region; global LB + MIG/Cloud Run + Cloud SQL HA; serverless (Cloud Run/Firestore/Pub/Sub/Eventarc); Cloud Storage/Cloud CDN/Cloud DNS; DR strategies and RTO/RPO; Spanner for global active-active. Every rung maps to exam objectives and the case studies.
Associate Cloud Engineer (ACE) The build-and-operate view of the lower rungs — deploying MIGs, Cloud Run and Cloud SQL, configuring the load balancer and health checks, and basic IAM/networking.
Professional Cloud DevOps Engineer The operational view — SLIs/SLOs/error budgets, monitoring, executing and testing DR runbooks, and the safe-deployment/blue-green practices the upper rungs depend on.
Professional Cloud Database Engineer The data-tier depth — choosing between Cloud SQL, Firestore and Spanner, HA vs read replicas, cross-region replication, and Spanner’s consistency/scaling model behind Rung 6.

For the PCA specifically, drill the availability and DR fundamentals: per-service SLAs, the down-minutes-per-month each “nine” implies, composite SLA for serial chains, the DR strategies mapped to RTO/RPO bands, and how zones, regions and Google’s global services raise the achievable number — and remember the exam’s recurring tell: it usually wants the design that meets the requirement at sensible cost, which is this ladder’s whole thesis.

Glossary

Next steps

You now have the spine of Google Cloud architectural judgement: requirements in, the right rung out. The natural next lesson turns this design discipline into a hiring portfolio — Real-World Google Cloud Portfolio Projects: From a Static Site to a Landing Zone — which builds exactly these rungs as shippable GitHub projects with quantified résumé bullets, so you can demonstrate the judgement this lesson teaches.

To deepen the surrounding material:

GCPArchitecture FrameworkResilienceMulti-RegionPCADisaster Recovery
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading