Architecture GCP

GCP Enterprise Architecture: Serverless API

Every serverless API on Google Cloud eventually forces two decisions that most “hello world” tutorials skip, and getting them wrong is what turns a clean weekend prototype into an 18-month rewrite. The first is the edge: do you put API Gateway in front, or Apigee, or both — and the honest answer for a growing enterprise is both, for different audiences, which only works if you understand exactly what each one is for. The second is the compute: Cloud Functions or Cloud Run? Google has quietly merged these two until the line is blurry (Cloud Functions 2nd gen literally runs on Cloud Run), but the right default for an enterprise API is not the one most people pick. This article is the reference architecture that answers both, built on the real services in Google’s serverless stack — API Gateway and Apigee at the front door, Cloud Run (with Cloud Functions where it fits) for compute, Firestore for data, and Identity Platform for identity — and assembled into something a five-person startup and a regulated enterprise can both deploy without redrawing the diagram.

The running domain is deliberately a request/response API, not an event firehose. There is a sibling pattern for event-driven, telemetry-heavy systems; this one is about the boring, universal thing almost every company needs first: a governed HTTP/JSON (and gRPC) API that backs web, mobile, and partner clients, scales to zero when nobody is using it, scales to thousands of concurrent requests on launch day, and bills per call instead of per provisioned VM. The interesting engineering is not “can serverless serve HTTP” — obviously it can — but how to give that API one identity, a layered edge, a data model that fits a document store, and a cost curve that tracks usage, without a single server to patch.

The business scenario

Cedarline Health (fictional, used throughout) builds a patient-engagement platform — appointment booking, secure messaging, lab-result delivery, and a clinician portal — sold to mid-sized clinics and hospital groups. They are 14 engineers. Their API has four distinct consumers, and that multiplicity is the whole story:

The traffic is spiky in two different ways at once. Within a day: quiet overnight, a booking surge 8–10 a.m. as clinics open, a lab-results wave each afternoon. Across the calendar: flu season and open-enrollment windows triple the baseline for weeks. They tried a fixed Compute Engine + managed-instance-group tier and lived the usual misery — sized for the flu-season peak and idle two-thirds of every day, or sized for the median and paged at 8:05 a.m.

The mandate from the new VP of Platform was specific and is what this architecture has to satisfy:

  1. One identity system spanning consumer patients and federated hospital staff and machine partners — not three bolted-together auth stacks. This is the requirement that eliminates most naive designs.
  2. A real partner program — a self-service developer portal, keys, quotas, analytics, and per-call monetization for the lab integrations — without standing up and running an API-management cluster by hand.
  3. Scale to zero overnight and absorb the flu-season ramp with no capacity meeting and no pre-warming spreadsheet.
  4. HIPAA-grade controls — encryption, least privilege, audit trails, data-exfiltration boundaries — because this is patient data, with a signed BAA and a security team that audits.
  5. A genuine DR story with a defined RTO/RPO, because a clinic that can’t pull a lab result is a clinical-safety and contract problem, not a “we’ll fix it Monday” problem.

This is the serverless sweet spot: variable, multi-audience, request-driven traffic where per-request economics beat steady-state utilization and the scarcest resource is the 14 engineers’ attention. And it scales down cleanly — a three-person startup with one clinic deploys the identical shape (API Gateway only, Cloud Run scaling to zero, single-region Firestore, Identity Platform on the free tier) for a few thousand rupees a month, and adds Apigee, multi-region, and VPC Service Controls when the partner program and the compliance auditor actually arrive. That down-scalability is what makes it a reference architecture rather than a big-company special case.

Architecture overview

The defining idea is a two-tier edge over a shared serverless core. The two front doors — API Gateway and Apigee — serve different audiences with different needs, but both terminate on the same identity, the same Cloud Run services, and the same Firestore data. Nothing about “which front door” leaks below the edge. Above it, each tier does only what it is good at.

GCP enterprise serverless API reference architecture: a two-tier edge (Cloud DNS, Global External ALB with Cloud Armor, fronting API Gateway for first-party traffic and Apigee for the monetized partner program) over a shared Cloud Run compute core, with Firestore and Cloud Storage data, Identity Platform as the single JWT issuer, and an Eventarc-to-Cloud Run async notification lane; the numbered patient request path runs 1 to 7.

The request path (patient mobile app — the high-volume consumer case):

  1. The app authenticates the user through the Identity Platform client SDK (email/password, Google, Apple, or SMS OTP). Identity Platform returns a signed OIDC ID token / JWT carrying the user’s sub, verified email, and any custom claims (tenant/clinic ID, role).
  2. The app calls https://api.cedarline.example/v1/... over TLS 1.3. DNS resolves through Cloud DNS to a Global External Application Load Balancer with Cloud Armor in front (WAF/OWASP rules, IP and geo rules, an adaptive-protection L7 DDoS layer, and a per-IP rate-based rule).
  3. The load balancer routes the patient/first-party paths to API Gateway, a fully managed gateway purpose-built to front serverless backends. API Gateway validates the Identity Platform JWT against its issuer/JWKS, enforces per-key quotas and method-level config from the OpenAPI spec, and forwards to the backend.
  4. API Gateway invokes a Cloud Run service over an authenticated call (it mints an ID token for the backend’s service account; the Cloud Run service is --no-allow-unauthenticated and only trusts the gateway’s identity). The service runs the business logic, reads/writes Firestore scoped to the caller’s verified sub/tenant, and returns JSON. For a hot, idempotent GET the load balancer/CDN can cache the response.

The request path (partner lab integration — the monetized machine case):

  1. The lab’s server obtains an OAuth2 access token (client-credentials) and calls the Apigee endpoint (its own hostname/base path, also fronted by the global LB + Cloud Armor).
  2. Apigee is the full API-management plane for the partner program. On the request it runs a policy pipeline: verify the API key / OAuth token, enforce the partner’s quota and spike-arrest (rate limiting), check the request against the OpenAPI contract, capture analytics, and — for the per-call-billed lab product — record the transaction for monetization. It then routes to the same Cloud Run service the patient path uses.
  3. The Cloud Run service is identity-agnostic at this layer: it trusts a verified caller identity and a tenant claim handed to it by whichever edge terminated the request, and it serves the same Firestore-backed logic. The partner never sees, and never needs, the patient app’s front door — and vice versa.

The request path (clinician portal — the federated enterprise case):

  1. A hospital staff member signs in to the clinician web app, which uses Identity Platform’s multi-tenancy and SAML/OIDC federation: each customer clinic is a tenant, and that tenant is configured to federate to the clinic’s own IdP (Entra ID, Okta). The staff member logs in with their hospital credentials; Identity Platform brokers the federation and issues a Cedarline JWT carrying the tenant ID and role.
  2. From there the path is identical to the patient path — through API Gateway to Cloud Run — except the tenant claim scopes every Firestore query to that clinic’s data, and role claims gate clinician-only operations.

The data path. Firestore in Native mode is the operational source of truth: documents for patients, appointments, messages, lab results, and clinic configuration, with security rules as a second authorization layer, composite indexes for the query patterns, and transactions for idempotent writes. Large binary artifacts (lab-result PDFs, message attachments) live in Cloud Storage, referenced by object name from Firestore and served to clients via short-lived signed URLs minted by the backend. The light async touch — when a lab result is written, a patient needs a push notification — rides Firestore’s change stream via an Eventarc trigger to a tiny notification Cloud Run service, so the synchronous write path returns immediately and the notification happens out of band. (This architecture deliberately keeps the event surface small; the heavy fan-out, saga, telemetry-firehose patterns belong to the event-driven reference architecture, not here.)

The whole thing is stateless at the compute tier and regional-with-failover at the edge: every Cloud Run service is horizontally scalable and idempotent, both front doors are managed services that scale without our involvement, and the only durable state is Firestore (multi-region) and Cloud Storage (multi-region/dual-region). Drawn as a diagram it is three layers stacked: edge (Cloud DNS → Global LB + Cloud Armor → {API Gateway | Apigee}) on top; compute (a pool of Cloud Run services, fronted identically by either gateway, each running as its own least-privilege service account) in the middle; data (Firestore Native multi-region + Cloud Storage, with a thin Eventarc→Cloud Run notification side-channel) at the bottom. Identity Platform sits to the side as the single issuer every front door validates against, and Cloud Logging/Trace/Monitoring plus a VPC Service Controls perimeter wrap the whole stack.

Component breakdown

Component GCP service Role here Key configuration choices
Edge / DDoS / WAF Global External ALB + Cloud Armor Global anycast TLS ingress, L7 filtering, host/path routing to the two gateways Preconfigured WAF (OWASP) rules; per-IP rate-based rules; adaptive protection for L7 DDoS; one cert/one edge in front of both gateways
First-party / internal edge API Gateway Lightweight managed gateway for patient/clinician/internal APIs OpenAPI-defined config; JWT validation against Identity Platform issuer/JWKS; per-key quotas; authenticated invocation of Cloud Run backends
Partner / monetized edge Apigee Full API-management plane: dev portal, keys, quotas, spike-arrest, analytics, monetization Policy pipeline (VerifyAPIKey/OAuthV2, Quota, SpikeArrest, OAS validation); developer portal + API products; rate plans for per-call billing
Identity Identity Platform One issuer for consumers + federated staff + machines Email/Google/Apple/SMS providers; multi-tenancy (a tenant per clinic) with SAML/OIDC federation to customer IdPs; custom claims (tenant, role) set via Admin SDK; MFA
Compute Cloud Run (primary) + Cloud Functions (where it fits) Stateless business logic, request-driven, scale-to-zero Concurrency 80 for I/O-bound handlers; min-instances only on latency-critical services; --no-allow-unauthenticated; per-service service account; CPU-boost on cold start
Data Firestore (Native mode) Operational source of truth + per-tenant document model Multi-region (nam5/eur3) for HA; security rules as a second authz layer; composite indexes; transactions for idempotency; TTL for ephemeral docs; PITR enabled
Blobs Cloud Storage Lab PDFs, attachments, exports Referenced by object name from Firestore; short-lived signed URLs minted by the backend; CMEK; dual/multi-region buckets; lifecycle to Nearline/Coldline
Light async Eventarc (Firestore trigger) → Cloud Run Push notification on a new lab result, out of band Document-write trigger on results/{id}; tiny single-purpose reactor; not a general fan-out bus
Secrets / config Secret Manager Partner credentials, third-party API keys, signing material Reached via service-account identity, never embedded in images or config; rotation; CMEK
Observability Cloud Logging + Trace + Monitoring + Error Reporting Structured logs, distributed traces, SLO alerting Trace context propagated edge→Run→Firestore; log-based business metrics; SLO burn-rate alerts; per-channel dashboards
Governance / boundary VPC Service Controls + Org Policy Data-exfiltration perimeter and org-wide guardrails VPC-SC perimeter around Firestore/Storage/Secret Manager; org policies (disable SA key creation, restrict regions, domain-restricted sharing)

Four of these choices carry the design and deserve the why, because they are where this architecture diverges from a naive serverless app.

Why two front doors — API Gateway and Apigee — instead of one. The instinct is to pick one and standardize. But they sit at different points on a price/capability curve, and an enterprise API genuinely needs both points. API Gateway is a lightweight, inexpensive, fully managed gateway designed specifically to front serverless backends (Cloud Run, Functions, App Engine). It does JWT validation, API keys, quotas, and OpenAPI-driven routing — exactly what the patient, clinician, and internal surfaces need — at a fraction of Apigee’s cost and operational weight. Apigee is a full API-management platform: a developer portal, fine-grained policy pipelines, deep analytics, traffic-management primitives (spike arrest, concurrent-rate limits), API product/rate-plan modeling, and monetization — the machinery a partner program with paying lab integrations requires and that you should not hand-build. The architectural rule: API Gateway for first-party and internal APIs; Apigee for the externalized, monetized partner program. Both terminate on the same Cloud Run services, so this is two edges over one backend, not two backends. (For a team with no partner program yet, you start with API Gateway alone and add Apigee the quarter the program is funded — without touching the compute or data layers.)

Why Cloud Run is the default, and where Cloud Functions still wins. This is the decision most teams get backwards. Cloud Functions 2nd gen and Cloud Run are now built on the same substrate, but Cloud Run is the better default for an API for concrete reasons: it serves any container (so any language/runtime, any framework — Express, FastAPI, Go net/http, gRPC), supports request concurrency >1 (one instance handles 80 simultaneous requests, which slashes cost and cold-start frequency versus a function that handles one request per instance), and gives full control over the listening process. Cloud Functions earns its place for the small, single-purpose, glue pieces where you want the absolute minimum deploy unit and event wiring is the point — the Eventarc-triggered “new lab result → notify” reactor is a perfect Cloud Function (or a tiny Cloud Run service; the line is genuinely thin). The rule: Cloud Run for the API surface and anything with real business logic or non-trivial dependencies; Cloud Functions for narrow event-glue where one-function-per-concern and zero-boilerplate deploy matter more than concurrency. Defaulting your whole API to one-request-per-instance Functions is the classic cost-and-latency mistake.

Why Firestore, and what it is not good for. Firestore in Native mode is an excellent fit here because the access patterns are document- and tenant-shaped: “get this patient,” “list this patient’s appointments,” “list this clinic’s unread messages” — all single-collection, single-tenant-partition queries that return in single-digit milliseconds and scale horizontally with no sharding. It also brings two things that matter for this domain specifically: security rules (a declarative, server-enforced authorization layer that can restrict a query to documents the authenticated user owns — defense in depth behind the gateway’s coarse check), and a change stream that turns a committed write into an event via Eventarc with no outbox table. Where Firestore is the wrong tool: heavy ad-hoc relational queries, multi-entity JOINs, strong cross-entity transactional reporting, or analytics aggregations. Those are not this API’s job — but if a slice of the domain needs them, that slice sits behind the same Cloud Run/gateway/identity front door on Cloud SQL or AlloyDB instead, and the architecture absorbs it without changing shape.

Why Identity Platform rather than rolling auth or using plain Firebase Auth. The three-audience requirement (consumers + federated staff + machines) is what forces this. Identity Platform is the enterprise-grade evolution of Firebase Authentication: it adds multi-tenancy (a separate identity tenant per clinic, with isolated users and per-tenant federation config), SAML and OIDC federation to bring a hospital’s existing Entra ID/Okta in without provisioning passwords, MFA, and the SLA and support an enterprise needs — while still issuing standard OIDC JWTs that API Gateway and Apigee both validate natively. Custom claims (tenant ID, role) are stamped onto the token via the Admin SDK so the backend gets a verified tenant scope on every request rather than trusting a client-supplied ID. One issuer, validated by two front doors, covering patients, federated clinicians, and machine partners — that is the requirement that defines this architecture, and Identity Platform is the single component that satisfies it.

Implementation guidance

Provision with Terraform (the user’s house standard) in a layered state layout so blast radius is contained and teams can move independently: an edge layer (Cloud DNS, certs, the Global LB, Cloud Armor policies, the API Gateway config, the Apigee org/environment/products), an identity layer (Identity Platform config, tenants, providers, custom-claim setters), a data layer (Firestore database + indexes + security rules, Cloud Storage buckets, CMEK keys), and an app layer (Cloud Run services, service accounts, IAM bindings, the Eventarc trigger). Each layer keeps its own remote state in a GCS backend with state locking, wired together by terraform_remote_state data sources. Keep container build out of the IaC critical path: CI (Cloud Build, or GitHub Actions over Workload Identity Federation) builds and pushes images to Artifact Registry, and Terraform points Cloud Run at an immutable image digest.

Concretely:

Networking — the deliberate choices. Cloud Run, Firestore, Cloud Storage, and Secret Manager are all managed services reachable over Google’s network and governed by IAM, not by network reachability — so you do not put Cloud Run in a VPC merely to talk to Firestore. The default here is: Cloud Run ingress set to internal-and-cloud-load-balancing (so the only public path is through the Global LB → gateway, never the run.app URL directly), and the service trusting only the gateway’s invoker identity. A Serverless VPC Access connector is added only when a service must reach a private resource (a Cloud SQL instance over private IP, an on-prem system over Interconnect, or to send third-party egress through Cloud NAT for a stable, allow-listable IP — which the lab/SMS integrations need). Otherwise, isolation here is an IAM-and-resource-policy problem, not a subnet problem.

Identity wiring (the part that prevents the most incidents). One Identity Platform configuration, with a tenant per customer clinic so users, federation config, and (optionally) data partitions are isolated per customer. Consumer providers (email/Google/Apple/SMS) live on the default/patient tenant; each clinic tenant is wired to that clinic’s SAML or OIDC IdP. Custom claims (tenantId, role) are set server-side via the Admin SDK at user-provisioning time, so the backend reads a verified tenant scope from the token — it never trusts a client-supplied tenant or user ID. Authorization is two-tiered: a coarse check at the edge (API Gateway / Apigee rejects an unauthenticated, expired, or wrong-audience token before any compute runs) and a fine check at the data boundary (the Cloud Run handler — and Firestore security rules as defense in depth — constrain every query to request.auth.token.tenantId and the caller’s own documents, so a patient can read only their records and a clinician only their clinic’s). Machine partners authenticate via OAuth2 client-credentials through Apigee; their tokens carry a partner/product identity, not a patient one.

Enterprise considerations

Security and Zero Trust. The design is Zero-Trust by construction: every request is authenticated (Identity Platform JWT or partner OAuth/key) and authorized at the edge and re-checked at the data boundary, with no implicit trust from “being inside the project” — every Cloud Run service is --no-allow-unauthenticated and grants run.invoker only to the specific gateway/Eventarc identity, so there is no anonymous east-west call. Cloud Armor filters L7 attacks and absorbs DDoS at the edge. Every service-to-data hop is least-privilege IAM — the messaging service can read/write the messages collection and nothing else; the notification service holds only the Secret Manager accessor role for the push credential. Data is encrypted at rest with CMEK (Firestore, Cloud Storage, Secret Manager) and in transit with TLS 1.2+ everywhere. Secrets live only in Secret Manager, reached via service-account identity — no keys in code, configs, or pipeline logs (this codebase has prior history with leaked DB credentials; the failure mode is designed out, not patched). A VPC Service Controls perimeter around Firestore/Storage/Secret Manager blocks data exfiltration to a project outside the boundary even if a credential leaks — the control that makes the HIPAA/BAA story credible. The biggest Zero-Trust win over a server-based design: there is no long-lived host to compromise, patch, or pivot from — compute is ephemeral and per-request.

Cost optimization. Serverless flips the model from “pay for capacity” to “pay for use,” which is exactly right for a flu-season swing. Levers, roughly in order of impact:

Scalability. Each tier scales independently and natively — both gateways and Cloud Run are managed and elastic. The governors to set deliberately: Cloud Run max-instances per service (to protect downstreams and cap spend), concurrency tuned per handler (high for I/O-bound, low only for memory-heavy/non-thread-safe ones), and Apigee spike-arrest / quota plus API Gateway quotas at the edge. The classic serverless scaling trap is a downstream that does not scale: if a service calls a fixed-size Cloud SQL instance, Cloud Run will happily open thousands of connections and melt it — which is precisely why the hot path here is on Firestore (horizontally scalable, connectionless), and any relational dependency sits behind a connection pooler with a capped max-instances.

Reliability and DR (RTO/RPO). Within a region every component is multi-zone by default (managed services), so zonal failure is invisible. For regional DR the design uses Firestore in multi-region mode (nam5/eur3) for automatic synchronous replication across regions — RPO ≈ 0 with automatic failover handled by the service, no application change — plus dual/multi-region Cloud Storage for blobs, and infrastructure-as-code redeployable into the second region in minutes. Front-door resilience comes from the Global External ALB, which is already a global anycast service routing to the nearest healthy backend; Cloud Run services are deployed in two regions behind it, and API Gateway/Apigee front them. Because Cloud Run and the gateways are deploy-from-IaC and hold no state, the “standby region” is a genuinely warm stack rather than a cold rebuild. Targets: RPO ≈ 0 (Firestore multi-region) and RTO of minutes (health-checked LB failover + already-warm managed services in region two). Idempotency — client request IDs guarded by Firestore transactions, conditional writes — makes retries and any replay safe; Eventarc dead-lettering on the notification reactor parks a poison event instead of crash-looping.

Observability. Propagate a trace/correlation ID edge-to-data so Cloud Trace stitches “gateway → Cloud Run → Firestore” into one timeline and you can see exactly where a slow request spent its milliseconds. Cloud Run emits structured JSON logs to Cloud Logging; build log-based metrics for business KPIs (bookings/min, lab results delivered/hour) and feed Cloud Monitoring dashboards; Error Reporting groups exceptions across services. Track the serverless-specific signals: cold-start rate and duration, instance-count vs max, Firestore latency and contention, and Apigee/API Gateway 4xx/5xx and quota-rejection rates. A dashboard per channel (patient app, clinician portal, partner API, internal) keeps a problem in one surface from being masked by health in the others. Define SLOs (read p99, booking-write p99, auth success rate) and alert on burn rate, not raw error counts.

Governance. A clear resource hierarchy (org → folders for environments → a project per environment, optionally per domain), with Organization Policy constraints enforced top-down: iam.disableServiceAccountKeyCreation (no exported keys), allowed-regions, and domain-restricted sharing. Apigee is itself a governance asset — the API product catalog, versioned proxy bundles, and the developer portal make the partner contract an explicitly managed artifact, and its analytics are the per-partner audit trail. Cloud Audit Logs (Admin Activity always on; Data Access enabled on Firestore/Storage given the PHI) flow to a logs bucket / BigQuery sink for retention and SIEM. Assured Workloads can pin the whole stack to a compliance regime where required. IAM is least-privilege and reviewed; cost and ownership are attributed per service via labels.

Reference enterprise example

Cedarline Health, flu-season readiness review. Baseline (summer): ~4.2 million API operations/day. Flu-season/open-enrollment peak: ~13 million/day, concentrated 8 a.m.–6 p.m. with the sharpest spike in the 8–10 a.m. booking window. Mix: ~70% patient app/web, ~18% clinician portal, ~9% partner API (lab/EHR/pharmacy), ~3% internal/back-office.

Decisions they made and why:

Cost outcome. The retired Compute Engine + MIG + self-managed-gateway tier had cost a flat ~₹7.0 lakh/month — sized for a flu-season peak that lasts a few weeks a year. The serverless platform billed ~₹2.1 lakh in a quiet month and ~₹6.4 lakh in a peak month, averaging ~₹3.4 lakh/month across the year — roughly a 50% reduction — while the peak was handled with no engineer paged for capacity and overnight hours cost almost nothing. The partner-API tier on Apigee, billed to the labs per call, turned the most expensive piece of the edge into a net contributor. And an entire class of work disappeared: no gateway cluster to run, no federation service to operate, no auth system to patch.

Where they spent the savings. Two engineers’ worth of reclaimed operational time went into the things serverless does not hand you free: the OpenAPI-and-Apigee-proxy contract discipline, the shared idempotency/observability library, the Firestore security-rules test suite, and the cross-region GameDay automation.

When to use it

Use this architecture when:

Trade-offs and anti-patterns to avoid:

Alternatives worth naming: a GKE-hosted API (with the same gateway/identity/Firestore core) when you need long-lived connections, large in-memory state, sidecar-heavy service mesh, or constant high throughput; Cloud SQL/AlloyDB in place of (or beside) Firestore when the domain is relational; Cloud Endpoints as a lighter ESPv2-based alternative to API Gateway for gRPC-heavy internal services; and Firebase as the rapid-start bundle when a small team wants Identity Platform + Firestore + Functions scaffolded end to end (Firebase is much of exactly this stack with a faster on-ramp). The front-door pattern — Identity Platform identity, Global LB + Cloud Armor edge, a Cloud Run core, Firestore data — survives every one of these swaps, which is the real reason to start here: you change the compute host or the database for a slice, not the shape of the whole platform.

GCPArchitectureEnterpriseReference Architecture
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading