GCP Enterprise Architecture: Serverless API

Every serverless API on Google Cloud eventually forces two decisions that most “hello world” tutorials skip, and getting them wrong is what turns a clean weekend prototype into an 18-month rewrite. The first is the edge: do you put API Gateway in front, or Apigee, or both — and the honest answer for a growing enterprise is both, for different audiences, which only works if you understand exactly what each one is for. The second is the compute: Cloud Functions or Cloud Run? Google has quietly merged these two until the line is blurry (Cloud Functions 2nd gen literally runs on Cloud Run), but the right default for an enterprise API is not the one most people pick. This article is the reference architecture that answers both, built on the real services in Google’s serverless stack — API Gateway and Apigee at the front door, Cloud Run (with Cloud Functions where it fits) for compute, Firestore for data, and Identity Platform for identity — and assembled into something a five-person startup and a regulated enterprise can both deploy without redrawing the diagram.

The running domain is deliberately a request/response API, not an event firehose. There is a sibling pattern for event-driven, telemetry-heavy systems; this one is about the boring, universal thing almost every company needs first: a governed HTTP/JSON (and gRPC) API that backs web, mobile, and partner clients, scales to zero when nobody is using it, scales to thousands of concurrent requests on launch day, and bills per call instead of per provisioned VM. The interesting engineering is not “can serverless serve HTTP” — obviously it can — but how to give that API one identity, a layered edge, a data model that fits a document store, and a cost curve that tracks usage, without a single server to patch.

The business scenario

Cedarline Health (fictional, used throughout) builds a patient-engagement platform — appointment booking, secure messaging, lab-result delivery, and a clinician portal — sold to mid-sized clinics and hospital groups. They are 14 engineers. Their API has four distinct consumers, and that multiplicity is the whole story:

A patient mobile app (iOS/Android) and a patient web app — high request volume, consumer-grade sign-in (email/password, Google, Apple, SMS OTP), strict per-user data isolation.
A clinician web portal used by staff at customer clinics — these are enterprise identities that must federate to each clinic’s own identity provider (a hospital’s Microsoft Entra ID, an Okta tenant) via SAML/OIDC, because no hospital will let staff create yet another password.
A partner API program: lab networks, EHR vendors, and pharmacy systems that integrate machine-to-machine. These partners want a developer portal, API keys, published OpenAPI docs, usage analytics, quota tiers, and — for the labs Cedarline charges per call — monetization and billing.
An internal/back-office surface: admin tools, batch jobs, and a couple of trusted first-party services that call the same business logic without needing the full partner-program machinery.

The traffic is spiky in two different ways at once. Within a day: quiet overnight, a booking surge 8–10 a.m. as clinics open, a lab-results wave each afternoon. Across the calendar: flu season and open-enrollment windows triple the baseline for weeks. They tried a fixed Compute Engine + managed-instance-group tier and lived the usual misery — sized for the flu-season peak and idle two-thirds of every day, or sized for the median and paged at 8:05 a.m.

The mandate from the new VP of Platform was specific and is what this architecture has to satisfy:

One identity system spanning consumer patients and federated hospital staff and machine partners — not three bolted-together auth stacks. This is the requirement that eliminates most naive designs.
A real partner program — a self-service developer portal, keys, quotas, analytics, and per-call monetization for the lab integrations — without standing up and running an API-management cluster by hand.
Scale to zero overnight and absorb the flu-season ramp with no capacity meeting and no pre-warming spreadsheet.
HIPAA-grade controls — encryption, least privilege, audit trails, data-exfiltration boundaries — because this is patient data, with a signed BAA and a security team that audits.
A genuine DR story with a defined RTO/RPO, because a clinic that can’t pull a lab result is a clinical-safety and contract problem, not a “we’ll fix it Monday” problem.

This is the serverless sweet spot: variable, multi-audience, request-driven traffic where per-request economics beat steady-state utilization and the scarcest resource is the 14 engineers’ attention. And it scales down cleanly — a three-person startup with one clinic deploys the identical shape (API Gateway only, Cloud Run scaling to zero, single-region Firestore, Identity Platform on the free tier) for a few thousand rupees a month, and adds Apigee, multi-region, and VPC Service Controls when the partner program and the compliance auditor actually arrive. That down-scalability is what makes it a reference architecture rather than a big-company special case.

Architecture overview

The defining idea is a two-tier edge over a shared serverless core. The two front doors — API Gateway and Apigee — serve different audiences with different needs, but both terminate on the same identity, the same Cloud Run services, and the same Firestore data. Nothing about “which front door” leaks below the edge. Above it, each tier does only what it is good at.

The request path (patient mobile app — the high-volume consumer case):

The app authenticates the user through the Identity Platform client SDK (email/password, Google, Apple, or SMS OTP). Identity Platform returns a signed OIDC ID token / JWT carrying the user’s sub, verified email, and any custom claims (tenant/clinic ID, role).
The app calls https://api.cedarline.example/v1/... over TLS 1.3. DNS resolves through Cloud DNS to a Global External Application Load Balancer with Cloud Armor in front (WAF/OWASP rules, IP and geo rules, an adaptive-protection L7 DDoS layer, and a per-IP rate-based rule).
The load balancer routes the patient/first-party paths to API Gateway, a fully managed gateway purpose-built to front serverless backends. API Gateway validates the Identity Platform JWT against its issuer/JWKS, enforces per-key quotas and method-level config from the OpenAPI spec, and forwards to the backend.
API Gateway invokes a Cloud Run service over an authenticated call (it mints an ID token for the backend’s service account; the Cloud Run service is --no-allow-unauthenticated and only trusts the gateway’s identity). The service runs the business logic, reads/writes Firestore scoped to the caller’s verified sub/tenant, and returns JSON. For a hot, idempotent GET the load balancer/CDN can cache the response.

The request path (partner lab integration — the monetized machine case):

The lab’s server obtains an OAuth2 access token (client-credentials) and calls the Apigee endpoint (its own hostname/base path, also fronted by the global LB + Cloud Armor).
Apigee is the full API-management plane for the partner program. On the request it runs a policy pipeline: verify the API key / OAuth token, enforce the partner’s quota and spike-arrest (rate limiting), check the request against the OpenAPI contract, capture analytics, and — for the per-call-billed lab product — record the transaction for monetization. It then routes to the same Cloud Run service the patient path uses.
The Cloud Run service is identity-agnostic at this layer: it trusts a verified caller identity and a tenant claim handed to it by whichever edge terminated the request, and it serves the same Firestore-backed logic. The partner never sees, and never needs, the patient app’s front door — and vice versa.

The request path (clinician portal — the federated enterprise case):

A hospital staff member signs in to the clinician web app, which uses Identity Platform’s multi-tenancy and SAML/OIDC federation: each customer clinic is a tenant, and that tenant is configured to federate to the clinic’s own IdP (Entra ID, Okta). The staff member logs in with their hospital credentials; Identity Platform brokers the federation and issues a Cedarline JWT carrying the tenant ID and role.
From there the path is identical to the patient path — through API Gateway to Cloud Run — except the tenant claim scopes every Firestore query to that clinic’s data, and role claims gate clinician-only operations.

The data path. Firestore in Native mode is the operational source of truth: documents for patients, appointments, messages, lab results, and clinic configuration, with security rules as a second authorization layer, composite indexes for the query patterns, and transactions for idempotent writes. Large binary artifacts (lab-result PDFs, message attachments) live in Cloud Storage, referenced by object name from Firestore and served to clients via short-lived signed URLs minted by the backend. The light async touch — when a lab result is written, a patient needs a push notification — rides Firestore’s change stream via an Eventarc trigger to a tiny notification Cloud Run service, so the synchronous write path returns immediately and the notification happens out of band. (This architecture deliberately keeps the event surface small; the heavy fan-out, saga, telemetry-firehose patterns belong to the event-driven reference architecture, not here.)

The whole thing is stateless at the compute tier and regional-with-failover at the edge: every Cloud Run service is horizontally scalable and idempotent, both front doors are managed services that scale without our involvement, and the only durable state is Firestore (multi-region) and Cloud Storage (multi-region/dual-region). Drawn as a diagram it is three layers stacked: edge (Cloud DNS → Global LB + Cloud Armor → {API Gateway | Apigee}) on top; compute (a pool of Cloud Run services, fronted identically by either gateway, each running as its own least-privilege service account) in the middle; data (Firestore Native multi-region + Cloud Storage, with a thin Eventarc→Cloud Run notification side-channel) at the bottom. Identity Platform sits to the side as the single issuer every front door validates against, and Cloud Logging/Trace/Monitoring plus a VPC Service Controls perimeter wrap the whole stack.

Component breakdown

Component	GCP service	Role here	Key configuration choices
Edge / DDoS / WAF	Global External ALB + Cloud Armor	Global anycast TLS ingress, L7 filtering, host/path routing to the two gateways	Preconfigured WAF (OWASP) rules; per-IP rate-based rules; adaptive protection for L7 DDoS; one cert/one edge in front of both gateways
First-party / internal edge	API Gateway	Lightweight managed gateway for patient/clinician/internal APIs	OpenAPI-defined config; JWT validation against Identity Platform issuer/JWKS; per-key quotas; authenticated invocation of Cloud Run backends
Partner / monetized edge	Apigee	Full API-management plane: dev portal, keys, quotas, spike-arrest, analytics, monetization	Policy pipeline (VerifyAPIKey/OAuthV2, Quota, SpikeArrest, OAS validation); developer portal + API products; rate plans for per-call billing
Identity	Identity Platform	One issuer for consumers + federated staff + machines	Email/Google/Apple/SMS providers; multi-tenancy (a tenant per clinic) with SAML/OIDC federation to customer IdPs; custom claims (tenant, role) set via Admin SDK; MFA
Compute	Cloud Run (primary) + Cloud Functions (where it fits)	Stateless business logic, request-driven, scale-to-zero	Concurrency 80 for I/O-bound handlers; `min-instances` only on latency-critical services; `--no-allow-unauthenticated`; per-service service account; CPU-boost on cold start
Data	Firestore (Native mode)	Operational source of truth + per-tenant document model	Multi-region (`nam5`/`eur3`) for HA; security rules as a second authz layer; composite indexes; transactions for idempotency; TTL for ephemeral docs; PITR enabled
Blobs	Cloud Storage	Lab PDFs, attachments, exports	Referenced by object name from Firestore; short-lived signed URLs minted by the backend; CMEK; dual/multi-region buckets; lifecycle to Nearline/Coldline
Light async	Eventarc (Firestore trigger) → Cloud Run	Push notification on a new lab result, out of band	Document-write trigger on `results/{id}`; tiny single-purpose reactor; not a general fan-out bus
Secrets / config	Secret Manager	Partner credentials, third-party API keys, signing material	Reached via service-account identity, never embedded in images or config; rotation; CMEK
Observability	Cloud Logging + Trace + Monitoring + Error Reporting	Structured logs, distributed traces, SLO alerting	Trace context propagated edge→Run→Firestore; log-based business metrics; SLO burn-rate alerts; per-channel dashboards
Governance / boundary	VPC Service Controls + Org Policy	Data-exfiltration perimeter and org-wide guardrails	VPC-SC perimeter around Firestore/Storage/Secret Manager; org policies (disable SA key creation, restrict regions, domain-restricted sharing)

Four of these choices carry the design and deserve the why, because they are where this architecture diverges from a naive serverless app.

Why two front doors — API Gateway and Apigee — instead of one. The instinct is to pick one and standardize. But they sit at different points on a price/capability curve, and an enterprise API genuinely needs both points. API Gateway is a lightweight, inexpensive, fully managed gateway designed specifically to front serverless backends (Cloud Run, Functions, App Engine). It does JWT validation, API keys, quotas, and OpenAPI-driven routing — exactly what the patient, clinician, and internal surfaces need — at a fraction of Apigee’s cost and operational weight. Apigee is a full API-management platform: a developer portal, fine-grained policy pipelines, deep analytics, traffic-management primitives (spike arrest, concurrent-rate limits), API product/rate-plan modeling, and monetization — the machinery a partner program with paying lab integrations requires and that you should not hand-build. The architectural rule: API Gateway for first-party and internal APIs; Apigee for the externalized, monetized partner program. Both terminate on the same Cloud Run services, so this is two edges over one backend, not two backends. (For a team with no partner program yet, you start with API Gateway alone and add Apigee the quarter the program is funded — without touching the compute or data layers.)

Why Cloud Run is the default, and where Cloud Functions still wins. This is the decision most teams get backwards. Cloud Functions 2nd gen and Cloud Run are now built on the same substrate, but Cloud Run is the better default for an API for concrete reasons: it serves any container (so any language/runtime, any framework — Express, FastAPI, Go net/http, gRPC), supports request concurrency >1 (one instance handles 80 simultaneous requests, which slashes cost and cold-start frequency versus a function that handles one request per instance), and gives full control over the listening process. Cloud Functions earns its place for the small, single-purpose, glue pieces where you want the absolute minimum deploy unit and event wiring is the point — the Eventarc-triggered “new lab result → notify” reactor is a perfect Cloud Function (or a tiny Cloud Run service; the line is genuinely thin). The rule: Cloud Run for the API surface and anything with real business logic or non-trivial dependencies; Cloud Functions for narrow event-glue where one-function-per-concern and zero-boilerplate deploy matter more than concurrency. Defaulting your whole API to one-request-per-instance Functions is the classic cost-and-latency mistake.

Why Firestore, and what it is not good for. Firestore in Native mode is an excellent fit here because the access patterns are document- and tenant-shaped: “get this patient,” “list this patient’s appointments,” “list this clinic’s unread messages” — all single-collection, single-tenant-partition queries that return in single-digit milliseconds and scale horizontally with no sharding. It also brings two things that matter for this domain specifically: security rules (a declarative, server-enforced authorization layer that can restrict a query to documents the authenticated user owns — defense in depth behind the gateway’s coarse check), and a change stream that turns a committed write into an event via Eventarc with no outbox table. Where Firestore is the wrong tool: heavy ad-hoc relational queries, multi-entity JOINs, strong cross-entity transactional reporting, or analytics aggregations. Those are not this API’s job — but if a slice of the domain needs them, that slice sits behind the same Cloud Run/gateway/identity front door on Cloud SQL or AlloyDB instead, and the architecture absorbs it without changing shape.

Why Identity Platform rather than rolling auth or using plain Firebase Auth. The three-audience requirement (consumers + federated staff + machines) is what forces this. Identity Platform is the enterprise-grade evolution of Firebase Authentication: it adds multi-tenancy (a separate identity tenant per clinic, with isolated users and per-tenant federation config), SAML and OIDC federation to bring a hospital’s existing Entra ID/Okta in without provisioning passwords, MFA, and the SLA and support an enterprise needs — while still issuing standard OIDC JWTs that API Gateway and Apigee both validate natively. Custom claims (tenant ID, role) are stamped onto the token via the Admin SDK so the backend gets a verified tenant scope on every request rather than trusting a client-supplied ID. One issuer, validated by two front doors, covering patients, federated clinicians, and machine partners — that is the requirement that defines this architecture, and Identity Platform is the single component that satisfies it.

Implementation guidance

Provision with Terraform (the user’s house standard) in a layered state layout so blast radius is contained and teams can move independently: an edge layer (Cloud DNS, certs, the Global LB, Cloud Armor policies, the API Gateway config, the Apigee org/environment/products), an identity layer (Identity Platform config, tenants, providers, custom-claim setters), a data layer (Firestore database + indexes + security rules, Cloud Storage buckets, CMEK keys), and an app layer (Cloud Run services, service accounts, IAM bindings, the Eventarc trigger). Each layer keeps its own remote state in a GCS backend with state locking, wired together by terraform_remote_state data sources. Keep container build out of the IaC critical path: CI (Cloud Build, or GitHub Actions over Workload Identity Federation) builds and pushes images to Artifact Registry, and Terraform points Cloud Run at an immutable image digest.

Concretely:

API Gateway from OpenAPI. The gateway config is an OpenAPI 2/3 document annotated with x-google-backend (the Cloud Run target) and x-google-jwt-authn style security (the Identity Platform issuer + JWKS URI). The spec is the single source of truth: routing, JWT validation, and method config all derive from it, so the client’s generated SDK and the gateway’s enforcement can never drift. In Terraform, google_api_gateway_api → google_api_gateway_api_config (with the spec as the document body) → google_api_gateway_gateway.
Apigee as code. Model the org/environment, API proxies (the policy bundle), API products (the bundle of operations + quota a partner subscribes to), and rate plans (the monetization tiers) in Terraform / Apigee config-as-code, and publish the developer portal. Proxy policies — VerifyAPIKey or OAuthV2, Quota, SpikeArrest, OASValidation, and the analytics/monetization hooks — live in versioned proxy bundles deployed through CI, never click-ops’d in the console.
Cloud Run packaging. A slim container (distroless or minimal base), the listening process honoring $PORT, concurrency 80 for I/O-bound API handlers, CPU boost on startup to cut cold-start latency, min-instances ≥ 1 only on the latency-critical patient/clinician services, and startup/liveness probes. Each service runs as its own least-privilege service account.

Networking — the deliberate choices. Cloud Run, Firestore, Cloud Storage, and Secret Manager are all managed services reachable over Google’s network and governed by IAM, not by network reachability — so you do not put Cloud Run in a VPC merely to talk to Firestore. The default here is: Cloud Run ingress set to internal-and-cloud-load-balancing (so the only public path is through the Global LB → gateway, never the run.app URL directly), and the service trusting only the gateway’s invoker identity. A Serverless VPC Access connector is added only when a service must reach a private resource (a Cloud SQL instance over private IP, an on-prem system over Interconnect, or to send third-party egress through Cloud NAT for a stable, allow-listable IP — which the lab/SMS integrations need). Otherwise, isolation here is an IAM-and-resource-policy problem, not a subnet problem.

Identity wiring (the part that prevents the most incidents). One Identity Platform configuration, with a tenant per customer clinic so users, federation config, and (optionally) data partitions are isolated per customer. Consumer providers (email/Google/Apple/SMS) live on the default/patient tenant; each clinic tenant is wired to that clinic’s SAML or OIDC IdP. Custom claims (tenantId, role) are set server-side via the Admin SDK at user-provisioning time, so the backend reads a verified tenant scope from the token — it never trusts a client-supplied tenant or user ID. Authorization is two-tiered: a coarse check at the edge (API Gateway / Apigee rejects an unauthenticated, expired, or wrong-audience token before any compute runs) and a fine check at the data boundary (the Cloud Run handler — and Firestore security rules as defense in depth — constrain every query to request.auth.token.tenantId and the caller’s own documents, so a patient can read only their records and a clinician only their clinic’s). Machine partners authenticate via OAuth2 client-credentials through Apigee; their tokens carry a partner/product identity, not a patient one.

Enterprise considerations

Security and Zero Trust. The design is Zero-Trust by construction: every request is authenticated (Identity Platform JWT or partner OAuth/key) and authorized at the edge and re-checked at the data boundary, with no implicit trust from “being inside the project” — every Cloud Run service is --no-allow-unauthenticated and grants run.invoker only to the specific gateway/Eventarc identity, so there is no anonymous east-west call. Cloud Armor filters L7 attacks and absorbs DDoS at the edge. Every service-to-data hop is least-privilege IAM — the messaging service can read/write the messages collection and nothing else; the notification service holds only the Secret Manager accessor role for the push credential. Data is encrypted at rest with CMEK (Firestore, Cloud Storage, Secret Manager) and in transit with TLS 1.2+ everywhere. Secrets live only in Secret Manager, reached via service-account identity — no keys in code, configs, or pipeline logs (this codebase has prior history with leaked DB credentials; the failure mode is designed out, not patched). A VPC Service Controls perimeter around Firestore/Storage/Secret Manager blocks data exfiltration to a project outside the boundary even if a credential leaks — the control that makes the HIPAA/BAA story credible. The biggest Zero-Trust win over a server-based design: there is no long-lived host to compromise, patch, or pivot from — compute is ephemeral and per-request.

Cost optimization. Serverless flips the model from “pay for capacity” to “pay for use,” which is exactly right for a flu-season swing. Levers, roughly in order of impact:

Scale to zero overnight — Cloud Run, API Gateway, and the event reactors cost ~nothing idle, so the quiet 12 hours are nearly free (Firestore storage and a trickle of reads aside).
Cloud Run concurrency, not one-request-per-instance — serving 80 requests per instance is the single biggest lever; it cuts instance-hours and cold starts dramatically versus defaulting the API to Cloud Functions.
Right-size CPU/memory and min-instances — hold a warm floor only on the latency-critical patient/clinician services; let everything else scale from zero.
CDN/edge cache for idempotent GETs — caching cacheable reads at the LB cuts both Cloud Run invocations and Firestore reads.
API Gateway for the bulk, Apigee only where it pays — Apigee is materially more expensive than API Gateway; routing only the monetized partner program through it (and everything first-party through API Gateway) keeps the edge bill proportional to the value each tier provides. Apigee’s monetization, in turn, bills the labs per call, so that tier funds itself.
Firestore read discipline — model for point reads/single-partition queries, cache hot config, and avoid fan-out reads; Firestore bills per operation.
Budgets and alerts, watching Cloud Run billable instance time and Firestore operations as the two leading cost indicators.

Scalability. Each tier scales independently and natively — both gateways and Cloud Run are managed and elastic. The governors to set deliberately: Cloud Run max-instances per service (to protect downstreams and cap spend), concurrency tuned per handler (high for I/O-bound, low only for memory-heavy/non-thread-safe ones), and Apigee spike-arrest / quota plus API Gateway quotas at the edge. The classic serverless scaling trap is a downstream that does not scale: if a service calls a fixed-size Cloud SQL instance, Cloud Run will happily open thousands of connections and melt it — which is precisely why the hot path here is on Firestore (horizontally scalable, connectionless), and any relational dependency sits behind a connection pooler with a capped max-instances.

Reliability and DR (RTO/RPO). Within a region every component is multi-zone by default (managed services), so zonal failure is invisible. For regional DR the design uses Firestore in multi-region mode (nam5/eur3) for automatic synchronous replication across regions — RPO ≈ 0 with automatic failover handled by the service, no application change — plus dual/multi-region Cloud Storage for blobs, and infrastructure-as-code redeployable into the second region in minutes. Front-door resilience comes from the Global External ALB, which is already a global anycast service routing to the nearest healthy backend; Cloud Run services are deployed in two regions behind it, and API Gateway/Apigee front them. Because Cloud Run and the gateways are deploy-from-IaC and hold no state, the “standby region” is a genuinely warm stack rather than a cold rebuild. Targets: RPO ≈ 0 (Firestore multi-region) and RTO of minutes (health-checked LB failover + already-warm managed services in region two). Idempotency — client request IDs guarded by Firestore transactions, conditional writes — makes retries and any replay safe; Eventarc dead-lettering on the notification reactor parks a poison event instead of crash-looping.

Observability. Propagate a trace/correlation ID edge-to-data so Cloud Trace stitches “gateway → Cloud Run → Firestore” into one timeline and you can see exactly where a slow request spent its milliseconds. Cloud Run emits structured JSON logs to Cloud Logging; build log-based metrics for business KPIs (bookings/min, lab results delivered/hour) and feed Cloud Monitoring dashboards; Error Reporting groups exceptions across services. Track the serverless-specific signals: cold-start rate and duration, instance-count vs max, Firestore latency and contention, and Apigee/API Gateway 4xx/5xx and quota-rejection rates. A dashboard per channel (patient app, clinician portal, partner API, internal) keeps a problem in one surface from being masked by health in the others. Define SLOs (read p99, booking-write p99, auth success rate) and alert on burn rate, not raw error counts.

Governance. A clear resource hierarchy (org → folders for environments → a project per environment, optionally per domain), with Organization Policy constraints enforced top-down: iam.disableServiceAccountKeyCreation (no exported keys), allowed-regions, and domain-restricted sharing. Apigee is itself a governance asset — the API product catalog, versioned proxy bundles, and the developer portal make the partner contract an explicitly managed artifact, and its analytics are the per-partner audit trail. Cloud Audit Logs (Admin Activity always on; Data Access enabled on Firestore/Storage given the PHI) flow to a logs bucket / BigQuery sink for retention and SIEM. Assured Workloads can pin the whole stack to a compliance regime where required. IAM is least-privilege and reviewed; cost and ownership are attributed per service via labels.

Reference enterprise example

Cedarline Health, flu-season readiness review. Baseline (summer): ~4.2 million API operations/day. Flu-season/open-enrollment peak: ~13 million/day, concentrated 8 a.m.–6 p.m. with the sharpest spike in the 8–10 a.m. booking window. Mix: ~70% patient app/web, ~18% clinician portal, ~9% partner API (lab/EHR/pharmacy), ~3% internal/back-office.

Decisions they made and why:

API Gateway for first-party, Apigee only for the partner program. Patient, clinician, and internal traffic — 91% of calls — go through API Gateway, which is cheap and exactly sufficient. The 9% partner traffic goes through Apigee, which gives the labs a self-service portal, keyed quota tiers, analytics, and per-call monetization. Routing only the monetized 9% through Apigee kept the edge bill proportional, and the lab rate plans made that tier revenue-positive rather than a cost.
Cloud Run for the API, Cloud Functions for the one reactor. All API services run on Cloud Run at concurrency 80; moving off an early “everything is a Cloud Function” prototype (one request per instance) cut instance-hours roughly in half at peak and dropped p50 booking latency from ~140 ms to ~70 ms. The single Eventarc-triggered “new lab result → push notification” stayed a tiny Cloud Function because one-function-per-concern and zero-boilerplate deploy genuinely fit it.
One Identity Platform, three audiences. Consumer providers (email/Google/Apple/SMS) on the patient tenant; a tenant per hospital customer federated via SAML to that hospital’s Entra ID/Okta so staff use their existing credentials; partners on OAuth2 client-credentials through Apigee. Custom claims (tenantId, role) stamped server-side scope every Firestore query — a patient reads only their records, a clinician only their clinic’s, enforced both in the handler and in Firestore security rules. They never wrote a line of password-management or federation-brokering code.
Firestore multi-region, single document model. One Firestore Native database in nam5, collections for patients/appointments/messages/results/clinic-config, composite indexes for the list queries, transactions guarding idempotent booking writes on a client request ID. Multi-region gave RPO ≈ 0 with no app changes. Lab PDFs in dual-region Cloud Storage, served by short-lived signed URLs.
DR drill. Cloud Run deployed in asia-south1 (Mumbai) and asia-southeast1 (Singapore) behind the Global LB; Firestore multi-region spanning both. A GameDay — fail the Mumbai backends — saw the Global LB route to Singapore automatically and Firestore continue uninterrupted; no data lost (Firestore multi-region, RPO ≈ 0) and full request service restored in ~2 minutes as the LB health checks flipped. Measured RTO ≈ 2 min, RPO ≈ 0.

Cost outcome. The retired Compute Engine + MIG + self-managed-gateway tier had cost a flat ~₹7.0 lakh/month — sized for a flu-season peak that lasts a few weeks a year. The serverless platform billed ~₹2.1 lakh in a quiet month and ~₹6.4 lakh in a peak month, averaging ~₹3.4 lakh/month across the year — roughly a 50% reduction — while the peak was handled with no engineer paged for capacity and overnight hours cost almost nothing. The partner-API tier on Apigee, billed to the labs per call, turned the most expensive piece of the edge into a net contributor. And an entire class of work disappeared: no gateway cluster to run, no federation service to operate, no auth system to patch.

Where they spent the savings. Two engineers’ worth of reclaimed operational time went into the things serverless does not hand you free: the OpenAPI-and-Apigee-proxy contract discipline, the shared idempotency/observability library, the Firestore security-rules test suite, and the cross-region GameDay automation.

When to use it

Use this architecture when:

Traffic is variable, spiky, or seasonal, and per-request economics beat steady-state utilization — the flu-season case.
You serve multiple distinct audiences — consumers, federated enterprise/staff users, and machine partners — and need one identity system across all of them. This is the requirement that most clearly points here.
You want or will want a partner API program (portal, keys, quotas, analytics, monetization) without running an API-management cluster by hand — Apigee over the same backend.
The data model fits document / per-tenant access patterns that Firestore serves natively (most CRUD, messaging, booking, and content workloads do).
The team is small relative to the surface area and operational attention is the binding constraint — managed services trade money for not running servers, gateways, or auth.
You need a genuinely warm multi-region DR story with RPO ≈ 0 (Firestore multi-region) without paying for active VMs around the clock.

Trade-offs and anti-patterns to avoid:

Defaulting the whole API to one-request-per-instance Cloud Functions. The most common cost/latency mistake on GCP serverless. Use Cloud Run with concurrency for the API surface; keep Cloud Functions for narrow event-glue.
Forcing everything through Apigee (or refusing to adopt it). Apigee on first-party traffic burns money for capability you don’t need there; no Apigee leaves a real partner program hand-built and fragile. Use API Gateway for first-party, Apigee for the monetized partner edge — two edges, one backend.
Putting Cloud Run in a VPC by reflex. Adds a connector and Cloud NAT bill for zero benefit when you’re only talking to Firestore/Storage/Secret Manager — those are IAM-secured. Add a connector only to reach a genuinely private resource.
Forcing a relational, JOIN-heavy, reporting-heavy domain onto Firestore. If a slice is genuinely relational, put it on Cloud SQL/AlloyDB behind the same gateway/Cloud Run/identity front door (with a connection pooler and capped concurrency), rather than fighting the document model.
Trusting client-supplied tenant/user IDs. Always scope queries to the verified sub/tenantId claim, and back it with Firestore security rules — never the body of the request.
Ignoring cold starts on a latency-critical synchronous path. Budget min-instances + CPU boost for the patient/clinician hot paths; don’t discover the cold-start tail in production.
Very high, flat, predictable volume billed per-request 24/7 at full tilt. At extreme constant scale, GKE / Cloud Run committed-use behind the same Firestore/Identity-Platform core can be cheaper — measure the crossover rather than assuming serverless is always cheapest.

Alternatives worth naming: a GKE-hosted API (with the same gateway/identity/Firestore core) when you need long-lived connections, large in-memory state, sidecar-heavy service mesh, or constant high throughput; Cloud SQL/AlloyDB in place of (or beside) Firestore when the domain is relational; Cloud Endpoints as a lighter ESPv2-based alternative to API Gateway for gRPC-heavy internal services; and Firebase as the rapid-start bundle when a small team wants Identity Platform + Firestore + Functions scaffolded end to end (Firebase is much of exactly this stack with a faster on-ramp). The front-door pattern — Identity Platform identity, Global LB + Cloud Armor edge, a Cloud Run core, Firestore data — survives every one of these swaps, which is the real reason to start here: you change the compute host or the database for a slice, not the shape of the whole platform.

GCP Enterprise Architecture: Serverless API

The business scenario

Architecture overview

Component breakdown

Implementation guidance

Enterprise considerations

Reference enterprise example

When to use it

Written by Vinod

Comments

Keep Reading

The AWS Architecting Ladder: From a Static Site to Multi-Region Active-Active

The Azure Architecting Ladder: From a Simple Web App to Mission-Critical

Azure Architecture Case Studies: Real Proposal Walkthroughs (Easy → Complex)