A serverless API backend is the one pattern almost every enterprise reaches for eventually: you want an HTTP surface that scales from a handful of requests at 3 a.m. to a launch-day spike of tens of thousands per second, you don’t want to babysit VMs or right-size a cluster for traffic that isn’t there yet, and you want the bill to track usage rather than provisioned capacity. This article is the reference architecture I hand to teams who have outgrown a single App Service but aren’t ready to run Kubernetes — and who want a design that holds up from a 5-person startup to a regulated mid-market business with a platform team.
The four named services — Azure Functions, API Management (APIM), Cosmos DB, and Event Grid — are not an arbitrary bundle. They map cleanly onto the four hard problems of an API backend: execute logic on demand (Functions), govern the front door (APIM), store state that scales horizontally without sharding pain (Cosmos DB), and decouple the write path from everything downstream (Event Grid). Get those four right and most of the rest is wiring.
The business scenario
Picture a company — call the shape of it a B2B SaaS with a public and partner API. Customers and partner systems call your endpoints to create orders, upload documents, query account state, and subscribe to webhooks. The traffic profile is the classic serverless fit:
- Spiky and unpredictable. A single large partner running a nightly batch can push 50x your median load for twenty minutes, then go quiet. Provisioning for peak means paying for idle 23 hours a day.
- Bursty write fan-out. One
POST /ordersis never just a database insert. It needs to trigger fulfilment, update search, notify the customer, emit an analytics event, and call a partner webhook — none of which the caller should wait on. - Multi-tenant with isolation and quota requirements. Tenant A must not be able to exhaust capacity that tenant B paid for, and your enterprise customers want per-key rate limits, IP allow-lists, and an audit trail.
- Lean ops. The team is 3–8 engineers. Nobody wants to be on-call for node pressure or patch a base image at 2 a.m.
The problem this solves: deliver a governed, multi-tenant HTTP API that scales to zero when idle and to thousands of concurrent executions under load, decouples slow downstream work from the request path, and does all of it without a server to manage or a cluster to right-size. The same design serves a startup’s first paying customer and a 500-employee firm’s partner integration program — you change SKUs and tighten the network, not the shape.
A worked-cost anchor we’ll return to: roughly 8 million API calls/month, ~30% of which are writes that fan out to 4 downstream actions each, with a p99 latency budget of 300 ms for reads and an async SLA of under 5 seconds for the fan-out side effects.
Architecture overview
The end-to-end flow is two clearly separated paths: a synchronous read/command path that the caller waits on, and an asynchronous fan-out path that runs after the response is returned. Keeping these separate is the single most important decision in the whole design.
Synchronous path (request → response):
- A client (browser SPA, mobile app, or partner server) sends an HTTPS request to a custom domain fronted by Azure Front Door (for global TLS termination, WAF, and caching of cacheable GETs).
- Front Door forwards to API Management, the policy enforcement point. APIM validates the JWT (from Microsoft Entra ID for first-party callers, or a subscription key for partners), applies per-product rate limits and quotas, strips/injects headers, and routes to the backend. This is where Zero Trust starts: nothing reaches compute without passing policy.
- APIM calls the Azure Functions app over Private Link / VNet integration, so the function never has a public ingress. A Function (HTTP trigger) runs the actual logic — validation, authorization checks against tenant claims, and the business operation.
- For reads, the Function queries Cosmos DB using a partition-key-scoped point read or a single-partition query (cheap, single-digit-ms). For writes, the Function does one thing on the hot path: it writes the record to Cosmos DB, then returns
201/202to the caller. It does not call fulfilment, search, email, or partner webhooks inline. - The response flows back APIM → Front Door → client. Total budget for a read: tens of milliseconds of Cosmos plus function and network overhead, comfortably under the 300 ms p99.
Asynchronous path (the fan-out):
- The write to Cosmos DB is observed by the Cosmos DB change feed. A change-feed-triggered Function (or, for cross-service events, an explicit publish to Event Grid) turns each committed change into one or more events —
OrderCreated,DocumentUploaded, etc. - Event Grid is the routing fabric. Each event type is published to a topic; subscribers (fulfilment Function, search-indexer Function, notification Function, partner-webhook Function) receive only the event types they subscribed to, with Event Grid handling retries, dead-lettering to Blob Storage, and at-least-once delivery.
- Each subscriber Function does its slice of work independently and idempotently. If the partner webhook is down, Event Grid retries with backoff for up to 24 hours and dead-letters the rest — without touching the order that was already safely persisted and acknowledged.
The mental model: Cosmos DB is the source of truth and the synchronization point; the change feed is the trigger; Event Grid is the nervous system that fans a single committed fact out to many independent reactions. The caller’s latency is bounded by “write one document,” not by “do everything that a new order implies.”
A note on why both the change feed and Event Grid appear: the change feed gives you guaranteed, ordered-per-partition reaction to every data mutation (great for internal projections and outbox-style reliability), while Event Grid gives you fan-out to many heterogeneous, independently-scaling, independently-failing consumers including external ones. Using the change feed as the bridge that publishes to Event Grid gives you a transactional outbox: the event is derived from the committed write, so you never emit an event for a write that didn’t persist, and never lose an event for one that did.
Component breakdown
| Component | Role in this architecture | Key configuration choices |
|---|---|---|
| Azure Front Door (Standard/Premium) | Global entry, TLS, WAF, edge caching of cacheable GETs, anycast routing | Premium for managed WAF rules + Private Link origin to APIM; cache only idempotent GETs; lock APIM to accept traffic only via the Front Door X-Azure-FDID header |
| API Management | Policy enforcement point: authN/authZ, rate limit, quota, transformation, routing, the product/subscription model | Premium (VNet + multi-region) for regulated/large; Standard v2 for cost-sensitive teams that still need VNet integration. JWT validate-jwt, rate-limit-by-key, quota-by-key, IP filters per product |
| Azure Functions | On-demand compute: HTTP-trigger APIs + event-trigger workers | Flex Consumption plan for scale-to-zero with always-ready instances + VNet; .NET isolated / Node / Python; managed identity for all downstream auth; idempotent handlers |
| Cosmos DB (NoSQL API) | Source-of-truth datastore, horizontally scalable, change feed as the event origin | Partition key chosen for tenant + access pattern; autoscale RU/s; session consistency default; change feed enabled; TTL where appropriate |
| Event Grid | Event routing fabric for the async fan-out; retries, dead-letter, at-least-once | Custom topic (or namespace topic for MQTT/pull); event schema or CloudEvents 1.0; dead-letter to Blob; per-subscriber filters and retry policy |
| Supporting: Key Vault, Managed Identity, App Insights, Log Analytics, Blob (dead-letter) | Secrets, identity, telemetry, DLQ storage | All compute uses user-assigned managed identity; secrets referenced (never inlined); distributed tracing via Application Insights |
Azure Functions — the workhorse. Two function apps, separated by concern and scaling profile: one for the synchronous HTTP API and one for the async event workers. Splitting them means a flood of background work (a partner retry storm) cannot starve the HTTP app’s instances, and you can tune each independently. The Flex Consumption plan is the current sweet spot for this pattern — it gives true scale-to-zero billing, fast per-instance concurrency, configurable always-ready instances to kill cold-start on the hot path, and native VNet integration that older Consumption lacked. Handlers must be idempotent because Event Grid is at-least-once: use the event ID or a natural key plus a Cosmos conditional write to dedupe.
API Management — the governed front door. APIM is what turns “a bunch of functions” into “an API product.” The product/subscription model is the multi-tenant lever: define a Partner product with a 100 req/s rate limit and 1M/day quota, a Free product capped lower, and assign subscription keys per customer. validate-jwt enforces Entra tokens for first-party SPAs; rate-limit-by-key and quota-by-key enforce fairness so one tenant can’t exhaust shared capacity. APIM also gives you the developer portal, OpenAPI-driven definitions, and a clean seam to version (v1/v2) and to mock or revision without redeploying functions. Choose Standard v2 when you need VNet integration and elastic scale without Premium’s price; choose Premium for multi-region active-active, availability zones, and the full networking story.
Cosmos DB — state that scales sideways. The make-or-break decision is the partition key. For a multi-tenant order system, a synthetic key like tenantId works only if tenants are evenly sized; a hierarchical or composite key (e.g. tenantId + a hash bucket, or /customerId) avoids hot partitions when one tenant dwarfs the rest. Use point reads (id + partition key) wherever possible — they cost ~1 RU and are the cheapest, fastest operation. Default to session consistency (read-your-writes per session, cheaper and lower-latency than strong) and reserve strong for the rare globally-linearizable case. Autoscale RU/s handles the spiky profile: set a max, pay for the floor (10% of max) when idle, burst automatically under load. The change feed is enabled implicitly and is the reliable origin of every downstream event.
Event Grid — the decoupler. Event Grid is push-based, serverless, and priced per operation — there’s no broker to size. Each subscriber declares an event-type filter so the search indexer never sees PaymentFailed it doesn’t care about. The retry policy (exponential backoff, configurable max attempts and TTL up to 24h) plus dead-lettering to Blob Storage means a downstream outage degrades gracefully instead of dropping data or blocking writes. Use CloudEvents 1.0 schema for portability. Where you need pull delivery, MQTT, or higher-scale queuing semantics, Event Grid namespace topics add queue-style consumption; for ordered, high-throughput streaming you’d reach past Event Grid to Service Bus or Event Hubs — but for “fan one fact out to many reactors,” Event Grid is the lowest-friction fit.
Implementation guidance
Provision with IaC — Bicep or Terraform. This stack is almost entirely declarative; keep it in source control and deploy via CI/CD. A pragmatic module layout:
network.bicep— VNet, subnets (one delegated to Functions, one for private endpoints), NSGs, Private DNS zones (privatelink.documents.azure.com,privatelink.azure-api.net,privatelink.vaultcore.azure.net).data.bicep— Cosmos account (NoSQL), database, containers with partition keys and autoscalemaxThroughput, plus a private endpoint so no public connectivity.compute.bicep— two Function apps on a Flex Consumption plan, VNet integration, user-assigned managed identity, App Insights wiring.gateway.bicep— APIM (Standard v2 or Premium), products, named values referencing Key Vault, API definitions imported from OpenAPI, policies as XML fragments.events.bicep— Event Grid topic(s), subscriptions with filters and dead-letter destination, and the change-feed-to-Event-Grid bridge function’s binding.
In Terraform the equivalents are azurerm_cosmosdb_account / _sql_container, azurerm_function_app_flex_consumption, azurerm_api_management with azurerm_api_management_product / _subscription, and azurerm_eventgrid_topic / _event_subscription. Pin provider versions; both AzureRM and Bicep ship Flex Consumption and APIM v2 resources today.
Identity wiring — managed identity end to end, zero stored secrets. This is the part teams most often shortcut and most regret. Give each Function app a user-assigned managed identity and grant it:
- Cosmos DB data-plane RBAC (the
Cosmos DB Built-in Data Contributorrole assignment, not the account key). Disable key-based auth on the account once RBAC is verified. - Event Grid Data Sender to publish, and the subscriber functions receive via system topic/webhook with the function’s own auth.
- Key Vault Secrets User for any third-party API secrets (partner credentials, signing keys).
APIM uses its own managed identity to pull TLS certs and named values from Key Vault, and to validate Entra tokens. The result: no connection strings in app settings, no keys in pipeline variables, full credential rotation handled by the platform.
Networking — private by default. Functions integrate into the VNet and reach Cosmos, Key Vault, and Storage over private endpoints with Private DNS. APIM (v2 or Premium) is VNet-integrated so it calls the Functions over Private Link, and Functions are configured to reject public traffic (access restriction allowing only APIM’s subnet / Private Link). Front Door fronts APIM and APIM accepts only Front Door (X-Azure-FDID check), closing the loop so the only public surface is the global edge. This is the concrete topology behind the Zero Trust claims below.
Cosmos modeling specifics. Co-locate data that’s read together to enable single-partition queries; denormalize aggressively (NoSQL, not 3NF). Use the integrated cache (via the dedicated gateway) for hot read-heavy keys to cut RU cost. Set TTL on ephemeral containers (e.g., idempotency keys, webhook delivery logs) so storage and RU don’t grow unbounded.
Async bridge. Implement the change-feed processor as a Cosmos-trigger Function that maps each changed document to a CloudEvent and publishes to Event Grid. Carry the document’s _etag/version into the event so consumers can detect and ignore stale reprocessing. This is your transactional outbox without a separate outbox table.
Enterprise considerations
Security & Zero Trust. Every hop authenticates and is least-privilege: client→Front Door (WAF + TLS), Front Door→APIM (FDID-locked), APIM→Functions (Private Link + token/key validation), Functions→Cosmos/Event Grid/Key Vault (managed identity + RBAC, no keys). Public network access is disabled on Cosmos, Key Vault, and Storage; only Front Door is internet-facing. Enable WAF managed rules (OWASP) and bot protection at the edge, diagnostic logging to an immutable Log Analytics workspace, and Microsoft Defender for Cloud plans for APIM, Cosmos, Storage, and Key Vault. Validate JWTs at the gateway and re-check tenant/role claims inside the Function — defense in depth, never trust the network.
Cost optimization. This is where serverless earns its keep. The dominant levers:
- Functions Flex Consumption bills per-execution + GB-seconds with scale-to-zero; you pay for the always-ready instances you explicitly configure to dodge cold-start on the HTTP app, and nothing for the idle event app between bursts.
- Cosmos autoscale pays the 10% floor when idle and scales RU/s with load; combine with point reads and the integrated cache to keep RU consumption low. Reserved capacity (1- or 3-year) cuts the floor cost further once usage is predictable.
- APIM Standard v2 is dramatically cheaper than Premium and now includes VNet integration — use it unless you specifically need multi-region active-active or zone redundancy.
- Event Grid is fractions of a cent per million operations; effectively a rounding error at this scale.
For our 8M-call anchor, the spend concentrates in Cosmos RU/s and APIM, not compute — which is exactly the inversion you want versus an always-on cluster.
Scalability. Each tier scales independently and elastically: Front Door is global edge; APIM v2 scales out on load; Functions scale per-trigger to hundreds of instances; Cosmos scales RU/s via autoscale and storage transparently; Event Grid handles millions of events/sec. The bottleneck to watch is Cosmos partition design (a hot partition caps at the per-partition RU ceiling regardless of account throughput) and APIM v2 instance limits — both addressable by design, neither by accident.
Reliability & DR (RTO/RPO). For a regulated/large deployment: Cosmos DB multi-region writes (or single-write multi-read) gives an effective RPO near zero within a region and seconds across regions; failover is automatic or manual with RTO in minutes. Deploy APIM Premium multi-region and Functions to two regions behind Front Door’s health-probed routing for an active-passive (or active-active) posture; Front Door fails over automatically. Event Grid is regional with its own retry/dead-letter durability. For a cost-sensitive single-region deployment, rely on availability zones (zone-redundant Cosmos, zonal Functions/APIM v2) for an in-region RTO/RPO that satisfies most mid-market SLAs, and treat cross-region as a documented manual runbook.
Observability. Application Insights on both Function apps with distributed tracing so a single traceparent correlates client→APIM→Function→Cosmos→Event Grid→subscriber. APIM emits request analytics and can sample bodies; Cosmos surfaces RU consumption and throttling (HTTP 429) metrics — alert on 429s and on Event Grid dead-letter count, the two early-warning signals that something is mis-sized or downstream is failing. Dashboards: p99 latency per operation, RU/s vs. provisioned, function execution units, and Event Grid delivery success rate.
Governance. Enforce with Azure Policy: deny public network access on data services, require diagnostic settings, require managed identity, restrict regions and SKUs. Use management groups + landing zones so the API workload lands in a subscription with guardrails pre-applied. APIM products and the developer portal are your API governance surface — versioning, deprecation, and consumer onboarding flow through them rather than tribal knowledge.
Reference enterprise example
Northwind Freight Exchange is a fictional logistics SaaS: 140 employees, ~600 carrier and shipper customers, and a partner API that brokers freight matches. Their old monolith on three always-on App Service P2v3 instances buckled every Monday morning when carriers pulled the weekend’s loads, and a single broken partner integration once blocked order creation for everyone because fulfilment calls ran inline.
What they built (single-region, zone-redundant, cost-tier):
- Front Door Standard + WAF on
api.northwindfreight.com. - APIM Standard v2, VNet-integrated, with three products: Carrier (200 req/s, 2M/day), Shipper (50 req/s), and Internal (Entra-token, no key).
- Two Function apps on Flex Consumption —
nwf-api(HTTP) with 2 always-ready instances,nwf-workers(events) scaling from zero. - Cosmos DB NoSQL, container
loadspartitioned on/lane(origin-destination corridor — naturally high-cardinality, matches their dominant query), autoscale max 20,000 RU/s, zone-redundant, session consistency. - Event Grid topic with four subscriptions: matching engine, search indexer, carrier notification, and a partner-webhook dispatcher (dead-lettering to Blob).
The numbers. Steady-state ~8M calls/month, 31% writes. Monday peaks hit ~6,000 req/s for ~25 minutes. Cosmos autoscale floats between 2,000 RU/s overnight and ~18,000 at peak. A POST /loads now returns in 41 ms p99 (one Cosmos write), while matching, indexing, notification, and the partner webhook all run async within ~2.3 s p95 end-to-end via Event Grid.
Decisions that mattered. (1) Choosing /lane over /carrierId as the partition key avoided a hot partition from their three mega-carriers. (2) Standard v2 over Premium saved roughly ₹1.6–2.0 lakh/month with no functional loss for a single-region footprint. (3) Moving fulfilment off the request path means a partner outage now dead-letters a few hundred webhook events for later replay instead of taking down order creation — the exact failure that bit them before.
Outcome. Monthly Azure spend landed around ₹2.7 lakh (dominated by Cosmos RU/s and APIM, with Functions and Event Grid a small fraction), versus ~₹4.4 lakh for the over-provisioned always-on estate that still fell over on Mondays. Incident frequency from downstream coupling dropped to zero, and onboarding a new partner went from a deploy to issuing an APIM subscription key.
When to use it
Use this architecture when traffic is spiky or unpredictable, your write path naturally fans out to multiple downstream actions, you want to pay per use rather than per provisioned hour, you need multi-tenant API governance (quotas, keys, per-product limits), and the team is too small to want a cluster. It scales down to one paying customer and up to a regulated partner program by swapping SKUs and tightening the network — the shape doesn’t change.
Trade-offs and anti-patterns:
- Don’t fight cold-start by accident. On pure Consumption, cold-starts hurt user-facing latency. Use Flex Consumption with always-ready instances on the HTTP app, or accept that a low-traffic internal API may not need it.
- Don’t make the request path synchronous through the fan-out. If you call fulfilment, search, email, and a partner webhook inline, you’ve rebuilt the monolith’s coupling with extra network hops. The whole point is: write, ack, fan out.
- Don’t ignore the partition key. A bad Cosmos partition key (low cardinality, or a single fat tenant) creates a hot partition that no amount of account-level RU/s can fix. Design it for your dominant access pattern up front.
- Don’t expect strong global consistency for free. Default to session consistency; reach for strong only where you truly need linearizability, and pay the latency/cost.
- Don’t put non-idempotent logic behind at-least-once delivery. Event Grid can deliver twice; every subscriber must dedupe (event ID + conditional Cosmos write).
Alternatives:
- Container Apps / AKS when you need long-running processes, gRPC streaming, custom networking, or to lift-and-shift existing containers — at the cost of more ops and a floor of always-on capacity.
- Service Bus or Event Hubs instead of (or alongside) Event Grid when you need strict ordering, transactions, sessions, or high-throughput event streaming rather than discrete fan-out.
- Azure SQL / PostgreSQL instead of Cosmos when your access patterns are relational, you need multi-row ACID transactions and joins, and your scale fits a single (large) instance — you give up effortless horizontal scale and the change feed for relational power and lower per-query cost.
- Logic Apps for low-code orchestration of the fan-out when the workflow is more “integration glue” than “custom code.”
For the broad middle — a governed, multi-tenant HTTP API with bursty writes and decoupled downstream work — Functions + APIM + Cosmos DB + Event Grid remains the highest-leverage, lowest-ops serverless backend Azure offers.