AWS Enterprise Architecture: Serverless REST/GraphQL API

A REST endpoint and a GraphQL endpoint look identical to the business — both return JSON, both sit behind a domain name, both need auth. But the two protocols pull an architecture in opposite directions: REST is request/response and resource-shaped, GraphQL is schema-shaped and loves to fan out into N child fetches per query. The interesting engineering question for an enterprise serverless API is not “which one” — mature platforms ship both — but how to put them on the same identity, the same data, and the same operational substrate without building two parallel stacks. This article is that blueprint, built entirely on managed AWS services: Amazon API Gateway and AWS AppSync at the front door, AWS Lambda for business logic, Amazon DynamoDB for data, and Amazon Cognito for identity.

The business scenario

Northwind Lockers (fictional, used throughout) runs smart parcel lockers for apartment complexes and offices. They started with a single REST API that a courier mobile app called to drop and pick up parcels. Three years in, the surface area has exploded:

The courier mobile app wants chatty, low-bandwidth calls — give me exactly the 4 fields I need to render a delivery card, nothing more. That is a GraphQL workload.
The building-manager web dashboard wants live updates — when a locker door opens, the dashboard tile should flip to “occupied” within a second, with no polling. That is GraphQL subscriptions.
A partner billing integration (the property-management SaaS that pays Northwind) wants a boring, versioned, OpenAPI-documented REST contract it can generate a client from. That is classic REST.
An IoT ingestion path fires events when a locker’s sensor changes state — high volume, fire-and-forget, no human waiting on the response.

The team is 9 engineers. They have no appetite to run Kubernetes, patch EC2 fleets, or babysit a self-managed GraphQL server at 2 a.m. Traffic is spiky and seasonal: quiet overnight, a delivery surge 11 a.m.–2 p.m., and a brutal December peak that is 8x the July baseline. They tried a fixed EC2 + ALB tier and spent the year either over-provisioned (paying for December in March) or falling over (paying for March in December).

The mandate from the new VP of Engineering is specific:

One identity across every channel — courier, building manager, partner, machine. No second auth system.
Both REST and GraphQL, because forcing the partner onto GraphQL or the mobile team onto REST would each waste a quarter.
Scale to zero overnight and absorb the December peak without a capacity meeting.
A real DR story — a region can fail and parcels keep moving, because a locker that won’t open at 9 p.m. is a support call and a churned building.

This is the sweet spot for serverless: variable, event-driven, multi-protocol traffic where the per-request cost matters more than steady-state utilization, and where the team’s scarcest resource is operational attention.

Architecture overview

The end-to-end picture is a two-front-door, shared-core design. Two managed API layers — REST via API Gateway, GraphQL via AppSync — terminate the public edge, validate identity against a single Cognito user pool, and then converge on a shared pool of Lambda functions and a shared DynamoDB data layer. Nothing about “REST vs GraphQL” leaks below the front door.

AWS serverless API reference architecture: Route 53/CloudFront/WAF edge and one Cognito identity front two managed doors — API Gateway (REST) and AppSync (GraphQL) — over a shared Lambda compute and single-table DynamoDB core, with an IoT/EventBridge event path and a DynamoDB Streams to AppSync real-time subscription fan-out

The request path (REST, partner billing call):

The partner’s generated client calls https://api.northwind.example/v1/invoices/{id} over TLS 1.3. DNS resolves through Amazon Route 53 to a CloudFront distribution; an AWS WAF web ACL on CloudFront strips the obvious garbage (SQLi/XSS signatures, IP reputation lists, a rate-based rule).
CloudFront forwards to a regional API Gateway REST API. A Cognito authorizer on the route validates the partner’s JWT (machine-to-machine, OAuth2 client-credentials grant from a Cognito app client). API Gateway also enforces request validation against the JSON Schema in the OpenAPI definition and applies a usage plan (API key + throttle + quota) so one partner cannot exhaust the account.
API Gateway proxy-integrates to a Lambda function (invoices-api). The function reads from DynamoDB and returns JSON. Idempotent reads are cached in API Gateway’s response cache for hot invoice IDs.

The request path (GraphQL, courier mobile query + subscription):

The mobile app calls the AppSync GraphQL endpoint (also fronted by Route 53 + custom domain). AppSync authorizes the user’s Cognito ID token directly — no separate Lambda authorizer needed for the common case.
The GraphQL query resolves field-by-field. A getDelivery query hits a DynamoDB resolver via a VTL or JavaScript (APPSYNC_JS) resolver with no Lambda in the path — AppSync talks to DynamoDB directly, which is the cheapest and lowest-latency option. A delivery.signature field that needs an S3 pre-signed URL is resolved by a Lambda data source (a pipeline resolver chains the DynamoDB read and the Lambda step).
When a courier marks a parcel delivered (a markDelivered mutation), AppSync writes to DynamoDB and publishes a subscription event. Every building-manager dashboard subscribed to that locker receives the update over a managed WebSocket — AppSync owns the connection fan-out, so no one runs a socket server.

The event path (IoT sensor ingestion):

Locker sensors publish state changes to AWS IoT Core, which routes via an IoT rule onto an Amazon EventBridge bus (or directly to an SQS queue). A Lambda consumer (sensor-projector) writes the new door state into the same DynamoDB table.
That same write triggers a DynamoDB Stream. A Lambda (stream-fanout) reads the stream and calls an AppSync mutation as the system, which causes the live subscription push to dashboards. This is the trick that makes a backend-originated change appear instantly on a client subscription: write → stream → AppSync mutation → subscription fan-out.

The data layer is a single DynamoDB table in single-table design, with Global Secondary Indexes for the access patterns, DynamoDB Streams feeding the real-time fan-out, point-in-time recovery (PITR) on, and a global table replica in a second region for DR. Cold/large artifacts (parcel photos, signature images) live in S3, referenced from DynamoDB by key.

The whole thing is regional and stateless at the compute tier — every Lambda is horizontally scalable and idempotent, every front door is a managed service that scales without our involvement, and the only durable state is in DynamoDB (multi-region) and S3 (cross-region replicated).

Component breakdown

Component	Service	Role	Key configuration choices
Edge / CDN / DDoS	CloudFront + AWS WAF + Shield Standard	TLS termination, caching, L7 filtering	WAF managed rule groups (Core, Known-Bad-Inputs, IP reputation) + a rate-based rule; CloudFront in front of both API Gateway and AppSync for one edge and one WAF
REST front door	API Gateway (REST, regional)	REST contract, request validation, throttle/quota, response cache	Cognito authorizer; request validators from OpenAPI JSON Schema; usage plans per partner; per-method throttling; access logs to CloudWatch
GraphQL front door	AWS AppSync	Schema, resolvers, managed subscriptions	Direct DynamoDB resolvers where possible (no Lambda); pipeline resolvers for multi-step; APPSYNC_JS resolvers; enhanced subscription filtering; caching tier for hot queries
Identity	Amazon Cognito user pool	One identity for humans + machines	Groups → roles (courier, manager, partner, admin); app clients per channel; client-credentials for M2M; advanced security (compromised-credential + adaptive MFA); token validity tuned (short access tokens, refresh rotation)
Compute	AWS Lambda	Business logic, integrations, stream/event consumers	ARM64 (Graviton) for ~20% better price/perf; Lambda SnapStart or Provisioned/Reserved Concurrency on latency-critical functions; tight per-function IAM; Powertools for logging/tracing/metrics; idempotency layer
Data	Amazon DynamoDB	Primary store, single-table	On-demand capacity (matches spiky/seasonal load); single table + GSIs; Streams enabled; PITR on; TTL for ephemeral items; global table for DR
Real-time fan-out	DynamoDB Streams + Lambda → AppSync mutation	Backend changes pushed to subscribed clients	Stream batches → idempotent projector → AppSync mutation with IAM auth as the system principal
Async / events	EventBridge + SQS + IoT Core	Decoupled ingestion, retries, buffering	SQS as a shock absorber in front of Lambda; DLQs everywhere; EventBridge for routing and future fan-out; partial batch response on SQS/streams
Blobs	Amazon S3	Photos, signatures, exports	Referenced by key from DynamoDB; pre-signed URLs minted by Lambda/AppSync; SSE-KMS; cross-region replication for DR; lifecycle to Intelligent-Tiering
Secrets / config	Secrets Manager + SSM Parameter Store	Partner credentials, feature flags	Rotation on secrets; Parameter Store for non-secret config; fetched via Lambda extension/cache, never baked into images
Observability	CloudWatch + X-Ray + Powertools	Logs, metrics, traces, alarms	Structured JSON logs; X-Ray traces spanning API GW/AppSync → Lambda → DynamoDB; CloudWatch dashboards + composite alarms; embedded metrics for business KPIs

A few choices deserve the “why,” because they are where this architecture differs from a naive serverless app.

Why AppSync resolves straight to DynamoDB, not “AppSync → Lambda → DynamoDB.” The reflex is to route every GraphQL field through a Lambda. For simple CRUD that adds latency, cost, and a cold-start surface for no benefit. AppSync’s native DynamoDB resolver (now writable in JavaScript via APPSYNC_JS, not just VTL) handles get/query/put/update directly. Lambda earns its place only when a field needs logic AppSync can’t express cleanly — calling a third party, minting a pre-signed URL, complex authorization. The rule: Lambda is a data source you reach for, not the default path.

Why a single DynamoDB table. Both REST and GraphQL serve the same nouns (deliveries, lockers, invoices, users). Modeling each as its own table would force the GraphQL resolvers and the REST Lambdas to join across tables in application code — slow and bug-prone. A single-table design with a deliberate partition/sort-key scheme and a handful of GSIs lets one item collection answer “get this delivery,” “list a courier’s deliveries today,” and “list a locker’s history” with single-digit-millisecond queries and no joins.

Why on-demand capacity, not provisioned + autoscaling. Northwind’s load is the textbook on-demand case: spiky within the day, seasonal across the year, with an 8x December peak. Provisioned capacity with autoscaling lags sudden spikes (the scaling alarm fires after throttling starts) and you pay for headroom. On-demand absorbs the spike instantly and bills per request. If a workload later becomes large and predictable, provisioned-with-autoscaling (or a reserved-capacity commit) becomes cheaper — but that is an optimization to earn with data, not a starting assumption.

Why Cognito is the single identity even for machines. The partner integration is machine-to-machine, which tempts teams to bolt on a separate API-key or homegrown JWT scheme. Cognito app clients support the OAuth2 client-credentials grant, so the partner gets a real OAuth token from the same issuer humans use. One JWKS endpoint, one set of authorizers, one audit trail. API Gateway and AppSync both natively validate Cognito tokens, so “one identity, two front doors” is literally one user pool referenced twice.

Implementation guidance

Provision with Terraform (the user’s house standard) using a layered state layout so blast radius is contained: a network-edge layer (Route 53 zones, ACM certs, CloudFront, WAF), an identity layer (Cognito pool, app clients, groups, IAM roles), a data layer (DynamoDB table, GSIs, streams, S3 buckets, global-table replica), and an app layer (Lambdas, API Gateway, AppSync, EventBridge, SQS) — each with its own remote state in S3 + DynamoDB lock table, wired together with terraform_remote_state data sources or SSM parameters. The serverless functions themselves are best authored with the AWS Serverless Application Model (SAM) or Serverless Framework and consumed by Terraform, or kept fully in Terraform if the team prefers one tool. Whichever — keep the function handler code out of the IaC repo’s critical path: build artifacts in CI, publish a versioned Lambda, and let IaC point at the version/alias.

Concretely:

API Gateway from OpenAPI. Author the REST contract as an OpenAPI 3 document and import it (body = file("openapi.yaml") in aws_api_gateway_rest_api, or the v2 equivalent). The spec is the source of truth: request validators, models, and the Cognito authorizer all come from x-amazon-apigateway-* extensions, so the partner’s generated client and the gateway’s enforcement can never drift.
AppSync schema + resolvers as code. Keep schema.graphql and each resolver (.js for APPSYNC_JS, or .vtl) in the repo; reference them from aws_appsync_resolver. Direct DynamoDB resolvers need only the table as a data source; pipeline resolvers list their functions in order.
Lambda packaging. ARM64 architecture, a slim runtime (Node 20 / Python 3.13), dependencies in a Lambda layer shared across functions, and Lambda Powertools wired for structured logging, X-Ray tracing, custom metrics, and idempotency. Latency-critical functions get SnapStart (Java) or provisioned/reserved concurrency; everything else runs on-demand.

Networking — and the deliberate choice to stay out of the VPC. This is a point teams get wrong. Lambda, DynamoDB, S3, AppSync, and API Gateway are all “VPC-optional” managed services that reach each other over the AWS network without a VPC. Putting Lambda in a VPC just to talk to DynamoDB adds ENI cold-start cost and a NAT bill for no security benefit — DynamoDB access is governed by IAM, not network reachability. So the default here is no VPC: functions reach DynamoDB/S3/AppSync over their service endpoints, secured by IAM. A Lambda is attached to a VPC only if it must reach something private — an RDS instance, an internal service, a partner over a private link. When that happens, the function gets a VPC config with VPC (Gateway/Interface) endpoints for DynamoDB and S3 so traffic never traverses a NAT, and the NAT is reserved for genuine internet egress. Network isolation here is an IAM-and-resource-policy problem, not a subnet problem.

Identity wiring. One Cognito user pool. Groups (courier, manager, partner, admin) map to IAM roles via the pool’s identity-pool/role mapping for any AWS-resource access, and to scopes/claims for app-level authorization. Separate app clients per channel (mobile, web, partner-M2M) so you can set different token lifetimes, revoke one channel without touching others, and enable the client-credentials grant only on the partner client. Fine-grained authorization lives in two tiers: coarse checks at the front door (the Cognito authorizer rejects an unauthenticated or wrong-audience token before any compute runs), and fine checks in resolvers/Lambdas (a courier can only read their own deliveries — enforced by constraining the DynamoDB query to the caller’s partition, derived from the verified sub claim, never from a client-supplied user ID). For complex policies, AppSync + Amazon Verified Permissions (Cedar) externalizes authorization as policy rather than scattered if statements.

Enterprise considerations

Security and Zero Trust. The architecture is Zero-Trust by construction: every request is authenticated (Cognito JWT) and authorized at the edge and re-checked at the data boundary, with no implicit trust from “being inside the network” — because there largely is no network perimeter to be inside. WAF on CloudFront filters L7 attacks; AWS Shield Standard absorbs common DDoS for free (Shield Advanced if the contractual SLA demands it). Every service-to-service hop is least-privilege IAM — each Lambda’s execution role grants only the specific table actions and item-level conditions it needs (dynamodb:LeadingKeys conditions to scope a function to a tenant’s partition). Data is encrypted with KMS (DynamoDB, S3, Secrets Manager) and in transit with TLS 1.2+ everywhere. Partner secrets rotate in Secrets Manager. The single biggest Zero-Trust win over a server-based design: there is no long-lived host to compromise, patch, or pivot from — compute is ephemeral and per-request.

Cost optimization. Serverless flips the cost model from “pay for capacity” to “pay for use,” which is exactly right for an 8x seasonal swing. Levers, roughly in order of impact:

Scale to zero overnight — Lambda, API Gateway, and AppSync cost ~nothing when idle, so the quiet 12 hours are nearly free (DynamoDB on-demand storage and a little stream activity aside).
ARM64/Graviton Lambda — ~20% better price/performance for a config change.
Right-size memory — Lambda CPU scales with memory; use AWS Lambda Power Tuning to find the cost/latency sweet spot rather than defaulting to 128 MB or 1024 MB.
AppSync direct resolvers — skipping Lambda for simple GraphQL fields removes both invocation cost and duration cost on the hottest paths.
DynamoDB on-demand now, reserved/provisioned later — start on-demand for unpredictability; revisit with usage data and buy reserved capacity only for the proven steady-state floor.
API Gateway response cache + AppSync caching for hot, idempotent reads cuts both Lambda invocations and DynamoDB RCUs.
Watch the gateway, not just the compute — at high volume, API Gateway per-request charges and CloudFront egress can exceed Lambda cost; an HTTP API (cheaper than REST API) is worth considering when you don’t need REST-API-only features like usage plans or request validation.

Scalability. Each tier scales independently and natively. The real governors to set deliberately: Lambda reserved concurrency to protect downstream systems (and a per-account concurrency budget so one runaway function can’t starve the others), DynamoDB on-demand’s automatic scaling (with adaptive capacity smoothing hot partitions — which a good key design avoids in the first place), and AppSync/API Gateway throttles. The classic serverless scaling trap is a downstream that does not scale — if a Lambda calls a fixed-size relational database, Lambda will happily open 10,000 connections and melt it. Northwind avoids this by keeping the hot path on DynamoDB; any relational dependency would sit behind RDS Proxy to pool connections.

Reliability and DR (RTO/RPO). Within a region, every component is multi-AZ by default (managed services), so single-AZ failure is invisible. For regional DR the design uses DynamoDB global tables (active-active, multi-region, typically sub-second replication → RPO seconds), S3 cross-region replication for blobs, and infrastructure-as-code redeployable into the second region in minutes. Front-door failover is a Route 53 health-checked failover (or latency) routing policy flipping traffic to the standby region’s CloudFront/API Gateway/AppSync. Targets: RPO of seconds (global-table replication lag) and RTO of minutes (DNS failover + already-warm managed services in region 2). Because Lambda/API Gateway/AppSync are deploy-from-IaC and hold no state, “standby region” can be a genuinely warm stack rather than a cold rebuild. Idempotency (Powertools idempotency keys, conditional DynamoDB writes) makes retries and dual-region replays safe. DLQs on every async consumer mean a poison message parks instead of blocking the queue, and partial batch responses on SQS/stream sources mean one bad record doesn’t fail a whole batch.

Observability. Structured JSON logs from every Lambda (Powertools), X-Ray distributed tracing stitching API Gateway/AppSync → Lambda → DynamoDB into one trace so you can see exactly where a slow request spent its milliseconds, CloudWatch metrics (including embedded-metric-format business KPIs like “deliveries completed per minute”), and composite alarms that page only on genuine, correlated problems (error-rate and latency, not a single noisy metric). Track the serverless-specific signals: cold-start rate and duration, concurrency utilization vs. limit, DynamoDB throttles, async DLQ depth, and AppSync subscription connection counts. A dashboard per channel (REST partner, GraphQL mobile, dashboard subscriptions, IoT ingest) keeps a problem in one channel from being masked by health in the others.

Governance. Multi-account via AWS Organizations / Control Tower (separate dev/stage/prod accounts), Service Control Policies to enforce guardrails (deny public S3, require encryption, pin allowed regions), tagging standards for cost allocation per channel and per environment, AWS Config rules for drift and compliance, and CloudTrail for an immutable audit log. Cognito’s audit events and API Gateway/AppSync access logs give a per-request, per-identity trail end to end.

Reference enterprise example

Northwind Lockers, December peak readiness review. Baseline (July): ~3.5 million API/GraphQL operations/day. December peak: ~28 million/day, concentrated 11 a.m.–8 p.m., with a hard spike on the three days before Christmas. Roughly 60% of operations are GraphQL (mobile couriers), 30% REST (partner billing + dashboard CRUD), 10% IoT sensor ingest.

Decisions they made and why:

GraphQL hot path went Lambda-free. The two highest-volume operations — listMyDeliveries and the markDelivered mutation — were originally AppSync → Lambda → DynamoDB. Moving them to direct DynamoDB resolvers (APPSYNC_JS) cut p50 latency from ~90 ms to ~22 ms and removed ~14 million Lambda invocations/day at peak. Lambda stayed only on the ~6 fields that genuinely needed it (pre-signed signature URLs, partner-facing enrichment).
DynamoDB single table, on-demand. One table, partition key PK / sort key SK, three GSIs (by courier+date, by locker, by partner-invoice-period). On-demand handled the 8x ramp with zero capacity meetings and zero throttling; they confirmed from CloudWatch that adaptive capacity never needed to kick in because no single locker or courier created a hot partition.
Subscriptions for the dashboard, fed by streams. Building managers’ dashboards subscribe per building. When a sensor reports a door event → IoT Core → Lambda projector → DynamoDB write → DynamoDB Stream → stream-fanout Lambda → AppSync mutation → subscription push. End-to-end “door opens to tile flips” measured ~700 ms at p95 during peak, with zero polling load.
One Cognito pool, four app clients. Mobile (auth-code + PKCE, 1-hour access token, refresh rotation), web dashboard (same), partner (client-credentials grant, scoped to invoices:read invoices:write), admin (auth-code + mandatory MFA via Cognito advanced security). The partner generated its REST client from Northwind’s published OpenAPI doc; Northwind never wrote partner-specific auth code.
DR: global table + Route 53 failover. Primary ap-south-1 (Mumbai), standby ap-southeast-1 (Singapore). DynamoDB global table keeps both regions current; IaC deploys the full app stack in both. A GameDay drill — kill the Mumbai front door — failed DNS over to Singapore in ~3 minutes with no parcel transactions lost (the few in-flight writes replicated within ~1 second and idempotency keys made the client retries safe). Measured RTO ≈ 3 min, RPO ≈ 1 sec.

Cost outcome. The retired EC2 + ALB + self-managed-GraphQL tier had cost a flat ~$4,100/month — sized for a peak that occurred a few days a year. The serverless platform billed ~$1,250 in a quiet month and ~$6,800 in the December peak month, averaging ~$2,300/month across the year — a ~44% reduction — while the December peak was handled with no engineer paged for capacity and the overnight hours cost almost nothing. The team also deleted an entire class of work: no GraphQL servers to patch, no socket fleet to scale, no auth service to run.

Where they spent the savings. Two engineers’ worth of reclaimed operational time went into the things serverless doesn’t give you for free: a shared idempotency/observability Lambda layer, the OpenAPI-and-schema contract discipline, and the cross-region GameDay automation.

When to use it

Use this architecture when:

Traffic is variable, spiky, or seasonal, and per-request economics beat steady-state utilization — the December-peak case.
You need both REST and GraphQL (or expect to), and want one identity and one data layer under them.
The team is small relative to the surface area and operational attention is the binding constraint — managed services trade money for not running servers, sockets, or auth.
The data model fits key-value / item-collection access patterns that DynamoDB serves natively (most CRUD and event-projection workloads do).
You want a genuinely warm multi-region DR story without running active infrastructure you pay for around the clock.

Trade-offs and anti-patterns to avoid:

Routing every GraphQL field through Lambda. The single most common cost/latency mistake. Use AppSync’s direct DynamoDB (and EventBridge, HTTP, RDS-Data) resolvers; reach for Lambda only when a field needs real logic.
Putting Lambda in a VPC by reflex. Adds ENI cold-start latency and a NAT bill for zero security benefit when you’re only talking to DynamoDB/S3/AppSync — those are IAM-secured. Stay out of the VPC unless you must reach a private resource.
Forcing a relational, join-heavy, transaction-heavy domain onto DynamoDB. If your access patterns are genuinely ad-hoc relational, single-table DynamoDB becomes a fight. Use Aurora Serverless v2 behind the same Lambda/API/AppSync front door instead (with RDS Proxy so Lambda concurrency doesn’t exhaust connections), or split the relational slice out.
Ignoring cold starts on a latency-critical synchronous path. For a sub-100 ms p99 requirement on a Java/large function, budget provisioned concurrency or SnapStart; don’t discover it in production.
Letting Lambda scale into a downstream that can’t. Cap with reserved concurrency and pool connections; unbounded Lambda concurrency is a foot-gun against fixed-size dependencies.
Very high, flat, predictable volume where you’re paying API Gateway/Lambda per-request 24/7 at full tilt. At extreme constant scale a container platform (ECS/EKS Fargate, or even reserved EC2) behind the same DynamoDB/Cognito core can be cheaper — measure the crossover rather than assuming serverless is always cheapest.

Alternatives worth naming: a container-based API (ALB + ECS/EKS Fargate) when you need long-lived connections, large in-memory state, non-HTTP protocols, or constant high throughput; Aurora Serverless v2 when the domain is relational; AWS Amplify when a small team wants AppSync + Cognito + DynamoDB scaffolded end to end (Amplify generates much of exactly this stack); and AWS Step Functions layered in when the business logic is a long, multi-step, stateful workflow rather than a request/response API. The front-door pattern — Cognito identity, CloudFront/WAF edge, DynamoDB core — survives most of these swaps, which is the real reason to start here.

AWS Enterprise Architecture: Serverless REST/GraphQL API

The business scenario

Architecture overview

Component breakdown

Implementation guidance

Enterprise considerations

Reference enterprise example

When to use it

Written by Vinod

Comments

Keep Reading

The AWS Architecting Ladder: From a Static Site to Multi-Region Active-Active

The Azure Architecting Ladder: From a Simple Web App to Mission-Critical

Azure Architecture Case Studies: Real Proposal Walkthroughs (Easy → Complex)