A national pharmacy chain — 1,400 stores, a mobile app, a pickup kiosk in every branch, and a partner program that lets insurers check prescription status — has a problem that started small and got loud. It began as one tidy orders service. Then someone added inventory, then loyalty, then prescriptions, then a pricing service, and within two years there were nineteen backend services, each with its own URL, its own idea of authentication, its own rate limits (or none), and its own team. The mobile app now hardcodes nineteen hostnames. Every team reinvented JWT validation, and three did it wrong. When the loyalty service melted under a Black Friday promotion, it took the checkout path down with it, because nothing stood between the public internet and a service that was never built to be public. The security team cannot answer a simple regulator question — “show me every external call to the prescriptions API last Tuesday” — because there is no single place that log exists.
Every one of those pains has the same fix, and it has a name: an API gateway. This article explains, from the ground up, what a gateway is, the handful of jobs it actually does, and the question that trips up most people new to this — how it differs from the load balancer and the WAF sitting right next to it. We will use AWS API Gateway, Azure API Management (APIM), and Google Apigee as concrete reference points, because the concept is identical across all three even when the buttons are named differently.
Architecture overview
Here is the whole picture before we zoom into the parts. A request from the pharmacy’s mobile app, kiosk, or an insurer’s partner system travels through a fixed sequence of layers, and the gateway is one specific box in that chain — not the only thing in front of your services, but the one that understands your APIs. From the outside in: traffic first hits the edge (CDN + WAF) for TLS and attack filtering, then a load balancer that picks a healthy gateway instance, then the API gateway itself, which authenticates the caller, enforces rate limits, validates the request, and routes it to the correct backend — one of nineteen microservices that the client never addresses directly.
[ Mobile app ] [ Kiosk ] [ Insurer partner ]
\ | /
\ | /
+-------------------------------------+
| Edge: Akamai (CDN + WAF + bots) | TLS, cache, block attacks
+-------------------------------------+
|
+-------------------------------------+
| Load Balancer (L4/L7) | pick a healthy gateway node
+-------------------------------------+
|
+-------------------------------------------------------+
| API GATEWAY (APIM / Apigee) |
| authn (Okta/Entra JWT) -> rate limit -> |
| request validation -> routing -> logging |
+-------------------------------------------------------+
| | | | |
orders inventory loyalty pricing rx-service
(each its own service / team / language, never public directly)
The control plane wrapping this — identity from Okta and Entra ID, secrets from HashiCorp Vault, observability into Datadog/Dynatrace, config delivered as code via GitHub Actions / Jenkins / Argo CD and Terraform / Ansible, posture from Wiz and runtime defense from CrowdStrike Falcon — is described in the sections below as each piece becomes relevant. Keep this one diagram in mind: every later section is just a close-up of one box in it.
What an API gateway actually is
An API gateway is a single front door for all your APIs. Instead of clients talking to nineteen services directly, they talk to one endpoint — api.pharmacychain.com — and the gateway decides where each request really goes. It is a reverse proxy with opinions: it sits in front of your backends, inspects every request, applies a set of rules, and forwards the survivors on.
The mental model that makes everything else click: the gateway is a policy enforcement point on the request path. A request walks in the front door, and before it is allowed near a backend it must pass through a series of checkpoints — is this caller who they claim to be, are they within their rate limit, is this request even well-formed, where does it route — and only then does it reach the service. The response walks back out through the same checkpoints. That is the whole idea. Everything below is just the specific checkpoints.
Crucially, the gateway lets backend teams stop solving the same cross-cutting problems over and over. Authentication, rate limiting, and request validation get offloaded to one shared layer, so the prescriptions team writes prescription logic and nothing else. That single move is what untangles the pharmacy’s nineteen-team sprawl.
The five jobs of a gateway
Strip away the marketing and a gateway does roughly five things. Understanding these five is understanding gateways.
1. Routing
The most basic job: take an incoming request and send it to the right backend. The gateway matches on the path (and sometimes the host, method, or headers) and maps it to an upstream service.
GET /v1/orders/{id} -> orders-service.internal:8080
GET /v1/inventory/{sku} -> inventory-service.internal:8081
POST /v1/loyalty/points -> loyalty-service.internal:8082
GET /v1/prescriptions/{id} -> rx-service.internal:8443
To the mobile app there is one host. Behind the gateway, those four paths fan out to four entirely different services — possibly in different clusters, written in different languages, owned by different teams. Routing is also where you do API versioning (/v1 vs /v2 pointing at different deployments), path rewriting (strip the public /v1 prefix the backend doesn’t expect), and traffic splitting for canary releases (send 5% of /v1/pricing to the new build). This is the layer that lets you reorganize, rename, and re-platform services behind a stable public contract.
2. Authentication and authorization offload
This is the job that pays for the gateway on its own. Instead of every service validating tokens, the gateway terminates authentication once, at the edge, and passes a verified identity to the backend.
In the pharmacy’s world, a customer logs into the app via Okta (the consumer-facing identity provider), and staff and partner systems authenticate via Microsoft Entra ID (the workforce IdP). Both issue OAuth 2.0 / OIDC JWTs. The gateway’s job is to validate that token — check the signature against the IdP’s public keys, confirm it hasn’t expired, verify the audience and scopes — and reject anything that fails before it ever reaches a service.
# Conceptually, what the gateway enforces per route:
- route: /v1/prescriptions/*
auth:
type: jwt
issuer: https://pharmacychain.okta.com/oauth2/default
audience: api://prescriptions
required_scopes: [ "rx.read" ]
A few things the gateway does here that matter:
- JWT validation — signature, expiry, issuer, audience, scopes. One implementation, done right, instead of nineteen.
- Identity passthrough — after validation, the gateway injects a trusted header (e.g.
X-User-Id,X-Scopes) so the backend doesn’t re-parse the token; it just trusts the gateway sitting inside the network boundary. - API keys for partners — the insurer checking prescription status gets an API key tied to a partner plan, separate from end-user OAuth.
- mTLS for the highest-trust partner integrations, terminated at the gateway.
The secrets behind all this — the signing keys the gateway uses for its own tokens, partner client secrets, mTLS private keys — should not live in the gateway’s config. They belong in HashiCorp Vault, leased dynamically and rotated, with the gateway pulling them at startup. A leaked credential in a config file is the kind of mistake you only make once.
3. Rate limiting and throttling
The Black Friday outage was a throttling failure. With no rate limit, a promotion-driven spike on loyalty consumed shared resources and took checkout with it. A gateway enforces rate limits — “this API key gets 100 requests/second, this user tier gets 10” — and throttling (smoothing bursts), so one noisy client or one hot endpoint cannot starve everyone else.
| Control | What it limits | Example |
|---|---|---|
| Rate limit | Requests per unit time, per key/user/IP | 1,000 req/min per partner key |
| Burst / throttle | Short-term spikes above the steady rate | Allow 50 in a burst, drain at 10/s |
| Quota | Total over a long window | 1,000,000 calls/month per plan |
| Concurrency | Simultaneous in-flight requests | Max 200 concurrent to rx-service |
This is also monetization and fairness: the free tier of the partner API gets 1,000 calls/day, the paid tier gets 100,000, and the gateway counts. When a client exceeds its limit the gateway returns 429 Too Many Requests with a Retry-After header — a clean, honest answer — instead of letting the backend fall over. Rate limiting is the single feature that would have kept checkout alive during the promotion.
4. Request and response validation
The gateway can reject malformed requests at the edge so backends never waste a cycle on garbage. Give the gateway an OpenAPI schema and it will validate that POST /v1/loyalty/points actually has the required customerId and a numeric points field, with the right content type, before forwarding. Bad requests get a 400 from the gateway; the loyalty service only ever sees well-formed traffic.
Validation also covers request/response transformation — stripping internal headers from responses, converting between formats, injecting correlation IDs — and payload size limits that stop a 50 MB body from reaching a service that should never receive one. It is a cheap, centralized layer of input hygiene.
5. Observability
Because every external request passes through the gateway, it is the one place that can answer “show me every call to the prescriptions API last Tuesday.” The gateway emits structured access logs, metrics (latency, error rate, throughput per route), and traces, with a correlation ID stamped on each request and propagated to backends so a single user action can be followed across services.
In practice you ship those signals to an observability platform — Datadog or Dynatrace — which turns gateway logs and metrics into dashboards (p95 latency per API, 4xx/5xx rates, top consumers by volume) and alerts. When latency on /v1/pricing crosses a threshold, Datadog pages the on-call. And when the gateway’s auth layer blocks a flood of forged tokens, that event can auto-raise a ServiceNow incident so the security team has a ticket, not just a log line. The gateway is what finally gives the pharmacy’s compliance team the audit answer they could never produce before.
Where the gateway sits: gateway vs load balancer vs WAF
This is the question that confuses almost everyone new to the topic, because all three sit “in front of” your application and all three inspect traffic. They are not the same thing, and in a real architecture you usually have all three, in a specific order. The trick is to think about what layer each one reasons about.
| Layer | Reasons about | Primary job | Example |
|---|---|---|---|
| Load balancer | TCP/IP + connections (L4), sometimes HTTP (L7) | Spread traffic across healthy instances; keep the lights on | AWS ALB/NLB, Azure Load Balancer, GCP Cloud LB |
| WAF | HTTP request content, for attacks | Block malicious payloads (SQLi, XSS, bots) | AWS WAF, Azure WAF, Akamai, Cloudflare |
| API gateway | APIs, identity, and business policy | Auth, rate limit, route, validate, meter | AWS API Gateway, Azure APIM, Apigee |
Here is the order a request travels, front to back, in the pharmacy’s stack:
Client
-> [ Edge / CDN + WAF: Akamai ] # TLS, caching, block attacks & bots
-> [ Load Balancer ] # pick a healthy gateway instance
-> [ API Gateway: APIM/Apigee ] # authn, rate limit, validate, route
-> [ Load Balancer (internal)] # spread across service replicas
-> [ Microservice ] # the actual business logic
Walk it slowly, because each box exists for a reason:
-
Akamai (CDN + WAF) at the edge. First contact. It terminates TLS close to the user, serves cached content, and runs WAF rules that block generic web attacks — SQL injection, cross-site scripting — and bot mitigation that filters scraping and credential-stuffing traffic. The WAF asks “is this HTTP request malicious?” It does not know or care what an “order” is or who the user is. It is a content-level security filter.
-
Load balancer. Its one job is distribution and availability: take the surviving traffic and spread it across healthy instances, running health checks so a dead instance stops receiving requests. An L4 load balancer reasons about connections and IPs; it does not understand JWTs, rate plans, or
/v1/orders. It keeps capacity online. That’s it. -
API gateway. Now the request is decrypted, non-malicious, and landed on a live gateway instance — and this is where API-aware logic happens: validate the Okta/Entra JWT, check the partner’s rate limit, validate the body against the OpenAPI schema, and route
/v1/prescriptionsto the right service. The gateway asks “is this caller allowed to do this specific API operation, and where does it go?” — questions only something that understands your APIs can answer.
The clarifying one-liner: a WAF blocks bad requests, a load balancer picks a healthy server, and an API gateway enforces who can call which API and how often. They overlap a little (many gateways can do basic rate limiting that a WAF also offers; many load balancers do L7 routing a gateway also does), but their centers of gravity are distinct, and a mature setup layers them rather than picking one. You do not replace your WAF with a gateway; you put the WAF in front of it.
A subtle point worth internalizing: the gateway typically lives inside your trust boundary relative to backends but at the edge relative to clients. That position is exactly why it can validate identity once and have backends trust a simple injected header — the backends are reachable only through the gateway, never directly, so a request bearing X-User-Id could only have come from a checkpoint that already verified it.
The three reference gateways
The concept is universal; the products differ in emphasis. A quick orientation.
AWS API Gateway is the AWS-native choice, deeply integrated with the AWS world. It comes in two flavors — HTTP APIs (cheaper, faster, fewer features) and REST APIs (full feature set: request validation, API keys, usage plans). It shines when your backends are Lambda functions or other AWS services, and it pairs naturally with AWS WAF and Cognito. It is pay-per-request, which is cheap at low volume and worth watching at high volume.
Azure API Management (APIM) is the richest of the three as a full API management platform, not just a runtime. Beyond routing and policy it ships a developer portal (where the pharmacy’s insurer partners self-serve docs and keys), a powerful XML/policy engine (validate-jwt, rate-limit-by-key, transformation policies), and tight Entra ID integration. It can run in an internal VNet mode so it is reachable only privately. It is the natural fit when the estate is Azure-centric and you need partner-facing API products with subscriptions and tiers.
Google Apigee is the cloud-agnostic, enterprise API-management heavyweight, strong on API products, monetization, and analytics, and happy fronting backends that live anywhere — GCP, another cloud, or on-prem. It is the common choice for organizations that sell APIs as a product and want deep analytics and a polished developer experience across a multi-cloud or hybrid estate.
A rough guide: AWS API Gateway if you live in AWS and front Lambda; APIM if you’re Azure-centric and need a partner portal with Entra; Apigee if APIs are a product and your backends sprawl across clouds. All three do the same five jobs — they differ in how much management, monetization, and portal tooling wraps the runtime.
A note on “gateway per team” — and when not to
Once a team sees the value, the temptation is to put a gateway in front of everything, including internal service-to-service calls. Be careful. The pattern above is an edge gateway (sometimes “north-south” — traffic entering from outside). For internal service-to-service traffic (“east-west”), a full API gateway on every hop adds latency and a single point of failure; that problem is usually better served by a service mesh (Istio, Linkerd) doing mTLS, retries, and traffic policy between services. A clean rule of thumb: gateway at the edge, mesh inside. Don’t make the gateway a chokepoint for traffic that never leaves your network.
There is also the Backend-for-Frontend (BFF) wrinkle: the mobile app and the partner API often want differently shaped responses. Rather than one gateway trying to please both, many teams run a thin BFF per client type behind the gateway — but that is a refinement, not a starting point. Start with one edge gateway and earn the complexity.
Failure modes and how to think about them
A gateway is, by design, in the path of everything, so its failure is everyone’s failure. Plan for it.
- The gateway is a single point of failure. If it’s down, every API is down. Mitigation: run it as a managed, multi-AZ/region service (all three clouds do this), front it with the load balancer for instance-level resilience, and health-check aggressively.
- It adds a network hop and latency. Every request now traverses one more component — typically single-digit milliseconds, but real. Mitigation: keep gateway policies lean, cache where you can, and measure p95 added latency in Datadog as a first-class SLO.
- Misconfiguration is the big risk. A wrong route, an over-permissive auth rule, or a missing rate limit can expose or break a service. Mitigation: manage gateway config as code (Terraform for the gateway resources, and the APIM/Apigee policy bundles in Git), review changes, and deploy through a pipeline — GitHub Actions or Jenkins building and validating the config, with Argo CD syncing the declarative state to the cluster for a mesh-adjacent setup. The gateway’s posture and any public-exposure drift get watched by Wiz / Wiz Code (the latter catching insecure gateway-as-code definitions before they merge), and the hosts running self-managed gateway components carry CrowdStrike Falcon for runtime threat detection. Ansible handles configuration drift on any self-managed gateway nodes or virtual appliances.
- Throttling the wrong thing. Set limits too tight and you reject legitimate traffic; too loose and you don’t protect the backend. Mitigation: base limits on real measured capacity, and start permissive-with-alerts before enforcing.
What you actually get
Bring it back to the pharmacy. Before the gateway: nineteen hostnames in the app, nineteen homegrown auth implementations, no rate limiting, a loyalty spike that takes down checkout, and a compliance team that can’t answer a basic audit question. After a single edge API gateway — say APIM, given their Azure footprint and the insurer partner portal they need:
- The app talks to one host; services get reorganized behind it without an app release.
- Authentication is validated once at the edge against Okta and Entra; the nineteen reinventions are deleted; secrets live in Vault.
- Rate limits per partner and per user tier mean a loyalty promotion can never again starve checkout — the offender just gets
429s. - Malformed requests die at the door against the OpenAPI schema; backends see only clean traffic.
- Every external call flows through one point that logs it, so Datadog dashboards show latency and error rates per API and the compliance team gets their Tuesday audit answer in one query.
That is the whole pitch for a gateway, and it is why the question “why do I need one” answers itself the moment you have more than a couple of services facing the outside world. It does not replace your load balancer, which keeps servers healthy, or your WAF, which blocks attacks — it sits between them and your services and owns the one thing neither of those can: knowing your APIs, your callers, and your rules, and enforcing them in exactly one place. Start with a single edge gateway, manage it as code, watch its latency, and add the portal, the monetization, and the BFFs only when you’ve earned them. The pharmacy’s next nineteen services will thank you.