Architecture Multi-cloud

API Gateways Explained: Why You Need One

A national pharmacy chain — 1,400 stores, a mobile app, a pickup kiosk in every branch, and a partner program that lets insurers check prescription status — has a problem that started small and got loud. It began as one tidy orders service. Then someone added inventory, then loyalty, then prescriptions, then a pricing service, and within two years there were nineteen backend services, each with its own URL, its own idea of authentication, its own rate limits (or none), and its own team. The mobile app now hardcodes nineteen hostnames. Every team reinvented JWT validation, and three did it wrong. When the loyalty service melted under a Black Friday promotion, it took the checkout path down with it, because nothing stood between the public internet and a service that was never built to be public. The security team cannot answer a simple regulator question — “show me every external call to the prescriptions API last Tuesday” — because there is no single place that log exists.

Every one of those pains has the same fix, and it has a name: an API gateway. This article explains, from the ground up, what a gateway is, the handful of jobs it actually does, and the question that trips up most people new to this — how it differs from the load balancer and the WAF sitting right next to it. We will use AWS API Gateway, Azure API Management (APIM), and Google Apigee as concrete reference points, because the concept is identical across all three even when the buttons are named differently.

Architecture overview

API Gateways Explained: Why You Need One — architecture

Here is the whole picture before we zoom into the parts. A request from the pharmacy’s mobile app, kiosk, or an insurer’s partner system travels through a fixed sequence of layers, and the gateway is one specific box in that chain — not the only thing in front of your services, but the one that understands your APIs. From the outside in: traffic first hits the edge (CDN + WAF) for TLS and attack filtering, then a load balancer that picks a healthy gateway instance, then the API gateway itself, which authenticates the caller, enforces rate limits, validates the request, and routes it to the correct backend — one of nineteen microservices that the client never addresses directly.

        [ Mobile app ]   [ Kiosk ]   [ Insurer partner ]
                 \           |            /
                  \          |           /
            +-------------------------------------+
            |   Edge: Akamai (CDN + WAF + bots)   |   TLS, cache, block attacks
            +-------------------------------------+
                            |
            +-------------------------------------+
            |        Load Balancer (L4/L7)        |   pick a healthy gateway node
            +-------------------------------------+
                            |
   +-------------------------------------------------------+
   |              API GATEWAY  (APIM / Apigee)             |
   |   authn (Okta/Entra JWT)  ->  rate limit  ->          |
   |   request validation  ->  routing  ->  logging        |
   +-------------------------------------------------------+
        |          |            |            |         |
     orders    inventory    loyalty     pricing     rx-service
   (each its own service / team / language, never public directly)

The control plane wrapping this — identity from Okta and Entra ID, secrets from HashiCorp Vault, observability into Datadog/Dynatrace, config delivered as code via GitHub Actions / Jenkins / Argo CD and Terraform / Ansible, posture from Wiz and runtime defense from CrowdStrike Falcon — is described in the sections below as each piece becomes relevant. Keep this one diagram in mind: every later section is just a close-up of one box in it.

What an API gateway actually is

An API gateway is a single front door for all your APIs. Instead of clients talking to nineteen services directly, they talk to one endpoint — api.pharmacychain.com — and the gateway decides where each request really goes. It is a reverse proxy with opinions: it sits in front of your backends, inspects every request, applies a set of rules, and forwards the survivors on.

The mental model that makes everything else click: the gateway is a policy enforcement point on the request path. A request walks in the front door, and before it is allowed near a backend it must pass through a series of checkpoints — is this caller who they claim to be, are they within their rate limit, is this request even well-formed, where does it route — and only then does it reach the service. The response walks back out through the same checkpoints. That is the whole idea. Everything below is just the specific checkpoints.

Crucially, the gateway lets backend teams stop solving the same cross-cutting problems over and over. Authentication, rate limiting, and request validation get offloaded to one shared layer, so the prescriptions team writes prescription logic and nothing else. That single move is what untangles the pharmacy’s nineteen-team sprawl.

The five jobs of a gateway

Strip away the marketing and a gateway does roughly five things. Understanding these five is understanding gateways.

1. Routing

The most basic job: take an incoming request and send it to the right backend. The gateway matches on the path (and sometimes the host, method, or headers) and maps it to an upstream service.

GET  /v1/orders/{id}        ->  orders-service.internal:8080
GET  /v1/inventory/{sku}    ->  inventory-service.internal:8081
POST /v1/loyalty/points     ->  loyalty-service.internal:8082
GET  /v1/prescriptions/{id} ->  rx-service.internal:8443

To the mobile app there is one host. Behind the gateway, those four paths fan out to four entirely different services — possibly in different clusters, written in different languages, owned by different teams. Routing is also where you do API versioning (/v1 vs /v2 pointing at different deployments), path rewriting (strip the public /v1 prefix the backend doesn’t expect), and traffic splitting for canary releases (send 5% of /v1/pricing to the new build). This is the layer that lets you reorganize, rename, and re-platform services behind a stable public contract.

2. Authentication and authorization offload

This is the job that pays for the gateway on its own. Instead of every service validating tokens, the gateway terminates authentication once, at the edge, and passes a verified identity to the backend.

In the pharmacy’s world, a customer logs into the app via Okta (the consumer-facing identity provider), and staff and partner systems authenticate via Microsoft Entra ID (the workforce IdP). Both issue OAuth 2.0 / OIDC JWTs. The gateway’s job is to validate that token — check the signature against the IdP’s public keys, confirm it hasn’t expired, verify the audience and scopes — and reject anything that fails before it ever reaches a service.

# Conceptually, what the gateway enforces per route:
- route: /v1/prescriptions/*
  auth:
    type: jwt
    issuer: https://pharmacychain.okta.com/oauth2/default
    audience: api://prescriptions
    required_scopes: [ "rx.read" ]

A few things the gateway does here that matter:

The secrets behind all this — the signing keys the gateway uses for its own tokens, partner client secrets, mTLS private keys — should not live in the gateway’s config. They belong in HashiCorp Vault, leased dynamically and rotated, with the gateway pulling them at startup. A leaked credential in a config file is the kind of mistake you only make once.

3. Rate limiting and throttling

The Black Friday outage was a throttling failure. With no rate limit, a promotion-driven spike on loyalty consumed shared resources and took checkout with it. A gateway enforces rate limits — “this API key gets 100 requests/second, this user tier gets 10” — and throttling (smoothing bursts), so one noisy client or one hot endpoint cannot starve everyone else.

Control What it limits Example
Rate limit Requests per unit time, per key/user/IP 1,000 req/min per partner key
Burst / throttle Short-term spikes above the steady rate Allow 50 in a burst, drain at 10/s
Quota Total over a long window 1,000,000 calls/month per plan
Concurrency Simultaneous in-flight requests Max 200 concurrent to rx-service

This is also monetization and fairness: the free tier of the partner API gets 1,000 calls/day, the paid tier gets 100,000, and the gateway counts. When a client exceeds its limit the gateway returns 429 Too Many Requests with a Retry-After header — a clean, honest answer — instead of letting the backend fall over. Rate limiting is the single feature that would have kept checkout alive during the promotion.

4. Request and response validation

The gateway can reject malformed requests at the edge so backends never waste a cycle on garbage. Give the gateway an OpenAPI schema and it will validate that POST /v1/loyalty/points actually has the required customerId and a numeric points field, with the right content type, before forwarding. Bad requests get a 400 from the gateway; the loyalty service only ever sees well-formed traffic.

Validation also covers request/response transformation — stripping internal headers from responses, converting between formats, injecting correlation IDs — and payload size limits that stop a 50 MB body from reaching a service that should never receive one. It is a cheap, centralized layer of input hygiene.

5. Observability

Because every external request passes through the gateway, it is the one place that can answer “show me every call to the prescriptions API last Tuesday.” The gateway emits structured access logs, metrics (latency, error rate, throughput per route), and traces, with a correlation ID stamped on each request and propagated to backends so a single user action can be followed across services.

In practice you ship those signals to an observability platform — Datadog or Dynatrace — which turns gateway logs and metrics into dashboards (p95 latency per API, 4xx/5xx rates, top consumers by volume) and alerts. When latency on /v1/pricing crosses a threshold, Datadog pages the on-call. And when the gateway’s auth layer blocks a flood of forged tokens, that event can auto-raise a ServiceNow incident so the security team has a ticket, not just a log line. The gateway is what finally gives the pharmacy’s compliance team the audit answer they could never produce before.

Where the gateway sits: gateway vs load balancer vs WAF

This is the question that confuses almost everyone new to the topic, because all three sit “in front of” your application and all three inspect traffic. They are not the same thing, and in a real architecture you usually have all three, in a specific order. The trick is to think about what layer each one reasons about.

Layer Reasons about Primary job Example
Load balancer TCP/IP + connections (L4), sometimes HTTP (L7) Spread traffic across healthy instances; keep the lights on AWS ALB/NLB, Azure Load Balancer, GCP Cloud LB
WAF HTTP request content, for attacks Block malicious payloads (SQLi, XSS, bots) AWS WAF, Azure WAF, Akamai, Cloudflare
API gateway APIs, identity, and business policy Auth, rate limit, route, validate, meter AWS API Gateway, Azure APIM, Apigee

Here is the order a request travels, front to back, in the pharmacy’s stack:

Client
  -> [ Edge / CDN + WAF: Akamai ]      # TLS, caching, block attacks & bots
    -> [ Load Balancer ]               # pick a healthy gateway instance
      -> [ API Gateway: APIM/Apigee ]  # authn, rate limit, validate, route
        -> [ Load Balancer (internal)] # spread across service replicas
          -> [ Microservice ]          # the actual business logic

Walk it slowly, because each box exists for a reason:

The clarifying one-liner: a WAF blocks bad requests, a load balancer picks a healthy server, and an API gateway enforces who can call which API and how often. They overlap a little (many gateways can do basic rate limiting that a WAF also offers; many load balancers do L7 routing a gateway also does), but their centers of gravity are distinct, and a mature setup layers them rather than picking one. You do not replace your WAF with a gateway; you put the WAF in front of it.

A subtle point worth internalizing: the gateway typically lives inside your trust boundary relative to backends but at the edge relative to clients. That position is exactly why it can validate identity once and have backends trust a simple injected header — the backends are reachable only through the gateway, never directly, so a request bearing X-User-Id could only have come from a checkpoint that already verified it.

The three reference gateways

The concept is universal; the products differ in emphasis. A quick orientation.

AWS API Gateway is the AWS-native choice, deeply integrated with the AWS world. It comes in two flavors — HTTP APIs (cheaper, faster, fewer features) and REST APIs (full feature set: request validation, API keys, usage plans). It shines when your backends are Lambda functions or other AWS services, and it pairs naturally with AWS WAF and Cognito. It is pay-per-request, which is cheap at low volume and worth watching at high volume.

Azure API Management (APIM) is the richest of the three as a full API management platform, not just a runtime. Beyond routing and policy it ships a developer portal (where the pharmacy’s insurer partners self-serve docs and keys), a powerful XML/policy engine (validate-jwt, rate-limit-by-key, transformation policies), and tight Entra ID integration. It can run in an internal VNet mode so it is reachable only privately. It is the natural fit when the estate is Azure-centric and you need partner-facing API products with subscriptions and tiers.

Google Apigee is the cloud-agnostic, enterprise API-management heavyweight, strong on API products, monetization, and analytics, and happy fronting backends that live anywhere — GCP, another cloud, or on-prem. It is the common choice for organizations that sell APIs as a product and want deep analytics and a polished developer experience across a multi-cloud or hybrid estate.

A rough guide: AWS API Gateway if you live in AWS and front Lambda; APIM if you’re Azure-centric and need a partner portal with Entra; Apigee if APIs are a product and your backends sprawl across clouds. All three do the same five jobs — they differ in how much management, monetization, and portal tooling wraps the runtime.

A note on “gateway per team” — and when not to

Once a team sees the value, the temptation is to put a gateway in front of everything, including internal service-to-service calls. Be careful. The pattern above is an edge gateway (sometimes “north-south” — traffic entering from outside). For internal service-to-service traffic (“east-west”), a full API gateway on every hop adds latency and a single point of failure; that problem is usually better served by a service mesh (Istio, Linkerd) doing mTLS, retries, and traffic policy between services. A clean rule of thumb: gateway at the edge, mesh inside. Don’t make the gateway a chokepoint for traffic that never leaves your network.

There is also the Backend-for-Frontend (BFF) wrinkle: the mobile app and the partner API often want differently shaped responses. Rather than one gateway trying to please both, many teams run a thin BFF per client type behind the gateway — but that is a refinement, not a starting point. Start with one edge gateway and earn the complexity.

Failure modes and how to think about them

A gateway is, by design, in the path of everything, so its failure is everyone’s failure. Plan for it.

What you actually get

Bring it back to the pharmacy. Before the gateway: nineteen hostnames in the app, nineteen homegrown auth implementations, no rate limiting, a loyalty spike that takes down checkout, and a compliance team that can’t answer a basic audit question. After a single edge API gateway — say APIM, given their Azure footprint and the insurer partner portal they need:

That is the whole pitch for a gateway, and it is why the question “why do I need one” answers itself the moment you have more than a couple of services facing the outside world. It does not replace your load balancer, which keeps servers healthy, or your WAF, which blocks attacks — it sits between them and your services and owns the one thing neither of those can: knowing your APIs, your callers, and your rules, and enforcing them in exactly one place. Start with a single edge gateway, manage it as code, watch its latency, and add the portal, the monetization, and the BFFs only when you’ve earned them. The pharmacy’s next nineteen services will thank you.

API GatewayAPIMApigeeMicroservicesArchitectureFundamentals
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading