AWS Networking

AWS Load Balancers and API Gateway: ALB, NLB and API Gateway Compared

Quick take: ALB routes HTTP/HTTPS with host, path and header rules at layer 7. NLB forwards raw TCP and UDP packets fast at layer 4, with static IPs and source-IP preservation. API Gateway is a managed API front door that adds authorization, throttling, caching and a developer portal. Pick the OSI layer and the feature set first; the service falls out of that choice. Forcing one front door to do another’s job is the single most common — and most expensive — mistake in this corner of AWS.

A platform team building a multiplayer gaming backend put everything behind one Application Load Balancer because “it worked for the website.” Voice chat, which rides UDP for low latency, performed terribly — ALB speaks only HTTP/HTTPS, so the team had bolted UDP onto a sidecar that round-tripped through the wrong layer. The WebSocket player-state channel technically worked but was awkward to route and kept getting cut at the idle timeout. The partner leaderboard REST API had no rate limiting, so one misbehaving integrator could saturate the backend. None of these were code bugs. They were layer bugs — the wrong front door chosen for the protocol and the feature. The fix was to stop asking one service to be all three: NLB for UDP voice with static IPs, ALB for HTTP game services with path-based routing, and API Gateway for partner-facing REST with usage plans and throttling. The moment the team split traffic by layer, the firefighting stopped.

This article is the decision guide and the failure playbook for those three entry points — Application Load Balancer (ALB), Network Load Balancer (NLB) and Amazon API Gateway — plus the legacy Classic Load Balancer (CLB) and the L7 cousin Gateway Load Balancer (GWLB) so you know what to avoid and what’s adjacent. You will learn the mental model that makes the choice obvious (which OSI layer does the work, and what each service can and cannot do there), the option-by-option differences (TLS termination, routing, health checks, source-IP behaviour, latency, limits), how to read the error and quota tables you will actually hit in production, and a structured symptom → root cause → how to confirm (exact CLI/console path) → fix playbook for the failures each one throws. Every configuration gets both an aws CLI snippet and a Terraform snippet. Because this is a reference you will return to mid-incident, the differences, the limits and the playbook are all laid out as scannable tables — read the prose once, keep the tables open at 02:14.

By the end you will stop guessing. When a new workload arrives you will name the layer in ten seconds, pick the right entry point, and know its limits and its failure modes before they bite. When an existing one throws a 502, a 429 or a silent connection reset, you will localise it to the exact hop and fix it — not by switching services in a panic, but because you understand what each front door is for.

What problem this solves

Different traffic needs different entry points, and AWS gives you several because no single one is right for every protocol, latency target and feature need. Choosing the wrong one does not usually fail loudly on day one — it fails expensively over time: latency you can’t explain, features you have to rebuild by hand, throttling you can’t apply, costs that balloon at scale, and connections that drop for reasons buried three layers down.

What breaks without this knowledge: a team puts a high-throughput TCP service (a database proxy, a game server, an MQTT broker) behind an ALB and pays an HTTP-parsing tax plus per-LCU costs it doesn’t need, when an NLB would move the same packets at a fraction of the latency and cost. Or it fronts a partner REST API with a bare ALB and has no throttling, no API keys, no usage plans — so the first noisy integrator takes down the backend for everyone. Or it reaches for API Gateway for a chatty internal microservice that does a million low-value calls a second, and the per-request pricing turns a ₹5,000 bill into a ₹2,00,000 one. Each is the wrong layer, not a bug.

Who hits this: nearly every team running anything public-facing on AWS — web apps, APIs, containers, game servers, IoT ingestion, partner integrations. It bites hardest on teams that learned one tool (usually ALB, because it’s the web default) and reach for it reflexively, and on teams scaling a service whose traffic profile has outgrown the front door they started with. The fix is almost never “switch everything” — it’s “match each traffic class to the layer that serves it, and know that layer’s limits.”

To frame the whole field before the deep dive, here is every entry point this article covers, the OSI layer it works at, what it is fundamentally for, and the one thing that most often sends people to the wrong one:

Entry point OSI layer Fundamentally for Killer feature Most common misuse
Application Load Balancer (ALB) L7 (HTTP/HTTPS) Routing web traffic by content (host/path/header) Rich L7 routing + WAF + native container/Lambda targets Used for raw TCP/UDP or extreme throughput it can’t serve
Network Load Balancer (NLB) L4 (TCP/UDP/TLS) Moving packets fast with static IPs Millions of flows, ~ms latency, static EIPs, source-IP preserved Used where you actually needed L7 routing or WAF
API Gateway (REST/HTTP/WS) L7 (managed API) Publishing managed APIs with auth/throttle/cache Full API lifecycle: keys, usage plans, authorizers, caching Used for chatty internal traffic where per-request cost explodes
Classic Load Balancer (CLB) L4/L7 (legacy) Nothing new — backwards compatibility only (none worth choosing) Chosen for new builds; it should never be
Gateway Load Balancer (GWLB) L3 (transparent) Inserting inline appliances (firewalls/IDS) Transparent traffic steering to a fleet of virtual appliances Confused with NLB; it’s for inspection, not app traffic

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should understand AWS networking basics: a VPC, subnets (public vs private), security groups, Availability Zones, and that Elastic Load Balancing (ELB) is the umbrella service that includes ALB, NLB, GWLB and the legacy CLB. You should know what a target group is conceptually (the pool of backends a load balancer forwards to), what a listener is (the port/protocol the front door accepts on), and the difference between layer 4 (TCP/UDP packets, no awareness of HTTP) and layer 7 (HTTP requests, headers, paths). Familiarity with running aws CLI and reading JSON, plus basic Terraform, will let you run every snippet here.

This sits in the Networking & traffic-management track. It assumes the VPC fundamentals from AWS VPC, Subnets and Security Groups Explained (security groups and subnet placement decide whether your targets are even reachable) and the AZ model from AWS Regions and Availability Zones: Resiliency from the Ground Up (cross-zone load balancing and per-AZ static IPs only make sense once you understand zones). It is upstream of the compute choices in AWS Compute: EC2, Lambda, ECS and EKS — Which One to Choose? and the container path in AWS ECS vs EKS vs Fargate: Choose Your Container Path, because what you put behind the load balancer shapes which load balancer you pick. If your backend is event-driven, AWS Lambda Patterns: Event-Driven Functions That Scale to Zero pairs with API Gateway’s Lambda integration.

A quick map of who owns what during an incident, so you page the right person:

Layer What lives here Who usually owns it Failure classes it can cause
Client / DNS TLS, name resolution, retries Frontend / SRE Misrouting, cert errors; often red herrings
CloudFront / edge (optional) CDN, edge TLS, WAF Network / platform 502/504 at edge, cache surprises
ALB (L7) Host/path routing, target health Platform / app 502/503 (unhealthy targets), 504 (slow backend)
NLB (L4) TCP/UDP forwarding, static IPs Network team Idle-timeout resets, source-IP surprises
API Gateway (managed) Auth, throttle, cache, mapping API / platform 429 (throttle), 403 (authorizer), 413 (payload)
Target group / backend EC2/ECS/EKS/Lambda, health checks App team 5xx from the app itself, failed health checks

Core concepts

Five mental models make every later decision obvious.

The layer does the work, and the layer constrains the features. An L7 front door (ALB, API Gateway) understands HTTP — it can read the path, host and headers, route on them, terminate TLS, inject headers, and have AWS WAF inspect the request. An L4 front door (NLB) sees only TCP/UDP segments — it forwards packets blind to a target, which makes it faster and cheaper but means it cannot route by URL, cannot run WAF, and cannot do HTTP things. The first question for any workload is therefore “does the entry point need to read HTTP?” If yes, you’re at L7 (ALB or API Gateway). If you just need to move TCP/UDP fast, you’re at L4 (NLB).

ALB and API Gateway are both L7, but they solve different problems. ALB is a load balancer: it spreads HTTP traffic across targets with content-based rules and is the natural front for web apps and container fleets. API Gateway is an API-management product: it adds the things a load balancer doesn’t — API keys, usage plans and throttling, authorizers (Cognito/IAM/Lambda/JWT), request/response mapping, caching, a developer portal, and per-stage versioning. If you’re publishing an API to customers or partners and need to govern it, that’s API Gateway. If you’re spreading web traffic across servers, that’s ALB. Many architectures use both — API Gateway out front for governance, an ALB behind it for the fleet.

A listener accepts; a target group receives; a health check decides. Every ELB load balancer has one or more listeners (e.g. HTTPS on 443) that match incoming traffic to a target group — the pool of backends (EC2 instances, IP addresses, Lambda functions, or for NLB an ALB). The load balancer continuously runs a health check against each target; only healthy targets receive traffic. A huge fraction of “the load balancer is broken” incidents are simply the health check is failing — wrong path, wrong port, wrong success matcher, or a security group blocking the probe — so the LB correctly refuses to send traffic and returns 503. Knowing the health check is the arbiter localises most failures instantly.

Source IP, TLS and timeouts behave differently at each layer — and that’s where surprises live. At L7 the load balancer terminates the TCP connection and opens a new one to the target, so the target sees the load balancer’s IP, and you recover the real client IP from the X-Forwarded-For header. At L4, NLB preserves the client source IP by default (the target sees the real client), which is great for allow-lists and logging but means your security groups must allow the client CIDRs, not the LB. TLS can be terminated at the front door (decrypt there, plaintext or re-encrypt to the backend) or passed through (NLB TCP passthrough hands the encrypted bytes straight to the target). And each has its own idle timeout — ALB’s is configurable (default 60 s), NLB’s TCP idle timeout is a fixed 350 seconds, and long-lived connections that go quiet get reset, which is the classic NLB gRPC/DB-connection mystery.

Managed means limits and per-request economics. API Gateway is fully managed — no capacity to provision — but that convenience comes with account-level throttling (a default 10,000 requests/second with a 5,000 burst per Region), payload size limits (10 MB for REST, 6 MB for a synchronous Lambda integration), an integration timeout (29 seconds maximum), and per-million-request pricing that is wonderful at low/medium volume and brutal at extreme volume. ALB and NLB price on capacity units (LCU/NLCU) instead, which is far cheaper for steady high-throughput traffic but gives you none of API Gateway’s governance. The economics flip depending on traffic shape, and choosing the wrong one for your shape is a real money mistake.

The vocabulary in one table

Before the deep sections, pin down every moving part:

Concept One-line definition Where it lives Why it matters to the choice
Listener Port + protocol the LB accepts on On the load balancer Decides which protocols the front door speaks
Target group Pool of backends the LB forwards to Per LB (ALB/NLB) The thing health checks and routing point at
Health check Probe deciding if a target is healthy Per target group Failing it → 503; arbiter of most outages
L7 (application) HTTP-aware (path/host/headers) ALB, API Gateway Enables content routing, WAF, header injection
L4 (network) TCP/UDP packet forwarding NLB Fast, cheap, no HTTP awareness
TLS termination Decrypt at the front door ALB / NLB(TLS) / API GW Where certs live; vs passthrough
Source-IP preservation Target sees real client IP NLB (default) Drives SG rules and allow-lists
X-Forwarded-For Header carrying the real client IP ALB/API GW How L7 backends recover client IP
LCU / NLCU Capacity-unit billing metric ALB / NLB Cost model for the load balancers
Usage plan Throttle + quota + API-key tier API Gateway How you govern callers; ALB has no equivalent
Authorizer Auth check before the request runs API Gateway Cognito/IAM/Lambda/JWT gate
Stage A deployed snapshot of an API API Gateway Versioning (dev/test/prod) of the API
Cross-zone LB Spread across AZs evenly ALB (on) / NLB (opt-in) Even distribution vs per-AZ data cost

ALB, NLB and API Gateway, head to head

This is the table you came for. The full side-by-side across every dimension that decides the choice. Read your requirement down the left, read which service satisfies it across the columns.

Dimension Application Load Balancer (ALB) Network Load Balancer (NLB) API Gateway
OSI layer L7 (HTTP/HTTPS, gRPC) L4 (TCP/UDP/TLS) L7 (managed API)
Protocols HTTP, HTTPS, HTTP/2, gRPC, WebSocket TCP, UDP, TCP_UDP, TLS HTTPS (REST/HTTP), WSS (WebSocket)
Routing Host, path, header, query, method, source-IP rules None (flow hash to target) Resource/method, stage variables, mappings
TLS Terminate (+ optional re-encrypt) Terminate (TLS listener) or passthrough (TCP) Terminate (managed certs / ACM)
WAF Yes No Yes (REST APIs)
Source client IP In X-Forwarded-For (LB IP at TCP) Preserved by default In X-Forwarded-For
Static IP No (use NLB or Global Accelerator in front) Yes — one EIP per AZ N/A (managed endpoint)
Targets EC2, IP, Lambda, ECS/EKS EC2, IP, ALB (as target) Lambda, HTTP(S), AWS services, VPC link
Latency added Single-digit ms ~sub-ms (very low) Tens of ms (managed overhead)
Sticky sessions Yes (duration/app cookie) Yes (source-IP based) N/A (stateless)
Auth built in No (use OIDC/Cognito action on rules) No Yes (Cognito/IAM/Lambda/JWT)
Throttling / quotas No No Yes (usage plans, per-method)
Caching No No Yes (per stage)
Idle timeout Configurable (default 60 s) Fixed 350 s (TCP) 29 s integration max
Pricing model Per hour + LCU Per hour + NLCU Per million requests (+ cache/data)
Best for Web apps, microservices, containers TCP/UDP, gaming, IoT, high throughput Customer/partner managed APIs
Avoid for Non-HTTP, extreme packet rates Anything needing L7 routing/WAF Chatty high-volume internal calls

The reverse lookup — start from the requirement, land on the service:

If your requirement is… Choose Why
Route by URL path / hostname ALB Only L7 LB with content routing
Raw TCP or UDP forwarding NLB Only one that speaks L4 / UDP
A fixed, static IP to allow-list NLB One EIP per AZ; ALB has none
Lowest possible latency NLB Sub-ms vs ALB’s single-digit ms
Millions of concurrent connections NLB Scales to tens of millions of flows
WAF inspection on requests ALB or API Gateway WAF associates with L7 only
API keys, usage plans, throttling API Gateway The only one with governance
Per-caller rate limiting API Gateway Usage plans; LBs can’t
Response caching at the edge of the API API Gateway Per-stage cache
WebSocket player/chat channel ALB or API Gateway (WS) Both speak WebSocket
gRPC service ALB Native gRPC target support
Preserve the real client IP cheaply NLB Default source-IP preservation
Front a Lambda with full request control API Gateway Mappings, auth, throttling
Spread HTTP across a container fleet ALB Native ECS/EKS integration
Insert a firewall/IDS appliance inline GWLB Transparent appliance steering

TLS is where each front door behaves subtly differently — terminate, re-encrypt, or pass the encrypted bytes straight through. The full matrix:

TLS need ALB NLB API Gateway
Terminate TLS at the front door Yes (HTTPS listener, ACM) Yes (TLS listener, ACM) Yes (managed / ACM)
Re-encrypt to the backend Yes (HTTPS target group) N/A (forwards plaintext after terminate) Via HTTPS integration
End-to-end passthrough (no decrypt at LB) No Yes (TCP listener) No
Where the cert lives ACM on the listener ACM on the TLS listener ACM / API GW custom domain
mTLS (client cert auth) Yes (mutual TLS on listener) No (passthrough to backend does it) Yes (mutual TLS on custom domain)
SNI / multiple certs Yes (up to 25) Yes (TLS listener) Per custom domain
Min TLS policy control ssl_policy (TLS 1.2/1.3) ssl_policy on TLS listener Security policy (TLS 1.2)

And the reverse question every architecture review asks — which front door can front which compute target:

Backend target ALB NLB API Gateway
EC2 instances Yes (instance/ip) Yes (instance/ip) Via HTTP_PROXY / VPC link
ECS / Fargate Yes (ip target) Yes (ip target) Via VPC link
EKS pods Yes (ip/ALB controller) Yes (ip/NLB controller) Via VPC link
Lambda Yes (lambda target) No (not directly) Yes (Lambda proxy)
Another ALB No Yes (ALB as target) No
AWS service (DynamoDB/SQS) No No Yes (service integration)
On-prem / external HTTP Via ip + Direct Connect Via ip Yes (HTTP_PROXY)

Deep dive — Application Load Balancer (L7)

ALB is the HTTP workhorse. It terminates the client TCP/TLS connection, parses the HTTP request, evaluates listener rules in priority order, and forwards to the matching target group. Because it understands HTTP, it can do everything content-based: route /api/* to one fleet and /static/* to another, send shop.example.com and admin.example.com to different targets on the same listener, weight traffic for blue/green, and redirect or return fixed responses without ever touching a backend.

Create one with the CLI — a load balancer, a target group, a health check, and an HTTPS listener:

# 1) Create the ALB across two public subnets, attach a security group
aws elbv2 create-load-balancer \
  --name alb-shop-prod --type application \
  --subnets subnet-aaa subnet-bbb --security-groups sg-alb \
  --scheme internet-facing

# 2) Create a target group with an HTTP health check on /healthz
aws elbv2 create-target-group \
  --name tg-shop-web --protocol HTTP --port 8080 --vpc-id vpc-123 \
  --health-check-path /healthz --health-check-protocol HTTP \
  --matcher HttpCode=200 --healthy-threshold-count 3 --unhealthy-threshold-count 3

# 3) HTTPS listener with an ACM cert, default action forwards to the TG
aws elbv2 create-listener \
  --load-balancer-arn <alb-arn> --protocol HTTPS --port 443 \
  --certificates CertificateArn=<acm-arn> --ssl-policy ELBSecurityPolicy-TLS13-1-2-2021-06 \
  --default-actions Type=forward,TargetGroupArn=<tg-arn>

The same in Terraform — declarative, reviewable, the production default:

resource "aws_lb" "shop" {
  name               = "alb-shop-prod"
  load_balancer_type = "application"
  subnets            = [aws_subnet.public_a.id, aws_subnet.public_b.id]
  security_groups    = [aws_security_group.alb.id]
}

resource "aws_lb_target_group" "web" {
  name        = "tg-shop-web"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = aws_vpc.main.id
  target_type = "ip"
  health_check {
    path                = "/healthz"
    matcher             = "200"
    healthy_threshold   = 3
    unhealthy_threshold = 3
    interval            = 15
  }
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.shop.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = aws_acm_certificate.shop.arn
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.web.arn
  }
}

ALB routing rules — what you can match on

Listener rules are evaluated by priority (lowest number first); the first match wins, and a default action catches everything else. Each rule’s condition can combine multiple match types. The full menu:

Condition type Matches on Example Common use
path-pattern URL path /api/* Split API vs static vs UI
host-header Host header admin.example.com Multi-tenant / subdomain routing
http-header Any request header X-Channel: mobile Channel / client-type routing
http-request-method HTTP verb POST Send writes to a different fleet
query-string Query key/value version=beta Feature/beta routing
source-ip Client CIDR 203.0.113.0/24 Partner/office-only paths

And the actions a rule can take — not every action is “forward to a target”:

Action What it does Needs a target? Use
forward Send to one or more target groups (weighted) Yes Normal routing; blue/green weights
redirect 301/302 to another URL/scheme No HTTP→HTTPS; domain moves
fixed-response Return a status + body directly No Maintenance page; block path
authenticate-oidc Run an OIDC auth flow first Yes (then forward) Gate an app behind SSO
authenticate-cognito Cognito user-pool auth first Yes (then forward) Login wall via Cognito

A redirect-everything-to-HTTPS listener, which every public ALB should have on port 80:

aws elbv2 create-listener --load-balancer-arn <alb-arn> \
  --protocol HTTP --port 80 \
  --default-actions '[{"Type":"redirect","RedirectConfig":{"Protocol":"HTTPS","Port":"443","StatusCode":"HTTP_301"}}]'

ALB target-group settings — the knobs that decide health and stickiness

The target group is where health and session behaviour live. Get these wrong and the ALB either won’t send traffic (failing health checks) or sends it unevenly (bad stickiness). Every setting that matters:

Setting What it does Default When to change Gotcha
target-type instance / ip / lambda / alb instance ip for Fargate/ENI, lambda for serverless ip needs SG to allow the LB subnet CIDRs
health-check-path Path probed for health / Always point at a fast, cheap /healthz / is often slow or auth-walled → false unhealthy
health-check-protocol HTTP / HTTPS HTTP HTTPS if the target only speaks TLS Cert/SNI must match or probe fails
matcher (HttpCode) Success status range 200 Widen to 200-299 if app returns 204/206 A 301 on / reads as unhealthy
healthy-threshold-count Consecutive passes to mark healthy 5 (ALB) / 3 Lower for faster recovery Too low flaps on transient blips
unhealthy-threshold-count Consecutive fails to mark unhealthy 2 Higher to ride out brief blips Too low evicts during GC pauses
interval Seconds between probes 30 10–15 for faster detection Lower = more probe load on targets
timeout Per-probe timeout 5 s Raise for slow health endpoints Must be < interval
deregistration_delay Connection-drain seconds 300 Lower (30–60) for fast deploys Too low cuts in-flight requests
stickiness (duration cookie) Pin client to a target off Legacy stateful apps only Concentrates load; defeats even spread
slow_start Ramp traffic to new targets 0 (off) 30–120 s for JIT-warming apps Delays full use of new capacity
load_balancing.algorithm round_robin / least_outstanding round_robin least_outstanding_requests for uneven request costs Round-robin can overload slow targets

ALB and WebSocket / gRPC / HTTP2

ALB speaks HTTP/2 to clients and supports WebSocket (the Upgrade handshake passes through and the connection stays open) and native gRPC (set the target-group protocol version to GRPC and the matcher to gRPC status codes). The one thing to watch: WebSocket and other long-lived connections die at the idle timeout if they go quiet — raise it or send keepalives.

# Raise the ALB idle timeout to 4000s for long-lived WebSocket connections
aws elbv2 modify-load-balancer-attributes --load-balancer-arn <alb-arn> \
  --attributes Key=idle_timeout.timeout_seconds,Value=4000
Protocol need ALB support What to set Watch-out
HTTP/1.1 Native Nothing
HTTP/2 (client) Native On by default to client Backend leg is HTTP/1.1
WebSocket Yes Raise idle_timeout Quiet sockets cut at timeout
gRPC Yes TG protocol version GRPC Matcher uses gRPC status codes
Server-Sent Events Yes Raise idle_timeout Same idle-cut risk as WebSocket

Deep dive — Network Load Balancer (L4)

NLB operates at layer 4: it picks a target by a flow hash (source IP/port, dest IP/port, protocol) and forwards the TCP/UDP segments without parsing anything above L4. That makes it astonishingly fast (sub-millisecond added latency), able to handle tens of millions of flows, and the only ELB that speaks UDP. It also gives you a static IP per AZ (attach an Elastic IP), which is the reason allow-list-driven and DNS-pinned clients use it.

# NLB with a TCP listener on 443, forwarding to an IP target group
aws elbv2 create-load-balancer --name nlb-game-prod --type network \
  --subnets subnet-aaa subnet-bbb --scheme internet-facing

aws elbv2 create-target-group --name tg-game-tcp \
  --protocol TCP --port 7777 --vpc-id vpc-123 \
  --health-check-protocol TCP --healthy-threshold-count 3

aws elbv2 create-listener --load-balancer-arn <nlb-arn> \
  --protocol TCP --port 443 \
  --default-actions Type=forward,TargetGroupArn=<tg-arn>
resource "aws_lb" "game" {
  name                             = "nlb-game-prod"
  load_balancer_type               = "network"
  subnets                          = [aws_subnet.public_a.id, aws_subnet.public_b.id]
  enable_cross_zone_load_balancing = true   # off by default on NLB — and AZ data charges apply
}

resource "aws_lb_target_group" "game" {
  name        = "tg-game-tcp"
  port        = 7777
  protocol    = "TCP"
  vpc_id      = aws_vpc.main.id
  target_type = "ip"
  health_check {
    protocol          = "TCP"
    healthy_threshold = 3
    interval          = 10
  }
}

NLB protocols and listeners

NLB listeners speak more than TCP — and the protocol you pick decides health-check options and TLS behaviour:

Listener protocol Carries TLS handling Health-check options Use
TCP Any TCP stream Passthrough (encrypted to target) TCP, HTTP, HTTPS Databases, game servers, MQTT, SSH
UDP UDP datagrams N/A TCP/HTTP on a side port Voice, DNS, syslog, game telemetry
TCP_UDP Both on one port N/A / passthrough TCP/HTTP Protocols using both (e.g. DNS)
TLS TLS-terminated TCP Terminated at NLB (ACM cert) TCP, HTTP, HTTPS Offload TLS at L4 but keep static IP

NLB attributes — source IP, cross-zone, and the timeout that bites

NLB’s defaults differ from ALB’s in ways that catch people. The three that matter most: source-IP preservation (on by default for instance/IP targets — your SGs must allow the client, not the LB), cross-zone load balancing (OFF by default, unlike ALB where it’s on — and turning it on incurs inter-AZ data charges), and the fixed 350-second TCP idle timeout (you cannot change it; long-lived quiet connections reset). The full attribute set:

Attribute / behaviour Default What it controls When to change Gotcha
Client IP preservation On (instance/IP) Target sees real client IP Off only if targets can’t handle it SGs must allow client CIDRs, not LB
Cross-zone load balancing Off Spread across all AZ targets On for even distribution Inter-AZ data transfer cost when on
TCP idle timeout 350 s (fixed) Reset idle TCP flows Cannot change — design around it gRPC/DB/SSH drop if quiet > 350 s
deregistration_delay 300 s Connection drain on deregister Lower for fast deploys Cuts in-flight if too low
proxy_protocol_v2 Off Prepend client info header On when behind another proxy Target must parse PROXY v2
TLS ssl_policy (TLS listener) Recent default Cipher/protocol set Tighten to TLS 1.2+/1.3 Old clients may fail handshake
Health-check interval 10 s (TCP) / 30 (HTTP) Probe frequency Lower for faster failover More probe load

A worked source-IP example: with preservation on, a game client at 203.0.113.50 connecting through the NLB makes the EC2 target see 203.0.113.50 directly — so the instance security group must Allow TCP 7777 from 0.0.0.0/0 (or the client ranges), not from the NLB. People allow the NLB’s ENI and then wonder why every connection is refused. That’s the source-IP model working exactly as designed.

# Enable cross-zone LB (off by default on NLB) and check it
aws elbv2 modify-load-balancer-attributes --load-balancer-arn <nlb-arn> \
  --attributes Key=load_balancing.cross_zone.enabled,Value=true

NLB as an ALB target — the “best of both” pattern

A genuinely useful trick: register an ALB as a target of an NLB. You get the NLB’s static IP and the ALB’s L7 routing together — common when a client (or a third party) demands a fixed IP to allow-list but you still need path-based routing and WAF behind it. The NLB forwards TCP 443 to the ALB; the ALB does the HTTP work.

Goal Pattern What each layer gives
Static IP + L7 routing NLB → ALB target NLB: fixed EIP; ALB: host/path/WAF
Static IP + raw TCP NLB → instances NLB: fixed EIP + low latency
Global static anycast IP Global Accelerator → ALB/NLB GA: 2 anycast IPs, edge entry
L7 routing only ALB → instances ALB: full content routing

Deep dive — API Gateway (managed API front door)

API Gateway is not a load balancer — it’s an API-management product. It accepts requests, authorizes them (Cognito, IAM, Lambda authorizer, or JWT), throttles them per usage plan, optionally caches responses, maps request/response shapes, and integrates with a backend (Lambda, an HTTP endpoint, an AWS service directly, or a private VPC resource via a VPC link). It comes in three flavours — REST, HTTP and WebSocket — and choosing the right flavour is itself a decision.

REST vs HTTP vs WebSocket APIs

The newer HTTP API is cheaper and lower-latency but has fewer features than the older REST API. The full comparison:

Capability REST API HTTP API WebSocket API
Primary use Full-feature managed REST Lean, cheap proxy to Lambda/HTTP Bidirectional realtime
Price (per million) Higher (~3.5× HTTP) Lowest Per message + connection-minute
Latency overhead Higher Lower Per-message
Authorizers IAM, Cognito, Lambda JWT, Lambda, IAM Lambda (on connect)
API keys + usage plans Yes No (limited) No
Request/response mapping Full (VTL) Minimal Route-based
Caching Yes (per stage) No No
WAF Yes No No
Private (VPC) integration VPC link (NLB) VPC link (ALB/NLB)
Edge/Regional/Private endpoint All three Regional Regional
Choose when You need keys, caching, WAF, mapping Simple, high-volume, cost-sensitive Chat, notifications, live data

Create a simple HTTP API fronting a Lambda with the CLI:

# HTTP API with a Lambda proxy integration and a default stage with auto-deploy
aws apigatewayv2 create-api --name api-orders \
  --protocol-type HTTP --target arn:aws:lambda:ap-south-1:111122223333:function:orders

A REST API with a usage plan and throttling, in Terraform:

resource "aws_api_gateway_rest_api" "orders" {
  name = "orders-api"
}

resource "aws_api_gateway_usage_plan" "partners" {
  name = "partners"
  api_stages {
    api_id = aws_api_gateway_rest_api.orders.id
    stage  = aws_api_gateway_stage.prod.stage_name
  }
  throttle_settings {
    rate_limit  = 200   # steady-state requests/second for this plan
    burst_limit = 400   # bucket size for spikes
  }
  quota_settings {
    limit  = 1000000    # requests
    period = "MONTH"
  }
}

resource "aws_api_gateway_api_key" "partner_acme" {
  name = "acme"
}

resource "aws_api_gateway_usage_plan_key" "acme" {
  key_id        = aws_api_gateway_api_key.partner_acme.id
  key_type      = "API_KEY"
  usage_plan_id = aws_api_gateway_usage_plan.partners.id
}

API Gateway authorizers — gating who gets in

Authorization is the headline reason to choose API Gateway over an ALB. The options, what they check, and when each fits:

Authorizer type Checks Best for Note
NONE Nothing (open) Public read endpoints Combine with API key + throttle
API key A key in x-api-key Partner identification + usage plans Not auth — identity for metering only
IAM SigV4-signed requests Service-to-service, internal Caller needs AWS creds
Cognito Cognito user-pool JWT End-user web/mobile auth Native user pools
Lambda (token) Custom logic on a bearer token Bespoke / third-party IdP You write the verify logic
Lambda (request) Custom logic on full request Header/query/context-based auth Most flexible; cache the result
JWT (HTTP API) OAuth2/OIDC JWT claims Standard OIDC providers HTTP API only; no Lambda needed

API Gateway throttling and the 429

Throttling is layered, and a 429 can come from any layer. From most-specific to least: per-method limits, the usage-plan rate/burst for the caller’s key, the stage default, and finally the account-level ceiling (default 10,000 rps + 5,000 burst per Region). The first ceiling hit wins.

Throttle scope Default Configurable? Returns How to confirm
Account (Region) 10,000 rps / 5,000 burst Via quota increase 429 Service Quotas console
Stage default Inherits account Yes 429 Stage → Default Route Throttling
Per-method Inherits stage Yes 429 Method throttling settings
Usage plan (per key) Plan rate/burst Yes 429 Usage plan → throttle
Per-client quota Plan quota (e.g. /month) Yes 429 (quota) Usage plan → quota

API Gateway caching, mapping and integration types

REST APIs can cache responses per stage (sized 0.5 GB–237 GB) with a TTL, cutting backend load and latency for read-heavy endpoints. Mapping templates (VTL) reshape requests/responses. And the integration type decides what sits behind the gateway:

Integration type Backend Use Limit to know
AWS_PROXY (Lambda proxy) Lambda Most serverless APIs 6 MB sync payload; 29 s timeout
AWS (Lambda non-proxy) Lambda When you need request mapping Same limits + VTL effort
HTTP_PROXY Any HTTP endpoint Front an existing service 29 s timeout
HTTP HTTP endpoint + mapping Reshape to a legacy API 29 s timeout
MOCK None Stubs, CORS preflight Returns canned response
AWS (service integration) DynamoDB/SQS/etc directly Skip Lambda for simple ops Per-service quotas
Private (VPC link) NLB (REST) / ALB or NLB (HTTP) Reach private VPC backends Needs the link + target

The error & limit reference

The lookup table you scan first during an incident: the status codes and the hard limits you realistically hit across all three front doors, what each means on AWS specifically, how to confirm, and the fix.

Status / error codes

Code Source Meaning on AWS Likely cause How to confirm First fix
502 Bad Gateway ALB / API GW Bad/no answer from target Target crashed, wrong port, Lambda error, bad response format ALB access logs elb_status_code=502; target health Fix target; align port/health; check Lambda
503 Service Unavailable ALB / NLB No healthy target to send to All targets unhealthy; no target registered in the AZ TargetGroup → Targets unhealthy; HealthyHostCount=0 Fix health check; register targets per AZ
504 Gateway Timeout ALB / API GW Backend too slow Target slower than idle timeout; API GW 29 s integration cap ALB target_processing_time; APIGW IntegrationLatency Speed up backend; raise ALB idle timeout
460 ALB Client closed connection before response Client timeout/abort ALB access log code 460 Client-side; usually benign
463 ALB X-Forwarded-For had too many IPs Malformed XFF chain ALB access log code 463 Fix upstream proxy XFF handling
429 Too Many Requests API GW Throttled Account/stage/method/usage-plan limit hit CloudWatch ThrottleCount; Service Quotas Raise throttle/quota; cache; request increase
403 Forbidden API GW / WAF Authorizer denied or WAF blocked Bad token, missing key, WAF rule APIGW execution logs; WAF sampled requests Fix token/key; tune WAF rule
413 Payload Too Large API GW Request body over limit > 10 MB (REST) / 6 MB (Lambda sync) Request size vs limit Use multipart/S3 presigned upload
401 Unauthorized API GW Auth required / failed Missing/expired credentials Authorizer logs Present valid credentials
500 Internal Server Error API GW Gateway/integration error Mapping template error; integration failure APIGW execution logs (/aws/apigateway) Fix mapping/integration
Connection reset NLB TCP flow reset 350 s idle timeout exceeded Target sees no FIN; flow idle > 350 s TCP keepalives < 350 s
Connection refused NLB SG blocked the client SG allows LB instead of client (source-IP preserved) VPC Flow Logs REJECT on target ENI Allow client CIDRs on target SG

Hard limits & quotas

The numbers that shape designs — and that you cannot wish away:

Limit ALB NLB API Gateway Note
Idle / connection timeout 60 s (configurable) 350 s TCP (fixed) 29 s integration max NLB’s is unchangeable
Max request/payload No fixed body cap (streaming) N/A (L4) 10 MB REST / 6 MB Lambda sync Use S3 for big uploads
Targets per target group 1,000 (default) 1,000 (default) N/A Soft quota; raise via Support
Rules per ALB 100 (default) N/A N/A Soft quota
Certificates per ALB 25 (default) 25 (TLS) per-domain SNI multi-cert
Default request rate (LCU-bound) (NLCU-bound) 10,000 rps + 5,000 burst API GW account-level
Static IPs None 1 EIP per AZ None Use NLB/Global Accelerator
Cross-zone LB default On Off N/A NLB opt-in costs inter-AZ data
WAF support Yes No REST only L7 only
Max APIs / resources N/A N/A 600 APIs; 300 resources/API Soft quotas
Lambda integration timeout N/A N/A 29 s (hard) Long jobs → async pattern

Three reading notes that save the most time:

Distinction The trap How to tell them apart
ALB 502 vs 503 Both look like “LB broken” 502 = a target answered badly; 503 = no healthy target to answer
API GW 429 (account) vs (usage plan) Hours tuning the wrong throttle If only one key 429s → usage plan; if all callers 429 → account/stage
NLB “refused” vs “reset” Different root causes Refused at connect = SG/source-IP; reset mid-flow = 350 s idle timeout

Architecture at a glance

The diagram traces a single request from the client and shows the three front doors side by side as the decision tier, then the shared compute and observability behind them. Read it left to right. A client (web, mobile or IoT) arrives at the edge, optionally through CloudFront for CDN and edge TLS, with AWS WAF available — but note immediately that WAF attaches only to the L7 paths (ALB, API Gateway, CloudFront), never to NLB. From the edge the request lands on exactly one of three entry points chosen by protocol and feature need: the ALB path terminates HTTP/HTTPS and routes by host/path/header into a target group with HTTP health checks; the NLB path forwards raw TCP/UDP with a static EIP per AZ, preserves the source IP, and carries the fixed 350-second idle timeout; the API Gateway path is the managed front door adding an authorizer, usage-plan throttling and caching in front of the same backends. All three converge on shared compute (EC2, ECS, EKS or Lambda — including IP targets) and emit their own access logs and CloudWatch metrics (5xx rate, target response time, throttle count).

Notice what each numbered badge marks: it is the decision or failure point that bites on that path. Badge 1 sits on the ALB target group — the unhealthy-target 502/503 that is the single most common ALB incident. Badge 2 sits on the NLB flow — the 350-second idle reset that silently kills long-lived gRPC and database connections. Badge 3 sits on API Gateway — the 429 when a throttle ceiling is hit. Badge 4 is the architecture-level trap: the wrong front door for the protocol (UDP on ALB, no rate-limit on a partner API, an oversized payload). Badge 5 is WAF on the wrong layer. The first question on any new workload is the one this diagram is built around: which layer does the entry point need to work at? — and the column you land in tells you which service, which limits, and which failures to expect.

AWS entry-point architecture comparing ALB, NLB and API Gateway: a client at the edge (optionally via CloudFront with AWS WAF, which attaches only to L7) routing to one of three front doors — an Application Load Balancer terminating HTTP/HTTPS and routing by host/path/header to a target group with HTTP health checks (badge 1: unhealthy-target 502/503), a Network Load Balancer forwarding raw TCP/UDP with a static EIP per AZ, source-IP preservation and a fixed 350-second idle timeout (badge 2: idle-reset of long-lived flows), and API Gateway as a managed REST/HTTP/WebSocket front door with an authorizer, usage-plan throttling and caching (badge 3: 429 throttling) — all three forwarding to shared compute (EC2, ECS, EKS, Lambda, IP targets) and emitting access logs plus CloudWatch metrics, with badge 4 marking the wrong-front-door-for-the-protocol trap and badge 5 marking WAF attached to the wrong layer

Real-world scenario

Vyana Games runs a multiplayer mobile title out of the Mumbai (ap-south-1) region: a Unity client, a fleet of authoritative game servers on EC2 (UDP), a set of stateless HTTP microservices on ECS Fargate (matchmaking, profile, store), a realtime chat channel, and a partner leaderboard API consumed by three esports websites. The platform team is five engineers; the monthly AWS spend across these front doors started at about ₹95,000 and was rising fast for reasons nobody could pin down.

The original design was the classic anti-pattern: one ALB for everything. UDP voice and game traffic were tunnelled over a WebSocket shim through the ALB because “the ALB was already there,” which added 30–60 ms of jitter and made the game feel laggy in 5v5 matches. The chat channel rode WebSocket on the same ALB and kept dropping connections — players had to reconnect every few minutes. The partner leaderboard API was a plain ALB target group with no throttling; when one esports site deployed a buggy poller that hammered the endpoint 50× normal, it starved the Fargate fleet and matchmaking timed out for everyone. And the bill: the ALB’s LCU charges were climbing because the WebSocket-tunnelled UDP traffic generated enormous connection churn.

The breakthrough was a whiteboard session that asked one question per workload: what layer does this actually need? Voice and game traffic are UDP and latency-critical — that’s L4, NLB, full stop, with source-IP preservation so the game servers see real client IPs for anti-cheat. The HTTP microservices need host/path routing and WAF — that’s L7, ALB. Chat is realtime bidirectional — they kept it on API Gateway WebSocket for managed scale and connection handling. The partner leaderboard needs governance — API keys, per-partner throttling, a usage plan, and caching for the read-heavy leaderboard — that’s API Gateway REST, no question.

The migration ran over three weeks. They stood up an NLB with Elastic IPs per AZ for the game/voice servers (the esports partners could now allow-list a stable IP, a bonus they hadn’t planned for), preserving the client source IP so the anti-cheat allow-lists worked. They moved the HTTP services behind a dedicated ALB with path rules (/match/*, /store/*, /profile/*) and put WAF in front to block the credential-stuffing they’d been seeing on login. They built the partner leaderboard as an API Gateway REST API with a usage plan per partner (200 rps steady, 400 burst, 1M/month quota), API keys so they could identify and rate-limit each integrator independently, and a 0.5 GB stage cache with a 30-second TTL on the leaderboard GET — which cut backend calls by ~80% and dropped p95 latency from 180 ms to 22 ms.

The results were unambiguous. Voice jitter fell from 30–60 ms to under 5 ms once UDP rode the NLB at L4 instead of being tunnelled through L7. The buggy-partner incident became a non-event: the offending key simply hit its 429 ceiling and got throttled in isolation, while every other caller and the game itself were untouched. Login credential-stuffing dropped off once WAF was in the path. And the bill fell to about ₹71,000 — the NLB is far cheaper than the ALB was for that connection-churny traffic, and the API Gateway cache slashed Lambda/Fargate invocations. The lesson the team wrote on the wall: “One load balancer for everything is one bug for everything. Pick the layer per workload.”

The migration as a before/after, because the mapping is the lesson:

Workload Before (one ALB) After (right layer) Why it’s better
Voice / game (UDP) Tunnelled over WebSocket on ALB NLB TCP/UDP, EIP per AZ L4 latency; source IP for anti-cheat
HTTP microservices Same ALB, mixed in Dedicated ALB + path rules + WAF Clean routing; WAF on logins
Realtime chat WebSocket on ALB (dropping) API Gateway WebSocket Managed connections at scale
Partner leaderboard ALB TG, no governance API Gateway REST + usage plans + cache Per-partner throttle; 80% fewer backend calls
Cost ~₹95,000 rising ~₹71,000 NLB cheaper for churn; cache cuts invocations

Advantages and disadvantages

No single front door wins on every axis — that’s the whole point. Weigh them honestly:

Advantages Disadvantages
ALB: richest L7 routing (host/path/header/method), native ECS/EKS/Lambda targets, WAF, OIDC/Cognito auth actions, gRPC/WebSocket ALB: HTTP/HTTPS only — useless for UDP or raw TCP; higher latency than NLB; LCU cost can climb with connection churn
NLB: sub-millisecond latency, tens of millions of flows, static EIP per AZ, UDP support, source-IP preservation, cheap for steady throughput NLB: no L7 routing, no WAF, no header awareness; fixed 350 s idle timeout you can’t change; cross-zone off by default (and costs data when on)
API Gateway: full API governance — keys, usage plans, throttling, authorizers, caching, mappings, stages, developer portal — with almost no code API Gateway: per-request pricing punishes high volume; tens-of-ms latency overhead; 29 s integration timeout; 10 MB/6 MB payload caps; more moving parts
All three scale automatically and integrate with CloudWatch access logs and metrics All three add a hop you must health-check, log and reason about; the wrong choice silently taxes latency, features or cost
ALB + NLB price on capacity units — very cheap for steady high-throughput traffic API Gateway can be 10–40× the cost of an ALB for the same chatty internal traffic
NLB → ALB pattern gives static IP and L7 routing together CLB (legacy) and GWLB (appliance-only) are easy to pick by mistake for the wrong reason

When each matters: choose ALB when the entry point must read HTTP and route on it (almost all web/microservice traffic), and you want WAF and container-native targets. Choose NLB when latency, raw throughput, UDP, or a static allow-listable IP dominate — gaming, IoT, financial feeds, database proxies. Choose API Gateway when you’re publishing an API to others and need to govern it (auth, per-caller throttling, keys, caching) more than you need raw throughput, and the volume is low-to-medium enough that per-request pricing stays sane. The disadvantages are all predictable — UDP-on-ALB latency, NLB idle resets, API-Gateway cost-at-scale — which is exactly why naming the layer first prevents every one of them.

Hands-on lab

Stand up all three front doors in front of a trivial backend, observe how each behaves, and tear it all down — free-tier-friendly where possible (the load balancers and API Gateway have small hourly/request costs; we delete everything at the end). Run in a shell with the aws CLI configured for a sandbox account in ap-south-1. Replace the placeholder IDs with your VPC/subnet IDs.

Step 1 — Variables and a target instance. Use an existing VPC with two public subnets, or create them first.

export VPC=vpc-0123456789abcdef0
export SUB_A=subnet-0aaa SUB_B=subnet-0bbb
export REGION=ap-south-1
# A tiny instance running a web server on :8080 (or reuse one you have)
aws ec2 run-instances --image-id ami-0xxxx --instance-type t3.micro \
  --subnet-id $SUB_A --associate-public-ip-address \
  --user-data 'IyEvYmluL2Jhc2gKcHl0aG9uMyAtbSBodHRwLnNlcnZlciA4MDgw' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=lab-backend}]'

Expected: an instance ID; note it and its private IP.

Step 2 — Create an ALB with an HTTP health check and watch the target go healthy.

ALB_ARN=$(aws elbv2 create-load-balancer --name lab-alb --type application \
  --subnets $SUB_A $SUB_B --query 'LoadBalancers[0].LoadBalancerArn' --output text)
TG_ARN=$(aws elbv2 create-target-group --name lab-tg --protocol HTTP --port 8080 \
  --vpc-id $VPC --health-check-path / --query 'TargetGroups[0].TargetGroupArn' --output text)
aws elbv2 register-targets --target-group-arn $TG_ARN --targets Id=<instance-id>
aws elbv2 create-listener --load-balancer-arn $ALB_ARN --protocol HTTP --port 80 \
  --default-actions Type=forward,TargetGroupArn=$TG_ARN
# Watch health flip from 'initial' to 'healthy'
aws elbv2 describe-target-health --target-group-arn $TG_ARN \
  --query 'TargetHealthDescriptions[].TargetHealth.State' --output text

Expected: the state moves initialhealthy within a couple of minutes. Curl the ALB DNS name and you get the Python server’s directory listing.

Step 3 — Break the health check on purpose, watch the 503. Point the health check at a path that 404s:

aws elbv2 modify-target-group --target-group-arn $TG_ARN --health-check-path /nope --matcher HttpCode=200
sleep 60
aws elbv2 describe-target-health --target-group-arn $TG_ARN \
  --query 'TargetHealthDescriptions[].TargetHealth.{state:State,reason:Reason}' --output json
# Now curl the ALB — you get a 503 because no target is healthy
curl -s -o /dev/null -w "%{http_code}\n" http://<alb-dns-name>/

Expected: state unhealthy, reason Target.ResponseCodeMismatch, and the curl returns 503. This is the single most common ALB incident, reproduced in one command. Restore it: aws elbv2 modify-target-group --target-group-arn $TG_ARN --health-check-path /.

Step 4 — Create an NLB and observe the static-IP / source-IP behaviour.

NLB_ARN=$(aws elbv2 create-load-balancer --name lab-nlb --type network \
  --subnets $SUB_A $SUB_B --query 'LoadBalancers[0].LoadBalancerArn' --output text)
NTG_ARN=$(aws elbv2 create-target-group --name lab-ntg --protocol TCP --port 8080 \
  --vpc-id $VPC --query 'TargetGroups[0].TargetGroupArn' --output text)
aws elbv2 register-targets --target-group-arn $NTG_ARN --targets Id=<instance-id>
aws elbv2 create-listener --load-balancer-arn $NLB_ARN --protocol TCP --port 80 \
  --default-actions Type=forward,TargetGroupArn=$NTG_ARN
# The NLB preserves your source IP — the instance SG must allow YOUR client, not the NLB

Curl the NLB DNS name; if you get connection-refused, that’s the source-IP preservation lesson: open the instance security group to your client CIDR on 8080, not the NLB. Fix it and the curl succeeds.

Step 5 — Create an HTTP API Gateway in front of the ALB (HTTP_PROXY) and throttle it.

API_ID=$(aws apigatewayv2 create-api --name lab-api --protocol-type HTTP \
  --query 'ApiId' --output text)
# (Add an integration to the ALB DNS and a default route; then a stage with throttling)
aws apigatewayv2 create-stage --api-id $API_ID --stage-name '$default' --auto-deploy \
  --default-route-settings 'ThrottlingRateLimit=5,ThrottlingBurstLimit=2'
# Hammer it past 5 rps and watch some requests return 429
for i in $(seq 1 30); do curl -s -o /dev/null -w "%{http_code} " https://$API_ID.execute-api.$REGION.amazonaws.com/; done; echo

Expected: a burst of 200s followed by several 429s once you exceed the 5 rps / 2-burst ceiling — the throttling lesson, reproduced.

Validation checklist. You created all three front doors, saw an ALB return 503 from a failed health check, learned the NLB source-IP model the hard way (connection refused until the SG allowed the client), and watched API Gateway return 429 when a throttle ceiling was crossed. Each maps to a real production incident:

Step What you did What it proves Real-world analogue
2 ALB + healthy target The listener→TG→health-check chain Standing up any web service
3 Break health check → 503 Health check is the arbiter The #1 ALB incident
4 NLB + source-IP refusal NLB preserves the client IP SG-vs-client confusion in prod
5 API GW throttle → 429 Throttling is layered and real Partner API rate-limit incidents

Cleanup (stop the hourly/request charges).

aws elbv2 delete-listener --listener-arn <alb-listener-arn>
aws elbv2 delete-load-balancer --load-balancer-arn $ALB_ARN
aws elbv2 delete-load-balancer --load-balancer-arn $NLB_ARN
aws elbv2 delete-target-group --target-group-arn $TG_ARN
aws elbv2 delete-target-group --target-group-arn $NTG_ARN
aws apigatewayv2 delete-api --api-id $API_ID
aws ec2 terminate-instances --instance-ids <instance-id>

Cost note. Two load balancers and an HTTP API for an hour, plus a t3.micro, is a few tens of rupees total; deleting everything stops it. ALB/NLB bill per hour + capacity unit even when idle, so don’t leave them running.

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First as a scannable table you can read mid-incident, then the entries that bite hardest with full confirm-command detail.

# Symptom Root cause Confirm (exact cmd / console path) Fix
1 ALB returns 503 though instances are running All targets failing the health check aws elbv2 describe-target-healthunhealthy; reason ResponseCodeMismatch/Timeout Fix health path/port/matcher; open SG from ALB to target
2 ALB returns 502 intermittently Target returns a malformed/empty HTTP response, or Lambda errored ALB access logs elb_status_code=502, target_status_code=-; Lambda logs Fix target response; for Lambda check the function error
3 New ECS/Fargate targets never go healthy target-type=instance used for awsvpc/IP targets, or SG blocks LB subnet TG shows targets stuck initial/unhealthy; check target_type Use target-type=ip; allow LB subnet CIDRs on the task SG
4 NLB connections “refused” at connect Source-IP preserved; SG allows the NLB, not the client VPC Flow Logs on target ENI show REJECT from client IP Allow the client CIDRs (not the NLB) on the target SG
5 gRPC/DB/SSH over NLB drops after ~6 min 350 s fixed TCP idle timeout exceeded Flow idle > 350 s; target never saw a FIN TCP keepalives < 350 s on client and target
6 NLB traffic lands on only one AZ’s targets Cross-zone LB off (NLB default) describe-load-balancer-attributes shows cross-zone false Enable cross-zone (accept inter-AZ data cost)
7 API Gateway callers get 429 A throttle ceiling hit (account/stage/method/usage-plan) CloudWatch ThrottleCount; Service Quotas; usage-plan limits Raise the relevant throttle/quota; cache; request account increase
8 API Gateway returns 504 / “Endpoint timed out” Backend slower than the 29 s integration limit APIGW IntegrationLatency near 29,000 ms Speed up backend; make the call async (SQS/Step Functions)
9 API Gateway returns 413 Payload over 10 MB (REST) / 6 MB (Lambda sync) Request body size vs limit Upload via S3 presigned URL; stream; chunk
10 API Gateway returns 403 to valid callers Authorizer denied, missing API key, or WAF blocked APIGW execution logs; WAF sampled requests Fix token/key mapping; tune the WAF rule
11 WAF rules “do nothing” on an NLB WAF can’t attach to NLB (L4) WebACL association list has no NLB Put WAF on an ALB/API GW/CloudFront in front
12 Backend sees the LB’s IP, not the client L7 terminates TCP; real IP is in X-Forwarded-For App logs show 10.x LB IP Read X-Forwarded-For (ALB/APIGW) or use NLB (preserves IP)
13 HTTPS works but HTTP just hangs/404s No HTTP→HTTPS redirect listener on :80 Only a :443 listener exists Add a :80 listener with a redirect action to 443
14 Sticky app overloads a few instances ALB duration-cookie stickiness pinning clients TG attribute stickiness.enabled=true Disable stickiness for stateless apps; spread load
15 API Gateway cost spikes unexpectedly Per-request pricing on chatty/high-volume traffic Cost Explorer by API GW usage type Move internal chatty traffic to ALB; add stage caching

The expanded form, with the full reasoning for the ones that bite hardest:

1. ALB returns 503 though the instances are running fine. Root cause: All targets are failing the health check, so the ALB has nothing healthy to forward to. Usually a wrong health-check path (pointing at / which is slow/auth-walled/redirects), a wrong port, a too-strict matcher (expecting 200 when the app returns 301/302/204), or a security group that doesn’t allow the ALB to reach the target on the health-check port. Confirm: aws elbv2 describe-target-health --target-group-arn <arn> returns State: unhealthy with Reason: Target.ResponseCodeMismatch (matcher) or Target.Timeout (SG/port/slow path). The console TargetGroup → Targets tab shows the same. Fix: Point the health check at a fast, cheap /healthz that returns 200; widen the matcher if the app legitimately returns 2xx other than 200; open the target’s security group to the ALB’s security group on the traffic/health port.

2. ALB returns 502 Bad Gateway intermittently. Root cause: A target answered, but badly — an empty response, a malformed HTTP response, a connection reset mid-response, or (for a Lambda target group) the function threw or returned a non-conforming payload. Confirm: ALB access logs (enable them to S3) show elb_status_code=502 with target_status_code="-" and a target_processing_time indicating the target was reached. For Lambda targets, the function’s CloudWatch logs show the error. Fix: Fix the target’s response (don’t return empty bodies / reset connections under load); for Lambda, fix the function or its response format; ensure keep-alive settings on the target don’t close connections the ALB is reusing.

4. NLB connections are “refused” at connect time. Root cause: NLB preserves the client source IP by default, so the target sees the real client, and the target’s security group must allow the client CIDRs — not the NLB. People reflexively allow the NLB’s ENI (as they would for an ALB) and every connection is refused. Confirm: Enable VPC Flow Logs on the target ENI; you’ll see REJECT entries from the client IP on the service port. The NLB itself has no security group, so the target SG is the only L4 filter. Fix: Allow the client CIDRs (or 0.0.0.0/0 for a public service) on the target security group for the service port. If you genuinely can’t, disable client-IP preservation on the target group (then the target sees the NLB and you allow the NLB subnet CIDRs instead).

5. Long-lived connections over NLB drop after about six minutes. Root cause: The NLB TCP idle timeout is a fixed 350 seconds and cannot be changed. A gRPC stream, a database connection in a pool, or an SSH session that goes quiet for longer than 350 s is silently reset — the target often never sees a FIN, so it thinks the connection is still open. Confirm: The drop correlates with ~350 s of inactivity; the application sees resets, not graceful closes; raising application-level activity prevents it. Fix: Enable TCP keepalives below 350 seconds on both the client and the target so the flow never goes idle long enough to be reaped. For connection pools, set a max-idle below 350 s. There is no way to raise the NLB timeout — you design around it.

7. API Gateway callers get 429 Too Many Requests. Root cause: A throttle ceiling was crossed. From most-specific to least: a per-method limit, the caller’s usage-plan rate/burst, the stage default, or the account-level 10,000 rps + 5,000 burst per Region. The first one hit returns 429. Confirm: CloudWatch ThrottleCount for the API/stage; if only one API key throttles, it’s that key’s usage plan; if everyone throttles at the same rps, it’s the stage/account ceiling. Service Quotas shows the account limit. Fix: Raise the relevant throttle (usage-plan rate/burst, stage/method settings) or request an account-level quota increase via Service Quotas; add stage caching to cut backend calls; for legitimately huge volume, reconsider whether API Gateway (per-request priced) is the right front door at all.

8. API Gateway returns 504 / “Endpoint request timed out.” Root cause: The backend integration took longer than API Gateway’s hard 29-second integration timeout. A slow Lambda, a slow HTTP backend, or a synchronous call doing too much work. Confirm: CloudWatch IntegrationLatency for the method climbs toward/over 29,000 ms. Fix: Speed up the backend; or convert a long operation to asynchronous — return 202 immediately and process via SQS/Step Functions, polling or webhook for the result. You cannot raise the 29 s cap.

11. WAF rules appear to do nothing in front of an NLB. Root cause: AWS WAF associates only with ALB, API Gateway (REST), CloudFront and AppSync — all L7. It cannot attach to an NLB, which operates at L4 and never sees the HTTP request WAF needs to inspect. Confirm: The WebACL’s associated-resources list contains no NLB (it can’t). Fix: Put the WAF on an ALB, API Gateway or CloudFront in front of the workload. If you must use an NLB for L4 reasons, terminate TLS and place an ALB or CloudFront (with WAF) upstream of it.

12. The backend logs the load balancer’s IP, not the real client. Root cause: At L7, the ALB (and API Gateway) terminate the client TCP connection and open a new one to the target, so the target’s socket peer is the load balancer. The real client IP is carried in the X-Forwarded-For header. Confirm: App access logs show a 10.x/LB subnet IP as the remote address. Fix: Read the client IP from X-Forwarded-For (left-most untrusted hop, with care about spoofing) in the app or proxy. If you need the real client IP at the socket level (e.g. for L4 allow-lists), use an NLB with source-IP preservation instead.

Best practices

The alerts worth wiring before the next incident — the leading indicators, not just “site down”:

Alert on Signal Threshold (starting point) Why it’s leading
Unhealthy targets UnHealthyHostCount (ALB/NLB) ≥ 1 for 3 min Catches eviction before 503s hit users
Healthy host floor HealthyHostCount < desired count Predicts capacity loss
Backend latency TargetResponseTime p95 > your SLO Slow backend creeping toward timeout
ALB 5xx rate HTTPCode_ELB_5XX_Count > 1% of requests The symptom; confirm, don’t wait
API GW throttling ThrottleCount > 0 sustained First sign a ceiling is being hit
API GW integration latency IntegrationLatency p95 > 20,000 ms Approaching the 29 s hard cap
NLB flow reset TCP_Target_Reset_Count spike Idle-timeout or target-side resets

Security notes

The security controls that also prevent outages — secure and resilient pull the same direction:

Control Mechanism Secures against Also prevents
TLS termination + modern policy ACM cert + ssl_policy Downgrade / cleartext Handshake failures from stale ciphers
WAF on L7 WebACL on ALB/APIGW/CloudFront SQLi, XSS, cred-stuffing, floods Backend overload from abusive traffic
Usage plans + API keys API Gateway throttling Abuse, scraping, runaway callers One caller starving the shared backend
Authorizers Cognito/IAM/Lambda/JWT Unauthenticated access Bad requests reaching/crashing the backend
Private subnets for targets Subnet placement + SG Direct internet hits bypassing the LB/WAF Accidental public exposure of a backend
Tight target SGs (client side for NLB) Security groups Unauthorized L4 access Misconfig-driven “refused” confusion
Private API endpoints + resource policy API GW private + VPCE Internet reachability of internal APIs Data exfiltration via a public API

Cost & sizing

The bill drivers, and how they interact with the choice:

A rough monthly picture (ap-south-1, indicative — confirm current pricing):

Front door What you pay Rough INR / month (moderate load) Cheapest when Most expensive when
ALB Hourly + LCU ~₹1,800 base + LCU Steady HTTP web/microservice traffic Huge connection churn
NLB Hourly + NLCU ~₹1,800 base + NLCU Steady high-throughput TCP/UDP Cross-zone on with big inter-AZ data
API Gateway (HTTP) Per million requests ~₹85 / million + data Low/medium volume, spiky baseline Billions of calls/month
API Gateway (REST) Per million (~3.5× HTTP) ~₹300 / million + cache Need keys/cache/WAF, modest volume Very high volume
API GW stage cache $/GB-hour ~₹1,500 (0.5 GB) Read-heavy APIs (pays back via fewer backend calls) Write-heavy / low cache hit
Global Accelerator (optional) Hourly + data ~₹2,500 + data Need global static anycast IPs Single-region simple workloads

Sizing rule of thumb: steady, high-throughput, latency-sensitive → NLB; HTTP web/microservices needing routing + WAF → ALB; governed APIs at low-to-medium volume → API Gateway (HTTP API unless you need REST’s keys/cache/WAF/mapping). The cost mistake that recurs most is fronting chatty, high-volume internal traffic with per-request-priced API Gateway when an ALB would cost a tenth as much — name the traffic shape, then pick.

Interview & exam questions

1. What’s the fundamental difference between ALB and NLB, and how do you choose? ALB is a layer-7 load balancer — it parses HTTP, routes by host/path/header, terminates TLS, and supports WAF, gRPC and WebSocket. NLB is a layer-4 load balancer — it forwards raw TCP/UDP by flow hash with sub-millisecond latency, static IPs and source-IP preservation, but no HTTP awareness. Choose ALB when the entry point must read HTTP and route on it; choose NLB for raw TCP/UDP, extreme throughput, static IPs, or lowest latency.

2. When do you use API Gateway instead of an ALB, given both are L7? API Gateway is an API-management product, not just a load balancer. Use it when you need governance — API keys, usage plans, per-caller throttling, authorizers (Cognito/IAM/Lambda/JWT), response caching, request/response mapping, stages and a developer portal. Use an ALB when you just need to spread HTTP traffic across targets with content routing. Many designs use both: API Gateway out front for governance, an ALB behind for the fleet.

3. An ALB returns 503 but the EC2 instances are clearly running. What’s wrong and how do you confirm? The targets are failing the health check, so the ALB has nothing healthy to forward to. Confirm with aws elbv2 describe-target-health — you’ll see unhealthy with a reason like Target.ResponseCodeMismatch (matcher too strict) or Target.Timeout (SG/port/slow path). Fix the health-check path/port/matcher and open the target’s security group to the ALB.

4. Why do long-lived connections over an NLB drop after about six minutes, and how do you prevent it? The NLB has a fixed 350-second TCP idle timeout that you cannot change. Any flow (gRPC, database pool, SSH) that goes idle longer than 350 s is silently reset. Prevent it with TCP keepalives below 350 seconds on both ends so the connection never goes idle long enough to be reaped.

5. You front your TCP service with an NLB and every connection is refused. What’s the most likely cause? NLB preserves the client source IP by default, so the target’s security group must allow the client CIDRs — not the NLB. Allowing the NLB (as you would for an ALB) refuses every connection. Confirm with VPC Flow Logs on the target ENI (REJECT from the client IP) and fix by allowing the client ranges on the target SG.

6. API Gateway callers start getting 429. Walk through how you’d diagnose it. A throttle ceiling was hit. Check from most-specific to least: a per-method limit, the caller’s usage-plan rate/burst, the stage default, or the account-level 10,000 rps + 5,000 burst per Region. CloudWatch ThrottleCount and which keys are affected tell you the layer — one key throttling means its usage plan; everyone throttling at the same rate means the stage/account ceiling. Raise the relevant throttle/quota or add caching.

7. Where can AWS WAF be attached, and what does that mean for NLB designs? WAF attaches to ALB, API Gateway (REST), CloudFront and AppSync — all L7. It cannot attach to an NLB (L4), because the NLB never parses the HTTP request WAF inspects. If you need L4 and WAF, front the NLB with a CloudFront or ALB that carries the WAF.

8. How does each front door expose the real client IP to the backend? At L7 (ALB, API Gateway), the LB terminates the client connection, so the backend’s socket peer is the LB; the real client IP is in the X-Forwarded-For header. At L4, the NLB preserves the source IP by default, so the target sees the real client at the socket level — which is why NLB is preferred for L4 allow-lists.

9. What are API Gateway’s key hard limits, and how do they shape design? A 29-second integration timeout (long operations must go asynchronous via SQS/Step Functions), a 10 MB REST / 6 MB synchronous-Lambda payload limit (big uploads go through S3 presigned URLs), and an account-level 10,000 rps + 5,000 burst default throttle (raise via Service Quotas). These push you toward async patterns, S3-based uploads, and caching for high-read APIs.

10. Why is choosing the wrong front door a cost problem, not just a feature problem? ALB/NLB price on capacity units (cheap for steady high-throughput traffic), while API Gateway prices per request (great at low/medium volume, brutal at extreme volume). Fronting chatty, high-volume internal traffic with API Gateway can cost 10–40× what an ALB would; conversely, paying for an idle ALB on a spiky low-baseline API wastes the hourly charge that API Gateway (no idle cost) avoids. Match the pricing model to the traffic shape.

11. What’s the NLB-in-front-of-ALB pattern for, and why use it? Registering an ALB as a target of an NLB gives you the NLB’s static IP per AZ and the ALB’s L7 routing and WAF together. It’s the answer when a client or partner demands a fixed IP to allow-list but you still need path-based routing and WAF behind it.

12. Should you ever pick a Classic Load Balancer for a new design? No. CLB is legacy, predates ALB/NLB, and offers nothing they don’t do better. New L7 work goes to ALB, new L4 work to NLB. CLB exists only for backwards compatibility with old setups.

These map to the AWS Certified Solutions Architect – Associate (SAA-C03)design resilient, high-performing architectures, ELB/load-balancer selection, and API Gateway — and the Advanced Networking – Specialty (ANS-C01)hybrid and edge networking, NLB source-IP and static-IP behaviour, Global Accelerator, and PrivateLink. The serverless/API angle also touches the Developer – Associate (DVA-C02). A compact cert mapping for revision:

Question theme Primary cert Exam objective area
ALB vs NLB vs API GW selection SAA-C03 Design high-performing / resilient architectures
Health checks, target groups, 503/502 SAA-C03 Design resilient architectures
NLB source-IP, static IP, 350 s timeout ANS-C01 Design and implement network connectivity
API Gateway auth, throttling, usage plans DVA-C02 Develop / secure serverless apps
WAF placement, TLS termination SAA-C03 / Security Design secure architectures
Cost models (LCU vs per-request) SAA-C03 Design cost-optimized architectures

Quick check

  1. A workload needs to forward UDP with the lowest possible latency and a static IP partners can allow-list. Which front door, and why is ALB wrong?
  2. An ALB returns 503 while your instances are healthy in the OS. What is the single most likely cause and the exact command to confirm it?
  3. True or false: you can attach AWS WAF to a Network Load Balancer to filter malicious requests.
  4. Connections through your NLB to a gRPC service drop silently after a few minutes of inactivity. Why, and what’s the fix?
  5. You’re publishing a partner REST API and need per-partner rate limiting and API keys. Which service, and what construct gives you the per-partner throttle?

Answers

  1. NLB. It’s the only ELB that speaks UDP, adds sub-millisecond latency, and gives a static EIP per AZ for allow-listing. ALB is layer 7, HTTP/HTTPS only — it cannot forward UDP at all, and has no static IP.
  2. All targets are failing the health check, so the ALB has no healthy target to forward to. Confirm with aws elbv2 describe-target-health --target-group-arn <arn> — it returns State: unhealthy with a reason (Target.ResponseCodeMismatch for a too-strict matcher, Target.Timeout for SG/port/slow path). Fix the health-check path/port/matcher and the target’s security group.
  3. False. WAF attaches only to L7 resources — ALB, API Gateway (REST), CloudFront and AppSync. NLB operates at L4 and never sees the HTTP request WAF inspects. Front the NLB with an ALB/CloudFront that carries the WAF.
  4. The NLB has a fixed 350-second TCP idle timeout that can’t be changed; an idle gRPC stream is reset after it. Fix by enabling TCP keepalives below 350 seconds on the client and target so the flow never goes idle long enough to be reaped.
  5. API Gateway, with a usage plan per partner (rate + burst + quota) bound to a per-partner API key. The usage plan’s throttle settings give each partner an independent rate limit, so one noisy integrator hits its own 429 without affecting the others.

Glossary

Next steps

You can now pick the right front door for any workload and fix the failures each one throws. Build outward:

AWSALBNLBAPI GatewayLoad BalancerNetworkingELBThrottling
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading