Architecture Multi-Cloud

API Gateway and Backend-for-Frontend Patterns: Aggregation, Composition, and Versioning

The API edge is where the most expensive architecture mistakes hide, because they are the hardest to undo. Put too much logic in the gateway and you have rebuilt the monolith on the wrong tier. Put a backend-for-frontend in front of every client and you have N teams reimplementing auth and pagination N different ways. This article walks through the decisions that actually matter at the edge: what belongs in the gateway versus a service, when a backend-for-frontend (BFF) earns its keep, how to aggregate and compose across many microservices without making the gateway chatty, and how to evolve versions without a coordinated big-bang release. The running example is an e-commerce product page assembled from product, pricing, inventory, and reviews services.

1 – Draw the line: gateway responsibilities versus service responsibilities

The single most useful rule at the edge is to keep the gateway dumb about business and smart about traffic. The gateway owns concerns that are uniform across every route and that you never want duplicated in each service. A service owns anything that requires understanding the domain.

Concern Belongs in the gateway Belongs in the service / BFF
TLS termination, HTTP/2, mTLS to upstreams Yes No
AuthN: validate JWT signature, issuer, audience, expiry Yes No
AuthZ: coarse scope check (has scope orders:read) Yes
AuthZ: fine-grained (can user X see order Y) No Yes
Rate limiting, quotas, IP allow/deny Yes No
Routing, path rewrite, header injection Yes No
Request/response body shaping for one client No Yes (BFF)
Aggregating multiple services into one payload No (anti-pattern) Yes (BFF)
Business validation, calculations, workflows No Yes

The trap is “the gateway can do request transformation, so let’s do all transformation there.” Coarse, declarative transforms (strip an internal header, rewrite a path) are fine. Body-level aggregation and per-client shaping are not – they turn a config artifact into untested business code living in YAML, owned by no team. Keep that logic in a service where it has a test suite and an owner.

Heuristic I apply in reviews: if a change to the rule requires understanding a domain invariant, it does not belong in the gateway. If it would be identical for every API in the company, it does.

A minimal, declarative gateway route (Envoy-style, expressed through a Kubernetes Gateway API HTTPRoute) looks like this – routing and a header rewrite, nothing domain-aware:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: product-edge
spec:
  parentRefs:
    - name: public-gateway
  hostnames: ["api.shop.example.com"]
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /v1/products
      filters:
        - type: RequestHeaderModifier
          requestHeaderModifier:
            set:
              - name: x-edge-version
                value: "v1"
      backendRefs:
        - name: product-bff
          port: 8080

2 – When to introduce a backend-for-frontend per client experience

A BFF is a server owned by a frontend team that exposes exactly the API one client experience needs. You introduce one when client requirements diverge enough that a single generic API forces every consumer to compromise. Concrete signals:

The discipline that keeps BFFs from multiplying into chaos: one BFF per experience, not per microservice and not per team. Web BFF, mobile BFF, partner/public BFF. Three is a healthy number; thirty is a smell. Each BFF is a thin orchestration and shaping layer – it must contain no business rules that the downstream services should own. If you find a BFF computing tax or deciding whether an order can ship, that logic has leaked out of the domain and will drift between your web and mobile BFFs.

A common failure mode is the “shared BFF” that tries to serve web and mobile at once to avoid duplication. You then add if (client == "mobile") branches until it is a generic API again, with worse cohesion. Duplication across BFFs is acceptable; coupling two experiences through one codebase is not.

3 – Request aggregation and API composition over many microservices

Aggregation is the BFF’s core job: turn one client request into several downstream calls and assemble the result. The product page needs product details, current price, stock, and a review summary – four services. The naive sequential version costs the sum of all latencies. Fan out concurrently instead and you pay roughly the slowest call.

Here is the aggregation in the web BFF (Node, but the shape is language-agnostic). Note three things that matter: concurrency, per-call timeouts, and partial-response tolerance so one slow non-critical dependency cannot sink the whole page.

// Web BFF: compose the product page from 4 services.
async function getProductPage(productId: string, ctx: RequestContext) {
  const opts = { signal: AbortSignal.timeout(300), headers: ctx.fwd() };

  const [product, price, stock, reviews] = await Promise.allSettled([
    fetchJson(`${PRODUCT}/products/${productId}`, opts),
    fetchJson(`${PRICING}/prices/${productId}`, opts),
    fetchJson(`${INVENTORY}/stock/${productId}`, opts),
    fetchJson(`${REVIEWS}/summary/${productId}`, opts),
  ]);

  // Product is critical: if it failed, the page has no meaning.
  if (product.status === "rejected") throw new UpstreamError("product", 502);

  // The rest are degradable -- render the page without them.
  return {
    ...product.value,
    price: settledOr(price, null),
    inStock: settledOr(stock, { available: null })?.available ?? null,
    reviews: settledOr(reviews, { count: 0, average: null }),
    _partial: [price, stock, reviews].some((r) => r.status === "rejected"),
  };
}

function settledOr<T>(r: PromiseSettledResult<T>, fallback: T): T {
  return r.status === "fulfilled" ? r.value : fallback;
}

Promise.allSettled is the key choice over Promise.all: the latter rejects the moment any call fails, discarding the successful ones. With allSettled, you decide per-dependency whether a failure is fatal (product) or degradable (reviews). Surfacing _partial lets the client and your dashboards distinguish a fully-rendered page from a degraded one.

When the fan-out gets wide or the shapes vary per client, a GraphQL BFF is the natural evolution – the client declares the fields it wants and resolvers fan out to services. Use it when field-level selection genuinely varies across screens; do not adopt it just to have a graph, because you inherit a non-trivial caching and N+1 problem (solved with DataLoader-style batching) in return.

4 – Cross-cutting concerns: auth, rate limiting, and request transformation

These belong at the gateway precisely because they are uniform. Get the split right between gateway and BFF:

AuthN at the gateway, fine-grained AuthZ at the service. The gateway validates the token cryptographically and rejects anything unsigned, expired, or for the wrong audience – cheaply, before traffic ever reaches your code. It does not know whether user 123 may view order 456; that is a domain decision the order service makes. Validate the JWT at the edge and forward verified claims as trusted headers:

# Envoy JWT authentication filter: verify signature/iss/aud, then
# forward selected claims as request headers for downstream services.
http_filters:
  - name: envoy.filters.http.jwt_authn
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
      providers:
        auth0:
          issuer: https://auth.shop.example.com/
          audiences: ["api.shop.example.com"]
          remote_jwks:
            http_uri:
              uri: https://auth.shop.example.com/.well-known/jwks.json
              cluster: jwks_cluster
              timeout: 5s
            cache_duration: { seconds: 600 }
          forward: true                 # keep the original Authorization header
          claim_to_headers:
            - { header_name: x-user-sub, claim_name: sub }
            - { header_name: x-user-scope, claim_name: scope }
      rules:
        - match: { prefix: /v1/ }
          requires: { provider_name: auth0 }

Downstream services must trust x-user-sub only because the network guarantees traffic arrives through the gateway (mTLS, network policy) – never expose those headers on a path a client could reach directly, or you have built an auth bypass. The gateway should also strip any inbound x-user-* header before re-injecting its own, so a client cannot spoof identity.

Rate limiting is a load-shaping control, so it lives at the edge. Apply at least two tiers: a coarse global limit to absorb floods, and a per-principal limit so one tenant cannot starve others. A token-bucket / fixed-window descriptor against an external rate-limit service:

# Rate limit by API key, then by route, evaluated at the gateway.
rate_limits:
  - actions:
      - request_headers: { header_name: x-api-key, descriptor_key: client }
  - actions:
      - { destination_cluster: {} }
# descriptor config (rate-limit service):
descriptors:
  - key: client
    rate_limit: { unit: minute, requests_per_unit: 600 }
  - key: destination_cluster
    value: reviews-service
    rate_limit: { unit: second, requests_per_unit: 50 }

Return 429 with a Retry-After header so well-behaved clients back off instead of hammering. Keep transformation here minimal and declarative – header set/remove, path rewrite, CORS. The instant a transform needs to read or reshape a JSON body for a specific client, move it into the BFF where it is testable.

5 – Resilience at the edge with timeouts, retries, and circuit breakers

The edge concentrates traffic, so a missing timeout here is a cluster-wide outage rather than a single slow request. Three controls, layered, and each is a deliberate decision about load on a struggling dependency.

Timeouts and deadline propagation. Every outbound call from the BFF needs a per-attempt timeout (the 300 ms AbortSignal.timeout above). Just as important, propagate the remaining budget downstream so a service does not start work the client has already given up waiting for. Set a hard ceiling at the gateway too:

# Gateway route timeout: never let a single edge request hang a worker.
route:
  cluster: product-bff
  timeout: 1.5s          # overall ceiling for the edge call
  retry_policy:
    retry_on: "5xx,reset,connect-failure"
    num_retries: 2
    per_try_timeout: 0.5s
    retry_back_off: { base_interval: 0.05s, max_interval: 0.5s }

Retries – only where safe. Retry idempotent reads, never blind writes. A retried POST /orders without an idempotency key can double-charge a customer. Two non-negotiables: (1) retry budgets, so retries are capped as a fraction of traffic (e.g. 20%) and cannot turn a brown-out into a retry storm; (2) jittered backoff, so retries do not synchronize into a thundering herd. For writes, require an Idempotency-Key header that the service deduplicates against.

Circuit breakers and outlier ejection. When a dependency is failing, stop calling it for a cooldown so it can recover and so your workers are not all blocked on a corpse. Envoy provides this declaratively via connection-pool limits plus outlier detection that ejects hosts returning consecutive 5xx:

clusters:
  - name: reviews-service
    circuit_breakers:
      thresholds:
        - { max_connections: 200, max_pending_requests: 100, max_requests: 200 }
    outlier_detection:
      consecutive_5xx: 5
      interval: 5s
      base_ejection_time: 30s
      max_ejection_percent: 50      # never eject the whole cluster

max_ejection_percent: 50 is the guardrail that prevents the breaker from amputating the entire upstream and converting a partial degradation into a total one. Pair every breaker with a fallback in the BFF – when reviews are down, the product page renders with reviews: { count: 0 } rather than a 500.

6 – API versioning strategies and managing breaking-change rollout

Versioning is a contract-evolution problem, not a URL problem. The cardinal rule: additive changes are not breaking; everything else is. Adding an optional field, a new endpoint, a new enum value the client is told to ignore – safe. Removing a field, renaming one, tightening validation, changing a type or the meaning of a value – breaking. Engineer for the first category so you rarely need the second.

The pragmatic strategy for an external HTTP API is URI major versioning (/v1, /v2) for breaking changes, combined with strict additive evolution inside a major version. URI versioning is explicit, cache-friendly, trivially routable at the gateway, and unambiguous in logs. Header/media-type versioning (Accept: application/vnd.shop.v2+json) is more RESTful-purist but harder to test, cache, and debug at the edge – a real cost on a public surface.

Rolling out v2 without a coordinated big-bang:

  1. Run v1 and v2 side by side. Route by prefix at the gateway to separate deployments (or one service that serves both contracts). Never break v1 to ship v2.
  2. Build v2 as an adapter over the same domain where possible. Reshaping output (renamed/restructured fields) is a translation layer; you rarely need to fork the whole service.
  3. Deprecate with signals, not surprises. Emit the standard Deprecation and Sunset HTTP headers so clients learn programmatically that an endpoint is going away and when.
HTTP/1.1 200 OK
Deprecation: true
Sunset: Sat, 31 Oct 2026 23:59:59 GMT
Link: <https://api.shop.example.com/v2/products/42>; rel="successor-version"
  1. Instrument adoption. Tag every request with its version and emit per-version, per-client traffic so you can prove no one is still on v1 before you delete it. You retire a version on data, not a calendar promise.

Schema discipline beats version proliferation. With Protobuf, never reuse or renumber a field tag and never change a field’s type – those are wire-breaking. Add new fields with new tags; mark removed ones reserved. With JSON, default-tolerant clients (ignore unknown fields) so the server can add fields freely. Most “we need a new version” requests evaporate once the change is genuinely additive.

7 – Avoiding the chatty gateway and the distributed monolith trap

Two failure modes recur. The chatty gateway is one where the BFF fans out to a service, gets a list of IDs, then loops making one call per ID – a classic N+1 across the network. The product page becomes 1 call for the cart plus 50 calls for line items. Fix it by giving services batch endpoints so the BFF asks once for many:

// Bad: N+1 across the network.
const items = await Promise.all(ids.map((id) => fetchJson(`${CATALOG}/items/${id}`)));

// Good: one batched call. The service does the set lookup efficiently.
const items = await fetchJson(`${CATALOG}/items:batchGet`, {
  method: "POST",
  body: JSON.stringify({ ids }),
});

The distributed monolith trap is subtler and more dangerous: services so tightly coupled that you cannot deploy one without deploying its neighbors in lockstep. The edge is where this leaks in. If your BFF assumes the exact internal shape of three services and any of their changes forces a synchronized BFF release, you have a monolith with network latency between its functions – the worst of both worlds. Defend against it by treating every BFF-to-service call as a versioned contract validated in CI (consumer-driven contract tests with a tool like Pact), and by keeping the BFF tolerant of additive upstream changes. Independent deployability is the property that makes microservices worth their cost; if you have lost it, simplify the topology rather than adding gateways.

A related smell is the gateway making synchronous calls in series to satisfy one request such that p99 latency is the sum of a long chain. Aggregate concurrently (Step 3), cache aggressively at the edge for read-heavy data, and push genuinely slow work to async patterns (return 202 with a status URL) rather than holding an edge worker for seconds.

8 – Observability and security hardening for the public API surface

The public edge is your most-attacked, most-observed surface. Treat it as such.

Observability. Propagate W3C traceparent so a single product-page request is one distributed trace spanning the BFF and all four downstreams – this is how you find which fan-out leg blew the p99. Emit RED metrics (Rate, Errors, Duration) per route, per version, and per client. A query like this (KQL, Azure Monitor / Application Insights) surfaces the routes degrading fastest:

requests
| where timestamp > ago(1h)
| extend route = tostring(customDimensions["route"]),
         apiVersion = tostring(customDimensions["x-edge-version"])
| summarize p99 = percentile(duration, 99),
            errRate = 100.0 * countif(success == false) / count(),
            calls = count()
  by route, apiVersion, bin(timestamp, 5m)
| where calls > 50 and (p99 > 800 or errRate > 1.0)
| order by p99 desc

Security hardening. The non-negotiable list for a public API edge:

Verify

Prove the edge behaves before you call it done.

# 1. AuthN is enforced at the gateway: no token -> 401, never reaches the BFF.
curl -s -o /dev/null -w '%{http_code}\n' https://api.shop.example.com/v1/products/42
# expect: 401

# 2. Aggregation degrades, not fails: kill the reviews service, page still renders.
curl -s -H "Authorization: Bearer $TOKEN" \
  https://api.shop.example.com/v1/products/42 | jq '{price, reviews, _partial}'
# expect: a 200 body with reviews defaulted and "_partial": true

# 3. Rate limit fires per key and returns Retry-After.
for i in $(seq 1 700); do
  curl -s -o /dev/null -w '%{http_code} ' -H "x-api-key: $KEY" \
    https://api.shop.example.com/v1/products/42
done | tr ' ' '\n' | sort | uniq -c
# expect: a mix of 200 and 429 once the per-minute budget is exhausted

# 4. Deprecation signalling on v1 is present.
curl -sI -H "Authorization: Bearer $TOKEN" \
  https://api.shop.example.com/v1/products/42 | grep -iE 'deprecation|sunset'
# expect: Deprecation: true and a Sunset date

# 5. Circuit breaker ejects a failing upstream (observe via gateway stats).
#    With reviews returning 5xx, confirm hosts are ejected and recover after cooldown.
curl -s http://envoy-admin:9901/clusters | grep -E 'reviews.*ejected'

Enterprise scenario

A retail platform team ran a single shared gateway with body-level aggregation expressed in Lua filters inside Envoy – the gateway itself fanned out to product, pricing, and inventory and merged the JSON. It worked until Black Friday traffic exposed two problems. First, the aggregation logic was untestable: a change to the pricing payload shape broke the product page, but it lived in gateway config owned by the platform team, not the frontend team, so it failed CI nowhere and was caught only in production. Second, mobile and web shared that one merged response; mobile was over-fetching 28 KB per product on cellular, and the web team could not change the shape without risking the mobile app.

The constraint was hard: they could not take a maintenance window during peak season, and the mobile app in the field expected the old shape until users updated.

They moved aggregation out of the gateway and into two thin BFFs – one web, one mobile – owned by the respective frontend teams, with the gateway reduced to TLS, JWT validation, rate limiting, and routing only. The mobile BFF returned a trimmed payload; the web BFF kept the rich one. Crucially, they did not break the old contract: the legacy merged endpoint stayed live behind /v1 while the new BFFs served /v2, and they emitted Deprecation/Sunset headers plus per-version traffic metrics to watch migration. Each downstream call became a Pact contract test in CI, so an upstream shape change failed the build instead of the product page. The decisive part of the fix was the resilience config on the new mobile BFF – one slow inventory call had been sinking the whole page, so they made it degradable:

# Mobile BFF -> inventory: tight timeout, breaker, and a documented fallback
# so a slow inventory service degrades the stock badge, not the whole page.
clusters:
  - name: inventory
    connect_timeout: 0.2s
    circuit_breakers:
      thresholds:
        - { max_pending_requests: 64, max_requests: 256 }
    outlier_detection:
      consecutive_5xx: 5
      base_ejection_time: 30s
      max_ejection_percent: 40

The product page p99 dropped from 1.9 s to 410 ms, mobile payload fell from 28 KB to 6 KB, and they retired /v1 ninety days later only after the metrics showed zero traffic on it – on data, not a guess.

Checklist

api-designmicroservicesbffintegrationgateway

Comments

Keep Reading