A US health-insurance payer — the kind that processes tens of millions of claims a year and lives inside HIPAA, HITRUST, and a stack of state regulators — gets a finding from its annual assessment that lands like a brick on the network team’s desk: traffic between application tiers inside its AWS footprint is essentially unwatched. North-south, at the internet edge, there are firewalls and a WAF. But east-west — the claims API talking to the eligibility service, the member portal reaching into the rating engine, a partner integration VPC reaching back into core — moves between subnets and accounts with nothing inspecting the payload. The auditors’ phrasing is precise and damning: “lateral movement would be undetected.” The security leadership turns it into a mandate. Every flow that crosses a trust boundary must pass through a real next-generation inspection stack — IPS, TLS visibility, an application firewall — and it must do so transparently, without re-IP-ing a single workload, without asking forty application teams to change a route table, and without becoming the new reason the claims pipeline is down at 2 a.m.
That last clause is where most designs die. The historical way to force traffic through an inspection appliance is to make the appliance a routed hop — a next-hop in a route table, a NAT, a bump-in-the-wire that mangles addresses. It works for one VPC and becomes a topology nightmare across fifty accounts. The flows that should be inspected get policy-routed around it the first time it causes a latency complaint, and the appliance becomes a single fragile chokepoint that the on-call team is terrified to patch. The payer needs the opposite: inspection that is mandatory by construction, horizontally scalable, and decoupled from the appliance’s own lifecycle so a BIG-IP can be drained, upgraded, and returned without anyone in claims noticing. The pattern that delivers exactly this on AWS is a fleet of F5 BIG-IP virtual appliances behind a Gateway Load Balancer (GWLB), fronted by GENEVE encapsulation, in a dedicated security VPC that every other VPC routes through. This article is the reference architecture for building it the way a payer’s security and network teams will both sign.
Why the obvious approaches fail
Three shortcuts get proposed on every project like this, and naming why each fails saves a quarter of arguing.
Make the firewall a routed next-hop in each VPC. This is the traditional bump-in-the-wire: point the subnet route table at the appliance’s ENI. It re-IPs return paths, forces source NAT to keep flow symmetry, and ties every application VPC’s routing to the health of one appliance. Scaling means manually carving traffic across appliances, and a single failed instance blackholes whatever was pinned to it. It does not survive contact with fifty accounts.
Use the AWS-native managed firewall everywhere and call it done. AWS Network Firewall is excellent and belongs in the picture, but the payer already owns deep F5 expertise, a library of F5 ASM (Advanced WAF) policies tuned over years for its claims and member-facing applications, and signatures it cannot simply abandon. “Rip out F5 and relearn everything” is not an architecture decision; it is a multi-year retraining project the audit deadline does not allow. The win is bringing the existing BIG-IP capability into AWS as a first-class, scalable service — not replacing the team’s hard-won policy with a generic one.
TAP/mirror the traffic to an out-of-band sensor. Traffic mirroring (or a packet broker feeding an IDS) gives you detection but not prevention — a mirror is a copy, so the malicious packet has already been delivered by the time the sensor sees it. The auditors asked for the ability to stop lateral movement, not merely to log it after the fact. Out-of-band is a complement, never the control.
GWLB threads the needle the others cannot. It is a transparent bump-in-the-wire that operates at Layer 3, presents a single endpoint (a GWLB Endpoint, or GWLBE) that consuming VPCs route to like any other gateway, and load-balances flows across a fleet of appliances with flow stickiness so both directions of a connection always hit the same BIG-IP. The appliances live behind it, register and deregister with health checks, and can be scaled or patched without the consumers knowing. Inspection becomes a property of the path, enforced by routing, not a hope that every team configured their VPC correctly.
Architecture overview
The topology has one organizing idea: a centralized security VPC owns the BIG-IP fleet and the Gateway Load Balancer, and every other VPC — application spokes, the partner integration VPC, the egress VPC — reaches it through AWS Transit Gateway and a GWLB Endpoint placed inside an inspection subnet in that spoke. Traffic that needs scrubbing is steered into the GWLBE by route tables; the GWLBE is a PrivateLink-style interface endpoint that tunnels the entire original packet to the load balancer.
The mechanism that makes this transparent is GENEVE (Generic Network Virtualization Encapsulation) on UDP 6081. When a flow arrives at the GWLBE, GWLB wraps the original Layer-3 packet — addresses, ports, payload, untouched — inside a GENEVE tunnel and forwards it to a chosen BIG-IP target. The original packet rides inside as payload, so the appliance sees the real source and destination, not a NATed substitute. GWLB attaches GENEVE TLV metadata (the endpoint ID, flow cookie) so the appliance and the load balancer agree on which flow this is. The BIG-IP decapsulates, runs the packet through its full inspection stack — IPS signatures, ASM/Advanced WAF application-layer policy, optional TLS decryption and re-encryption, malware and DLP modules — and if the verdict is allow, re-encapsulates the (possibly modified) packet back into GENEVE and returns it to GWLB, which forwards it onward to its original destination. If the verdict is deny, the appliance simply drops it. Nothing downstream is re-addressed; the workloads never learn an inspector exists.
Following an east-west flow, control and data together:
- The claims API in the claims-prod VPC initiates a connection to the eligibility service in the core-prod VPC. Both attach to Transit Gateway.
- TGW route tables are built so that inter-VPC traffic is sent to the security VPC’s inspection attachment first — this is the appliance-mode TGW attachment that guarantees flow symmetry, so the return packet comes back to the same Availability Zone’s appliance.
- Inside the spoke’s inspection subnet, the GWLB Endpoint receives the packet and GWLB GENEVE-encapsulates it to a healthy F5 BIG-IP target in that AZ, with flow stickiness based on the 5-tuple.
- The BIG-IP decapsulates, runs IPS + ASM; for member-facing or API flows it can terminate TLS using keys it pulls from HashiCorp Vault (so private keys never live on the appliance’s disk), inspect cleartext, then re-encrypt.
- On allow, the appliance re-encapsulates and hands the packet back to GWLB, which forwards it to the eligibility service. On deny, it is dropped and a log event fires.
- The return path retraces the same AZ’s appliance because of appliance-mode symmetry; the connection completes with full bidirectional inspection and the workloads none the wiser.
North-south and egress ride the same fleet: internet-bound traffic from the spokes is steered through the security VPC’s GWLBE before it reaches the egress NAT/IGW path, so the same BIG-IP policy that watches east-west also scrubs outbound — catching, for instance, a compromised host beaconing to a command-and-control domain.
Component breakdown
| Layer | Service / tool | Role in the platform | Key configuration choices |
|---|---|---|---|
| Steering fabric | AWS Transit Gateway | Connects all VPCs; forces inter-VPC traffic through the security VPC | Appliance-mode on the inspection attachment for flow symmetry; separate route tables per security zone |
| Transparent LB | Gateway Load Balancer | Single endpoint; GENEVE tunneling; flow-sticky distribution across appliances | Cross-zone off (keep flow in-AZ); 5-tuple stickiness; deregistration delay tuned for drains |
| Consumer hook | GWLB Endpoint (GWLBE) | The “gateway” each spoke routes to; PrivateLink-style entry into GWLB | One GWLBE per AZ per consuming VPC; route tables point at it |
| Inspection appliances | F5 BIG-IP (virtual edition) | IPS, ASM/Advanced WAF, TLS visibility, DLP — the actual scrubbing | Auto Scaling group across AZs; GENEVE/UDP 6081 listener; health-check VIP |
| App firewall policy | F5 ASM (Advanced WAF) | Layer-7 protection for claims/member APIs | OWASP Top-10 signatures; bot defense; per-app policy from existing on-prem library |
| Secrets / TLS keys | HashiCorp Vault | Issues and rotates the private keys used for TLS decryption | Appliance authenticates via IAM auth method; short-lived leases; keys never on disk |
| Cloud posture | Wiz / Wiz Code | Detects any route that bypasses inspection; scans the IaC before it ships | Attack-path analysis on TGW/route tables; Wiz Code gate on Terraform PRs |
| Runtime security | CrowdStrike Falcon | Endpoint/runtime protection on the BIG-IP host and protected workloads | Sensor on appliance host OS and EC2 fleets; detections to the SOC |
| Observability | Datadog / Dynatrace | Flow telemetry, appliance health, latency-added budgets, traces | GENEVE flow logs ingested; per-AZ appliance dashboards; SLO on added latency |
| ITSM / change | ServiceNow | Change gates for policy edits and fleet upgrades; auto-incidents on drops | CAB approval before an ASM policy change; ticket on inspection-bypass alert |
| CI / IaC | GitHub Actions + Terraform | Builds the network and appliance fleet as code; Argo CD for policy GitOps | OIDC to AWS (no stored creds); plan/apply gates; Ansible for BIG-IP config |
| Config management | Ansible | Declarative BIG-IP provisioning (VIPs, profiles, policies) at scale | f5networks collection; idempotent runs against the Auto Scaling fleet |
A few of these choices are the ones teams get wrong, so they deserve the why.
Why appliance-mode on the Transit Gateway attachment is non-negotiable. Stateful inspection requires that both directions of a flow traverse the same appliance — a BIG-IP that saw the SYN must see the SYN-ACK, or the connection table desyncs and the firewall drops legitimate traffic. Without appliance mode, TGW can hash the forward and return paths to different AZs, landing on different appliances. Turning on appliance mode tells TGW to maintain AZ affinity for the life of the flow, which combined with GWLB’s in-AZ stickiness keeps each connection pinned to one inspector end to end. Skip this and you will spend a week debugging “random” asymmetric-flow drops.
Why GENEVE rather than a routed next-hop. GENEVE preserves the original packet verbatim and carries flow metadata in TLVs, so the appliance inspects real addresses and GWLB can guarantee stickiness without NAT. The alternative — making each appliance a routed hop — forces source NAT to keep symmetry, which destroys the very source-IP visibility the auditors want and couples every consumer’s routing to the appliance’s health. GENEVE keeps inspection a transparent overlay.
Why keep TLS keys in Vault, not on the appliance. Inspecting TLS means the BIG-IP terminates and re-originates the session, which requires the server’s private key. Baking keys onto the appliance image turns every snapshot and every scaled-out instance into a key-exfiltration risk. Instead the appliance authenticates to HashiCorp Vault (IAM auth method, bound to the instance role) and pulls short-lived key material at boot, with leases that expire and rotate centrally. A decommissioned appliance carries no usable secret. This is the difference between “we can decrypt for inspection” and “we have scattered our crown-jewel keys across an Auto Scaling group.”
Failure modes and high availability
The whole point of the GWLB pattern is that the appliance is no longer the fragile thing. Design the failure behavior explicitly.
Appliance failure. Each BIG-IP exposes a health-check VIP that GWLB probes. On failure, GWLB stops sending new flows to that target and — critically — you must decide the fate of existing flows. With cross-zone load balancing disabled, a dead appliance’s in-flight connections reset, but new flows in that AZ immediately go to a healthy peer. Run at least two appliances per AZ across at least two AZs, fronted by an Auto Scaling group, so a single instance or even a full AZ loss degrades capacity without losing the control.
The “fail-open vs fail-closed” decision is a policy call, not a default. If every appliance in an AZ is unhealthy, do flows in that AZ bypass inspection (fail-open, available but unguarded) or stop (fail-closed, guarded but down)? For a HIPAA payer, some zones — the partner integration VPC, anything PHI-bearing — are fail-closed: better an outage than uninspected lateral movement. Lower-sensitivity zones may fail-open to preserve availability. You encode this in how the spoke route tables and TGW routes behave when the GWLBE has no healthy backends, and you document it so the on-call engineer is not improvising at 2 a.m.
Asymmetric routing is the classic silent killer — covered above; appliance mode plus in-AZ stickiness is the cure, and a synthetic flow test in the pipeline is how you keep it cured after every change.
MTU and fragmentation. GENEVE adds tunnel overhead (~50+ bytes). If you do not account for it, large packets fragment or get black-holed and you get baffling “works for small requests, hangs on big uploads” reports. AWS supports jumbo frames within a VPC; size the inspection path’s MTU deliberately and test with full-size payloads.
Scaling
The fleet scales horizontally, which is the property the old routed-hop design never had.
| Scaling lever | What it addresses | How |
|---|---|---|
| Auto Scaling the BIG-IP fleet | Aggregate throughput / connection count | Scale on appliance CPU, concurrent connections, and GENEVE flow count; warm pool to cut boot time |
| Adding AZs | Resilience and locality | More GWLB targets and GWLBEs; each AZ self-contained for flow symmetry |
| Right-sizing the instance | Per-flow inspection cost (TLS is expensive) | TLS decryption is CPU-heavy; size for the decrypted throughput, not the wire rate |
| Selective inspection | Avoid paying to inspect benign bulk | Route only trust-boundary-crossing flows through the GWLBE; keep intra-tier chatter local |
The subtle cost trap is TLS decryption: terminating and re-encrypting at line rate can cut an appliance’s effective throughput by more than half versus passthrough. Size instances against the cleartext rate you actually need to inspect, and be deliberate about which flows are decrypted — you rarely need to crack open every internal call, only those crossing into or out of sensitive zones.
Security and governance
The architecture is a security control, so its own posture must be unimpeachable.
- Drift detection on the steering fabric. The entire model rests on route tables sending traffic through the GWLBE. A single fat-fingered route that bypasses inspection silently re-opens the audit finding. Wiz runs attack-path analysis across the TGW and VPC route tables and alerts on any path that reaches a protected workload without traversing the security VPC; Wiz Code gates the Terraform pull request so a bypassing route never merges in the first place.
- Identity for the operators. Network and security engineers reach the BIG-IP management plane and the AWS console through SSO — Okta federated to AWS IAM Identity Center (or Entra ID where the payer’s directory lives) — with MFA and just-in-time elevation, so appliance administration is tied to a named human, not a shared
adminpassword. - Runtime protection of the inspectors themselves. CrowdStrike Falcon sensors run on the BIG-IP host OS and the protected EC2 fleets; if an appliance host is tampered with, the SOC sees it. An inspection box that is itself compromised is the worst-case blast radius, so it gets the strongest endpoint posture.
- Change control with teeth. ASM policy edits and fleet upgrades flow through ServiceNow change requests with CAB approval, and any inspection-bypass or fail-closed-drop alert auto-opens an incident. The audit trail — who changed which signature, when, approved by whom — is the artifact the next assessment wants to see.
- Posture as code. Policies are versioned in Git and rolled out via Argo CD (GitOps) and Ansible against the fleet, so the running configuration always matches a reviewed commit and a rollback is a
git revert, not a midnight console session.
Observability
You cannot operate an inline control you cannot see. Datadog (or Dynatrace) ingests GWLB and GENEVE flow logs, BIG-IP appliance metrics, and connection telemetry into per-AZ dashboards. Three signals matter most: appliance health and capacity (CPU, concurrent connections, flow count per instance, so Auto Scaling fires before saturation); added latency (the inline tax — hold an SLO, e.g. p99 inspection overhead under a few milliseconds for passthrough, and alert when TLS-decrypted paths breach budget); and deny/anomaly events (drops, ASM rule hits, IPS signatures), which both feed the SOC and prove to auditors the control is active, not merely deployed. Dynatrace’s anomaly detection on flow patterns is useful for catching the slow drift — an appliance quietly degrading — that threshold alerts miss.
Cost and explicit tradeoffs
Honest economics, because this is not a free control.
- GWLB charges per hour and per GB processed, on top of Transit Gateway attachment and data-processing fees and the F5 BIG-IP licensing (BYOL or marketplace hourly) across the fleet. Centralizing the appliances in one security VPC is itself the main cost optimization — you amortize a shared fleet across every spoke instead of standing up firewalls in each account.
- The latency tax is real. Every inspected flow takes an extra hop through TGW and the appliance, and TLS decryption adds materially more. For most claims and portal traffic this is invisible; for a latency-critical internal path you may consciously exclude it from inspection and document the risk acceptance.
- Selective steering is the lever. You are not obligated to inspect every byte. Route only flows crossing trust boundaries through the GWLBE; let intra-application chatter stay local. This is the single biggest knob on both cost and latency.
| Tradeoff | The cost | Why the payer accepts it |
|---|---|---|
| Centralized security VPC | A blast-radius and capacity concentration point | HA fleet across AZs + fail-closed on sensitive zones; one place to govern beats fifty to audit |
| Inline (not out-of-band) | Added latency; appliances on the critical path | Auditors required prevention, not just detection — a mirror cannot stop lateral movement |
| TLS decryption | CPU cost, key-management burden, privacy considerations | PHI-bearing, sensitive-zone flows only; keys in Vault; cleartext visibility is the whole ask |
| F5 over AWS-native firewall | Licensing and operational ownership | Reuses years of tuned ASM policy and deep team expertise; no multi-year relearn under a deadline |
Closing
The audit finding was, at heart, a routing problem dressed as a security one: traffic crossed trust boundaries with nothing in the path to inspect it, and every traditional fix made the inspector a fragile routed hop that teams would eventually engineer around. The GWLB-plus-BIG-IP pattern inverts that. Inspection becomes a property of the path — enforced by Transit Gateway routing and a transparent GENEVE overlay — while the appliances themselves become a stateless-to-the-consumer, horizontally scalable, independently patchable fleet behind a single endpoint. The payer keeps its hard-won F5 ASM policy, gains mandatory east-west and egress scrubbing, and gets the one thing the assessors actually demanded: the ability to stop lateral movement, not just to read about it afterward. Wrap it in Vault-managed keys, Wiz drift detection, Falcon runtime protection, ServiceNow change control, and Datadog visibility, and you have a control a payer’s CISO and network lead will both put their name on — and one the next audit closes instead of reopens.