A payments company is consolidating twelve product VPCs behind a shared-services account and the security architect has one hard requirement: every packet leaving a workload VPC for the internet, and every packet crossing between VPCs, must pass through a real F5 BIG-IP for L4-L7 inspection — TLS visibility, an iRule-driven WAF policy, and IPS — with no single point of failure and no re-IP of the application fleet. The previous design pinned all traffic through a single BIG-IP in one Availability Zone; when that AZ blipped during a maintenance window, the entire egress path went dark and the on-call paged at 3 a.m. This guide rebuilds that inspection layer the way it should have been done the first time: a pair (or more) of BIG-IP Virtual Edition appliances behind an AWS Gateway Load Balancer (GWLB), inspecting transparently in active-active across two AZs, so losing an appliance or a whole zone costs you capacity, not connectivity.
GWLB is the piece that makes this clean. It is a bump-in-the-wire L3 load balancer that uses the GENEVE protocol (UDP 6081) to tunnel original, unmodified packets to a pool of appliances and bring them back, while preserving 5-tuple flow stickiness so both directions of a connection always hit the same BIG-IP — which is exactly what a stateful firewall needs. You insert it with a GWLB endpoint (GWLBE) in each spoke and a few route-table edits; the application VPCs never learn that an inspection layer exists. This is the centralized-inspection pattern AWS documents, built with F5 as the virtual appliance.
Prerequisites
- An AWS account (or a shared inspection/security VPC in a multi-account org via AWS Organizations) and Terraform >= 1.6 with the
awsprovider>= 5.40. Ansible is optional for in-guest BIG-IP config. - A subscribed F5 BIG-IP Virtual Edition AMI from AWS Marketplace — either BYOL or a PAYG / hourly bundle that includes LTM + AFM + ASM (you need the firewall and WAF modules, not just LTM). Note the AMI ID per region.
- An EC2 key pair, and an instance type F5 supports for VE —
m5.xlargeor larger (BIG-IP VE wants 8 GB+ RAM and at least 4 vCPU for production throughput). - A grasp of GWLB, GENEVE, VPC route tables, and the fact that GWLB targets are registered by instance ID or IP, health-checked on a TCP/HTTP port you choose.
- CLI access:
awsv2 configured,terraform,jq, and SSH. A bastion or SSM access into the management subnet. - IAM permission to create VPCs, GWLB, VPC endpoints, route tables, EC2 instances, and (for secrets) read access to HashiCorp Vault.
Target topology
The design is a centralized inspection VPC fronted by GWLB, with spoke VPCs steering traffic to it through GWLB endpoints:
- A dedicated inspection VPC (
10.100.0.0/16) spanning two AZs (aandb). Each AZ holds three subnets per BIG-IP: a management subnet, a data/GENEVE subnet where GWLB sends tunneled traffic, and the appliances themselves. - Two BIG-IP VE instances — one per AZ — each registered as a GWLB target. Both are active and inspecting simultaneously; GWLB hashes flows across them. Lose one and its share rebalances to the survivor.
- A Gateway Load Balancer with one target group containing both BIG-IPs, health-checked so a failed appliance is pulled in seconds.
- A GWLB endpoint service (powered by AWS PrivateLink) that the spokes consume. Each spoke VPC gets a GWLBE in each AZ.
- Route tables that force the path: spoke subnet → GWLBE → GWLB → BIG-IP (inspect) → GWLB → back out via the Internet/NAT gateway, which lives in the inspection VPC. Symmetric routing is enforced by GWLB flow stickiness, so return traffic lands on the same appliance.
Everything below provisions this with Terraform, configures the BIG-IPs to inspect GENEVE traffic, validates the path, and wires it into the operating model.
1. Lay down the inspection VPC and BIG-IP subnets
Start with the network. Keep management, data, and the GWLB target subnets separate per AZ — BIG-IP VE is multi-NIC and you do not want GENEVE traffic and SSH sharing an interface.
# versions.tf
terraform {
required_version = ">= 1.6"
required_providers {
aws = { source = "hashicorp/aws", version = ">= 5.40" }
}
}
# vpc.tf
locals {
azs = ["us-east-1a", "us-east-1b"]
}
resource "aws_vpc" "inspection" {
cidr_block = "10.100.0.0/16"
enable_dns_hostnames = true
tags = { Name = "insp-vpc", Tier = "security" }
}
# Per-AZ: mgmt (.0.x/.1.x), data/GWLB (.10.x/.11.x)
resource "aws_subnet" "mgmt" {
for_each = { "a" = ["10.100.0.0/24", local.azs[0]], "b" = ["10.100.1.0/24", local.azs[1]] }
vpc_id = aws_vpc.inspection.id
cidr_block = each.value[0]
availability_zone = each.value[1]
tags = { Name = "insp-mgmt-${each.key}" }
}
resource "aws_subnet" "data" {
for_each = { "a" = ["10.100.10.0/24", local.azs[0]], "b" = ["10.100.11.0/24", local.azs[1]] }
vpc_id = aws_vpc.inspection.id
cidr_block = each.value[0]
availability_zone = each.value[1]
tags = { Name = "insp-data-${each.key}" }
}
# Public subnets for the egress NAT/IGW path, one per AZ
resource "aws_subnet" "public" {
for_each = { "a" = ["10.100.20.0/24", local.azs[0]], "b" = ["10.100.21.0/24", local.azs[1]] }
vpc_id = aws_vpc.inspection.id
cidr_block = each.value[0]
availability_zone = each.value[1]
map_public_ip_on_launch = true
tags = { Name = "insp-public-${each.key}" }
}
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.inspection.id
tags = { Name = "insp-igw" }
}
Apply this slice first so the AZ/subnet IDs exist for the appliance and GWLB resources:
terraform init
terraform apply -target=aws_vpc.inspection \
-target=aws_subnet.mgmt -target=aws_subnet.data \
-target=aws_subnet.public -target=aws_internet_gateway.igw
2. Launch two BIG-IP VE appliances, one per AZ
Each BIG-IP gets two NICs: mgmt (eth0, for the GUI/SSH/iControl REST API) and data (eth1, where GWLB delivers GENEVE-tunneled packets). Pull the Marketplace AMI dynamically so the module is region-portable.
# bigip.tf
data "aws_ami" "bigip" {
most_recent = true
owners = ["679593333241"] # AWS Marketplace
filter {
name = "name"
# PAYG bundle with LTM+AFM+ASM (Good/Better/Best). Match your subscription.
values = ["F5 BIGIP-17.* PAYG-Best 25Mbps*"]
}
}
resource "aws_security_group" "bigip_mgmt" {
name_prefix = "bigip-mgmt-"
vpc_id = aws_vpc.inspection.id
ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["10.0.0.0/8"] }
ingress { from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["10.0.0.0/8"] }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] }
}
resource "aws_security_group" "bigip_data" {
name_prefix = "bigip-data-"
vpc_id = aws_vpc.inspection.id
# GENEVE from GWLB
ingress { from_port = 6081 to_port = 6081 protocol = "udp" cidr_blocks = ["10.100.0.0/16"] }
# GWLB health check (TCP/8080 to a BIG-IP virtual server we create in step 4)
ingress { from_port = 8080 to_port = 8080 protocol = "tcp" cidr_blocks = ["10.100.0.0/16"] }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] }
}
# Data-plane ENIs (source/dest check OFF — this box routes through, not to, itself)
resource "aws_network_interface" "data" {
for_each = aws_subnet.data
subnet_id = each.value.id
security_groups = [aws_security_group.bigip_data.id]
source_dest_check = false
tags = { Name = "bigip-data-${each.key}" }
}
resource "aws_instance" "bigip" {
for_each = aws_subnet.mgmt
ami = data.aws_ami.bigip.id
instance_type = "m5.xlarge"
key_name = var.key_pair_name
# eth0 = mgmt
subnet_id = each.value.id
vpc_security_group_ids = [aws_security_group.bigip_mgmt.id]
# eth1 = data
network_interface {
network_interface_id = aws_network_interface.data[each.key].id
device_index = 1
}
user_data = file("${path.module}/bigip-onboard.sh") # F5 Declarative Onboarding bootstrap
tags = { Name = "bigip-ve-${each.key}", Role = "inspection" }
}
The source_dest_check = false on the data ENI is mandatory — without it AWS drops the transit packets and you will chase a phantom “GENEVE arrives but nothing returns” bug for hours. Set the management password and licensing via F5 Declarative Onboarding (DO) in bigip-onboard.sh; pull the BIG-IP admin password and (for BYOL) the license key from HashiCorp Vault at boot rather than baking them into user-data:
#!/bin/bash
# bigip-onboard.sh — runs on first boot via cloud-init
ADMIN_PW=$(vault kv get -field=admin_password secret/aws/bigip/inspection)
cat > /config/cloud/do.json <<EOF
{ "schema_version": "1.0.0", "class": "Device", "Common": { "class": "Tenant",
"myProvision": { "class": "Provision", "ltm": "nominal", "afm": "nominal", "asm": "nominal" },
"admin": { "class": "User", "userType": "regular", "password": "${ADMIN_PW}", "shell": "bash" }
}}
EOF
# DO is applied by f5-cloud-libs already present on the VE image
3. Create the Gateway Load Balancer and target group
GWLB lives in the data subnets and load-balances across both BIG-IPs. The target group protocol is GENEVE on port 6081; the health check is a separate, ordinary TCP/HTTP probe to a port the BIG-IP answers.
# gwlb.tf
resource "aws_lb" "gwlb" {
name = "insp-gwlb"
load_balancer_type = "gateway"
subnets = [for s in aws_subnet.data : s.id]
tags = { Name = "insp-gwlb" }
}
resource "aws_lb_target_group" "bigip" {
name = "insp-bigip-tg"
protocol = "GENEVE"
port = 6081
vpc_id = aws_vpc.inspection.id
target_type = "instance"
health_check {
protocol = "TCP"
port = 8080 # answered by the health-check virtual server in step 4
interval = 10
healthy_threshold = 3
unhealthy_threshold = 3
}
# Keep both directions of a flow on the same appliance after a target change
stickiness { type = "source_ip_dest_ip_proto" enabled = true }
}
resource "aws_lb_target_group_attachment" "bigip" {
for_each = aws_instance.bigip
target_group_arn = aws_lb_target_group.bigip.arn
target_id = each.value.id
}
resource "aws_lb_listener" "gwlb" {
load_balancer_arn = aws_lb.gwlb.arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.bigip.arn
}
}
# Publish GWLB as an endpoint service the spokes consume
resource "aws_vpc_endpoint_service" "gwlb" {
acceptance_required = false
gateway_load_balancer_arns = [aws_lb.gwlb.arn]
tags = { Name = "insp-gwlb-svc" }
}
The stickiness 3-tuple (source_ip_dest_ip_proto) is the active-active correctness guarantee: if a target is added or removed, existing flows keep landing on their original appliance instead of being rehashed mid-connection and dropped by the stateful firewall.
4. Configure BIG-IP to inspect the GENEVE traffic
This is the F5-specific heart of the build, done in-guest (via the GUI, tmsh, or Ansible with f5networks.f5_modules). BIG-IP must (a) terminate the GENEVE tunnel from GWLB, (b) hand decapsulated traffic to a wildcard forwarding virtual server that applies your firewall/WAF policy, and © expose a health-check VIP on TCP 8080.
# Run on each BIG-IP. Create the GENEVE tunnel that faces GWLB.
create net tunnels tunnel gwlb-tunnel { profile geneve local-address 10.100.10.10 }
# A wildcard "L3 forwarding" virtual that inspects everything arriving on the tunnel.
# type forwarding(ip) = transparent in/out; attach AFM + ASM policy here.
create ltm virtual vs_inspect_all {
destination 0.0.0.0:any
ip-protocol any
profiles add { fastL4 { } }
vlans add { gwlb-tunnel }
vlans-enabled
translate-address disabled
translate-port disabled
}
# Health-check responder GWLB probes on TCP 8080
create ltm virtual vs_healthcheck {
destination 0.0.0.0:8080
ip-protocol tcp
profiles add { tcp { } }
rules { hc_200_ok }
}
create ltm rule hc_200_ok {
when CLIENT_ACCEPTED { TCP::respond "HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\r\n"; TCP::close }
}
saveDB sys config
Attach your security policy to vs_inspect_all: an AFM network-firewall policy for the L3-L4 rules and an ASM WAF policy (or an iRule) for L7. Because the virtual is translate-address disabled, BIG-IP inspects the original client IP and destination transparently — the application servers still see the real source, and GWLB returns the packet over the same GENEVE tunnel. This is what “transparent bump-in-the-wire” means in F5 terms.
5. Insert GWLB endpoints in the spokes and fix the route tables
Now make a spoke VPC actually use the inspection layer. Create a GWLBE per AZ in the spoke, then edit route tables so traffic detours through it. The pattern for egress inspection: the spoke’s private subnets default-route to the GWLBE; the GWLBE’s subnet default-routes to the local IGW/NAT path (which here lives in the inspection VPC, reached via a Transit Gateway or VPC peering in a real multi-VPC build — shown single-VPC here for clarity).
# spoke-endpoints.tf
resource "aws_vpc_endpoint" "gwlbe" {
for_each = toset(["a", "b"])
service_name = aws_vpc_endpoint_service.gwlb.service_name
vpc_endpoint_type = "GatewayLoadBalancer"
vpc_id = aws_vpc.spoke.id
subnet_ids = [aws_subnet.spoke_gwlbe[each.key].id]
}
# Spoke app subnet default-routes INTO the GWLB endpoint (traffic gets inspected)
resource "aws_route" "app_to_gwlbe" {
for_each = aws_route_table.spoke_app
route_table_id = each.value.id
destination_cidr_block = "0.0.0.0/0"
vpc_endpoint_id = aws_vpc_endpoint.gwlbe[each.key].id
}
# Inspection VPC: after BIG-IP, traffic egresses to the internet via IGW
resource "aws_route_table" "data_egress" {
for_each = aws_subnet.data
vpc_id = aws_vpc.inspection.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.igw.id
}
}
Apply the whole stack and capture the GWLB endpoint IDs:
terraform plan -out tf.plan && terraform apply tf.plan
terraform output -json | jq '.gwlbe_ids'
aws ec2 describe-vpc-endpoints \
--filters Name=vpc-endpoint-type,Values=GatewayLoadBalancer \
--query 'VpcEndpoints[].{id:VpcEndpointId,state:State}' --output table
6. Wire it into the operating model
Inspection appliances that no one watches rot. Bolt the BIG-IP pair onto the platform you already run:
- Identity / management access. Front the BIG-IP Configuration Utility and iControl REST API with SSO via Okta (or Microsoft Entra ID) using SAML, so admins authenticate with the corporate IdP and MFA instead of local BIG-IP accounts; map IdP groups to BIG-IP roles (admin vs. auditor). Local
adminstays as glass-break only, its password leased from HashiCorp Vault. - Secrets. All sensitive values — the BIG-IP admin password, BYOL license/registration keys, the TLS private keys used for SSL-forward-proxy inspection — live in HashiCorp Vault and are pulled at boot or rotated via the Vault agent, never committed to the Terraform repo.
- Cloud posture & IaC scanning. Run Wiz for agentless CSPM across the inspection account — it flags a BIG-IP data ENI that drifts to a public IP, a security group opened too wide, or a missing
source_dest_checkchange — and run Wiz Code in the pull request to scan the Terraform for misconfigurations (public exposure, permissive SGs) before merge. - Runtime security. Endpoint workloads behind the inspection layer run CrowdStrike Falcon sensors for runtime threat detection on the EC2 fleet; Falcon detections and BIG-IP AFM/ASM blocks both feed the SOC, giving correlated network-plus-host visibility.
- Observability. Ship BIG-IP telemetry with the F5 Telemetry Streaming (TS) declaration to Datadog (or Dynatrace) — throughput per appliance, CPU/TMM utilization, active connections, and GWLB target health — with a dashboard and a monitor that alerts when a target goes unhealthy or one appliance carries a lopsided share of flows. Pair it with CloudWatch metrics on the GWLB target group (
HealthyHostCount,UnHealthyHostCount). - Change management. Every Terraform apply and every BIG-IP policy change flows through a ServiceNow change request; a guardrail breach (an appliance unhealthy for > 5 min, a sustained ASM attack signature) auto-raises a ServiceNow incident so network security gets a ticket, not just a Datadog blip.
- Pipeline. The Terraform and the BIG-IP AS3/DO declarations live in Git; GitHub Actions (or Jenkins) runs
terraform plan, the Wiz Code scan, and a policy lint on every PR, authenticating to AWS via OIDC with no stored keys. For the Kubernetes-adjacent pieces of the platform, Argo CD reconciles the declared state. Ansible (withf5networks.f5_modules) applies the in-guest tmsh/AS3 config so the appliances are reproducible, not hand-built.
This is also exactly the kind of shared virtual-appliance inspection layer that fronts other internet-facing estates — e.g. a university’s Moodle LMS or an Akamai-fronted web property where Akamai handles edge CDN/WAF and the BIG-IP pair does the deeper L4-L7 inspection and IPS on traffic reaching origin.
Validation
Prove the path before you trust it.
-
Targets healthy. Both BIG-IPs should be
healthyin the GWLB target group:aws elbv2 describe-target-health \ --target-group-arn "$(terraform output -raw bigip_tg_arn)" \ --query 'TargetHealthDescriptions[].{id:Target.Id,health:TargetHealth.State}' --output table -
Traffic is actually inspected. From a spoke instance, generate egress and watch it on the BIG-IP. On each appliance:
# On the BIG-IP, confirm GENEVE arrives and the wildcard virtual sees connections tcpdump -nni 0.0 udp port 6081 -c 20 # GENEVE-tunneled packets from GWLB tmsh show ltm virtual vs_inspect_all | grep -E 'Connections|Bits'From the spoke:
curl -s https://ifconfig.me ; echo # should succeed via the inspection path -
Active-active distribution. Run sustained traffic from several source IPs and confirm connections land on both appliances (
tmsh show ltm virtualon each shows non-zero, roughly balanced connection counts) — GWLB hashes by flow, so many flows spread; a single long flow stays pinned. -
AFM/ASM enforcement. Send a request that violates the WAF policy (e.g. a crafted SQLi pattern) and confirm BIG-IP blocks it and logs the event to Datadog/the SOC.
Failover test (chaos drill)
Do not wait for a real AZ event. Stop one BIG-IP and prove the survivor carries the load:
aws ec2 stop-instances --instance-ids "$(terraform output -raw bigip_a_id)"
# Within ~30s (3 failed health checks * 10s) the target group marks it unhealthy
watch -n5 'aws elbv2 describe-target-health \
--target-group-arn "$(terraform output -raw bigip_tg_arn)" \
--query "TargetHealthDescriptions[].TargetHealth.State"'
# Re-run the curl from the spoke — egress must still succeed, now via bigip-b only
New flows immediately hash to the healthy appliance. Existing flows pinned to the stopped box reset once — acceptable for a stateful firewall failure — and reconnect through the survivor. Start the instance again and confirm flows rebalance.
Rollback / teardown
Because the spokes only reach the inspection layer through a default route into the GWLBE, you can un-insert the inspection layer instantly without destroying the appliances — restore the spoke’s default route to its own NAT gateway:
# Emergency bypass: point the spoke app subnets back at the NAT gateway
aws ec2 replace-route --route-table-id "$SPOKE_RTB" \
--destination-cidr-block 0.0.0.0/0 --nat-gateway-id "$SPOKE_NAT"
Full teardown is ordinary Terraform, but order matters — GWLB endpoints must go before the endpoint service and the GWLB:
terraform destroy -target=aws_vpc_endpoint.gwlbe # remove spoke endpoints first
terraform destroy -target=aws_vpc_endpoint_service.gwlb
terraform destroy # then the rest
If a destroy hangs on the endpoint service, confirm no GWLBE connections remain (aws ec2 describe-vpc-endpoint-connections); a lingering accepted connection blocks deletion.
Common pitfalls
source_dest_checkleft on. The single most common failure: GENEVE packets arrive at the BIG-IP data ENI but never return. Disable source/dest check on the data ENI (step 2).- Health check pointed at GENEVE/6081. GWLB cannot health-check the GENEVE port itself — it needs a real TCP/HTTP responder. Use the dedicated TCP/8080 virtual server (step 4); if you skip it, both targets show
unhealthyand all traffic blackholes. - Asymmetric routing. If return traffic does not come back through GWLB (e.g. the inspection VPC’s egress route is wrong), the stateful BIG-IP sees half a conversation and drops it. Keep
stickinesson and verify both directions traverse the appliance. - Cross-AZ data charges and AZ affinity. GWLB keeps traffic in-AZ when an in-AZ target is healthy; if only the other AZ’s appliance is up, traffic crosses AZs (works, but you pay for it). Run one appliance per AZ so each zone is self-sufficient.
- MTU. GENEVE adds encapsulation overhead. Either lower the workload MTU or ensure the path supports the larger frames, or large packets fragment and throughput tanks.
- Marketplace AMI not subscribed. Terraform
applyfails with an opaque error if the F5 BIG-IP Marketplace listing has not been subscribed in that account/region. Subscribe first. - Wrong module bundle. A PAYG image with only LTM cannot run AFM/ASM. Pick the Best bundle (LTM+AFM+ASM) for full inspection.
Security notes
Run the appliances under least privilege: management plane reachable only from the corporate CIDR or via SSM, SAML SSO through Okta/Entra with MFA for admins, and the local admin account as glass-break only with its credential in HashiCorp Vault. For TLS inspection (SSL forward proxy), the signing CA key is the crown jewel — store it in Vault, never on disk in the repo. Let Wiz continuously verify the posture (no public data ENI, no over-broad SG) and Wiz Code gate the Terraform PR; let CrowdStrike Falcon cover the host runtime. AFM enforces the L3-L4 firewall and ASM the L7 WAF on the wildcard virtual, so inspection is a real control, not a passthrough. Pin the BIG-IP software version explicitly and patch through the ServiceNow change gate.
Cost notes
The recurring spend is two BIG-IP VE instances (m5.xlarge on-demand is meaningful 24/7 — buy Savings Plans or Reserved Instances once the design is stable), the F5 license (PAYG hourly bundle vs. cheaper-at-scale BYOL — BYOL wins past steady utilization), the GWLB hourly + LCU/GB-processed charge, and the GWLB endpoint hourly + data-processing charge per spoke. Watch cross-AZ data transfer: keeping one appliance per AZ avoids the cross-zone charge that a single-AZ design silently incurs. Right-size the instance type to measured throughput in Datadog rather than guessing — BIG-IP VE PAYG tiers are throughput-capped (e.g. 25 Mbps/200 Mbps/1 Gbps), so an over-spec’d tier is pure waste. Scaling out (more appliances in the same target group) is the lever for capacity; the active-active design means each added appliance adds usable throughput, not just standby.