AWS Lesson 47 of 123

Designing Multi-Account VPC Connectivity with Transit Gateway and Centralized Egress

A two-account network you can mesh by hand. A forty-account estate with prod/non-prod isolation, a single internet egress point, and on-prem connectivity is an architecture problem. Get the hub wrong early and you inherit a peering mess, overlapping CIDRs you can never renumber, and a flat network where any compromised workload can reach every other one. This is how to build the hub correctly the first time — and how to read it at 02:00 when a spoke “can’t reach the database” and you need to know in five minutes whether the problem is a missing route, a wrong association, a default-allow firewall, or an asymmetric flow the stateful engine is silently dropping.

The keystone is the AWS Transit Gateway (TGW) — a regional, horizontally-scaled cloud router that every VPC and every on-prem link attaches to once, replacing the O(N²) tangle of VPC peering. But a TGW is only as good as the discipline around it: a non-overlapping CIDR plan enforced by IPAM, Resource Access Manager (RAM) sharing so spokes attach without an invitation dance, and above all route-table segmentation — the single feature that lets one shared router enforce that prod and non-prod can never reach each other while both reach shared services and a single inspected egress point. Isolation here is the absence of a route, not a firewall rule, and that distinction is the whole game.

By the end you will be able to design the hub, share it across an AWS Organization, segment it into routing domains, force all egress through a central inspected path, resolve DNS consistently, terminate Direct Connect on the hub, and prove the data plane with Reachability Analyzer rather than trusting the console. Because this is a reference you will return to mid-incident, the topology choices, the IPAM hierarchy, the association/propagation matrix, the firewall policy actions, the service quotas, the cost drivers and the failure playbook are all laid out as scannable tables — read the prose once, then keep the tables open.

What problem this solves

Multi-account AWS is the default operating model — separate accounts for prod, non-prod, security, logging, and each business unit — because the account boundary is the strongest blast-radius and billing boundary AWS offers. But accounts are network islands. Two VPCs in two accounts cannot talk until you explicitly connect them, and the naive connectors do not scale: VPC peering is 1:1 and non-transitive, so N VPCs need N(N-1)/2 peerings and spoke A still cannot reach spoke C through B.

What breaks without a deliberate hub: teams peer VPCs ad hoc until the mesh is unmaintainable; someone provisions a VPC as 10.0.0.0/16, a second team does the same in another account, and now the two can never be routed to each other because you cannot renumber a live VPC; every spoke runs its own NAT gateway, so egress is sprawled across forty accounts with no central inspection or logging and a NAT bill multiplied by forty; and the network is flat — once anything is reachable, a compromised non-prod box can reach prod because nobody designed an isolation boundary into the routing.

Who hits this: any platform or cloud-foundations team past a handful of accounts; anyone running a landing zone; anyone with a compliance requirement to inspect and log egress centrally or to isolate environments at the network layer. The TGW with proper segmentation solves all four problems at once — transitive any-to-any-under-policy reachability, a single place to enforce isolation, one inspected egress point, and one hybrid termination point. The rest of this guide builds exactly that.

To frame the whole field before the deep dive, here is every problem this article solves, the failure you get without it, and the section that fixes it:

Problem What breaks without the hub The mechanism that fixes it Section
Accounts are network islands Ad-hoc peering mesh, O(N²) TGW + RAM share Step 2
CIDR collisions you can’t undo Two VPCs claim 10.0.0.0/16; unroutable IPAM hierarchy, disjoint super-blocks Step 1
Flat network, no isolation Non-prod box reaches prod Route-table domains; isolation = no route Step 3
Egress sprawl & no inspection NAT × 40; nothing logged or filtered Central egress VPC + Network Firewall Step 4
Inconsistent DNS across spokes PHZ/on-prem names don’t resolve Route 53 Resolver endpoints + rules Step 5
On-prem terminated per-VPC Many VPN/DX tunnels, no control Terminate DX/VPN on the TGW Step 6
“Is the path open?” guesswork Multi-hour incident triage Reachability Analyzer + flow logs Verify

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should already understand single-VPC fundamentals — subnets, route tables, an internet gateway, a NAT gateway, security groups vs NACLs — and basic AWS Organizations (a management account, member accounts, OUs). You should be comfortable running aws CLI commands, reading JSON, and applying Terraform. This guide assumes the depth of Deep dive into VPC: subnets, routing, IGW, NAT, and endpoints and VPC networking fundamentals explained as the layer beneath it.

This sits at the network-foundations layer of a landing zone, directly above account vending and just below per-workload networking. It pairs tightly with CIDR & IPAM management: allocation and BYOIP at scale (the address plan it depends on), AWS Network Firewall: centralized egress inspection (the inspection layer), Route 53 Resolver: DNS Firewall, endpoints, rules, hybrid resolution (centralized DNS), and Direct Connect + Transit Gateway: resilient hybrid (hybrid termination). The guardrails come from Organizations SCPs, guardrails & delegated admin.

A quick map of who owns what, so you call the right team fast during an incident:

Layer What lives here Who usually owns it Failure classes it can cause
IPAM / address plan Pools, allocations, super-blocks Network / platform Overlap → unroutable spoke; summarization breaks
AWS Organizations / RAM Account tree, OUs, resource shares Cloud foundations Spoke can’t see the shared TGW
Network account (hub) TGW, route tables, egress VPC, resolver Network team Wrong association/propagation → broken isolation
Spoke account (VPC) VPC, subnets, TGW attachment App / dev team Missing AZ ENI, wrong subnet, default route absent
Egress VPC Network Firewall, NAT, IGW Network / security Default-allow; asymmetric drop; no return route
Hybrid (DX/VPN) DXGW, VIFs, VPN attachment Network team Over-advertised prefixes; env reachable it shouldn’t be
Observability TGW/VPC/FW flow logs, Athena Platform / SRE Blind triage; no exfil detection

Core concepts

Six mental models make every later decision obvious.

A Transit Gateway is a regional router, not a global one. The TGW is a horizontally-scaled, managed router that lives in one region. Everything in that region — VPCs, VPN, Direct Connect via a Direct Connect Gateway — attaches to it once and gets transitive reachability, governed by route tables. A global estate needs one TGW per region, joined with inter-region peering attachments. Plan accounts and CIDRs with that regional boundary in mind from day one; a packet from eu-west-1 to us-east-1 crosses a peering attachment, and the CIDR plan must keep regions disjoint.

An attachment is an ENI in your subnets; routing happens in the TGW. When a VPC attaches to a TGW, the platform places an elastic network interface in one subnet per AZ you choose. Traffic only reaches AZs where the attachment has an ENI — attach in every AZ you run workloads in, or that AZ’s traffic blackholes. The VPC’s own route table sends a destination (often 0.0.0.0/0 or a summary) to the TGW attachment; from there, the TGW route table the attachment associates to makes the next-hop decision.

Association decides which table you use; propagation decides which tables learn your routes. This is the crux of segmentation and the line everyone gets backwards at first. Association = “which TGW route table do I consult for my outbound decisions” — every attachment associates to exactly one. Propagation = “into which TGW route tables do my VPC’s CIDRs get advertised.” To let prod reach shared services, you propagate the shared-services attachment into the prod table and propagate prod into the shared table. Because you never propagate prod into the non-prod table (and vice versa), those two domains have no route to each other even though they share one router. Isolation is the absence of a route, not a firewall rule.

Allocation is finite and the CIDR plan is permanent. The TGW route table is a longest-prefix-match router. Two VPCs both advertising 10.0.0.0/16 cannot both be routed, and you cannot renumber a live VPC without downtime. Solve allocation centrally, before anyone provisions a VPC, with AWS IPAM as the single source of truth and disjoint super-blocks per environment so summarization stays clean.

Centralized egress trades NAT sprawl for one inspected, billed path. Instead of a NAT gateway in every spoke, you run one egress VPC in the network account with NAT gateways and an AWS Network Firewall, and point every spoke’s default route at the TGW, which forwards to the egress VPC. The catch: TGW data-processing and Network Firewall both bill per-GB, firewall endpoints are AZ-local (so flows must be AZ-symmetric or the stateful engine drops return traffic), and you must propagate every spoke back into the egress route table or the return path blackholes.

The data plane is the source of truth; the console lies by omission. A route can exist in the console and still not deliver a packet (wrong AZ, NACL, security group, asymmetric firewall). Reachability Analyzer and TGW Flow Logs are the authoritative way to prove a path is — or is not — open, end to end across the whole hub.

The vocabulary in one table

Pin down every moving part before the deep sections. The glossary repeats these for lookup; this is the mental model side by side:

Concept One-line definition Where it lives Why it matters
Transit Gateway (TGW) Regional managed router Network account, per region The hub everything attaches to
Attachment An ENI-backed link (VPC/VPN/DX/peer) In your subnets / TGW No ENI in an AZ → that AZ blackholes
TGW route table A routing domain On the TGW The segmentation primitive
Association Which table an attachment uses Attachment ↔ table “My outbound decisions”
Propagation Which tables learn an attachment’s CIDRs Attachment → tables “Who can route to me”
IPAM pool Hierarchical address allocator IPAM scope Source of truth; no overlaps
RAM share Cross-account resource grant RAM Spokes attach without invites
Egress VPC Central NAT + inspection VPC Network account One inspected internet exit
Network Firewall Stateful/stateless inspection Egress VPC, AZ-local Drops/allows egress; per-GB billed
Resolver endpoint Inbound/outbound DNS NIC Shared/egress VPC Hybrid + consistent DNS
Blackhole route A route that drops traffic TGW route table Intentional isolation or a bug
Reachability Analyzer Path-proving service VPC Authoritative data-plane test

Topology: Transit Gateway vs. peering vs. PrivateLink

These three are not competitors; they solve different problems. Pick deliberately — choosing peering for a forty-account estate, or a TGW to share a single API, are both expensive mistakes.

Pattern Connectivity model Transitive routing Scales to Best for Worst for
VPC peering 1:1, full IP reachability No A handful of VPCs Lowest latency, no hub fee, 2–10 VPCs Many VPCs (N(N-1)/2), any transitive need
Transit Gateway Hub-and-spoke, regional router Yes (policy-controlled) Thousands of attachments Many VPCs/accounts, segmentation, hybrid, central egress Sharing one app without IP routing
PrivateLink Service endpoint (one ENI) N/A (no IP routing) One service, many consumers Exposing a single service across a trust boundary Everything-talks-to-everything

Peering does not scale, and the reason is arithmetic. PrivateLink is the right tool when you want to share one application without granting network-layer reachability — it sidesteps CIDR overlap entirely because there is no routing. For everything-talks-to-everything-under-policy across accounts, the TGW is the answer. Here is the head-to-head on the dimensions that actually decide a design:

Dimension VPC peering Transit Gateway PrivateLink
Connections for N VPCs N(N-1)/2 N (one each) 1 endpoint per consumer/service
Transitive (A→B→C) No Yes N/A
Overlapping CIDRs allowed No No (LPM router) Yes (no IP routing)
Per-GB data charge No Yes (data processing) Yes (per-GB + endpoint-hour)
Cross-region Yes (inter-region peering) Yes (TGW peering) Yes (with extra config)
Central inspection point No Yes (egress VPC) N/A
Bandwidth ceiling VPC-to-VPC line rate ~50 Gbps per VPC attachment (burst) Per-ENI
Typical use 2–10 VPCs, latency-sensitive Landing-zone hub Internal API / SaaS endpoint

A TGW is a regional resource. A global estate needs one TGW per region, joined with inter-region peering attachments. Plan accounts and CIDRs with that boundary in mind from day one. The rest of this guide builds the regional hub.

Step 1 — A non-overlapping CIDR plan with IPAM

The single most expensive mistake in multi-account networking is CIDR overlap. The TGW route table is a longest-prefix-match router; two VPCs advertising 10.0.0.0/16 cannot both be routed, and you cannot renumber a live VPC. Solve allocation centrally before anyone provisions a VPC.

Use AWS IPAM as the source of truth. Carve a top-level pool, then per-environment and per-region pools beneath it, and force every VPC to draw from IPAM so uniqueness is guaranteed by construction rather than by a spreadsheet someone forgets to update.

resource "aws_vpc_ipam" "main" {
  operating_regions { region_name = "eu-west-1" }
}

resource "aws_vpc_ipam_pool" "top" {
  address_family = "ipv4"
  ipam_scope_id  = aws_vpc_ipam.main.private_default_scope_id
  locale         = "eu-west-1"
}

resource "aws_vpc_ipam_pool_cidr" "top" {
  ipam_pool_id = aws_vpc_ipam_pool.top.id
  cidr         = "10.0.0.0/8"
}

# Environment pool: prod gets a /12 out of the /8
resource "aws_vpc_ipam_pool" "prod" {
  address_family      = "ipv4"
  ipam_scope_id       = aws_vpc_ipam.main.private_default_scope_id
  locale              = "eu-west-1"
  source_ipam_pool_id = aws_vpc_ipam_pool.top.id
}

resource "aws_vpc_ipam_pool_cidr" "prod" {
  ipam_pool_id   = aws_vpc_ipam_pool.prod.id
  netmask_length = 12
}

Spoke VPCs then allocate from the pool instead of hard-coding a block. IPAM hands out a free, non-overlapping range and tracks it:

resource "aws_vpc" "spoke" {
  ipv4_ipam_pool_id    = aws_vpc_ipam_pool.prod.id
  ipv4_netmask_length  = 20            # IPAM hands out a free /20
  enable_dns_support   = true
  enable_dns_hostnames = true
}

Reserve disjoint super-blocks per environment so route-table summarization stays clean later, and reserve a separate block for on-prem so hybrid routes never collide. A worked allocation that leaves room to grow and summarizes to one prefix per domain:

Domain Super-block Mask VPC size handed out Approx VPC capacity Summarized as
Prod 10.16.0.0/12 /12 /20 ~16,000 /20s 10.16.0.0/12
Non-prod 10.32.0.0/12 /12 /20 ~16,000 /20s 10.32.0.0/12
Shared services 10.48.0.0/12 /12 /22 ~64,000 /22s 10.48.0.0/12
Egress / inspection 10.64.0.0/16 /16 /24 ~256 /24s 10.64.0.0/16
On-prem (reserved) 10.200.0.0/13 /13 n/a (BGP) data-center owned 10.200.0.0/13
Future / spare 10.96.0.0/11 /11 reserved growth

The IPAM hierarchy itself, level by level, and what each level is for:

IPAM level Scope / mask Owns Why it exists
Top pool private scope, /8 The whole RFC-1918 space Single root of truth
Locale region binding A region’s allocations Keeps regions disjoint
Environment pool /12–/13 prod / non-prod / shared Summarizable domains
Account/team pool (optional) /16 One BU or account Delegated self-service
VPC allocation /20–/24 One VPC Drawn at provision time

IPAM allocation settings worth knowing, with their defaults and the gotcha each guards against:

Setting What it controls Default When to change Gotcha if wrong
auto_import Pull existing CIDRs into the pool false Migrating legacy VPCs Imports overlaps as findings, not blocks
allocation_min_netmask_length Smallest block IPAM will hand out pool-defined Enforce a floor (e.g. /24) Teams grab huge blocks
allocation_max_netmask_length Largest mask (smallest network) pool-defined Cap tiny allocations Fragmentation
allocation_default_netmask_length Default size on request none Standardize VPC size Inconsistent VPCs
publicly_advertisable BYOIP advertisement false BYOIP only Accidental public advertise
Resource discovery (org) Cross-account CIDR visibility off Multi-account (always) Blind to other accounts’ overlaps

Renumbering is not an option for a live VPC. Every byte of effort spent on the address plan now saves a quarter of migration pain later. If you inherit overlaps, the only clean fixes are PrivateLink (no routing) for the affected service or a brand-new VPC behind a fresh IPAM allocation with a workload migration — never a TGW route hack.

Step 2 — Provision the TGW and share it with RAM

Create the TGW in a dedicated network account (part of your AWS Organizations structure), then share it to every other account with Resource Access Manager (RAM). Turn off the default automation so route propagation and association become explicit, policy-driven decisions rather than accidents.

resource "aws_ec2_transit_gateway" "hub" {
  description                     = "Org hub TGW"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
  dns_support                     = "enable"
  vpn_ecmp_support                = "enable"
  amazon_side_asn                 = 64512   # for any future BGP attachments
  tags = { Name = "tgw-hub" }
}

The TGW-level options you set once, with their effect and the recommended value for a segmented hub:

Option Values Default Recommended (segmented hub) Why
default_route_table_association enable / disable enable disable Force explicit association; segmentation depends on it
default_route_table_propagation enable / disable enable disable No accidental any-to-any reachability
dns_support enable / disable enable enable Cross-VPC DNS resolution over the TGW
vpn_ecmp_support enable / disable enable enable Multi-path over redundant VPN tunnels
amazon_side_asn 64512–65534, 4200000000–4294967294 64512 A value you control BGP for DX/VPN; avoid clashing with on-prem ASN
multicast_support enable / disable disable disable (unless needed) Niche; off by default
auto_accept_shared_attachments enable / disable disable disable Approve attachments deliberately
transit_gateway_cidr_blocks CIDR list none set for Connect/peering Required for some attachment types

Sharing with the whole organization removes the per-account invitation dance. This requires that you have enabled RAM sharing within AWS Organizations once (aws ram enable-sharing-with-aws-organization):

resource "aws_ram_resource_share" "tgw" {
  name                      = "tgw-hub-share"
  allow_external_principals = false
}

resource "aws_ram_resource_association" "tgw" {
  resource_arn       = aws_ec2_transit_gateway.hub.arn
  resource_share_arn = aws_ram_resource_share.tgw.arn
}

# Share to the entire org (or to specific OUs by ARN)
resource "aws_ram_principal_association" "org" {
  principal          = "arn:aws:organizations::111122223333:organization/o-exampleorgid"
  resource_share_arn = aws_ram_resource_share.tgw.arn
}

RAM lets you scope the share precisely; pick the narrowest principal that still avoids per-account toil:

RAM principal type Example ARN / value Scope When to use
Whole organization organization/o-xxxx Every account, now and future Landing-zone default
Organizational unit ou-xxxx-yyyy Accounts under that OU Share only to workload OUs
Single account ID 123456789012 One account Pilots, exceptions
IAM role/user (external) role ARN One principal Rare; requires allow_external_principals=true

Once shared, a spoke account creates its attachment locally, referencing the shared TGW ID. This is the clean ownership split: the network account owns the TGW and its route tables; the spoke owns its VPC and attachment.

resource "aws_ec2_transit_gateway_vpc_attachment" "spoke" {
  transit_gateway_id = "tgw-0abc123..."        # the shared TGW
  vpc_id             = aws_vpc.spoke.id
  subnet_ids         = [for s in aws_subnet.tgw : s.id]  # one /28 per AZ
  dns_support        = "enable"
  appliance_mode_support = "disable"           # enable only for inline appliances
  tags = { Name = "att-spoke-prod-app1" }
}

Attachment options and when each matters — appliance_mode is the one people miss and it changes flow symmetry:

Attachment option Values Default When to change Effect
subnet_ids one subnet per AZ required Always one /28 per AZ Places the ENI; missing AZ = blackhole
dns_support enable / disable enable rarely DNS resolution over the attachment
appliance_mode_support enable / disable disable inline inspection appliance VPC Pins a flow to one AZ’s appliance (symmetry)
ipv6_support enable / disable disable dual-stack IPv6 routing over the TGW
transit_gateway_default_route_table_association bool (provider) follows TGW keep disabled Explicit association instead

Give the TGW its own tiny attachment subnets — a /28 per AZ is plenty — separate from workload subnets. Attach in every AZ you run workloads in; an attachment only delivers traffic to AZs where it has an ENI, and intra-AZ traffic avoids cross-AZ data charges.

The attachment types a TGW supports, and the route mechanism each uses:

Attachment type Connects Routes via Notes
VPC A VPC in any account Static + propagation The common case; one ENI per AZ
VPN (Site-to-Site) On-prem over IPsec BGP (dynamic) or static ECMP across tunnels
Direct Connect (via DXGW) On-prem over DX BGP Through a Direct Connect Gateway
TGW peering Another TGW (cross-region) Static only No transitive peering; per-pair
Connect (GRE) SD-WAN / virtual routers BGP over GRE For third-party appliances
Multicast domain Multicast group members Multicast routing Niche; off by default

Service quotas and limits that bite

Design to the documented quotas, not to a guess — and treat the soft ones as raisable via Service Quotas, the hard ones as design constraints. These are the figures that most often force a redesign; confirm the current values and your account’s applied limits before you build to a ceiling:

Limit Default / value Soft or hard What hitting it looks like
VPC attachments per TGW ~5,000 Soft (raisable) New attachment rejected at scale
Routes per TGW route table ~10,000 Hard Propagation/route install fails
TGW route tables per TGW ~20 Soft Can’t add another routing domain
Attachments per VPC (same TGW) 5 Hard Limits per-AZ/redundant designs
TGWs per Region per account ~5 Soft Can’t split into more hubs
Subnets per VPC attachment (AZs) one per AZ Hard Missing AZ = blackhole
Bandwidth per VPC attachment ~50 Gbps (burst) Hard Throughput ceiling per VPC
DXGW allowed prefixes to on-prem ~20 Hard Over-advertised routes dropped
DXGW associations (TGWs) ~6 Hard Limits hub count behind one DXGW
Peering attachments per TGW ~50 Soft Multi-region fan-out ceiling
Resolver endpoints’ ENIs (per endpoint) ≥2 Hard Need 2+ for AZ resilience
RAM resource shares per account ~5,000 Soft Many fine-grained shares

Step 3 — Route-table segmentation

This is where a TGW earns its keep. A TGW route table is a routing domain. By controlling which attachments associate to a domain (which table they consult for outbound decisions) and which propagate into it (whose CIDRs appear there), you build isolation that a flat network cannot. The classic layout:

                 +------------------+
   prod spokes ->| prod RT          |  -> shared svc, egress (no non-prod)
                 +------------------+
                 +------------------+
non-prod spokes->| non-prod RT      |  -> shared svc, egress (no prod)
                 +------------------+
                 +------------------+
 shared svc VPC ->| shared RT        |  -> prod + non-prod (serves both)
                 +------------------+
                 +------------------+
       egress VPC->| egress RT       |  <- default route lives here; learns ALL spokes
                 +------------------+

Goal: prod talks to prod and to shared services; non-prod talks to non-prod and to shared services; prod and non-prod never reach each other; everyone reaches the internet only through the central egress VPC. Here is the full association/propagation matrix — read a row as “this attachment associates to its own table and propagates into the ticked tables”:

Attachment ↓ / propagates into → prod RT non-prod RT shared RT egress RT hybrid RT
Prod spoke (assoc: prod) self ✓ (if cleared)
Non-prod spoke (assoc: non-prod) self
Shared-svc VPC (assoc: shared) self ✓ (if cleared)
Egress VPC (assoc: egress) static 0/0 static 0/0 static 0/0 self
Hybrid (DX/VPN) (assoc: hybrid) ✓ (if cleared) self

The mental model that keeps this straight, stated as a decision table you can apply to any new requirement:

If you want… Then… Concretely
A to use a domain’s routes for its decisions Associate A to that table Prod spoke associates to prod RT
B to be reachable from A’s domain Propagate B into A’s table Propagate shared into prod RT
A↔B mutual reachability Propagate each into the other’s table Prod↔shared both directions
A and B fully isolated Propagate neither into the other’s table Prod & non-prod: no mutual propagation
Everyone to reach the internet Static 0/0 → egress attachment in each domain Per-domain default route
Egress to return traffic to a spoke Propagate that spoke into the egress table All spokes → egress RT

In Terraform, a prod spoke associates to the prod table and propagates into shared so shared services can reach it:

resource "aws_ec2_transit_gateway_route_table" "prod" {
  transit_gateway_id = aws_ec2_transit_gateway.hub.id
  tags = { Name = "rt-prod" }
}

# A prod spoke associates to the prod table...
resource "aws_ec2_transit_gateway_route_table_association" "prod_app1" {
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.spoke.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.prod.id
}

# ...and propagates its CIDR INTO the shared-services table (so shared svc can reach it)
resource "aws_ec2_transit_gateway_route_table_propagation" "prod_into_shared" {
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.spoke.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.shared.id
}

The default route to the egress VPC is a static route in each spoke domain pointing at the egress attachment:

resource "aws_ec2_transit_gateway_route" "prod_default" {
  destination_cidr_block         = "0.0.0.0/0"
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.egress.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.prod.id
}

Static vs propagated routes behave differently when they collide; know which wins:

Route property Static route Propagated route
Source You declare it Learned from an attachment (BGP/auto)
Priority on exact-match tie Wins over propagated Loses to a static of same prefix
Used for Default 0/0 → egress; blackholes Reachability of real VPC CIDRs
Survives attachment delete Yes (manual cleanup) No (withdrawn)
Longest-prefix match Applies first, across both Applies first, across both
Blackhole option Yes (explicit drop) No

Isolation is the absence of a route. Because you never propagate prod into the non-prod table (and vice versa), those two domains have no route to each other even though they share one TGW. You do not need a firewall rule to keep them apart — you need the route to simply not exist. This is cheaper (no per-GB inspection) and harder to misconfigure than a deny rule.

A common segmentation pattern beyond the four-domain default, with the trade-off each carries:

Segmentation model Route tables Isolation strength Operational cost
Flat (one table) 1 None — any-to-any Lowest; unsafe past a few VPCs
Env split (this guide) prod / non-prod / shared / egress Strong env boundary Moderate; the sweet spot
Per-BU domains One per business unit BU-level blast radius Higher; many tables
Per-tier (web/app/data) Tier tables East-west micro-isolation High; verbose, use SGs instead
Inspection-forced All east-west via firewall RT Maximum (everything inspected) Highest $ (per-GB on all flows)

Step 4 — Centralized egress through a shared NAT + Network Firewall VPC

Per-VPC NAT gateways are a cost and governance sprawl: every spoke pays for NAT, and you have no single place to inspect or log egress. Consolidate into one egress VPC in the network account that owns the NAT gateways and an AWS Network Firewall for inspection. Spoke default routes already point at this VPC’s TGW attachment (Step 3).

The traffic path matters. Inside the egress VPC, force the flow TGW → firewall endpoint → NAT gateway → internet. That means three subnet tiers per AZ and route tables that hand off between them:

ingress from TGW
   |
   v  (TGW subnet route table: 0.0.0.0/0 -> firewall endpoint)
[firewall subnet: AWS Network Firewall endpoint]
   |
   v  (firewall subnet route table: 0.0.0.0/0 -> NAT gateway)
[public subnet: NAT gateway + IGW]
   |
   v
internet

The exact route tables that build that hand-off — get one wrong and traffic bypasses the firewall or blackholes:

Subnet tier (per AZ) Route added Next hop Purpose
TGW attachment subnet 0.0.0.0/0 Firewall endpoint (this AZ) Force inbound-from-TGW through inspection
TGW attachment subnet spoke summaries (local/TGW) Return path knowledge
Firewall subnet 0.0.0.0/0 NAT gateway (this AZ) Inspected traffic to NAT
Firewall subnet spoke summaries (e.g. 10.16.0.0/12) TGW attachment Return traffic back to spokes
Public subnet 0.0.0.0/0 Internet gateway NAT to the internet
Public subnet spoke summaries Firewall endpoint (this AZ) Symmetric return through inspection
resource "aws_networkfirewall_firewall" "egress" {
  name                = "fw-central-egress"
  firewall_policy_arn = aws_networkfirewall_firewall_policy.egress.arn
  vpc_id              = aws_vpc.egress.id

  dynamic "subnet_mapping" {
    for_each = aws_subnet.firewall
    content { subnet_id = subnet_mapping.value.id }
  }
}

Critically, set the firewall policy to drop unmatched traffic and add explicit allow rules — a default-allow inspection layer inspects nothing useful:

resource "aws_networkfirewall_firewall_policy" "egress" {
  name = "policy-central-egress"
  firewall_policy {
    stateless_default_actions          = ["aws:forward_to_sfe"]
    stateless_fragment_default_actions = ["aws:forward_to_sfe"]
    stateful_engine_options { rule_order = "STRICT_ORDER" }
    stateful_default_actions = ["aws:drop_established", "aws:alert_established"]

    stateful_rule_group_reference {
      resource_arn = aws_networkfirewall_rule_group.allowlist.arn
      priority     = 100
    }
  }
}

The Network Firewall policy actions, what each does, and where to use it — the stateless/stateful split trips people up:

Action Engine Meaning Use for
aws:pass stateless/stateful Allow, stop evaluating Known-good flows
aws:drop stateless/stateful Silently discard Default-deny posture
aws:forward_to_sfe stateless Hand to the stateful engine Default stateless action
aws:alert stateful Log but allow Triage / detection-only
aws:drop_established stateful default Drop unless a rule allowed it The secure default
aws:alert_established stateful default Log the dropped flow Visibility on drops

Rule-order matters; the two modes evaluate very differently:

Rule order Evaluation Default action semantics When to use
DEFAULT_ACTION_ORDER Pass rules, then drop, then alert (action groups) Implicit ordering Simple allowlists
STRICT_ORDER Strict numeric priority, top-down You set the default explicitly Production; predictable, auditable

The return path is the part people miss: the TGW route table for the egress VPC must carry routes back to every spoke CIDR (propagate all spokes into the egress domain), and the firewall subnet route table needs each spoke summary pointing back at the TGW. Because firewall endpoints are AZ-local, keep traffic symmetric — route an AZ’s flow through that same AZ’s firewall endpoint so the stateful engine sees both directions.

A side-by-side of centralized vs per-spoke egress so the trade-off is explicit:

Aspect Per-spoke NAT (no hub) Centralized egress (this design)
NAT gateways One+ per spoke VPC A few (per AZ in egress VPC)
Inspection None (or N firewalls) One Network Firewall
Logging Scattered Central (S3 / CW)
Cost shape NAT × N spokes NAT × AZ + TGW/FW per-GB
Egress IP allow-listing N sets of EIPs One small set of EIPs
Governance (SCP block IGW) Hard (each VPC needs IGW) Easy (only egress VPC has IGW)
Failure blast radius Per-VPC Shared egress (design for AZ HA)

Network Firewall is billed per endpoint-hour plus per-GB processed. Centralizing means you pay for the endpoints once instead of per spoke, but the per-GB cost is real — this is why we drop east-west prod/non-prod traffic at the TGW (free, via missing routes) rather than hairpinning it through the firewall, and why bulk AWS-service traffic (S3, DynamoDB) gets gateway endpoints in the spoke (Step 4’s cost note below and the Real-world scenario).

Step 5 — Centralized DNS with Route 53 Resolver

Spokes need to resolve private hosted zones, on-prem names, and AWS service endpoints consistently. Run Route 53 Resolver endpoints in the shared-services (or egress) VPC and point every spoke at them, rather than standing up resolver infrastructure in every account.

resource "aws_route53_resolver_endpoint" "outbound" {
  name      = "rslv-outbound"
  direction = "OUTBOUND"
  security_group_ids = [aws_security_group.resolver.id]
  dynamic "ip_address" {
    for_each = aws_subnet.resolver
    content { subnet_id = ip_address.value.id }
  }
}

resource "aws_route53_resolver_rule" "onprem" {
  name                 = "fwd-corp-internal"
  domain_name          = "corp.internal"
  rule_type            = "FORWARD"
  resolver_endpoint_id = aws_route53_resolver_endpoint.outbound.id
  target_ip { ip = "10.200.0.10" }
  target_ip { ip = "10.200.0.11" }
}

Share the rule across accounts with RAM, then associate it in each spoke VPC so the spoke honors it. The resolver building blocks and what each is for:

Component Direction Resolves Shared via Notes
Inbound endpoint On-prem → AWS PHZ / AWS private names n/a (in hub VPC) 2+ ENIs across 2 AZs
Outbound endpoint AWS → on-prem Forwarded domains n/a (in hub VPC) SG must allow TCP+UDP 53
FORWARD rule AWS → target IPs e.g. corp.internal RAM (share + associate) Target 2 on-prem IPs in 2 sites
SYSTEM rule Override a FORWARD for a subdomain RAM Carve out exceptions
Private hosted zone In-VPC e.g. aws.example.com VPC association Associate to each spoke (or automate)
.2 resolver In-VPC Everything (default) implicit Rules ride underneath it

The forwarding-rule types, since picking the wrong one silently breaks resolution:

Rule type Behaviour Use when
FORWARD Send matching queries to target IPs On-prem or third-party DNS
SYSTEM Use Route 53 Resolver, ignore a broader FORWARD Exempt a subdomain (e.g. an AWS PHZ inside corp.internal)
RECURSIVE (default behaviour) Standard Route 53 resolution No rule needed

For private hosted zones, associate the zone with each spoke VPC — or, at scale, share it and automate association. Spokes keep using the VPC .2 resolver; the rules ride underneath. The security-group rules the resolver endpoints need (a frequent failure point — UDP works, TCP for large answers does not):

Endpoint Direction Protocol/port Source/Dest Why
Outbound Egress UDP 53 On-prem resolver IPs Standard DNS
Outbound Egress TCP 53 On-prem resolver IPs Answers > 512 bytes / DNSSEC
Inbound Ingress UDP 53 On-prem CIDR On-prem queries in
Inbound Ingress TCP 53 On-prem CIDR Large answers / zone-ish queries

Step 6 — Hybrid connectivity into the hub

Terminate Direct Connect or Site-to-Site VPN on the TGW, not on individual VPCs — that is the whole point of the hub. For Direct Connect, associate a Transit VIF with a Direct Connect Gateway, then attach that DXGW to the TGW. For VPN, create a VPN attachment directly:

resource "aws_ec2_transit_gateway_dx_gateway_attachment" "dx" {
  transit_gateway_id = aws_ec2_transit_gateway.hub.id
  dx_gateway_id      = aws_dx_gateway.main.id
}

Put the hybrid attachment in its own TGW route table. This lets you control exactly which environments on-prem can reach: propagate prod into the hybrid table only if prod is allowed to talk to the data center, and on the hybrid attachment associate a table that propagates only the environments cleared for on-prem. Advertise summarized routes (your reserved super-blocks from Step 1) over BGP rather than hundreds of /20s — the DXGW has an allowed-prefixes limit, and summarization keeps you well under it.

DX vs VPN onto the TGW, on the dimensions that decide which (or both) you use:

Dimension Direct Connect (via DXGW) Site-to-Site VPN
Transport Private fiber IPsec over internet
Bandwidth 1/10/100 Gbps ports ~1.25 Gbps per tunnel (ECMP to scale)
Latency / jitter Low, consistent Variable (internet)
Provisioning time Weeks (cross-connect) Minutes
Encryption Not by default (add MACsec / IPsec) Built-in IPsec
Cost shape Port-hours + data Tunnel-hours + data
Resilience pattern 2 DX at 2 locations 2 tunnels per connection; VPN as DX backup
Routing BGP via DXGW → TGW BGP or static; ECMP with vpn_ecmp_support

BGP prefix discipline on the hybrid edge — summarize or you will hit the limit and starve the table:

Knob Why it matters Good practice
DXGW allowed prefixes Hard cap on what AWS advertises to on-prem Advertise summarized /12s, not /20s
On-prem advertised routes Counts against TGW route limits Summarize the data-center space (one /13)
BGP communities / AS-path Influence path selection, DX-vs-VPN failover Tag DX-preferred; longer AS-path on VPN backup
Blackhole on withdrawal Avoid stale routes Let propagation withdraw on link down

Architecture at a glance

Trace a single packet to internalize the whole design. A workload in a prod spoke VPC wants to reach a SaaS API on the internet. Its subnet route table has 0.0.0.0/0 pointing at the TGW attachment (an ENI sitting in a tiny /28 per-AZ subnet, with a CIDR IPAM handed out from the prod /12). The packet enters the TGW and lands in the prod route-table domain the attachment associates to. That domain has a static 0.0.0.0/0 to the egress VPC attachment — but crucially it has no route to the non-prod /12, because non-prod was never propagated here. That missing route is the isolation: prod simply cannot address non-prod. The default route forwards the packet to the egress VPC in the network account, where the TGW-attachment subnet’s route table sends 0.0.0.0/0 to that AZ’s Network Firewall endpoint. The firewall, running a STRICT-ORDER policy that defaults to drop, checks the flow against the allowlist; if permitted, the firewall subnet’s route table forwards to that same AZ’s NAT gateway, which translates to its Elastic IP and exits via the internet gateway. Return traffic retraces the path through the same AZ’s firewall endpoint — AZ symmetry is mandatory or the stateful engine drops the return.

Off to the side, the same hub carries hybrid and DNS: a Direct Connect Gateway attaches to the TGW in its own route table (so you choose exactly which environments on-prem can reach), and Route 53 Resolver endpoints plus RAM-shared FORWARD rules give every spoke consistent resolution of on-prem and private-zone names. Everything is observed: TGW Flow Logs, VPC Flow Logs, and firewall logs land in S3 for Athena. The numbered badges mark the five places this architecture most often fails — an overlapping CIDR or a missing AZ ENI (1), a wrong association/propagation that breaks isolation (2), a default-allow or asymmetric firewall (3), a missing egress return route or an avoidable NAT/firewall bill (4), and an over-advertised hybrid prefix or absent flow logs (5). The legend narrates each as symptom, confirm, and fix.

Hub-and-spoke AWS multi-account architecture: prod and non-prod spoke VPCs with per-AZ /28 TGW attachments and 0.0.0.0/0 pointing at a regional Transit Gateway shared org-wide via RAM; the TGW holds prod, non-prod, shared, and egress route-table domains where association and propagation enforce isolation (prod and non-prod never propagate into each other); the egress route domain statically forwards to a central egress VPC where an AZ-local Network Firewall endpoint inspects with a STRICT-ORDER drop policy before a per-AZ NAT gateway and internet gateway reach the internet; a Direct Connect/VPN attachment in its own route table and Route 53 Resolver inbound/outbound endpoints with RAM-shared FORWARD rules provide hybrid connectivity and DNS, with TGW, VPC, and firewall flow logs shipped to S3 and Athena. Five numbered badges mark CIDR/AZ-ENI failures, broken isolation, firewall default-allow or asymmetry, missing egress return routes or NAT cost, and over-advertised hybrid prefixes or absent logging.

Real-world scenario

A retail platform team — call them NorthWind Retail — had the textbook hub from this guide running across ~30 accounts: prod and non-prod isolated by route-table domains, all egress hairpinned through one Network Firewall VPC, Direct Connect terminated on the TGW for store-back-office connectivity, and Route 53 Resolver giving every account consistent DNS. It worked beautifully — until the AWS bill for the network account tripled in a single month and the FinOps lead escalated.

The investigation, driven by TGW Flow Logs queried in Athena, found the culprit fast: every spoke was reaching S3 over the public path — TGW data-processing, plus Network Firewall per-GB, plus NAT data-processing — for what was internal bulk data. A nightly analytics job alone pushed terabytes through the central firewall, and the per-GB charges on both the TGW and the firewall dwarfed the compute. The team had centralized egress for governance and accidentally routed bulk storage traffic through the most expensive path in the account.

The numbers told the story precisely:

Egress path for S3 traffic TGW data-proc Firewall per-GB NAT data-proc Net per-GB cost Inspected?
Through the hub (before) Yes Yes Yes Highest Yes (but pointless for S3)
Gateway VPC endpoint (after) No No No ~Free No (not needed; IAM-scoped)
Interface endpoint (PrivateLink) No No No Endpoint-hour + per-GB No

The fix was to keep S3 and DynamoDB traffic off the hub entirely with gateway VPC endpoints in each spoke. A gateway endpoint is free, adds a prefix-list route in the spoke’s own route table, and never touches the TGW or the firewall:

resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.spoke.id
  service_name      = "com.amazonaws.eu-west-1.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private.id]
}

The gotcha: the gateway-endpoint prefix-list route is longest-prefix-match against the 0.0.0.0/0 that pointed at the TGW, so it wins automatically for S3 — but only inside the VPC that owns the endpoint. They templated it into the spoke module so every account got one by default, then used an SCP Deny on s3:* unless aws:sourceVpce matched an approved endpoint, closing the public path for good. The result: firewall-processed GB dropped by roughly 70%, the network account bill fell back below its prior baseline, and the central inspection layer went back to inspecting traffic that actually leaves the estate. The lesson the team wrote into their design guide: centralize egress for what needs inspecting; never hairpin bulk AWS-service traffic that has a free private path.

Advantages and disadvantages

Advantages Disadvantages
Transitive any-to-any reachability under policy Regional resource; multi-region needs peering + planning
One router replaces an O(N²) peering mesh Per-GB data-processing charge on all TGW traffic
Strong env isolation via route-table domains Association/propagation model is easy to get backwards
One inspected, logged, allow-listed egress point Egress VPC is a shared dependency — must design AZ-HA
Single hybrid (DX/VPN) termination for all accounts Firewall per-GB cost if you hairpin bulk traffic
RAM org-share removes per-account toil CIDR overlaps are unfixable without renumbering
Clean ownership split (network acct vs spoke) Cross-AZ traffic incurs data charges if not localized
Reachability Analyzer proves the data plane More moving parts than peering; steeper learning curve

The advantages dominate past roughly a dozen VPCs or any multi-account compliance requirement; below that, peering is simpler and cheaper. The disadvantages are mostly disciplines rather than blockers: the per-GB charge is controlled by keeping east-west off the firewall and bulk AWS-service traffic on gateway endpoints; the association/propagation confusion is solved by a written matrix and Reachability Analyzer checks in CI; the shared-egress blast radius is solved by per-AZ firewall endpoints and NAT. The one genuinely permanent risk is CIDR overlap — which is exactly why Step 1 comes first.

Hands-on lab

A minimal, single-account proof you can run to see segmentation work without forty accounts. You build a TGW, two spoke VPCs, two route-table domains, and prove that one spoke can reach a shared VPC while the two spokes cannot reach each other. Free-tier friendly except for TGW attachment-hours and a couple of t3.micro instances; tear down at the end.

  1. Set variables and create the TGW (default association/propagation disabled):
REGION=eu-west-1
TGW=$(aws ec2 create-transit-gateway --region $REGION \
  --options DefaultRouteTableAssociation=disable,DefaultRouteTablePropagation=disable \
  --query 'TransitGateway.TransitGatewayId' --output text)
echo "TGW=$TGW"
  1. Create three VPCsspoke-a (10.16.0.0/20), spoke-b (10.32.0.0/20), shared (10.48.0.0/20) — each with one subnet and a /28 TGW-attachment subnet. (Use the console or a short Terraform module; the key is disjoint CIDRs.)

  2. Attach each VPC to the TGW:

ATT_A=$(aws ec2 create-transit-gateway-vpc-attachment --transit-gateway-id $TGW \
  --vpc-id $VPC_A --subnet-ids $TGW_SUBNET_A \
  --query 'TransitGatewayVpcAttachment.TransitGatewayAttachmentId' --output text)
# repeat for ATT_B, ATT_SHARED
  1. Create two route-table domains and associate the spokes:
RT_SPOKE=$(aws ec2 create-transit-gateway-route-table --transit-gateway-id $TGW \
  --query 'TransitGatewayRouteTable.TransitGatewayRouteTableId' --output text)
RT_SHARED=$(aws ec2 create-transit-gateway-route-table --transit-gateway-id $TGW \
  --query 'TransitGatewayRouteTable.TransitGatewayRouteTableId' --output text)

aws ec2 associate-transit-gateway-route-table --transit-gateway-route-table-id $RT_SPOKE --transit-gateway-attachment-id $ATT_A
aws ec2 associate-transit-gateway-route-table --transit-gateway-route-table-id $RT_SPOKE --transit-gateway-attachment-id $ATT_B
aws ec2 associate-transit-gateway-route-table --transit-gateway-route-table-id $RT_SHARED --transit-gateway-attachment-id $ATT_SHARED
  1. Propagate so spokes↔shared work but spokes are isolated from each other — propagate shared into the spoke table, and each spoke into the shared table; never propagate spoke-a into spoke-b’s domain (they share one table, so add explicit blackholes if you want hard isolation within a shared table, or use separate tables per spoke):
aws ec2 enable-transit-gateway-route-table-propagation --transit-gateway-route-table-id $RT_SPOKE  --transit-gateway-attachment-id $ATT_SHARED
aws ec2 enable-transit-gateway-route-table-propagation --transit-gateway-route-table-id $RT_SHARED --transit-gateway-attachment-id $ATT_A
aws ec2 enable-transit-gateway-route-table-propagation --transit-gateway-route-table-id $RT_SHARED --transit-gateway-attachment-id $ATT_B
# Blackhole spoke-a -> spoke-b to prove isolation inside the shared spoke table
aws ec2 create-transit-gateway-route --transit-gateway-route-table-id $RT_SPOKE \
  --destination-cidr-block 10.32.0.0/20 --blackhole
  1. Add VPC route-table entries in each VPC sending the other CIDRs to the TGW attachment, and launch a t3.micro in spoke-a and shared.

  2. Prove it. From the spoke-a instance, ping the shared instance (should work) and the spoke-b instance (should fail — blackholed). Confirm with a route search:

# Should return a blackhole entry for spoke-b from the spoke table
aws ec2 search-transit-gateway-routes --transit-gateway-route-table-id $RT_SPOKE \
  --filters Name=type,Values=static Name=state,Values=blackhole
  1. Tear down to stop attachment-hour charges:
aws ec2 delete-transit-gateway-vpc-attachment --transit-gateway-attachment-id $ATT_A
# delete ATT_B, ATT_SHARED, the route tables, the TGW, then the VPCs and instances

Expected result: spoke-a → shared succeeds; spoke-a → spoke-b fails because the route is a blackhole — segmentation demonstrated with the absence (or explicit drop) of a route, exactly as production isolation works.

Common mistakes & troubleshooting

The hub fails in a small number of characteristic ways. This is the playbook — match the symptom, run the confirm command, apply the fix. Keep it open during an incident.

# Symptom Root cause Confirm (exact command / path) Fix
1 A whole spoke is unreachable Attachment has no ENI in that AZ aws ec2 describe-transit-gateway-vpc-attachments → check SubnetIds per AZ Add a /28 attachment subnet in every workload AZ
2 Traffic to one AZ blackholes, others fine Missing attachment subnet in that AZ VPC route table points at TGW but no ENI there Attach in the missing AZ
3 Prod can reach non-prod (security finding) Prod or non-prod wrongly propagated into the other table aws ec2 search-transit-gateway-routes --transit-gateway-route-table-id <prod> --filters Name=route-search.subnet-of-match,Values=10.32.0.0/12 returns a route Remove the propagation; verify with Reachability Analyzer
4 Prod can’t reach shared services Shared not propagated into prod (or prod not into shared) Route search for the shared CIDR in the prod table returns nothing Enable propagation both directions
5 Spoke has no internet No static 0.0.0.0/0 → egress attachment in the spoke’s domain Search the spoke table for a default route Add static 0.0.0.0/0 → egress attachment
6 Egress works outbound, replies never return Spokes not propagated into the egress route table Egress table route search for the spoke CIDR is empty Propagate every spoke into the egress domain
7 Intermittent drops under load through egress Asymmetric flow — return via a different AZ’s firewall endpoint Firewall flow logs show one-sided flows; compare AZ of in/out routes Make per-AZ route tables symmetric; consider appliance mode
8 Egress traffic not being inspected Firewall policy defaults to allow, or route bypasses the FW endpoint Inspect stateful_default_actions; check FW-subnet 0/0 next hop Set aws:drop_established; route 0/0 via FW endpoint
9 New VPC can’t attach to the TGW RAM share not reaching the account, or not accepted aws ram get-resource-shares / get-resource-share-associations Share to the org/OU; enable enable-sharing-with-aws-organization
10 Overlapping CIDR, route won’t install Two attachments advertise the same prefix search-transit-gateway-routes shows the prefix from another attachment Renumber one VPC (new IPAM CIDR + migrate) or use PrivateLink
11 On-prem reaches an env it shouldn’t Over-propagation into the hybrid route table Inspect the hybrid table’s propagated routes Restrict propagation; own route table per hybrid attach
12 DX advertises but on-prem missing routes DXGW allowed-prefixes limit hit, or not summarized aws directconnect describe-direct-connect-gateways + allowed prefixes Advertise summarized /12s, raise/trim allowed prefixes
13 DNS for on-prem names fails intermittently Resolver SG missing TCP/53, or on-prem DNS down describe-security-group-rules on the endpoint SG; dig +tcp Allow TCP 53 alongside UDP 53; target 2 on-prem IPs
14 Network bill spiked Bulk S3/DDB hairpinning through TGW+FW+NAT TGW Flow Logs in Athena by bytes/destination Add gateway VPC endpoints in spokes; SCP-enforce
15 “It should work but doesn’t” Console route present, data plane blocked (NACL/SG/AZ) Reachability Analyzer path source→dest Fix the actual blocking hop the analyzer names

The single most important confirm command, because a green console can still mean a blocked path:

# Authoritative data-plane proof: is this path open across the whole TGW?
PATH_ID=$(aws ec2 create-network-insights-path \
  --source $SRC_ENI --destination $DST_ENI --protocol tcp --destination-port 443 \
  --query 'NetworkInsightsPath.NetworkInsightsPathId' --output text)
aws ec2 start-network-insights-analysis --network-insights-path-id $PATH_ID
aws ec2 describe-network-insights-analyses --network-insights-path-id $PATH_ID \
  --query 'NetworkInsightsAnalyses[0].{reachable:NetworkPathFound, blocker:Explanations[0].ExplanationCode}'

A quick decision table for the “can’t reach X” class of tickets:

If a route lookup returns… It’s probably… Do this
Nothing for the destination CIDR Missing propagation Propagate the target into this table
A blackhole route Intentional isolation or a stale static Confirm intent; remove if it’s a bug
A route, but ping still fails NACL/SG/AZ-ENI/firewall block Run Reachability Analyzer; fix the named hop
The CIDR from two attachments Overlap Renumber one; you cannot route both
The default 0/0 but no internet Egress return path missing Propagate spokes into the egress table
A route present but in blackhole state after a delete Stale static route Delete the orphaned static; re-add if needed
The right route but DNS fails Resolver SG/rule issue, not routing Check resolver SG (TCP/UDP 53) and FORWARD rule

Best practices

Security notes

The network is a security control surface, and the hub concentrates several of them. Least-privilege applies to routing as much as to IAM: a domain should learn only the routes it needs, and isolation should be the default (no propagation) rather than an exception.

Control What to do Why
Route-based isolation No mutual propagation between trust domains Compromised non-prod cannot address prod
Egress inspection Network Firewall default-drop + FQDN allowlist Stop data exfil and command-and-control
Egress IP allow-listing One small EIP set from the egress VPC Partners allow-list a handful of IPs, not forty
IGW/NAT guardrail SCP deny on IGW/NAT in spokes Egress cannot bypass the inspected path
DNS exfil detection Route 53 Resolver query logging + DNS Firewall DNS tunneling is invisible without logs
Flow visibility TGW + VPC + FW flow logs to S3 Forensics and anomaly detection
RAM scope Share to the org/OU, never external principals Don’t leak the hub outside the org
Hybrid prefix control DXGW allowed-prefixes + own route table On-prem reaches only cleared environments
Encryption in transit IPsec on VPN; MACsec/IPsec over DX DX is not encrypted by default
Endpoint policies IAM + aws:sourceVpce conditions on S3/DDB Bind data access to approved endpoints
TGW attachment ownership Spoke owns its attachment; network acct owns routing Least-privilege; no cross-account route edits
Resolver query logging scope Log all VPCs, ship to S3 + alarm on anomalies Detect DNS tunneling and exfil early

The identity-and-encryption layer pairs with Organizations SCPs, guardrails & delegated admin for the guardrails and AWS Network Firewall: centralized egress inspection for the inspection rules; the DNS-exfil controls live in Route 53 Resolver: DNS Firewall, endpoints, rules, hybrid resolution.

Cost & sizing

The hub’s bill has three movable drivers: TGW attachment-hours, TGW data-processing per-GB, and Network Firewall (endpoint-hours plus per-GB). NAT and cross-AZ data ride alongside. Figures are indicative (eu-west-1, USD; INR at ~₹84/USD) — confirm against the current price list.

Cost driver Rough unit price Scales with Lever to reduce
TGW attachment-hour ~$0.05 / attachment-hour Number of attachments Consolidate VPCs; don’t over-attach
TGW data processing ~$0.02 / GB Traffic crossing the TGW Keep east-west off; gateway endpoints for S3/DDB
Network Firewall endpoint ~$0.395 / endpoint-hour Endpoints (per AZ) Right-size AZ count
Network Firewall data ~$0.065 / GB Inspected GB Don’t hairpin bulk traffic
NAT gateway ~$0.045 / hour + ~$0.045 / GB Egress volume Gateway endpoints bypass NAT for S3/DDB
Cross-AZ data ~$0.01 / GB each way Cross-AZ traffic Attach and route within-AZ
Reachability Analyzer ~$0.10 / analysis Ad-hoc checks Negligible; use freely

A worked monthly estimate for a 30-account hub with three egress AZs, modest egress, and bulk S3 moved off the hub:

Line item Quantity Monthly USD Monthly INR (~₹84)
TGW attachments (35 × 730h) 35 attach ~$1,278 ~₹1.07 L
TGW data processing ~20 TB ~$400 ~₹33.6 K
Network Firewall endpoints (3 AZ) 3 × 730h ~$865 ~₹72.7 K
Network Firewall data ~8 TB ~$520 ~₹43.7 K
NAT gateways (3 AZ) 3 × 730h + data ~$200 ~₹16.8 K
Cross-AZ data minimized ~$80 ~₹6.7 K
Approx total ~$3,343/mo ~₹2.81 L/mo

Sizing rules of thumb, and the free-tier reality:

Question Guidance
How many AZs for the egress VPC? Match workload AZs (usually 2–3); each adds a firewall + NAT endpoint cost
When is centralized egress worth it? Past ~10 spokes, or any inspection/compliance requirement
What dominates the bill? Attachment-hours + firewall per-GB; control the latter by routing discipline
Free tier? None for TGW/Network Firewall; the lab incurs attachment-hours — tear down
Biggest accidental cost? Bulk AWS-service traffic on the public path (fix with gateway endpoints)

Interview & exam questions

Q1. Why does VPC peering not scale to a large multi-account estate? Peering is 1:1 and non-transitive: N VPCs need N(N-1)/2 peerings, and spoke A cannot reach spoke C through B. A Transit Gateway gives transitive, policy-controlled reachability with one attachment per VPC. (SAP-C02, ANS-C01)

Q2. Explain association vs propagation on a TGW route table. Association sets which route table an attachment uses for its own outbound decisions (exactly one). Propagation sets which route tables learn an attachment’s VPC CIDRs. Isolation between two domains is achieved by never propagating one into the other’s table. (ANS-C01)

Q3. How do you isolate prod from non-prod on a shared TGW without a firewall? Put each in its own route-table domain and never propagate one into the other’s table. Isolation is the absence of a route — no deny rule needed, and no per-GB inspection cost. (SAP-C02)

Q4. Why must centralized-egress flows be AZ-symmetric? Network Firewall endpoints are AZ-local and the stateful engine must see both directions of a flow. If the return path uses a different AZ’s endpoint, the engine never saw the forward direction and drops the return. Keep per-AZ route tables symmetric. (ANS-C01)

Q5. A spoke can send traffic out to the internet but replies never come back. What’s wrong? The egress VPC’s TGW route table is missing routes back to the spoke. Propagate every spoke into the egress domain, and ensure the firewall subnet route table sends spoke summaries back to the TGW. (ANS-C01)

Q6. How do you keep S3 traffic from inflating the TGW/firewall bill? Use a gateway VPC endpoint for S3 (and DynamoDB) in each spoke. It is free, adds a prefix-list route that wins by longest-prefix match, and bypasses the TGW, firewall, and NAT entirely. Enforce with an SCP keyed on aws:sourceVpce. (SAP-C02)

Q7. Why disable default route-table association and propagation when creating the TGW? The defaults auto-associate every attachment to one table and propagate everywhere, producing any-to-any reachability. Disabling them forces explicit, reviewable routing decisions, which segmentation depends on. (ANS-C01)

Q8. How should you advertise routes from AWS to on-prem over Direct Connect? Terminate DX on the TGW via a Direct Connect Gateway, put the attachment in its own route table, and advertise summarized super-blocks (your /12s) rather than hundreds of /20s — the DXGW has an allowed-prefixes limit. (ANS-C01)

Q9. What is the authoritative way to prove a path is open across the TGW? Reachability Analyzer (create-network-insights-path / start-network-insights-analysis). It evaluates the full data plane — routes, NACLs, security groups, AZ ENIs — not just whether a route exists in the console. (ANS-C01, SAP-C02)

Q10. When would you choose PrivateLink over a TGW? When you need to expose a single service across a trust boundary without granting network-layer reachability — especially with overlapping CIDRs, since PrivateLink does no IP routing. (SAP-C02)

Q11. How do you stop spoke accounts from bypassing the central egress? An SCP that denies creating internet gateways and NAT gateways in spoke accounts, so the only path to the internet is the spoke’s default route to the TGW and the central egress VPC. (SAP-C02)

Q12. What’s the regional scope of a TGW and how do you go global? A TGW is regional. For a global estate, deploy one TGW per region and join them with inter-region TGW peering attachments (static routes only), keeping CIDRs disjoint per region for clean summarization. (ANS-C01)

Quick check

  1. You associate a spoke attachment to the prod route table and propagate it into the shared table. Which direction of reachability does the propagation enable?
  2. Prod and non-prod must never reach each other but both must reach shared services. What propagation rule keeps them isolated?
  3. Egress works but return traffic is dropped under load only. What is the most likely cause?
  4. A spoke’s CIDR is 10.16.0.0/20 and another account provisioned 10.16.0.0/16. Can the TGW route both? Why or why not?
  5. Which single tool proves whether a path across the TGW is actually open, beyond what the route table shows?

Answers

  1. Propagating the spoke into the shared table advertises the spoke’s CIDR into the shared domain, so shared services can reach the spoke. To let the spoke reach shared services, you must also propagate shared into the prod table.
  2. Keep prod and non-prod in separate route-table domains and never propagate one into the other’s table; propagate shared into both. Isolation is the absence of a route.
  3. Asymmetric flow — the return path is going through a different AZ’s Network Firewall endpoint than the forward path, so the stateful engine drops it. Make per-AZ route tables symmetric (or enable appliance mode).
  4. No. The TGW is a longest-prefix-match router and the two prefixes overlap; it cannot route both. One VPC must be renumbered (new IPAM allocation + migration) or exposed via PrivateLink instead.
  5. Reachability Analyzer (create-network-insights-path / start-network-insights-analysis) — it evaluates routes, NACLs, security groups, and AZ ENIs end to end, unlike a console route lookup.

Glossary

Next steps

AWSVPCTransit GatewayNetworkingTerraformRAM
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments