Designing Multi-Account VPC Connectivity with Transit Gateway and Centralized Egress

A two-account network you can mesh by hand. A forty-account estate with prod/non-prod isolation, a single internet egress point, and on-prem connectivity is an architecture problem. Get the hub wrong early and you inherit a peering mess, overlapping CIDRs you can never renumber, and a flat network where any compromised workload can reach every other one. This is how to build the hub correctly the first time — and how to read it at 02:00 when a spoke “can’t reach the database” and you need to know in five minutes whether the problem is a missing route, a wrong association, a default-allow firewall, or an asymmetric flow the stateful engine is silently dropping.

The keystone is the AWS Transit Gateway (TGW) — a regional, horizontally-scaled cloud router that every VPC and every on-prem link attaches to once, replacing the O(N²) tangle of VPC peering. But a TGW is only as good as the discipline around it: a non-overlapping CIDR plan enforced by IPAM, Resource Access Manager (RAM) sharing so spokes attach without an invitation dance, and above all route-table segmentation — the single feature that lets one shared router enforce that prod and non-prod can never reach each other while both reach shared services and a single inspected egress point. Isolation here is the absence of a route, not a firewall rule, and that distinction is the whole game.

By the end you will be able to design the hub, share it across an AWS Organization, segment it into routing domains, force all egress through a central inspected path, resolve DNS consistently, terminate Direct Connect on the hub, and prove the data plane with Reachability Analyzer rather than trusting the console. Because this is a reference you will return to mid-incident, the topology choices, the IPAM hierarchy, the association/propagation matrix, the firewall policy actions, the service quotas, the cost drivers and the failure playbook are all laid out as scannable tables — read the prose once, then keep the tables open.

What problem this solves

Multi-account AWS is the default operating model — separate accounts for prod, non-prod, security, logging, and each business unit — because the account boundary is the strongest blast-radius and billing boundary AWS offers. But accounts are network islands. Two VPCs in two accounts cannot talk until you explicitly connect them, and the naive connectors do not scale: VPC peering is 1:1 and non-transitive, so N VPCs need N(N-1)/2 peerings and spoke A still cannot reach spoke C through B.

What breaks without a deliberate hub: teams peer VPCs ad hoc until the mesh is unmaintainable; someone provisions a VPC as 10.0.0.0/16, a second team does the same in another account, and now the two can never be routed to each other because you cannot renumber a live VPC; every spoke runs its own NAT gateway, so egress is sprawled across forty accounts with no central inspection or logging and a NAT bill multiplied by forty; and the network is flat — once anything is reachable, a compromised non-prod box can reach prod because nobody designed an isolation boundary into the routing.

Who hits this: any platform or cloud-foundations team past a handful of accounts; anyone running a landing zone; anyone with a compliance requirement to inspect and log egress centrally or to isolate environments at the network layer. The TGW with proper segmentation solves all four problems at once — transitive any-to-any-under-policy reachability, a single place to enforce isolation, one inspected egress point, and one hybrid termination point. The rest of this guide builds exactly that.

To frame the whole field before the deep dive, here is every problem this article solves, the failure you get without it, and the section that fixes it:

Problem	What breaks without the hub	The mechanism that fixes it	Section
Accounts are network islands	Ad-hoc peering mesh, O(N²)	TGW + RAM share	Step 2
CIDR collisions you can’t undo	Two VPCs claim 10.0.0.0/16; unroutable	IPAM hierarchy, disjoint super-blocks	Step 1
Flat network, no isolation	Non-prod box reaches prod	Route-table domains; isolation = no route	Step 3
Egress sprawl & no inspection	NAT × 40; nothing logged or filtered	Central egress VPC + Network Firewall	Step 4
Inconsistent DNS across spokes	PHZ/on-prem names don’t resolve	Route 53 Resolver endpoints + rules	Step 5
On-prem terminated per-VPC	Many VPN/DX tunnels, no control	Terminate DX/VPN on the TGW	Step 6
“Is the path open?” guesswork	Multi-hour incident triage	Reachability Analyzer + flow logs	Verify

Learning objectives

By the end of this article you can:

Choose deliberately between Transit Gateway, VPC peering, and PrivateLink for a given connectivity requirement, and explain why peering does not scale and PrivateLink sidesteps CIDR overlap.
Design a non-overlapping CIDR plan with AWS IPAM: a pooled hierarchy, disjoint per-environment super-blocks, and a reserved on-prem range, so route summarization stays clean and you never have to renumber.
Provision a TGW in a dedicated network account, disable default association/propagation, and share it org-wide with RAM so spokes attach without the invitation dance.
Build route-table segmentation using the association-vs-propagation model so prod and non-prod are isolated by the absence of a route, while both reach shared services and one egress point.
Force centralized egress through a Network Firewall → NAT → IGW path, keep flows AZ-symmetric for the stateful engine, default the policy to drop, and keep east-west traffic off the firewall to control cost.
Deliver consistent DNS with Route 53 Resolver inbound/outbound endpoints and RAM-shared forwarding rules, and terminate Direct Connect/VPN on the hub with per-environment hybrid route control.
Prove the data plane with Reachability Analyzer and TGW Flow Logs, and diagnose the common failure modes (blackholes, broken isolation, asymmetric drops, missing return routes) from a symptom→cause→confirm→fix playbook.

Prerequisites & where this fits

You should already understand single-VPC fundamentals — subnets, route tables, an internet gateway, a NAT gateway, security groups vs NACLs — and basic AWS Organizations (a management account, member accounts, OUs). You should be comfortable running aws CLI commands, reading JSON, and applying Terraform. This guide assumes the depth of Deep dive into VPC: subnets, routing, IGW, NAT, and endpoints and VPC networking fundamentals explained as the layer beneath it.

This sits at the network-foundations layer of a landing zone, directly above account vending and just below per-workload networking. It pairs tightly with CIDR & IPAM management: allocation and BYOIP at scale (the address plan it depends on), AWS Network Firewall: centralized egress inspection (the inspection layer), Route 53 Resolver: DNS Firewall, endpoints, rules, hybrid resolution (centralized DNS), and Direct Connect + Transit Gateway: resilient hybrid (hybrid termination). The guardrails come from Organizations SCPs, guardrails & delegated admin.

A quick map of who owns what, so you call the right team fast during an incident:

Layer	What lives here	Who usually owns it	Failure classes it can cause
IPAM / address plan	Pools, allocations, super-blocks	Network / platform	Overlap → unroutable spoke; summarization breaks
AWS Organizations / RAM	Account tree, OUs, resource shares	Cloud foundations	Spoke can’t see the shared TGW
Network account (hub)	TGW, route tables, egress VPC, resolver	Network team	Wrong association/propagation → broken isolation
Spoke account (VPC)	VPC, subnets, TGW attachment	App / dev team	Missing AZ ENI, wrong subnet, default route absent
Egress VPC	Network Firewall, NAT, IGW	Network / security	Default-allow; asymmetric drop; no return route
Hybrid (DX/VPN)	DXGW, VIFs, VPN attachment	Network team	Over-advertised prefixes; env reachable it shouldn’t be
Observability	TGW/VPC/FW flow logs, Athena	Platform / SRE	Blind triage; no exfil detection

Core concepts

Six mental models make every later decision obvious.

A Transit Gateway is a regional router, not a global one. The TGW is a horizontally-scaled, managed router that lives in one region. Everything in that region — VPCs, VPN, Direct Connect via a Direct Connect Gateway — attaches to it once and gets transitive reachability, governed by route tables. A global estate needs one TGW per region, joined with inter-region peering attachments. Plan accounts and CIDRs with that regional boundary in mind from day one; a packet from eu-west-1 to us-east-1 crosses a peering attachment, and the CIDR plan must keep regions disjoint.

An attachment is an ENI in your subnets; routing happens in the TGW. When a VPC attaches to a TGW, the platform places an elastic network interface in one subnet per AZ you choose. Traffic only reaches AZs where the attachment has an ENI — attach in every AZ you run workloads in, or that AZ’s traffic blackholes. The VPC’s own route table sends a destination (often 0.0.0.0/0 or a summary) to the TGW attachment; from there, the TGW route table the attachment associates to makes the next-hop decision.

Association decides which table you use; propagation decides which tables learn your routes. This is the crux of segmentation and the line everyone gets backwards at first. Association = “which TGW route table do I consult for my outbound decisions” — every attachment associates to exactly one. Propagation = “into which TGW route tables do my VPC’s CIDRs get advertised.” To let prod reach shared services, you propagate the shared-services attachment into the prod table and propagate prod into the shared table. Because you never propagate prod into the non-prod table (and vice versa), those two domains have no route to each other even though they share one router. Isolation is the absence of a route, not a firewall rule.

Allocation is finite and the CIDR plan is permanent. The TGW route table is a longest-prefix-match router. Two VPCs both advertising 10.0.0.0/16 cannot both be routed, and you cannot renumber a live VPC without downtime. Solve allocation centrally, before anyone provisions a VPC, with AWS IPAM as the single source of truth and disjoint super-blocks per environment so summarization stays clean.

Centralized egress trades NAT sprawl for one inspected, billed path. Instead of a NAT gateway in every spoke, you run one egress VPC in the network account with NAT gateways and an AWS Network Firewall, and point every spoke’s default route at the TGW, which forwards to the egress VPC. The catch: TGW data-processing and Network Firewall both bill per-GB, firewall endpoints are AZ-local (so flows must be AZ-symmetric or the stateful engine drops return traffic), and you must propagate every spoke back into the egress route table or the return path blackholes.

The data plane is the source of truth; the console lies by omission. A route can exist in the console and still not deliver a packet (wrong AZ, NACL, security group, asymmetric firewall). Reachability Analyzer and TGW Flow Logs are the authoritative way to prove a path is — or is not — open, end to end across the whole hub.

The vocabulary in one table

Pin down every moving part before the deep sections. The glossary repeats these for lookup; this is the mental model side by side:

Concept	One-line definition	Where it lives	Why it matters
Transit Gateway (TGW)	Regional managed router	Network account, per region	The hub everything attaches to
Attachment	An ENI-backed link (VPC/VPN/DX/peer)	In your subnets / TGW	No ENI in an AZ → that AZ blackholes
TGW route table	A routing domain	On the TGW	The segmentation primitive
Association	Which table an attachment uses	Attachment ↔ table	“My outbound decisions”
Propagation	Which tables learn an attachment’s CIDRs	Attachment → tables	“Who can route to me”
IPAM pool	Hierarchical address allocator	IPAM scope	Source of truth; no overlaps
RAM share	Cross-account resource grant	RAM	Spokes attach without invites
Egress VPC	Central NAT + inspection VPC	Network account	One inspected internet exit
Network Firewall	Stateful/stateless inspection	Egress VPC, AZ-local	Drops/allows egress; per-GB billed
Resolver endpoint	Inbound/outbound DNS NIC	Shared/egress VPC	Hybrid + consistent DNS
Blackhole route	A route that drops traffic	TGW route table	Intentional isolation or a bug
Reachability Analyzer	Path-proving service	VPC	Authoritative data-plane test

Topology: Transit Gateway vs. peering vs. PrivateLink

These three are not competitors; they solve different problems. Pick deliberately — choosing peering for a forty-account estate, or a TGW to share a single API, are both expensive mistakes.

Pattern	Connectivity model	Transitive routing	Scales to	Best for	Worst for
VPC peering	1:1, full IP reachability	No	A handful of VPCs	Lowest latency, no hub fee, 2–10 VPCs	Many VPCs (N(N-1)/2), any transitive need
Transit Gateway	Hub-and-spoke, regional router	Yes (policy-controlled)	Thousands of attachments	Many VPCs/accounts, segmentation, hybrid, central egress	Sharing one app without IP routing
PrivateLink	Service endpoint (one ENI)	N/A (no IP routing)	One service, many consumers	Exposing a single service across a trust boundary	Everything-talks-to-everything

Peering does not scale, and the reason is arithmetic. PrivateLink is the right tool when you want to share one application without granting network-layer reachability — it sidesteps CIDR overlap entirely because there is no routing. For everything-talks-to-everything-under-policy across accounts, the TGW is the answer. Here is the head-to-head on the dimensions that actually decide a design:

Dimension	VPC peering	Transit Gateway	PrivateLink
Connections for N VPCs	N(N-1)/2	N (one each)	1 endpoint per consumer/service
Transitive (A→B→C)	No	Yes	N/A
Overlapping CIDRs allowed	No	No (LPM router)	Yes (no IP routing)
Per-GB data charge	No	Yes (data processing)	Yes (per-GB + endpoint-hour)
Cross-region	Yes (inter-region peering)	Yes (TGW peering)	Yes (with extra config)
Central inspection point	No	Yes (egress VPC)	N/A
Bandwidth ceiling	VPC-to-VPC line rate	~50 Gbps per VPC attachment (burst)	Per-ENI
Typical use	2–10 VPCs, latency-sensitive	Landing-zone hub	Internal API / SaaS endpoint

A TGW is a regional resource. A global estate needs one TGW per region, joined with inter-region peering attachments. Plan accounts and CIDRs with that boundary in mind from day one. The rest of this guide builds the regional hub.

Step 1 — A non-overlapping CIDR plan with IPAM

The single most expensive mistake in multi-account networking is CIDR overlap. The TGW route table is a longest-prefix-match router; two VPCs advertising 10.0.0.0/16 cannot both be routed, and you cannot renumber a live VPC. Solve allocation centrally before anyone provisions a VPC.

Use AWS IPAM as the source of truth. Carve a top-level pool, then per-environment and per-region pools beneath it, and force every VPC to draw from IPAM so uniqueness is guaranteed by construction rather than by a spreadsheet someone forgets to update.

resource "aws_vpc_ipam" "main" {
  operating_regions { region_name = "eu-west-1" }
}

resource "aws_vpc_ipam_pool" "top" {
  address_family = "ipv4"
  ipam_scope_id  = aws_vpc_ipam.main.private_default_scope_id
  locale         = "eu-west-1"
}

resource "aws_vpc_ipam_pool_cidr" "top" {
  ipam_pool_id = aws_vpc_ipam_pool.top.id
  cidr         = "10.0.0.0/8"
}

# Environment pool: prod gets a /12 out of the /8
resource "aws_vpc_ipam_pool" "prod" {
  address_family      = "ipv4"
  ipam_scope_id       = aws_vpc_ipam.main.private_default_scope_id
  locale              = "eu-west-1"
  source_ipam_pool_id = aws_vpc_ipam_pool.top.id
}

resource "aws_vpc_ipam_pool_cidr" "prod" {
  ipam_pool_id   = aws_vpc_ipam_pool.prod.id
  netmask_length = 12
}

Spoke VPCs then allocate from the pool instead of hard-coding a block. IPAM hands out a free, non-overlapping range and tracks it:

resource "aws_vpc" "spoke" {
  ipv4_ipam_pool_id    = aws_vpc_ipam_pool.prod.id
  ipv4_netmask_length  = 20            # IPAM hands out a free /20
  enable_dns_support   = true
  enable_dns_hostnames = true
}

Reserve disjoint super-blocks per environment so route-table summarization stays clean later, and reserve a separate block for on-prem so hybrid routes never collide. A worked allocation that leaves room to grow and summarizes to one prefix per domain:

Domain	Super-block	Mask	VPC size handed out	Approx VPC capacity	Summarized as
Prod	10.16.0.0/12	/12	/20	~16,000 /20s	10.16.0.0/12
Non-prod	10.32.0.0/12	/12	/20	~16,000 /20s	10.32.0.0/12
Shared services	10.48.0.0/12	/12	/22	~64,000 /22s	10.48.0.0/12
Egress / inspection	10.64.0.0/16	/16	/24	~256 /24s	10.64.0.0/16
On-prem (reserved)	10.200.0.0/13	/13	n/a (BGP)	data-center owned	10.200.0.0/13
Future / spare	10.96.0.0/11	/11	reserved	growth	—

The IPAM hierarchy itself, level by level, and what each level is for:

IPAM level	Scope / mask	Owns	Why it exists
Top pool	private scope, /8	The whole RFC-1918 space	Single root of truth
Locale	region binding	A region’s allocations	Keeps regions disjoint
Environment pool	/12–/13	prod / non-prod / shared	Summarizable domains
Account/team pool (optional)	/16	One BU or account	Delegated self-service
VPC allocation	/20–/24	One VPC	Drawn at provision time

IPAM allocation settings worth knowing, with their defaults and the gotcha each guards against:

Setting	What it controls	Default	When to change	Gotcha if wrong
`auto_import`	Pull existing CIDRs into the pool	false	Migrating legacy VPCs	Imports overlaps as findings, not blocks
`allocation_min_netmask_length`	Smallest block IPAM will hand out	pool-defined	Enforce a floor (e.g. /24)	Teams grab huge blocks
`allocation_max_netmask_length`	Largest mask (smallest network)	pool-defined	Cap tiny allocations	Fragmentation
`allocation_default_netmask_length`	Default size on request	none	Standardize VPC size	Inconsistent VPCs
`publicly_advertisable`	BYOIP advertisement	false	BYOIP only	Accidental public advertise
Resource discovery (org)	Cross-account CIDR visibility	off	Multi-account (always)	Blind to other accounts’ overlaps

Renumbering is not an option for a live VPC. Every byte of effort spent on the address plan now saves a quarter of migration pain later. If you inherit overlaps, the only clean fixes are PrivateLink (no routing) for the affected service or a brand-new VPC behind a fresh IPAM allocation with a workload migration — never a TGW route hack.

Step 2 — Provision the TGW and share it with RAM

Create the TGW in a dedicated network account (part of your AWS Organizations structure), then share it to every other account with Resource Access Manager (RAM). Turn off the default automation so route propagation and association become explicit, policy-driven decisions rather than accidents.

resource "aws_ec2_transit_gateway" "hub" {
  description                     = "Org hub TGW"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
  dns_support                     = "enable"
  vpn_ecmp_support                = "enable"
  amazon_side_asn                 = 64512   # for any future BGP attachments
  tags = { Name = "tgw-hub" }
}

The TGW-level options you set once, with their effect and the recommended value for a segmented hub:

Option	Values	Default	Recommended (segmented hub)	Why
`default_route_table_association`	enable / disable	enable	disable	Force explicit association; segmentation depends on it
`default_route_table_propagation`	enable / disable	enable	disable	No accidental any-to-any reachability
`dns_support`	enable / disable	enable	enable	Cross-VPC DNS resolution over the TGW
`vpn_ecmp_support`	enable / disable	enable	enable	Multi-path over redundant VPN tunnels
`amazon_side_asn`	64512–65534, 4200000000–4294967294	64512	A value you control	BGP for DX/VPN; avoid clashing with on-prem ASN
`multicast_support`	enable / disable	disable	disable (unless needed)	Niche; off by default
`auto_accept_shared_attachments`	enable / disable	disable	disable	Approve attachments deliberately
`transit_gateway_cidr_blocks`	CIDR list	none	set for Connect/peering	Required for some attachment types

Sharing with the whole organization removes the per-account invitation dance. This requires that you have enabled RAM sharing within AWS Organizations once (aws ram enable-sharing-with-aws-organization):

resource "aws_ram_resource_share" "tgw" {
  name                      = "tgw-hub-share"
  allow_external_principals = false
}

resource "aws_ram_resource_association" "tgw" {
  resource_arn       = aws_ec2_transit_gateway.hub.arn
  resource_share_arn = aws_ram_resource_share.tgw.arn
}

# Share to the entire org (or to specific OUs by ARN)
resource "aws_ram_principal_association" "org" {
  principal          = "arn:aws:organizations::111122223333:organization/o-exampleorgid"
  resource_share_arn = aws_ram_resource_share.tgw.arn
}

RAM lets you scope the share precisely; pick the narrowest principal that still avoids per-account toil:

RAM principal type	Example ARN / value	Scope	When to use
Whole organization	`organization/o-xxxx`	Every account, now and future	Landing-zone default
Organizational unit	`ou-xxxx-yyyy`	Accounts under that OU	Share only to workload OUs
Single account ID	`123456789012`	One account	Pilots, exceptions
IAM role/user (external)	role ARN	One principal	Rare; requires `allow_external_principals=true`

Once shared, a spoke account creates its attachment locally, referencing the shared TGW ID. This is the clean ownership split: the network account owns the TGW and its route tables; the spoke owns its VPC and attachment.

resource "aws_ec2_transit_gateway_vpc_attachment" "spoke" {
  transit_gateway_id = "tgw-0abc123..."        # the shared TGW
  vpc_id             = aws_vpc.spoke.id
  subnet_ids         = [for s in aws_subnet.tgw : s.id]  # one /28 per AZ
  dns_support        = "enable"
  appliance_mode_support = "disable"           # enable only for inline appliances
  tags = { Name = "att-spoke-prod-app1" }
}

Attachment options and when each matters — appliance_mode is the one people miss and it changes flow symmetry:

Attachment option	Values	Default	When to change	Effect
`subnet_ids`	one subnet per AZ	required	Always one /28 per AZ	Places the ENI; missing AZ = blackhole
`dns_support`	enable / disable	enable	rarely	DNS resolution over the attachment
`appliance_mode_support`	enable / disable	disable	inline inspection appliance VPC	Pins a flow to one AZ’s appliance (symmetry)
`ipv6_support`	enable / disable	disable	dual-stack	IPv6 routing over the TGW
`transit_gateway_default_route_table_association`	bool (provider)	follows TGW	keep disabled	Explicit association instead

Give the TGW its own tiny attachment subnets — a /28 per AZ is plenty — separate from workload subnets. Attach in every AZ you run workloads in; an attachment only delivers traffic to AZs where it has an ENI, and intra-AZ traffic avoids cross-AZ data charges.

The attachment types a TGW supports, and the route mechanism each uses:

Attachment type	Connects	Routes via	Notes
VPC	A VPC in any account	Static + propagation	The common case; one ENI per AZ
VPN (Site-to-Site)	On-prem over IPsec	BGP (dynamic) or static	ECMP across tunnels
Direct Connect (via DXGW)	On-prem over DX	BGP	Through a Direct Connect Gateway
TGW peering	Another TGW (cross-region)	Static only	No transitive peering; per-pair
Connect (GRE)	SD-WAN / virtual routers	BGP over GRE	For third-party appliances
Multicast domain	Multicast group members	Multicast routing	Niche; off by default

Service quotas and limits that bite

Design to the documented quotas, not to a guess — and treat the soft ones as raisable via Service Quotas, the hard ones as design constraints. These are the figures that most often force a redesign; confirm the current values and your account’s applied limits before you build to a ceiling:

Limit	Default / value	Soft or hard	What hitting it looks like
VPC attachments per TGW	~5,000	Soft (raisable)	New attachment rejected at scale
Routes per TGW route table	~10,000	Hard	Propagation/route install fails
TGW route tables per TGW	~20	Soft	Can’t add another routing domain
Attachments per VPC (same TGW)	5	Hard	Limits per-AZ/redundant designs
TGWs per Region per account	~5	Soft	Can’t split into more hubs
Subnets per VPC attachment (AZs)	one per AZ	Hard	Missing AZ = blackhole
Bandwidth per VPC attachment	~50 Gbps (burst)	Hard	Throughput ceiling per VPC
DXGW allowed prefixes to on-prem	~20	Hard	Over-advertised routes dropped
DXGW associations (TGWs)	~6	Hard	Limits hub count behind one DXGW
Peering attachments per TGW	~50	Soft	Multi-region fan-out ceiling
Resolver endpoints’ ENIs (per endpoint)	≥2	Hard	Need 2+ for AZ resilience
RAM resource shares per account	~5,000	Soft	Many fine-grained shares

Step 3 — Route-table segmentation

This is where a TGW earns its keep. A TGW route table is a routing domain. By controlling which attachments associate to a domain (which table they consult for outbound decisions) and which propagate into it (whose CIDRs appear there), you build isolation that a flat network cannot. The classic layout:

                 +------------------+
   prod spokes ->| prod RT          |  -> shared svc, egress (no non-prod)
                 +------------------+
                 +------------------+
non-prod spokes->| non-prod RT      |  -> shared svc, egress (no prod)
                 +------------------+
                 +------------------+
 shared svc VPC ->| shared RT        |  -> prod + non-prod (serves both)
                 +------------------+
                 +------------------+
       egress VPC->| egress RT       |  <- default route lives here; learns ALL spokes
                 +------------------+

Goal: prod talks to prod and to shared services; non-prod talks to non-prod and to shared services; prod and non-prod never reach each other; everyone reaches the internet only through the central egress VPC. Here is the full association/propagation matrix — read a row as “this attachment associates to its own table and propagates into the ticked tables”:

Attachment ↓ / propagates into →	prod RT	non-prod RT	shared RT	egress RT	hybrid RT
Prod spoke (assoc: prod)	self	—	✓	✓	✓ (if cleared)
Non-prod spoke (assoc: non-prod)	—	self	✓	✓	—
Shared-svc VPC (assoc: shared)	✓	✓	self	✓	✓ (if cleared)
Egress VPC (assoc: egress)	static 0/0	static 0/0	static 0/0	self	—
Hybrid (DX/VPN) (assoc: hybrid)	✓ (if cleared)	—	✓	—	self

The mental model that keeps this straight, stated as a decision table you can apply to any new requirement:

If you want…	Then…	Concretely
A to use a domain’s routes for its decisions	Associate A to that table	Prod spoke associates to prod RT
B to be reachable from A’s domain	Propagate B into A’s table	Propagate shared into prod RT
A↔B mutual reachability	Propagate each into the other’s table	Prod↔shared both directions
A and B fully isolated	Propagate neither into the other’s table	Prod & non-prod: no mutual propagation
Everyone to reach the internet	Static 0/0 → egress attachment in each domain	Per-domain default route
Egress to return traffic to a spoke	Propagate that spoke into the egress table	All spokes → egress RT

In Terraform, a prod spoke associates to the prod table and propagates into shared so shared services can reach it:

resource "aws_ec2_transit_gateway_route_table" "prod" {
  transit_gateway_id = aws_ec2_transit_gateway.hub.id
  tags = { Name = "rt-prod" }
}

# A prod spoke associates to the prod table...
resource "aws_ec2_transit_gateway_route_table_association" "prod_app1" {
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.spoke.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.prod.id
}

# ...and propagates its CIDR INTO the shared-services table (so shared svc can reach it)
resource "aws_ec2_transit_gateway_route_table_propagation" "prod_into_shared" {
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.spoke.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.shared.id
}

The default route to the egress VPC is a static route in each spoke domain pointing at the egress attachment:

resource "aws_ec2_transit_gateway_route" "prod_default" {
  destination_cidr_block         = "0.0.0.0/0"
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.egress.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.prod.id
}

Static vs propagated routes behave differently when they collide; know which wins:

Route property	Static route	Propagated route
Source	You declare it	Learned from an attachment (BGP/auto)
Priority on exact-match tie	Wins over propagated	Loses to a static of same prefix
Used for	Default 0/0 → egress; blackholes	Reachability of real VPC CIDRs
Survives attachment delete	Yes (manual cleanup)	No (withdrawn)
Longest-prefix match	Applies first, across both	Applies first, across both
Blackhole option	Yes (explicit drop)	No

Isolation is the absence of a route. Because you never propagate prod into the non-prod table (and vice versa), those two domains have no route to each other even though they share one TGW. You do not need a firewall rule to keep them apart — you need the route to simply not exist. This is cheaper (no per-GB inspection) and harder to misconfigure than a deny rule.

A common segmentation pattern beyond the four-domain default, with the trade-off each carries:

Segmentation model	Route tables	Isolation strength	Operational cost
Flat (one table)	1	None — any-to-any	Lowest; unsafe past a few VPCs
Env split (this guide)	prod / non-prod / shared / egress	Strong env boundary	Moderate; the sweet spot
Per-BU domains	One per business unit	BU-level blast radius	Higher; many tables
Per-tier (web/app/data)	Tier tables	East-west micro-isolation	High; verbose, use SGs instead
Inspection-forced	All east-west via firewall RT	Maximum (everything inspected)	Highest $ (per-GB on all flows)

Step 4 — Centralized egress through a shared NAT + Network Firewall VPC

Per-VPC NAT gateways are a cost and governance sprawl: every spoke pays for NAT, and you have no single place to inspect or log egress. Consolidate into one egress VPC in the network account that owns the NAT gateways and an AWS Network Firewall for inspection. Spoke default routes already point at this VPC’s TGW attachment (Step 3).

The traffic path matters. Inside the egress VPC, force the flow TGW → firewall endpoint → NAT gateway → internet. That means three subnet tiers per AZ and route tables that hand off between them:

ingress from TGW
   |
   v  (TGW subnet route table: 0.0.0.0/0 -> firewall endpoint)
[firewall subnet: AWS Network Firewall endpoint]
   |
   v  (firewall subnet route table: 0.0.0.0/0 -> NAT gateway)
[public subnet: NAT gateway + IGW]
   |
   v
internet

The exact route tables that build that hand-off — get one wrong and traffic bypasses the firewall or blackholes:

Subnet tier (per AZ)	Route added	Next hop	Purpose
TGW attachment subnet	`0.0.0.0/0`	Firewall endpoint (this AZ)	Force inbound-from-TGW through inspection
TGW attachment subnet	spoke summaries	(local/TGW)	Return path knowledge
Firewall subnet	`0.0.0.0/0`	NAT gateway (this AZ)	Inspected traffic to NAT
Firewall subnet	spoke summaries (e.g. 10.16.0.0/12)	TGW attachment	Return traffic back to spokes
Public subnet	`0.0.0.0/0`	Internet gateway	NAT to the internet
Public subnet	spoke summaries	Firewall endpoint (this AZ)	Symmetric return through inspection

resource "aws_networkfirewall_firewall" "egress" {
  name                = "fw-central-egress"
  firewall_policy_arn = aws_networkfirewall_firewall_policy.egress.arn
  vpc_id              = aws_vpc.egress.id

  dynamic "subnet_mapping" {
    for_each = aws_subnet.firewall
    content { subnet_id = subnet_mapping.value.id }
  }
}

Critically, set the firewall policy to drop unmatched traffic and add explicit allow rules — a default-allow inspection layer inspects nothing useful:

resource "aws_networkfirewall_firewall_policy" "egress" {
  name = "policy-central-egress"
  firewall_policy {
    stateless_default_actions          = ["aws:forward_to_sfe"]
    stateless_fragment_default_actions = ["aws:forward_to_sfe"]
    stateful_engine_options { rule_order = "STRICT_ORDER" }
    stateful_default_actions = ["aws:drop_established", "aws:alert_established"]

    stateful_rule_group_reference {
      resource_arn = aws_networkfirewall_rule_group.allowlist.arn
      priority     = 100
    }
  }
}

The Network Firewall policy actions, what each does, and where to use it — the stateless/stateful split trips people up:

Action	Engine	Meaning	Use for
`aws:pass`	stateless/stateful	Allow, stop evaluating	Known-good flows
`aws:drop`	stateless/stateful	Silently discard	Default-deny posture
`aws:forward_to_sfe`	stateless	Hand to the stateful engine	Default stateless action
`aws:alert`	stateful	Log but allow	Triage / detection-only
`aws:drop_established`	stateful default	Drop unless a rule allowed it	The secure default
`aws:alert_established`	stateful default	Log the dropped flow	Visibility on drops

Rule-order matters; the two modes evaluate very differently:

Rule order	Evaluation	Default action semantics	When to use
`DEFAULT_ACTION_ORDER`	Pass rules, then drop, then alert (action groups)	Implicit ordering	Simple allowlists
`STRICT_ORDER`	Strict numeric priority, top-down	You set the default explicitly	Production; predictable, auditable

The return path is the part people miss: the TGW route table for the egress VPC must carry routes back to every spoke CIDR (propagate all spokes into the egress domain), and the firewall subnet route table needs each spoke summary pointing back at the TGW. Because firewall endpoints are AZ-local, keep traffic symmetric — route an AZ’s flow through that same AZ’s firewall endpoint so the stateful engine sees both directions.

A side-by-side of centralized vs per-spoke egress so the trade-off is explicit:

Aspect	Per-spoke NAT (no hub)	Centralized egress (this design)
NAT gateways	One+ per spoke VPC	A few (per AZ in egress VPC)
Inspection	None (or N firewalls)	One Network Firewall
Logging	Scattered	Central (S3 / CW)
Cost shape	NAT × N spokes	NAT × AZ + TGW/FW per-GB
Egress IP allow-listing	N sets of EIPs	One small set of EIPs
Governance (SCP block IGW)	Hard (each VPC needs IGW)	Easy (only egress VPC has IGW)
Failure blast radius	Per-VPC	Shared egress (design for AZ HA)

Network Firewall is billed per endpoint-hour plus per-GB processed. Centralizing means you pay for the endpoints once instead of per spoke, but the per-GB cost is real — this is why we drop east-west prod/non-prod traffic at the TGW (free, via missing routes) rather than hairpinning it through the firewall, and why bulk AWS-service traffic (S3, DynamoDB) gets gateway endpoints in the spoke (Step 4’s cost note below and the Real-world scenario).

Step 5 — Centralized DNS with Route 53 Resolver

Spokes need to resolve private hosted zones, on-prem names, and AWS service endpoints consistently. Run Route 53 Resolver endpoints in the shared-services (or egress) VPC and point every spoke at them, rather than standing up resolver infrastructure in every account.

An inbound resolver endpoint lets on-prem DNS forward AWS private names into Route 53.
An outbound resolver endpoint plus forwarding rules sends queries for on-prem domains (e.g. corp.internal) to the on-prem resolvers.

resource "aws_route53_resolver_endpoint" "outbound" {
  name      = "rslv-outbound"
  direction = "OUTBOUND"
  security_group_ids = [aws_security_group.resolver.id]
  dynamic "ip_address" {
    for_each = aws_subnet.resolver
    content { subnet_id = ip_address.value.id }
  }
}

resource "aws_route53_resolver_rule" "onprem" {
  name                 = "fwd-corp-internal"
  domain_name          = "corp.internal"
  rule_type            = "FORWARD"
  resolver_endpoint_id = aws_route53_resolver_endpoint.outbound.id
  target_ip { ip = "10.200.0.10" }
  target_ip { ip = "10.200.0.11" }
}

Share the rule across accounts with RAM, then associate it in each spoke VPC so the spoke honors it. The resolver building blocks and what each is for:

Component	Direction	Resolves	Shared via	Notes
Inbound endpoint	On-prem → AWS	PHZ / AWS private names	n/a (in hub VPC)	2+ ENIs across 2 AZs
Outbound endpoint	AWS → on-prem	Forwarded domains	n/a (in hub VPC)	SG must allow TCP+UDP 53
FORWARD rule	AWS → target IPs	e.g. `corp.internal`	RAM (share + associate)	Target 2 on-prem IPs in 2 sites
SYSTEM rule	—	Override a FORWARD for a subdomain	RAM	Carve out exceptions
Private hosted zone	In-VPC	e.g. `aws.example.com`	VPC association	Associate to each spoke (or automate)
`.2` resolver	In-VPC	Everything (default)	implicit	Rules ride underneath it

The forwarding-rule types, since picking the wrong one silently breaks resolution:

Rule type	Behaviour	Use when
`FORWARD`	Send matching queries to target IPs	On-prem or third-party DNS
`SYSTEM`	Use Route 53 Resolver, ignore a broader FORWARD	Exempt a subdomain (e.g. an AWS PHZ inside `corp.internal`)
`RECURSIVE` (default behaviour)	Standard Route 53 resolution	No rule needed

For private hosted zones, associate the zone with each spoke VPC — or, at scale, share it and automate association. Spokes keep using the VPC .2 resolver; the rules ride underneath. The security-group rules the resolver endpoints need (a frequent failure point — UDP works, TCP for large answers does not):

Endpoint	Direction	Protocol/port	Source/Dest	Why
Outbound	Egress	UDP 53	On-prem resolver IPs	Standard DNS
Outbound	Egress	TCP 53	On-prem resolver IPs	Answers > 512 bytes / DNSSEC
Inbound	Ingress	UDP 53	On-prem CIDR	On-prem queries in
Inbound	Ingress	TCP 53	On-prem CIDR	Large answers / zone-ish queries

Step 6 — Hybrid connectivity into the hub

Terminate Direct Connect or Site-to-Site VPN on the TGW, not on individual VPCs — that is the whole point of the hub. For Direct Connect, associate a Transit VIF with a Direct Connect Gateway, then attach that DXGW to the TGW. For VPN, create a VPN attachment directly:

resource "aws_ec2_transit_gateway_dx_gateway_attachment" "dx" {
  transit_gateway_id = aws_ec2_transit_gateway.hub.id
  dx_gateway_id      = aws_dx_gateway.main.id
}

Put the hybrid attachment in its own TGW route table. This lets you control exactly which environments on-prem can reach: propagate prod into the hybrid table only if prod is allowed to talk to the data center, and on the hybrid attachment associate a table that propagates only the environments cleared for on-prem. Advertise summarized routes (your reserved super-blocks from Step 1) over BGP rather than hundreds of /20s — the DXGW has an allowed-prefixes limit, and summarization keeps you well under it.

DX vs VPN onto the TGW, on the dimensions that decide which (or both) you use:

Dimension	Direct Connect (via DXGW)	Site-to-Site VPN
Transport	Private fiber	IPsec over internet
Bandwidth	1/10/100 Gbps ports	~1.25 Gbps per tunnel (ECMP to scale)
Latency / jitter	Low, consistent	Variable (internet)
Provisioning time	Weeks (cross-connect)	Minutes
Encryption	Not by default (add MACsec / IPsec)	Built-in IPsec
Cost shape	Port-hours + data	Tunnel-hours + data
Resilience pattern	2 DX at 2 locations	2 tunnels per connection; VPN as DX backup
Routing	BGP via DXGW → TGW	BGP or static; ECMP with `vpn_ecmp_support`

BGP prefix discipline on the hybrid edge — summarize or you will hit the limit and starve the table:

Knob	Why it matters	Good practice
DXGW allowed prefixes	Hard cap on what AWS advertises to on-prem	Advertise summarized /12s, not /20s
On-prem advertised routes	Counts against TGW route limits	Summarize the data-center space (one /13)
BGP communities / AS-path	Influence path selection, DX-vs-VPN failover	Tag DX-preferred; longer AS-path on VPN backup
Blackhole on withdrawal	Avoid stale routes	Let propagation withdraw on link down

Architecture at a glance

Trace a single packet to internalize the whole design. A workload in a prod spoke VPC wants to reach a SaaS API on the internet. Its subnet route table has 0.0.0.0/0 pointing at the TGW attachment (an ENI sitting in a tiny /28 per-AZ subnet, with a CIDR IPAM handed out from the prod /12). The packet enters the TGW and lands in the prod route-table domain the attachment associates to. That domain has a static 0.0.0.0/0 to the egress VPC attachment — but crucially it has no route to the non-prod /12, because non-prod was never propagated here. That missing route is the isolation: prod simply cannot address non-prod. The default route forwards the packet to the egress VPC in the network account, where the TGW-attachment subnet’s route table sends 0.0.0.0/0 to that AZ’s Network Firewall endpoint. The firewall, running a STRICT-ORDER policy that defaults to drop, checks the flow against the allowlist; if permitted, the firewall subnet’s route table forwards to that same AZ’s NAT gateway, which translates to its Elastic IP and exits via the internet gateway. Return traffic retraces the path through the same AZ’s firewall endpoint — AZ symmetry is mandatory or the stateful engine drops the return.

Off to the side, the same hub carries hybrid and DNS: a Direct Connect Gateway attaches to the TGW in its own route table (so you choose exactly which environments on-prem can reach), and Route 53 Resolver endpoints plus RAM-shared FORWARD rules give every spoke consistent resolution of on-prem and private-zone names. Everything is observed: TGW Flow Logs, VPC Flow Logs, and firewall logs land in S3 for Athena. The numbered badges mark the five places this architecture most often fails — an overlapping CIDR or a missing AZ ENI (1), a wrong association/propagation that breaks isolation (2), a default-allow or asymmetric firewall (3), a missing egress return route or an avoidable NAT/firewall bill (4), and an over-advertised hybrid prefix or absent flow logs (5). The legend narrates each as symptom, confirm, and fix.

Real-world scenario

A retail platform team — call them NorthWind Retail — had the textbook hub from this guide running across ~30 accounts: prod and non-prod isolated by route-table domains, all egress hairpinned through one Network Firewall VPC, Direct Connect terminated on the TGW for store-back-office connectivity, and Route 53 Resolver giving every account consistent DNS. It worked beautifully — until the AWS bill for the network account tripled in a single month and the FinOps lead escalated.

The investigation, driven by TGW Flow Logs queried in Athena, found the culprit fast: every spoke was reaching S3 over the public path — TGW data-processing, plus Network Firewall per-GB, plus NAT data-processing — for what was internal bulk data. A nightly analytics job alone pushed terabytes through the central firewall, and the per-GB charges on both the TGW and the firewall dwarfed the compute. The team had centralized egress for governance and accidentally routed bulk storage traffic through the most expensive path in the account.

The numbers told the story precisely:

Egress path for S3 traffic	TGW data-proc	Firewall per-GB	NAT data-proc	Net per-GB cost	Inspected?
Through the hub (before)	Yes	Yes	Yes	Highest	Yes (but pointless for S3)
Gateway VPC endpoint (after)	No	No	No	~Free	No (not needed; IAM-scoped)
Interface endpoint (PrivateLink)	No	No	No	Endpoint-hour + per-GB	No

The fix was to keep S3 and DynamoDB traffic off the hub entirely with gateway VPC endpoints in each spoke. A gateway endpoint is free, adds a prefix-list route in the spoke’s own route table, and never touches the TGW or the firewall:

resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.spoke.id
  service_name      = "com.amazonaws.eu-west-1.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private.id]
}

The gotcha: the gateway-endpoint prefix-list route is longest-prefix-match against the 0.0.0.0/0 that pointed at the TGW, so it wins automatically for S3 — but only inside the VPC that owns the endpoint. They templated it into the spoke module so every account got one by default, then used an SCP Deny on s3:* unless aws:sourceVpce matched an approved endpoint, closing the public path for good. The result: firewall-processed GB dropped by roughly 70%, the network account bill fell back below its prior baseline, and the central inspection layer went back to inspecting traffic that actually leaves the estate. The lesson the team wrote into their design guide: centralize egress for what needs inspecting; never hairpin bulk AWS-service traffic that has a free private path.

Advantages and disadvantages

Advantages	Disadvantages
Transitive any-to-any reachability under policy	Regional resource; multi-region needs peering + planning
One router replaces an O(N²) peering mesh	Per-GB data-processing charge on all TGW traffic
Strong env isolation via route-table domains	Association/propagation model is easy to get backwards
One inspected, logged, allow-listed egress point	Egress VPC is a shared dependency — must design AZ-HA
Single hybrid (DX/VPN) termination for all accounts	Firewall per-GB cost if you hairpin bulk traffic
RAM org-share removes per-account toil	CIDR overlaps are unfixable without renumbering
Clean ownership split (network acct vs spoke)	Cross-AZ traffic incurs data charges if not localized
Reachability Analyzer proves the data plane	More moving parts than peering; steeper learning curve

The advantages dominate past roughly a dozen VPCs or any multi-account compliance requirement; below that, peering is simpler and cheaper. The disadvantages are mostly disciplines rather than blockers: the per-GB charge is controlled by keeping east-west off the firewall and bulk AWS-service traffic on gateway endpoints; the association/propagation confusion is solved by a written matrix and Reachability Analyzer checks in CI; the shared-egress blast radius is solved by per-AZ firewall endpoints and NAT. The one genuinely permanent risk is CIDR overlap — which is exactly why Step 1 comes first.

Hands-on lab

A minimal, single-account proof you can run to see segmentation work without forty accounts. You build a TGW, two spoke VPCs, two route-table domains, and prove that one spoke can reach a shared VPC while the two spokes cannot reach each other. Free-tier friendly except for TGW attachment-hours and a couple of t3.micro instances; tear down at the end.

Set variables and create the TGW (default association/propagation disabled):

REGION=eu-west-1
TGW=$(aws ec2 create-transit-gateway --region $REGION \
  --options DefaultRouteTableAssociation=disable,DefaultRouteTablePropagation=disable \
  --query 'TransitGateway.TransitGatewayId' --output text)
echo "TGW=$TGW"

Create three VPCs — spoke-a (10.16.0.0/20), spoke-b (10.32.0.0/20), shared (10.48.0.0/20) — each with one subnet and a /28 TGW-attachment subnet. (Use the console or a short Terraform module; the key is disjoint CIDRs.)
Attach each VPC to the TGW:

ATT_A=$(aws ec2 create-transit-gateway-vpc-attachment --transit-gateway-id $TGW \
  --vpc-id $VPC_A --subnet-ids $TGW_SUBNET_A \
  --query 'TransitGatewayVpcAttachment.TransitGatewayAttachmentId' --output text)
# repeat for ATT_B, ATT_SHARED

Create two route-table domains and associate the spokes:

RT_SPOKE=$(aws ec2 create-transit-gateway-route-table --transit-gateway-id $TGW \
  --query 'TransitGatewayRouteTable.TransitGatewayRouteTableId' --output text)
RT_SHARED=$(aws ec2 create-transit-gateway-route-table --transit-gateway-id $TGW \
  --query 'TransitGatewayRouteTable.TransitGatewayRouteTableId' --output text)

aws ec2 associate-transit-gateway-route-table --transit-gateway-route-table-id $RT_SPOKE --transit-gateway-attachment-id $ATT_A
aws ec2 associate-transit-gateway-route-table --transit-gateway-route-table-id $RT_SPOKE --transit-gateway-attachment-id $ATT_B
aws ec2 associate-transit-gateway-route-table --transit-gateway-route-table-id $RT_SHARED --transit-gateway-attachment-id $ATT_SHARED

Propagate so spokes↔shared work but spokes are isolated from each other — propagate shared into the spoke table, and each spoke into the shared table; never propagate spoke-a into spoke-b’s domain (they share one table, so add explicit blackholes if you want hard isolation within a shared table, or use separate tables per spoke):

aws ec2 enable-transit-gateway-route-table-propagation --transit-gateway-route-table-id $RT_SPOKE  --transit-gateway-attachment-id $ATT_SHARED
aws ec2 enable-transit-gateway-route-table-propagation --transit-gateway-route-table-id $RT_SHARED --transit-gateway-attachment-id $ATT_A
aws ec2 enable-transit-gateway-route-table-propagation --transit-gateway-route-table-id $RT_SHARED --transit-gateway-attachment-id $ATT_B
# Blackhole spoke-a -> spoke-b to prove isolation inside the shared spoke table
aws ec2 create-transit-gateway-route --transit-gateway-route-table-id $RT_SPOKE \
  --destination-cidr-block 10.32.0.0/20 --blackhole

Add VPC route-table entries in each VPC sending the other CIDRs to the TGW attachment, and launch a t3.micro in spoke-a and shared.
Prove it. From the spoke-a instance, ping the shared instance (should work) and the spoke-b instance (should fail — blackholed). Confirm with a route search:

# Should return a blackhole entry for spoke-b from the spoke table
aws ec2 search-transit-gateway-routes --transit-gateway-route-table-id $RT_SPOKE \
  --filters Name=type,Values=static Name=state,Values=blackhole

Tear down to stop attachment-hour charges:

aws ec2 delete-transit-gateway-vpc-attachment --transit-gateway-attachment-id $ATT_A
# delete ATT_B, ATT_SHARED, the route tables, the TGW, then the VPCs and instances

Expected result: spoke-a → shared succeeds; spoke-a → spoke-b fails because the route is a blackhole — segmentation demonstrated with the absence (or explicit drop) of a route, exactly as production isolation works.

Common mistakes & troubleshooting

The hub fails in a small number of characteristic ways. This is the playbook — match the symptom, run the confirm command, apply the fix. Keep it open during an incident.

#	Symptom	Root cause	Confirm (exact command / path)	Fix
1	A whole spoke is unreachable	Attachment has no ENI in that AZ	`aws ec2 describe-transit-gateway-vpc-attachments` → check `SubnetIds` per AZ	Add a `/28` attachment subnet in every workload AZ
2	Traffic to one AZ blackholes, others fine	Missing attachment subnet in that AZ	VPC route table points at TGW but no ENI there	Attach in the missing AZ
3	Prod can reach non-prod (security finding)	Prod or non-prod wrongly propagated into the other table	`aws ec2 search-transit-gateway-routes --transit-gateway-route-table-id <prod> --filters Name=route-search.subnet-of-match,Values=10.32.0.0/12` returns a route	Remove the propagation; verify with Reachability Analyzer
4	Prod can’t reach shared services	Shared not propagated into prod (or prod not into shared)	Route search for the shared CIDR in the prod table returns nothing	Enable propagation both directions
5	Spoke has no internet	No static `0.0.0.0/0` → egress attachment in the spoke’s domain	Search the spoke table for a default route	Add static `0.0.0.0/0` → egress attachment
6	Egress works outbound, replies never return	Spokes not propagated into the egress route table	Egress table route search for the spoke CIDR is empty	Propagate every spoke into the egress domain
7	Intermittent drops under load through egress	Asymmetric flow — return via a different AZ’s firewall endpoint	Firewall flow logs show one-sided flows; compare AZ of in/out routes	Make per-AZ route tables symmetric; consider appliance mode
8	Egress traffic not being inspected	Firewall policy defaults to allow, or route bypasses the FW endpoint	Inspect `stateful_default_actions`; check FW-subnet 0/0 next hop	Set `aws:drop_established`; route 0/0 via FW endpoint
9	New VPC can’t attach to the TGW	RAM share not reaching the account, or not accepted	`aws ram get-resource-shares` / `get-resource-share-associations`	Share to the org/OU; enable `enable-sharing-with-aws-organization`
10	Overlapping CIDR, route won’t install	Two attachments advertise the same prefix	`search-transit-gateway-routes` shows the prefix from another attachment	Renumber one VPC (new IPAM CIDR + migrate) or use PrivateLink
11	On-prem reaches an env it shouldn’t	Over-propagation into the hybrid route table	Inspect the hybrid table’s propagated routes	Restrict propagation; own route table per hybrid attach
12	DX advertises but on-prem missing routes	DXGW allowed-prefixes limit hit, or not summarized	`aws directconnect describe-direct-connect-gateways` + allowed prefixes	Advertise summarized /12s, raise/trim allowed prefixes
13	DNS for on-prem names fails intermittently	Resolver SG missing TCP/53, or on-prem DNS down	`describe-security-group-rules` on the endpoint SG; `dig +tcp`	Allow TCP 53 alongside UDP 53; target 2 on-prem IPs
14	Network bill spiked	Bulk S3/DDB hairpinning through TGW+FW+NAT	TGW Flow Logs in Athena by `bytes`/destination	Add gateway VPC endpoints in spokes; SCP-enforce
15	“It should work but doesn’t”	Console route present, data plane blocked (NACL/SG/AZ)	Reachability Analyzer path source→dest	Fix the actual blocking hop the analyzer names

The single most important confirm command, because a green console can still mean a blocked path:

# Authoritative data-plane proof: is this path open across the whole TGW?
PATH_ID=$(aws ec2 create-network-insights-path \
  --source $SRC_ENI --destination $DST_ENI --protocol tcp --destination-port 443 \
  --query 'NetworkInsightsPath.NetworkInsightsPathId' --output text)
aws ec2 start-network-insights-analysis --network-insights-path-id $PATH_ID
aws ec2 describe-network-insights-analyses --network-insights-path-id $PATH_ID \
  --query 'NetworkInsightsAnalyses[0].{reachable:NetworkPathFound, blocker:Explanations[0].ExplanationCode}'

A quick decision table for the “can’t reach X” class of tickets:

If a route lookup returns…	It’s probably…	Do this
Nothing for the destination CIDR	Missing propagation	Propagate the target into this table
A `blackhole` route	Intentional isolation or a stale static	Confirm intent; remove if it’s a bug
A route, but ping still fails	NACL/SG/AZ-ENI/firewall block	Run Reachability Analyzer; fix the named hop
The CIDR from two attachments	Overlap	Renumber one; you cannot route both
The default 0/0 but no internet	Egress return path missing	Propagate spokes into the egress table
A route present but in `blackhole` state after a delete	Stale static route	Delete the orphaned static; re-add if needed
The right route but DNS fails	Resolver SG/rule issue, not routing	Check resolver SG (TCP/UDP 53) and FORWARD rule

Best practices

Allocate every CIDR from IPAM. No manual blocks anywhere; enable org-wide resource discovery so IPAM can see overlaps across accounts. Reserve disjoint super-blocks per environment and a separate block for on-prem.
Create the TGW with default association and propagation disabled. Segmentation is impossible if everything auto-associates to one table. Make every association and propagation an explicit, reviewed decision.
Keep prod and non-prod route tables with no mutual propagation. Isolation is the absence of a route. Prove it in CI with a Reachability Analyzer check that asserts prod→non-prod is not reachable.
Give the TGW its own tiny attachment subnets (a /28 per AZ) separate from workload subnets, and attach in every AZ you run workloads in. A missing AZ ENI blackholes that AZ.
Default the firewall policy to drop with STRICT_ORDER and an explicit allowlist. A default-allow inspection layer inspects nothing useful. Keep flows AZ-symmetric so the stateful engine sees both directions.
Never hairpin bulk AWS-service traffic through the firewall. Add gateway VPC endpoints for S3 and DynamoDB in every spoke (template it into the spoke module), and SCP-enforce the private path.
Drop east-west at the TGW, not the firewall. Isolation via missing routes is free; isolation via firewall rules is billed per-GB. Reserve the firewall for traffic that genuinely leaves the estate.
Terminate DX/VPN on the TGW in a dedicated hybrid route table. Control precisely which environments on-prem can reach, and advertise summarized prefixes (your /12s) over BGP, not hundreds of /20s.
Enable TGW Flow Logs and Network Firewall logging from day one. Centralize in S3, query with Athena. They turn a multi-hour “can’t reach the database” into a five-minute lookup.
Guardrail egress with SCPs. Deny creating internet gateways and NAT gateways in spoke accounts so egress cannot bypass the hub. The central hub plus RAM plus SCPs is what makes a forty-team network safe.
Keep cross-AZ traffic minimal. Attach and route within-AZ; cross-AZ data is charged. Symmetric per-AZ routing serves both cost and firewall correctness.
Plan for the regional boundary. One TGW per region, joined with inter-region peering; keep CIDRs disjoint per region so peering routes summarize cleanly.

Security notes

The network is a security control surface, and the hub concentrates several of them. Least-privilege applies to routing as much as to IAM: a domain should learn only the routes it needs, and isolation should be the default (no propagation) rather than an exception.

Control	What to do	Why
Route-based isolation	No mutual propagation between trust domains	Compromised non-prod cannot address prod
Egress inspection	Network Firewall default-drop + FQDN allowlist	Stop data exfil and command-and-control
Egress IP allow-listing	One small EIP set from the egress VPC	Partners allow-list a handful of IPs, not forty
IGW/NAT guardrail	SCP deny on IGW/NAT in spokes	Egress cannot bypass the inspected path
DNS exfil detection	Route 53 Resolver query logging + DNS Firewall	DNS tunneling is invisible without logs
Flow visibility	TGW + VPC + FW flow logs to S3	Forensics and anomaly detection
RAM scope	Share to the org/OU, never external principals	Don’t leak the hub outside the org
Hybrid prefix control	DXGW allowed-prefixes + own route table	On-prem reaches only cleared environments
Encryption in transit	IPsec on VPN; MACsec/IPsec over DX	DX is not encrypted by default
Endpoint policies	IAM + `aws:sourceVpce` conditions on S3/DDB	Bind data access to approved endpoints
TGW attachment ownership	Spoke owns its attachment; network acct owns routing	Least-privilege; no cross-account route edits
Resolver query logging scope	Log all VPCs, ship to S3 + alarm on anomalies	Detect DNS tunneling and exfil early

The identity-and-encryption layer pairs with Organizations SCPs, guardrails & delegated admin for the guardrails and AWS Network Firewall: centralized egress inspection for the inspection rules; the DNS-exfil controls live in Route 53 Resolver: DNS Firewall, endpoints, rules, hybrid resolution.

Cost & sizing

The hub’s bill has three movable drivers: TGW attachment-hours, TGW data-processing per-GB, and Network Firewall (endpoint-hours plus per-GB). NAT and cross-AZ data ride alongside. Figures are indicative (eu-west-1, USD; INR at ~₹84/USD) — confirm against the current price list.

Cost driver	Rough unit price	Scales with	Lever to reduce
TGW attachment-hour	~$0.05 / attachment-hour	Number of attachments	Consolidate VPCs; don’t over-attach
TGW data processing	~$0.02 / GB	Traffic crossing the TGW	Keep east-west off; gateway endpoints for S3/DDB
Network Firewall endpoint	~$0.395 / endpoint-hour	Endpoints (per AZ)	Right-size AZ count
Network Firewall data	~$0.065 / GB	Inspected GB	Don’t hairpin bulk traffic
NAT gateway	~$0.045 / hour + ~$0.045 / GB	Egress volume	Gateway endpoints bypass NAT for S3/DDB
Cross-AZ data	~$0.01 / GB each way	Cross-AZ traffic	Attach and route within-AZ
Reachability Analyzer	~$0.10 / analysis	Ad-hoc checks	Negligible; use freely

A worked monthly estimate for a 30-account hub with three egress AZs, modest egress, and bulk S3 moved off the hub:

Line item	Quantity	Monthly USD	Monthly INR (~₹84)
TGW attachments (35 × 730h)	35 attach	~$1,278	~₹1.07 L
TGW data processing	~20 TB	~$400	~₹33.6 K
Network Firewall endpoints (3 AZ)	3 × 730h	~$865	~₹72.7 K
Network Firewall data	~8 TB	~$520	~₹43.7 K
NAT gateways (3 AZ)	3 × 730h + data	~$200	~₹16.8 K
Cross-AZ data	minimized	~$80	~₹6.7 K
Approx total	—	~$3,343/mo	~₹2.81 L/mo

Sizing rules of thumb, and the free-tier reality:

Question	Guidance
How many AZs for the egress VPC?	Match workload AZs (usually 2–3); each adds a firewall + NAT endpoint cost
When is centralized egress worth it?	Past ~10 spokes, or any inspection/compliance requirement
What dominates the bill?	Attachment-hours + firewall per-GB; control the latter by routing discipline
Free tier?	None for TGW/Network Firewall; the lab incurs attachment-hours — tear down
Biggest accidental cost?	Bulk AWS-service traffic on the public path (fix with gateway endpoints)

Interview & exam questions

Q1. Why does VPC peering not scale to a large multi-account estate? Peering is 1:1 and non-transitive: N VPCs need N(N-1)/2 peerings, and spoke A cannot reach spoke C through B. A Transit Gateway gives transitive, policy-controlled reachability with one attachment per VPC. (SAP-C02, ANS-C01)

Q2. Explain association vs propagation on a TGW route table. Association sets which route table an attachment uses for its own outbound decisions (exactly one). Propagation sets which route tables learn an attachment’s VPC CIDRs. Isolation between two domains is achieved by never propagating one into the other’s table. (ANS-C01)

Q3. How do you isolate prod from non-prod on a shared TGW without a firewall? Put each in its own route-table domain and never propagate one into the other’s table. Isolation is the absence of a route — no deny rule needed, and no per-GB inspection cost. (SAP-C02)

Q4. Why must centralized-egress flows be AZ-symmetric? Network Firewall endpoints are AZ-local and the stateful engine must see both directions of a flow. If the return path uses a different AZ’s endpoint, the engine never saw the forward direction and drops the return. Keep per-AZ route tables symmetric. (ANS-C01)

Q5. A spoke can send traffic out to the internet but replies never come back. What’s wrong? The egress VPC’s TGW route table is missing routes back to the spoke. Propagate every spoke into the egress domain, and ensure the firewall subnet route table sends spoke summaries back to the TGW. (ANS-C01)

Q6. How do you keep S3 traffic from inflating the TGW/firewall bill? Use a gateway VPC endpoint for S3 (and DynamoDB) in each spoke. It is free, adds a prefix-list route that wins by longest-prefix match, and bypasses the TGW, firewall, and NAT entirely. Enforce with an SCP keyed on aws:sourceVpce. (SAP-C02)

Q7. Why disable default route-table association and propagation when creating the TGW? The defaults auto-associate every attachment to one table and propagate everywhere, producing any-to-any reachability. Disabling them forces explicit, reviewable routing decisions, which segmentation depends on. (ANS-C01)

Q8. How should you advertise routes from AWS to on-prem over Direct Connect? Terminate DX on the TGW via a Direct Connect Gateway, put the attachment in its own route table, and advertise summarized super-blocks (your /12s) rather than hundreds of /20s — the DXGW has an allowed-prefixes limit. (ANS-C01)

Q9. What is the authoritative way to prove a path is open across the TGW? Reachability Analyzer (create-network-insights-path / start-network-insights-analysis). It evaluates the full data plane — routes, NACLs, security groups, AZ ENIs — not just whether a route exists in the console. (ANS-C01, SAP-C02)

Q10. When would you choose PrivateLink over a TGW? When you need to expose a single service across a trust boundary without granting network-layer reachability — especially with overlapping CIDRs, since PrivateLink does no IP routing. (SAP-C02)

Q11. How do you stop spoke accounts from bypassing the central egress? An SCP that denies creating internet gateways and NAT gateways in spoke accounts, so the only path to the internet is the spoke’s default route to the TGW and the central egress VPC. (SAP-C02)

Q12. What’s the regional scope of a TGW and how do you go global? A TGW is regional. For a global estate, deploy one TGW per region and join them with inter-region TGW peering attachments (static routes only), keeping CIDRs disjoint per region for clean summarization. (ANS-C01)

Quick check

You associate a spoke attachment to the prod route table and propagate it into the shared table. Which direction of reachability does the propagation enable?
Prod and non-prod must never reach each other but both must reach shared services. What propagation rule keeps them isolated?
Egress works but return traffic is dropped under load only. What is the most likely cause?
A spoke’s CIDR is 10.16.0.0/20 and another account provisioned 10.16.0.0/16. Can the TGW route both? Why or why not?
Which single tool proves whether a path across the TGW is actually open, beyond what the route table shows?

Answers

Propagating the spoke into the shared table advertises the spoke’s CIDR into the shared domain, so shared services can reach the spoke. To let the spoke reach shared services, you must also propagate shared into the prod table.
Keep prod and non-prod in separate route-table domains and never propagate one into the other’s table; propagate shared into both. Isolation is the absence of a route.
Asymmetric flow — the return path is going through a different AZ’s Network Firewall endpoint than the forward path, so the stateful engine drops it. Make per-AZ route tables symmetric (or enable appliance mode).
No. The TGW is a longest-prefix-match router and the two prefixes overlap; it cannot route both. One VPC must be renumbered (new IPAM allocation + migration) or exposed via PrivateLink instead.
Reachability Analyzer (create-network-insights-path / start-network-insights-analysis) — it evaluates routes, NACLs, security groups, and AZ ENIs end to end, unlike a console route lookup.

Glossary

Transit Gateway (TGW): A regional, managed, horizontally-scaled cloud router that VPCs and on-prem links attach to once for transitive, policy-controlled connectivity.
Attachment: An ENI-backed connection between a TGW and a VPC, VPN, Direct Connect Gateway, or peer TGW; a VPC attachment places one ENI per chosen AZ.
TGW route table: A routing domain on the TGW; controlling associations and propagations into it is the segmentation mechanism.
Association: Which TGW route table an attachment consults for its own outbound decisions; each attachment associates to exactly one.
Propagation: Which TGW route tables learn an attachment’s VPC CIDRs; the lever that grants (or, by omission, denies) reachability.
AWS IPAM: IP Address Manager — a hierarchical pool service that is the source of truth for CIDR allocation and enforces non-overlap across accounts.
RAM (Resource Access Manager): Shares AWS resources (like a TGW or a resolver rule) across accounts/OUs/org without per-account invitations.
Egress VPC: A central VPC in the network account holding NAT gateways and Network Firewall, through which all spoke internet traffic is routed and inspected.
AWS Network Firewall: A managed stateful/stateless inspection service with AZ-local endpoints, billed per endpoint-hour plus per-GB processed.
Blackhole route: A TGW route that explicitly drops matching traffic; used for intentional isolation or seen as a symptom of a bug.
Direct Connect Gateway (DXGW): A global object that associates a Direct Connect Transit VIF with one or more TGWs; has an allowed-prefixes limit.
Gateway VPC endpoint: A free, route-based endpoint for S3 and DynamoDB that keeps that traffic inside AWS, off the TGW, firewall, and NAT.
Reachability Analyzer: A service that traces a source→destination path across the data plane (routes, NACLs, SGs, ENIs) to prove whether it is open.
Appliance mode: An attachment setting that pins a flow to one AZ’s appliance/endpoint so stateful inspection sees both directions symmetrically.
Inter-region peering: A TGW-to-TGW attachment that joins regional hubs into a global topology (static routes only).

Next steps

Lock down the address plan first with CIDR & IPAM management: allocation and BYOIP at scale — the foundation everything else depends on.
Engineer the inspection layer with AWS Network Firewall: centralized egress inspection and its rule-writing companion Suricata egress inspection rule engineering.
Make DNS consistent across the hub with Route 53 Resolver: DNS Firewall, endpoints, rules, hybrid resolution.
Add resilient hybrid connectivity with Direct Connect + Transit Gateway: resilient hybrid.
Prove and guardrail the design with Network Reachability Analyzer & Access Analyzer connectivity validation and Organizations SCPs, guardrails & delegated admin.