Amazon VPC, In Depth: Subnets, Route Tables, IGW, NAT, Endpoints & Every Component

Every workload you run in AWS that touches the network — an EC2 instance, an RDS database, a Lambda function reaching a private API, a load balancer — lives inside a Virtual Private Cloud (VPC): your own logically isolated, software-defined slice of the AWS network where you choose the IP address range, carve it into subnets across Availability Zones, and decide exactly what can reach the internet, what stays private, and how packets are routed. Get the design right and everything downstream just works. Get it wrong and you feel it for years: you run out of addresses mid-migration, you cannot peer two VPCs because their ranges overlap, traffic that should stay private egresses through a NAT gateway you are paying for by the gigabyte, or a “private” instance silently has a route to the internet.

This is the exhaustive lesson. We go component by component — the VPC CIDR and how to add IPv6 and secondary ranges, every field on a subnet and the five IP addresses AWS reserves in each one, route tables and the immovable local route, the Internet Gateway, the long-running argument of NAT Gateway versus a self-managed NAT instance, DHCP option sets, the two DNS attributes that break name resolution when they are off, the difference between gateway and interface (PrivateLink) endpoints, where peering ends and Transit Gateway begins, and VPC Flow Logs — until you can whiteboard a production VPC from memory and answer the follow-up questions a Solutions Architect interview or the SAA-C03 and ANS-C01 exams will throw at you. It is beginner-accessible — every term is defined as it appears — but complete: read it once and you know the service end to end.

Learning objectives

By the end of this lesson you will be able to:

Plan a VPC CIDR block with room to grow, add secondary IPv4 ranges, and enable IPv6, understanding what you can and cannot change after creation.
Size and place subnets across Availability Zones, classify them as public or private by their routing, and account for the five reserved IP addresses AWS takes in every subnet.
Build route tables correctly — the main vs custom distinction, the unremovable local route, the 0.0.0.0/0 default route, and route priority (longest-prefix match).
Attach an Internet Gateway and reason about what actually makes a subnet public.
Choose between a NAT Gateway and a NAT instance for outbound-only internet access, and size both for cost and throughput.
Configure DHCP option sets and the enableDnsSupport / enableDnsHostnames attributes, and explain what each one controls.
Decide between gateway endpoints (S3, DynamoDB) and interface endpoints / PrivateLink, and know when each saves money or is required.
Connect VPCs with peering and know where Transit Gateway takes over, and turn on VPC Flow Logs for visibility.

Prerequisites & where this fits

You need an AWS account and the basics of regions, Availability Zones, and the CLI/console from the earlier Fundamentals lessons, plus a working idea of what an IP address and a subnet are. No deep networking background is assumed — CIDR, routing, and NAT are all explained from first principles. This is the opening Networking deep-dive of the AWS Zero-to-Hero course and the foundation that every later networking lesson builds on. The very next lesson, AWS Security Groups vs Network ACLs, In Depth, covers the filtering layer that sits on top of the routing layer you design here; this lesson deliberately stays on addressing, routing, and connectivity, and points you there for firewalls. When your address planning outgrows a spreadsheet, Amazon VPC IPAM: Hierarchical CIDR Planning, Allocation, and BYOIP at Scale automates it; when one VPC becomes dozens, Designing Multi-Account VPC Connectivity with Transit Gateway replaces the peering mesh.

Core concepts

A VPC (Virtual Private Cloud) is a regional resource — it spans every Availability Zone in one AWS Region but cannot cross Regions — that defines a private IPv4 address range (and optionally IPv6) which is yours alone. Inside it you build a network using a small set of primitives that fit together predictably. Anchor everything that follows on these mental models:

The VPC is the building; subnets are the floors. You give the building an address space (e.g. 10.0.0.0/16) and partition it into subnets (10.0.1.0/24, 10.0.2.0/24, …). Every network interface — and therefore every instance, database, or load balancer node — attaches to a subnet, never to the VPC directly.
A subnet lives in exactly one Availability Zone. This is the single most important fact for designing high availability: to survive an AZ failure you need at least two subnets in two different AZs, and you place a copy of your workload in each.
Routing decides “public” vs “private”, not the subnet itself. A subnet is “public” only because its route table sends 0.0.0.0/0 to an Internet Gateway. There is no checkbox called “public”; it is a property of the routes.
Everything inside is reachable by default; the edges are controlled. Every subnet in a VPC can reach every other subnet via the built-in local route. What crosses the edge — to the internet, to another VPC, to on-premises — is what you explicitly enable with gateways and routes.
Filtering is a separate layer. Routing gets a packet to a destination; security groups (stateful, on the network interface) and network ACLs (stateless, on the subnet) decide whether it is allowed. They are covered in the next lesson — keep them mentally separate from routing.

Key terms you will see throughout: CIDR (Classless Inter-Domain Routing — the /16, /24 notation that defines how many addresses a block holds and how the prefix is split between network and host), ENI (Elastic Network Interface — the virtual NIC that everything in a VPC actually attaches to), IGW (Internet Gateway — the VPC’s door to the public internet), NAT (Network Address Translation — letting many private addresses share one public address for outbound traffic), route table (the ordered set of rules that decides where a packet goes next), and endpoint (a private on-ramp to an AWS service that keeps traffic off the internet).

Default VPC vs a custom VPC

Every Region in a new account comes with a default VPC so that you can launch an instance immediately without designing a network first. Understanding what makes it “default” tells you what a custom VPC does not give you for free.

Property	Default VPC	Custom VPC (one you create)
CIDR	`172.31.0.0/16`, fixed	You choose
Subnets	One default subnet per AZ, all public	None until you create them
Internet Gateway	Created and attached	You attach it yourself
Route to `0.0.0.0/0`	Present in the main route table → IGW	You add it
Public IP on launch	Auto-assign public IPv4 = on in default subnets	Off by default
DNS hostnames	Enabled	Disabled by default
Good for	Quick demos, getting started	Everything real — explicit control

The convenience of the default VPC is also its danger: every default subnet is public and auto-assigns a public IP, so an instance launched there is internet-reachable the moment a permissive security group is attached. For anything beyond a throwaway test, build a custom VPC where nothing is public unless you deliberately route it that way. You can delete the default VPC, and recreate it later from the console if you ever need it back.

VPC CIDR: primary, secondary, and IPv6

When you create a VPC the one truly load-bearing decision is the primary IPv4 CIDR block. It defines the pool of private addresses every subnet will be carved from, and it cannot be changed or removed for the life of the VPC — you can only add secondary blocks.

Setting	What it is	Choices / limits	Default	When to change / gotcha
Primary IPv4 CIDR	The main private address range	`/16` (65,536 addresses) down to `/28` (16 addresses); use RFC 1918 private ranges	None — required	Permanent. Pick a `/16` for production so subnets have room. Cannot overlap with any network you will peer or connect to on-prem.
Secondary IPv4 CIDRs	Extra ranges added later when you run out	Up to 5 by default (raise to ~50 via quota); must not overlap existing blocks or reserved AWS ranges	None	Add when subnets fill up. Cannot fall inside an existing block; choose from the same private range family to keep routing sane.
IPv6 CIDR	An optional `/56` block	Amazon-provided (you get a `/56`, subnets are `/64`) or your own (BYOIP)	Off	Enable for IPv6 workloads or to use egress-only internet gateways. IPv6 addresses are public and globally routable — there is no “private IPv6” in the RFC 1918 sense.

Two rules save most teams from pain. First, size for the whole estate, not today’s app — a /16 costs nothing extra over a /28 (you pay for traffic and resources, never for address space), and running out of contiguous space later forces ugly secondary-CIDR workarounds. Second, never reuse the same CIDR across VPCs you might connect. If vpc-a and vpc-b are both 10.0.0.0/16, you can never peer them or attach them to the same Transit Gateway — overlapping ranges have no unambiguous route. Allocate a unique block per VPC up front; when this becomes hard to track by hand, that is exactly the problem VPC IPAM solves.

# Create a custom VPC with a /16 primary block
aws ec2 create-vpc \
  --cidr-block 10.0.0.0/16 \
  --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=vpc-lab}]'

# Add a secondary IPv4 block later
aws ec2 associate-vpc-cidr-block --vpc-id vpc-0abc... --cidr-block 10.1.0.0/16

# Add an Amazon-provided IPv6 /56
aws ec2 associate-vpc-cidr-block --vpc-id vpc-0abc... --amazon-provided-ipv6-cidr-block

Subnets: public vs private, AZ placement, sizing, and reserved IPs

A subnet is a sub-range of the VPC CIDR that lives in exactly one Availability Zone. Resources attach to subnets, and the subnet’s route table determines whether it is public or private.

Setting	What it is	Choices	Default	When / trade-off / gotcha
VPC	The parent network	Any VPC in the Region	—	The subnet’s CIDR must fall inside the VPC’s CIDR.
Availability Zone	Physical location of the subnet	Any AZ in the Region	AWS picks if unspecified	Pin it explicitly and spread workloads across ≥2 AZs for HA. A subnet cannot span or move AZs.
IPv4 CIDR block	The subnet’s address range	`/16` to `/28` within the VPC	Required	`/24` (256 addresses) is a comfortable default. Smaller than `/28` is not allowed because of reserved IPs.
IPv6 CIDR	Optional `/64` from the VPC’s `/56`	One `/64` per subnet	None	Required if the subnet hosts IPv6 resources.
Auto-assign public IPv4	Give launched instances a public IP automatically	On / Off	Off	Turning this on is what people mean by a “public subnet” in practice — but it only matters alongside a route to an IGW.
Auto-assign IPv6	Auto-assign an IPv6 address on launch	On / Off	Off	Enable for IPv6 subnets.

Public vs private is purely about routing. A subnet is public when its route table has a 0.0.0.0/0 route pointing at an Internet Gateway (and, in practice, auto-assign public IP is on or instances carry Elastic IPs). It is private when it has no such route — instances reach the internet only outbound via a NAT gateway, or not at all. A common third tier is an isolated subnet with no internet route in either direction (for databases), reachable only inside the VPC and via endpoints.

The five reserved IP addresses

AWS reserves the first four and the last IP address in every subnet, so a /24 (256 addresses) gives you 251 usable, not 256. Memorise this — it is a classic exam question and it bites IP planning.

Address (in `10.0.1.0/24`)	Reserved for
`10.0.1.0`	Network address
`10.0.1.1`	VPC router (the implied default gateway)
`10.0.1.2`	Amazon-provided DNS (the “`.2` resolver” — VPC base `+2`)
`10.0.1.3`	Reserved for future use
`10.0.1.255`	Network broadcast (broadcast is not supported, but the address is still reserved)

Because five addresses always disappear, the smallest permitted subnet is a /28 (16 addresses → 11 usable). The .2 resolver in particular matters later: it is the address the Amazon DNS server answers on, and several DNS features depend on it.

# Two subnets in two different AZs (HA), with auto-assign public IP on for the first
aws ec2 create-subnet --vpc-id vpc-0abc... --cidr-block 10.0.1.0/24 \
  --availability-zone ap-south-1a \
  --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=public-1a}]'
aws ec2 modify-subnet-attribute --subnet-id subnet-0pub... --map-public-ip-on-launch

aws ec2 create-subnet --vpc-id vpc-0abc... --cidr-block 10.0.11.0/24 \
  --availability-zone ap-south-1b \
  --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=private-1b}]'

Route tables: main vs custom, the local route, and priority

A route table is an ordered set of rules — destination CIDR → target — that the VPC router consults for every packet leaving a network interface. Each subnet is associated with exactly one route table at a time; if you do not associate one explicitly, the subnet uses the VPC’s main route table.

Concept	What it is	Detail / gotcha
Main route table	The default table every new subnet implicitly uses	One per VPC; you can edit it, but the safer pattern is to leave it minimal (private) and attach custom tables to public subnets.
Custom route table	A table you create and explicitly associate with subnets	The recommended way to define “public” vs “private” — one custom table per tier.
`local` route	An automatic route for the entire VPC CIDR with target `local`	Always present, cannot be deleted or edited. It is why every subnet can reach every other subnet with zero configuration.
`0.0.0.0/0` (IPv4) / `::/0` (IPv6)	The “default route” — everything not matched elsewhere	Point it at an IGW (public), a NAT gateway (private outbound), a Transit Gateway, a peering connection, or an egress-only IGW (IPv6).
Subnet associations	Which subnets use this table	A subnet has one table; a table can serve many subnets.
Route propagation	Auto-learn routes from a VPN/Direct Connect gateway via BGP	Toggle per route table; avoids hand-maintaining on-prem prefixes.

Route priority is longest-prefix match. When several routes could apply, the VPC router picks the most specific (longest prefix) one. A packet to 10.0.5.7 matches 10.0.0.0/16 → local over 0.0.0.0/0 → nat because /16 is more specific than /0. The local route therefore always wins for in-VPC traffic, which is precisely why you cannot accidentally route internal traffic out to the internet. Static routes beat propagated (BGP-learned) routes of the same prefix.

# A public route table: default route to the Internet Gateway, associated to the public subnet
aws ec2 create-route-table --vpc-id vpc-0abc... \
  --tag-specifications 'ResourceType=route-table,Tags=[{Key=Name,Value=rtb-public}]'
aws ec2 create-route --route-table-id rtb-0pub... \
  --destination-cidr-block 0.0.0.0/0 --gateway-id igw-0xyz...
aws ec2 associate-route-table --route-table-id rtb-0pub... --subnet-id subnet-0pub...

Internet Gateway: the door to the public internet

An Internet Gateway (IGW) is a horizontally scaled, redundant, highly available VPC component that allows communication between instances in your VPC and the internet. It does two jobs: it provides a target in your route tables for internet-bound traffic, and it performs one-to-one NAT between an instance’s private IPv4 address and its public IPv4 address (or Elastic IP).

Three conditions must all be true for an instance to be reachable from the internet over IPv4 — miss any one and connectivity silently fails, which is the most common “why can’t I reach my instance” support question:

An IGW is attached to the VPC.
The subnet’s route table has 0.0.0.0/0 → the IGW.
The instance has a public IPv4 address (auto-assigned, or an Elastic IP) and its security group / network ACL allow the traffic.

Key facts: a VPC can have only one IGW attached at a time; the IGW itself is free (you pay for data transfer and, since 2024, for public IPv4 addresses); and for IPv6 there is a separate egress-only internet gateway that allows outbound IPv6 only (the IPv6 equivalent of a NAT gateway, since IPv6 has no NAT).

aws ec2 create-internet-gateway \
  --tag-specifications 'ResourceType=internet-gateway,Tags=[{Key=Name,Value=igw-lab}]'
aws ec2 attach-internet-gateway --internet-gateway-id igw-0xyz... --vpc-id vpc-0abc...

NAT Gateway vs NAT instance: outbound-only internet

Instances in a private subnet often still need outbound internet — to download patches, call a third-party API, or pull a container image — without being reachable inbound. That is Network Address Translation (NAT): many private addresses share one public address for outbound flows, and return traffic for those flows is allowed back, but nothing can initiate a connection to the private instances from outside.

You route the private subnet’s 0.0.0.0/0 to the NAT, and the NAT itself sits in a public subnet (it needs the IGW to reach the internet). There are two ways to provide NAT:

Dimension	NAT Gateway (managed)	NAT instance (self-managed EC2)
What it is	A fully managed AWS service	An EC2 instance running NAT software
Availability	Highly available within one AZ; deploy one per AZ for zone resilience	Single instance = single point of failure; you build HA yourself
Throughput	Scales automatically 5 → 100 Gbps	Bounded by the instance type’s network/CPU
Management	Zero — no patching, no sizing	You patch, monitor, and size it
Source/dest check	N/A	Must disable source/destination check or it will not forward
Security groups	Cannot attach an SG (control via NACL / the private route)	Has a security group like any instance
Port forwarding / bastion	Not possible	Possible (it is a normal instance)
Cost	Hourly charge + per-GB data processing	Just the EC2 instance (often a small/free-tier type)
Use it when	Almost always — the default	Cost-sensitive dev/test, or you need features only an instance gives

Default to the NAT Gateway — it is the managed, scalable, low-effort choice and the right answer in virtually every exam scenario. The two things to know cold: it is zonal, so a truly resilient design places one NAT gateway in each AZ and points each AZ’s private subnets at the NAT gateway in their own AZ (this also avoids paying cross-AZ data transfer); and its bill has two parts — an hourly rate plus a per-GB data-processing charge — which is exactly why pulling large objects from S3 through a NAT gateway is wasteful when a free gateway endpoint would keep that traffic off the NAT entirely (see the next section).

A NAT instance is the legacy approach. The detail interviewers love is that, because the instance forwards traffic for other hosts, you must disable the source/destination check (aws ec2 modify-instance-attribute --no-source-dest-check) — by default an instance drops packets whose source or destination is not itself.

# Allocate an Elastic IP and create a NAT gateway in the PUBLIC subnet
aws ec2 allocate-address --domain vpc           # returns an AllocationId
aws ec2 create-nat-gateway --subnet-id subnet-0pub... --allocation-id eipalloc-0...

# Point the PRIVATE subnet's default route at the NAT gateway
aws ec2 create-route --route-table-id rtb-0priv... \
  --destination-cidr-block 0.0.0.0/0 --nat-gateway-id nat-0...

DHCP option sets

When an instance boots, it gets its network configuration — DNS servers, domain name, NTP servers — via DHCP, and a DHCP option set is the VPC-level object that defines those values. Every VPC has one associated; the default one points at AmazonProvidedDNS and is fine for most cases.

Option	What it controls	Default	When to change
`domain-name-servers`	Which DNS resolvers instances use	`AmazonProvidedDNS` (the `.2` resolver)	Point at custom resolvers (e.g. on-prem AD DNS, or Route 53 Resolver inbound endpoints) for hybrid name resolution.
`domain-name`	The domain suffix applied to hostnames	Region-specific (e.g. `ap-south-1.compute.internal`)	Set a corporate suffix like `corp.example.com`.
`ntp-servers`	Time servers	Amazon Time Sync (`169.254.169.123`)	Override only if you have a specific NTP requirement.
`netbios-name-servers` / `netbios-node-type`	Legacy Windows NetBIOS	None	Rarely needed; set `node-type=2` for Windows estates that use it.

The important gotchas: you cannot edit an option set in place — you create a new one and associate it with the VPC. After re-associating, existing instances pick up the change only when their DHCP lease renews (or on reboot), so do not expect it to take effect instantly. Replacing AmazonProvidedDNS with custom servers is the usual reason to touch this, and it is how you wire VPC DNS into a hybrid Active Directory environment.

DNS in the VPC: enableDnsSupport and enableDnsHostnames

Two VPC attributes control DNS, and confusing them is a perennial source of “my private endpoint resolves to a public IP” tickets. Both default differently for default vs custom VPCs.

Attribute	What it does	Default (custom VPC)	If turned off
`enableDnsSupport`	Whether the Amazon DNS resolver (the `.2` address) answers queries in the VPC	On	Instances cannot resolve names via the AWS resolver; DNS-based features (including private DNS for endpoints) break.
`enableDnsHostnames`	Whether instances with a public IP get a public DNS hostname auto-assigned	Off	Instances get no public DNS name; private DNS names for interface endpoints will not resolve even if support is on.

The rule to remember: enableDnsSupport must be on for any DNS to work at all, and enableDnsHostnames must also be on for interface (PrivateLink) endpoints’ private DNS names to resolve to the endpoint’s private IP. Both default to on in the default VPC and (in modern accounts) enableDnsSupport is on but enableDnsHostnames is off in custom VPCs — so when you adopt PrivateLink and find s3.ap-south-1.amazonaws.com still resolving to a public IP, the fix is almost always to turn on enableDnsHostnames.

aws ec2 modify-vpc-attribute --vpc-id vpc-0abc... --enable-dns-support '{"Value":true}'
aws ec2 modify-vpc-attribute --vpc-id vpc-0abc... --enable-dns-hostnames '{"Value":true}'

VPC endpoints: gateway vs interface (PrivateLink)

A VPC endpoint lets resources in your VPC reach supported AWS services (and third-party / your-own services) privately, over the AWS network, without an Internet Gateway, NAT gateway, or public IPs. There are two fundamentally different kinds, and knowing which is which — and when each is even available — is core SAA/ANS material.

Dimension	Gateway endpoint	Interface endpoint (PrivateLink)
Supported services	Only Amazon S3 and DynamoDB	Most AWS services (SSM, EC2 API, ECR, CloudWatch, Secrets Manager, SQS, KMS, …) and partner/your-own services
How it works	A route added to your route table targeting the endpoint (a `prefix list`)	An ENI with a private IP placed in your subnet(s)
What you point at it	Route tables	DNS — queries to the service name resolve to the ENI’s private IP (with private DNS enabled)
Cost	Free (no hourly or data charge)	Hourly per-endpoint, per-AZ charge + per-GB data processing
Cross-Region / on-prem reachable	No (stays in-Region, in-VPC)	Yes — reachable over peering, TGW, VPN, Direct Connect
Access control	Endpoint policy (a resource policy on the endpoint)	Endpoint policy + security group on the ENI
Use it for	Keeping S3/DynamoDB traffic off the NAT gateway — the classic cost win	Private access to every other AWS service API

The decision tree is simple. Is it S3 or DynamoDB? Use a gateway endpoint — it is free and removes that traffic from your NAT bill entirely (a private subnet that only talks to S3 may not need a NAT gateway at all). Anything else? Use an interface endpoint, accepting the hourly cost in exchange for keeping API traffic private. Interface endpoints are built on AWS PrivateLink, the same technology you use to expose your own service privately to other VPCs — covered in depth in AWS PrivateLink for Service Providers and Consumers. Two gotchas: gateway endpoints are Region-local and route-based, so they do not work for on-prem or cross-Region callers (use an interface endpoint there); and interface-endpoint private DNS only works when both enableDnsSupport and enableDnsHostnames are on (see the DNS section above).

# Gateway endpoint for S3 (free) — attach to the private route table
aws ec2 create-vpc-endpoint --vpc-id vpc-0abc... \
  --vpc-endpoint-type Gateway \
  --service-name com.amazonaws.ap-south-1.s3 \
  --route-table-ids rtb-0priv...

# Interface endpoint for SSM (PrivateLink) — ENIs in the private subnets, with private DNS
aws ec2 create-vpc-endpoint --vpc-id vpc-0abc... \
  --vpc-endpoint-type Interface \
  --service-name com.amazonaws.ap-south-1.ssm \
  --subnet-ids subnet-0priv... \
  --security-group-ids sg-0... \
  --private-dns-enabled

Connecting VPCs: peering vs Transit Gateway (in brief)

A single VPC is rarely the whole story — you connect VPCs to each other and to on-premises. Two options dominate, and the line between them is a frequent interview question.

Dimension	VPC peering	Transit Gateway (TGW)
Topology	One-to-one link between two VPCs	Hub-and-spoke; one TGW connects many VPCs (and VPN/Direct Connect)
Transitivity	Non-transitive — if A↔B and B↔C, A still cannot reach C	Transitive — all attached VPCs can route to each other
Scale	Connections explode as `n(n-1)/2` (a full mesh of 10 VPCs = 45 peerings)	Linear — each VPC attaches once
Routing control	Per-VPC route tables	Central TGW route tables; segmentation via multiple route tables
Cost	No hourly fee; pay data transfer	Hourly per-attachment fee + per-GB; more, but far simpler at scale
Cross-Region	Inter-Region peering supported	TGW peering across Regions
Use it when	A handful of VPCs, simple any-to-any	Many VPCs / accounts, central egress, hybrid connectivity

The headline rule: peering is non-transitive and does not scale — it is fine for two or three VPCs, but a growing estate becomes an unmanageable mesh, at which point you move to a Transit Gateway, which is transitive, centrally routed, and the standard for multi-account networking. Peering also requires non-overlapping CIDRs (you cannot peer two 10.0.0.0/16 VPCs) and does not support edge-to-edge routing (you cannot use a peer’s IGW or NAT). The full hub-and-spoke design, segmentation, and centralised egress are covered in Designing Multi-Account VPC Connectivity with Transit Gateway.

VPC Flow Logs: seeing the traffic

You cannot debug — or secure — a network you cannot see. VPC Flow Logs capture metadata about the IP traffic going to and from network interfaces: source and destination IP and port, protocol, packet and byte counts, the action (ACCEPT or REJECT), and more. They do not capture packet contents — this is NetFlow-style metadata, not a packet capture.

Setting	What it is	Choices	Notes
Scope	What the logs cover	VPC, subnet, or a single ENI	VPC-level captures everything beneath it; start there.
Filter	Which traffic to record	All, Accepted, or Rejected	`Rejected` is great for spotting blocked traffic / misconfigured security groups.
Destination	Where logs go	CloudWatch Logs, S3, or Kinesis Data Firehose	S3 is cheapest for archival/analytics (query with Athena); CloudWatch for alerting.
Format	Which fields	Default or custom	Add fields like `vpc-id`, `subnet-id`, `pkt-srcaddr`, `tcp-flags` for richer analysis.
Aggregation interval	How often records are emitted	1 min or 10 min	1-minute is more granular but higher volume.

The single most important caveat for troubleshooting: flow logs show the result of security group and NACL evaluation, not the rules themselves. A REJECT tells you traffic was blocked but not by which layer — that you reason out from the stateful/stateless behaviour you will learn in the next lesson. Turn flow logs on for every production VPC; they are inexpensive (you pay for log storage/ingestion) and indispensable the day something breaks or a security review asks “what talked to what”.

aws ec2 create-flow-logs \
  --resource-type VPC --resource-ids vpc-0abc... \
  --traffic-type ALL \
  --log-destination-type s3 \
  --log-destination arn:aws:s3:::my-flow-logs-bucket/vpc/

The complete picture

The diagram below assembles every component into one production-shaped VPC: a /16 split into public and private subnets across two Availability Zones, an Internet Gateway on the public tier, a NAT Gateway per AZ for private-subnet egress, a free gateway endpoint pulling S3 traffic off the NAT, an interface endpoint for an AWS API, the route tables that tie it together, and flow logs watching it all.

Amazon VPC anatomy

Trace a packet through it and the whole lesson clicks: an instance in a private subnet hits the internet via its route table’s 0.0.0.0/0 → NAT gateway (in its AZ) → IGW; the same instance reaches S3 via the local-then-gateway-endpoint route with no NAT involved; and a packet to a peer subnet matches the local route and never leaves the VPC.

Hands-on lab

You will build a minimal but complete two-tier VPC — one public and one private subnet, an IGW, a NAT gateway, and an S3 gateway endpoint — entirely from the CLI, validate routing, then tear it all down. Everything here is AWS Free Tier eligible except the NAT gateway and the Elastic IP, so follow the Cost note and clean up promptly.

Prerequisites: the AWS CLI v2 configured (aws configure) with a region set (examples use ap-south-1).

Step 1 — Create the VPC and turn on DNS hostnames.

VPC=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
  --query Vpc.VpcId --output text)
aws ec2 modify-vpc-attribute --vpc-id $VPC --enable-dns-hostnames '{"Value":true}'
aws ec2 create-tags --resources $VPC --tags Key=Name,Value=lab-vpc
echo "VPC=$VPC"

Step 2 — Create one public and one private subnet in the same AZ (for simplicity).

PUB=$(aws ec2 create-subnet --vpc-id $VPC --cidr-block 10.0.1.0/24 \
  --availability-zone ap-south-1a --query Subnet.SubnetId --output text)
aws ec2 modify-subnet-attribute --subnet-id $PUB --map-public-ip-on-launch
PRIV=$(aws ec2 create-subnet --vpc-id $VPC --cidr-block 10.0.2.0/24 \
  --availability-zone ap-south-1a --query Subnet.SubnetId --output text)
echo "PUB=$PUB PRIV=$PRIV"

Step 3 — Attach an Internet Gateway and make the public subnet public.

IGW=$(aws ec2 create-internet-gateway --query InternetGateway.InternetGatewayId --output text)
aws ec2 attach-internet-gateway --internet-gateway-id $IGW --vpc-id $VPC
RTPUB=$(aws ec2 create-route-table --vpc-id $VPC --query RouteTable.RouteTableId --output text)
aws ec2 create-route --route-table-id $RTPUB --destination-cidr-block 0.0.0.0/0 --gateway-id $IGW
aws ec2 associate-route-table --route-table-id $RTPUB --subnet-id $PUB

Step 4 — Create a NAT gateway in the public subnet and route the private subnet through it.

EIP=$(aws ec2 allocate-address --domain vpc --query AllocationId --output text)
NAT=$(aws ec2 create-nat-gateway --subnet-id $PUB --allocation-id $EIP \
  --query NatGateway.NatGatewayId --output text)
# Wait until the NAT gateway is available (takes a couple of minutes)
aws ec2 wait nat-gateway-available --nat-gateway-ids $NAT
RTPRIV=$(aws ec2 create-route-table --vpc-id $VPC --query RouteTable.RouteTableId --output text)
aws ec2 create-route --route-table-id $RTPRIV --destination-cidr-block 0.0.0.0/0 --nat-gateway-id $NAT
aws ec2 associate-route-table --route-table-id $RTPRIV --subnet-id $PRIV

Step 5 — Add a free S3 gateway endpoint to the private route table.

aws ec2 create-vpc-endpoint --vpc-id $VPC --vpc-endpoint-type Gateway \
  --service-name com.amazonaws.ap-south-1.s3 --route-table-ids $RTPRIV

Step 6 — Validate. Confirm the routing is exactly what you intended:

# Public route table should show 0.0.0.0/0 -> igw-...
aws ec2 describe-route-tables --route-table-ids $RTPUB \
  --query 'RouteTables[].Routes' --output table
# Private route table should show 0.0.0.0/0 -> nat-... AND an S3 prefix-list -> vpce-...
aws ec2 describe-route-tables --route-table-ids $RTPRIV \
  --query 'RouteTables[].Routes' --output table

Expected: the public table has a local route plus 0.0.0.0/0 → igw-…; the private table has local, 0.0.0.0/0 → nat-…, and a pl-… (S3) → vpce-… route. That last line is the gateway endpoint at work — S3 traffic now bypasses the NAT entirely.

Cleanup — delete in reverse dependency order (endpoints and NAT before the IGW and subnets, or the deletes will fail):

EPID=$(aws ec2 describe-vpc-endpoints --filters Name=vpc-id,Values=$VPC \
  --query 'VpcEndpoints[0].VpcEndpointId' --output text)
aws ec2 delete-vpc-endpoints --vpc-endpoint-ids $EPID
aws ec2 delete-nat-gateway --nat-gateway-id $NAT
aws ec2 wait nat-gateway-deleted --nat-gateway-ids $NAT
aws ec2 release-address --allocation-id $EIP
aws ec2 detach-internet-gateway --internet-gateway-id $IGW --vpc-id $VPC
aws ec2 delete-internet-gateway --internet-gateway-id $IGW
aws ec2 delete-subnet --subnet-id $PUB
aws ec2 delete-subnet --subnet-id $PRIV
aws ec2 delete-route-table --route-table-id $RTPUB
aws ec2 delete-route-table --route-table-id $RTPRIV
aws ec2 delete-vpc --vpc-id $VPC

Cost note: the VPC, subnets, IGW, route tables, and the S3 gateway endpoint are free. The two charged items are the NAT gateway (an hourly rate plus per-GB processing) and the Elastic IP (free while attached to a running resource, charged when idle). Running this lab for under an hour and cleaning up costs a few US cents at most — but do not leave the NAT gateway running, as its hourly charge accrues around the clock.

Common mistakes & troubleshooting

Symptom	Likely cause	Fix
Instance in a “public” subnet has no internet	Missing one of the three conditions (no IGW, no `0.0.0.0/0` route, or no public IP)	Verify IGW attached, route table has `0.0.0.0/0 → igw`, and the instance has a public/Elastic IP.
Private instance cannot reach the internet outbound	No NAT route, or NAT gateway is in a private subnet	Put the NAT gateway in a public subnet and point the private route table’s `0.0.0.0/0` at it.
`Cannot create subnet: CIDR not within VPC`	Subnet CIDR outside the VPC block, or overlaps another subnet	Choose a sub-range that fits inside the VPC CIDR and does not overlap.
Interface endpoint name still resolves to a public IP	`enableDnsHostnames` is off (or private DNS not enabled)	Turn on `enableDnsHostnames` and `enableDnsSupport`; enable private DNS on the endpoint.
Cannot peer two VPCs	Overlapping CIDR ranges	Re-IP one VPC, or use distinct ranges from the start — overlapping blocks cannot be peered.
Self-built NAT instance forwards nothing	Source/destination check still enabled	Run `modify-instance-attribute --no-source-dest-check`.
S3 traffic is inflating the NAT bill	No gateway endpoint; S3 traffic flows through NAT	Add a free S3 gateway endpoint to the private route table.
App “A” cannot reach app “C” through a middle VPC	Peering is non-transitive	Add a direct peering A↔C, or move to a Transit Gateway.

Best practices

Plan CIDR for the whole estate. Allocate a unique, non-overlapping block per VPC, size production VPCs at /16, and leave room for secondary blocks. Use IPAM once you have more than a handful.
Multi-AZ by default. At least two subnets in two AZs per tier; one NAT gateway per AZ so an AZ failure never takes out egress and you avoid cross-AZ data charges.
Three subnet tiers. Public (load balancers, NAT), private-with-egress (app servers), and isolated (databases) — separated by their route tables.
Leave the main route table private. Attach explicit custom route tables to public subnets so nothing becomes public by accident.
Use endpoints aggressively. A free gateway endpoint for S3/DynamoDB and interface endpoints for the AWS APIs your private workloads call — this both saves NAT cost and keeps traffic off the public internet.
Turn on flow logs everywhere (to S3 for cheap archival), and tag every network resource (env, owner, tier) for cost allocation and automation.

Security notes

Private by default. Build custom VPCs where nothing is internet-reachable unless a route deliberately makes it so; reserve public subnets for the few resources that truly need ingress.
Endpoints reduce exposure. Reaching AWS services through interface/gateway endpoints keeps that traffic on the AWS network and off any IGW/NAT, shrinking your attack surface; pair with endpoint policies to restrict which resources can be reached.
Flow logs are an audit and detection tool. REJECT records surface scanning and misconfiguration; ship them to S3 and query with Athena, or to CloudWatch for alarms.
Routing is not a firewall. A route gets a packet to a destination; security groups and network ACLs decide whether it is allowed — design both layers, and read the next lesson for how stateful vs stateless filtering actually behaves.
Mind the IGW NAT. The Internet Gateway’s one-to-one NAT means any instance with a public IP and a permissive security group is directly exposed — audit public IP assignment.
Egress control at scale. For centralised, inspected egress across many VPCs, route through a Transit Gateway to an inspection VPC rather than per-VPC NAT.

Interview & exam questions

What makes a subnet “public”? Its route table has a 0.0.0.0/0 route pointing at an Internet Gateway. (In practice you also enable auto-assign public IP or attach Elastic IPs.) There is no “public” flag on the subnet itself — it is purely routing.
How many usable IPs are in a /24 subnet, and why not 256? 251. AWS reserves the first four addresses (network, VPC router, Amazon DNS, future use) and the last (broadcast) in every subnet.
NAT Gateway vs NAT instance — give three differences. NAT Gateway is managed, auto-scales to 100 Gbps, and is HA within an AZ; the NAT instance is a self-managed EC2 (single point of failure, fixed throughput, you patch it) but can act as a bastion/port-forwarder and needs source/destination check disabled.
Why deploy one NAT gateway per Availability Zone? A NAT gateway is zonal; one per AZ removes the single-AZ dependency and avoids cross-AZ data-transfer charges by keeping each AZ’s egress local. If the AZ holding your only NAT gateway fails, all private-subnet egress fails.
Gateway endpoint vs interface endpoint — when do you use each? Gateway endpoints serve only S3 and DynamoDB, are free, and work via a route-table entry. Interface endpoints (PrivateLink) serve most other services, place an ENI with a private IP in your subnet, cost per hour + per GB, and are reachable cross-Region / on-prem.
Can you change a VPC’s primary CIDR after creation? No. The primary IPv4 CIDR is permanent. You can only add up to four (default) secondary CIDR blocks that do not overlap existing ranges.
An interface endpoint’s DNS name resolves to a public IP — what is wrong? enableDnsHostnames (and enableDnsSupport) must be on, and private DNS must be enabled on the endpoint, for the service name to resolve to the endpoint’s private IP.
Is VPC peering transitive? No. If A↔B and B↔C are peered, A cannot reach C through B. You add a direct A↔C peering or move to a Transit Gateway, which is transitive.
What is the local route and can you remove it? An automatic route for the entire VPC CIDR with target local that lets every subnet reach every other subnet. It cannot be deleted or modified and always wins for in-VPC traffic (longest-prefix match).
You need a private instance to reach the internet outbound but never be reachable inbound — what do you build? A NAT gateway in a public subnet, with the private subnet’s 0.0.0.0/0 route pointing at it. For IPv6, an egress-only internet gateway instead.
How do route tables decide between two matching routes? Longest-prefix match — the most specific route wins (e.g. /24 over /0); static routes beat propagated BGP routes of the same prefix.
How do you cut S3 data-transfer costs through a NAT gateway? Add a free S3 gateway endpoint to the private subnet’s route table so S3 traffic bypasses the NAT entirely.

Quick check

True or false: a subnet can span two Availability Zones.
Which two AWS services are supported by gateway endpoints?
Which VPC attribute must be on for interface-endpoint private DNS names to resolve correctly?
Where must a NAT gateway be placed — a public or a private subnet — and why?
What is the smallest subnet size AWS allows, and what limits it?

Answers

False — a subnet lives in exactly one AZ; use multiple subnets across AZs for HA.
Amazon S3 and DynamoDB (only these two).
enableDnsHostnames (alongside enableDnsSupport, which must also be on).
A public subnet — the NAT gateway needs a route to the Internet Gateway to reach the internet on behalf of private instances.
/28 (16 addresses, 11 usable) — limited by the five reserved IPs AWS takes in every subnet.

Exercise

Design (on paper or in the console) a production-ready VPC for a three-tier web application in the ap-south-1 Region that must survive the loss of one Availability Zone:

Choose a /16 CIDR and carve six subnets — public, private-app, and isolated-database tiers across two AZs.
Decide where IGWs and NAT gateways go, and how many NAT gateways you need for AZ resilience.
Add a free S3 gateway endpoint and at least one interface endpoint (e.g. SSM, so you can manage instances without SSH/bastion).
Sketch the route tables for each tier and confirm the database tier has no internet route.
List which components are free and which incur cost, and estimate the dominant cost driver.

Bonus: explain what you would change to add a second VPC and connect the two, and at what point you would replace peering with a Transit Gateway.

Certification mapping

Exam	Objective area this supports
SAA-C03 (Solutions Architect – Associate)	Design secure and resilient architectures — VPC/subnet/AZ design, public vs private routing, NAT for egress, gateway vs interface endpoints, and peering vs Transit Gateway trade-offs.
ANS-C01 (Advanced Networking – Specialty)	Network design and connectivity — CIDR/IPv6 planning, route-table behaviour and priority, PrivateLink/endpoints, DHCP option sets and hybrid DNS, and flow-log-based troubleshooting.
DVA-C02 (Developer – Associate)	Deployment and security — placing application resources in the right subnet tier and reaching AWS services privately via endpoints.
SOA-C02 (SysOps – Associate)	Networking and monitoring — operating NAT gateways, route tables, and VPC Flow Logs for day-to-day troubleshooting.

Glossary

VPC (Virtual Private Cloud) — a logically isolated, software-defined virtual network in one AWS Region.
CIDR — Classless Inter-Domain Routing; the /16-style notation defining an address block’s size.
Subnet — a sub-range of the VPC CIDR confined to a single Availability Zone.
Availability Zone (AZ) — one or more discrete data centres in a Region with independent power and networking.
Reserved IPs — the five addresses (first four + last) AWS reserves in every subnet.
Route table — the ordered rules mapping destination CIDRs to targets; one per subnet (defaults to the main table).
local route — the unremovable route for the VPC’s own CIDR that makes all subnets mutually reachable.
Internet Gateway (IGW) — the VPC’s door to the public internet; performs one-to-one NAT for public IPs.
Egress-only internet gateway — the IPv6 equivalent of NAT: outbound-only IPv6 internet access.
NAT (Network Address Translation) — letting many private addresses share a public address for outbound traffic.
NAT Gateway / NAT instance — the managed vs self-managed ways to provide outbound-only internet to private subnets.
Elastic IP (EIP) — a static public IPv4 address you allocate and attach.
ENI (Elastic Network Interface) — the virtual NIC that resources attach to in a subnet.
DHCP option set — VPC-level DNS/domain/NTP configuration handed to instances at boot.
enableDnsSupport / enableDnsHostnames — the two VPC attributes that control AWS DNS resolution and public/endpoint DNS names.
VPC endpoint — a private on-ramp to AWS (or partner) services; gateway (S3/DynamoDB, free, route-based) or interface (PrivateLink, ENI-based).
PrivateLink — the technology behind interface endpoints; also exposes your own services privately.
VPC peering — a one-to-one, non-transitive connection between two VPCs.
Transit Gateway (TGW) — a transitive hub connecting many VPCs and on-prem links.
VPC Flow Logs — metadata (not packet contents) about IP traffic on ENIs/subnets/VPCs.

Next steps

Continue the course with AWS Security Groups vs Network ACLs, In Depth — now that you control where traffic flows, learn the filtering layer that decides what is allowed, including the stateful-vs-stateless difference and the classic “return traffic blocked by a NACL” gotcha. Then deepen your networking with:

Amazon VPC IPAM: Hierarchical CIDR Planning, Allocation & BYOIP at Scale — automate the address planning this lesson did by hand.
Designing Multi-Account VPC Connectivity with Transit Gateway — replace the peering mesh with a transitive hub and centralised egress.
AWS PrivateLink for Service Providers & Consumers — expose and consume private services across accounts using the technology behind interface endpoints.