Dual-Stack Done Deliberately: IPv6 Across VPCs, VNets, and Load Balancers

Most teams reach for IPv6 reactively: a partner mandates it, an ALB hits a v4-exhaustion edge case, or someone finally notices that three acquired business units all picked 10.0.0.0/16 and the VPC peering refuses to land. Dual-stack is not hard, but it is full of asymmetries with IPv4 that fail silently — a security group that looks closed is wide open on v6, an egress path that works for v4 black-holes for v6, a health check that passes while real traffic doesn’t. This is a deliberate walkthrough of designing and rolling out dual-stack on AWS and Azure without breaking the v4 estate you already run.

The governing principle: dual-stack means both protocols are first-class and independently configured. Every rule, route, and record you maintain for v4 has a v6 sibling, and forgetting the sibling is the entire failure mode.

1. Why IPv6 now: the three forcing functions

There are exactly three reasons that justify the work, and you should know which one you’re solving:

RFC 1918 exhaustion and overlap. Large estates burn through 10.0.0.0/8 faster than expected once you count EKS/AKS pod IPs (a single node can consume a /28 or worse), Lambda ENIs, and per-AZ subnet padding. Worse is overlap: mergers and multi-account sprawl produce colliding 10.x ranges that make peering and Transit Gateway routing impossible without NAT-on-NAT hairballs. IPv6 gives every workload a globally unique, non-overlapping address.
External-facing requirements. US Government (OMB M-21-07 targets IPv6-only), mobile carriers, and an increasing share of residential ISPs are v6-first. If your API is v4-only, those clients traverse carrier-grade NAT (CGNAT) — which means degraded geolocation, shared-IP reputation problems, and rate-limit collisions.
Cost. On AWS, public IPv4 addresses are billed per-hour per-address. At fleet scale, moving internal and egress traffic to v6 measurably trims the v4 bill.

If none of these bite you, dual-stack is premature. If one does, read on.

2. The addressing plan: get the boundaries right before you touch a console

IPv6 addressing forgives nothing later, so spend your design budget here. Two decisions dominate.

Provider-assigned vs. BYOIP. For a greenfield rollout, take the provider-assigned GUA block — AWS hands you a /56 per VPC from Amazon’s pool; Azure lets you assign from /40 down to /64 on a VNet. Use BYOIP only when you must keep a portable prefix (regulatory pinning, multi-cloud identity, or an existing ASN advertisement). BYOIP adds ROA/RPKI and advertisement overhead you don’t want on day one.

The /56 and /64 boundaries are not negotiable. The single rule that prevents most pain:

A subnet in a cloud VPC/VNet must be a /64. SLAAC, EUI-64, and the providers’ own address management assume it. Do not try to be clever with /80 or /112 subnets — autoconfiguration breaks and you’ll fight the platform forever.

So the hierarchy is:

Scope	Prefix	Notes
Organization pool (BYOIP)	/48 or larger	One advertisement, many regions
VPC / VNet	/56 (AWS default)	256 possible /64s
Subnet	/64	Mandatory; one per AZ/zone tier

A /56 gives you 256 /64 subnets per VPC — plenty for public/private/data tiers across every AZ. Plan the subnet layout the same way you do for v4: reserve contiguous ranges per tier so route tables and NSGs stay readable. AWS assigns the specific /64 per subnet for you out of the VPC /56; you choose which one.

3. Enabling dual-stack on subnets without breaking existing IPv4

The cardinal rule: add v6 alongside v4; never convert. Existing ENIs keep their v4 addresses and routes untouched. You are layering, not migrating in place.

AWS — associate an IPv6 CIDR to the VPC, then to subnets:

# 1. Give the VPC an Amazon-provided /56
aws ec2 associate-vpc-cidr-block \
  --vpc-id vpc-0abc123 \
  --amazon-provided-ipv6-cidr-block

# 2. Read back the assigned /56
aws ec2 describe-vpcs --vpc-ids vpc-0abc123 \
  --query 'Vpcs[0].Ipv6CidrBlockAssociationSet[0].Ipv6CidrBlock' --output text
# -> 2600:1f18:abcd::/56

# 3. Carve a /64 per subnet (06::/64, 07::/64, ...). Existing IPv4 CIDR is unchanged.
aws ec2 associate-subnet-cidr-block \
  --subnet-id subnet-0public1 \
  --ipv6-cidr-block 2600:1f18:abcd:0006::/64

# 4. Auto-assign a v6 address to new ENIs in that subnet
aws ec2 modify-subnet-attribute \
  --subnet-id subnet-0public1 \
  --assign-ipv6-address-on-creation

In Terraform, the same intent (note assign_generated_ipv6_cidr_block on the VPC and the ipv6_cidr_block math on subnets):

resource "aws_vpc" "main" {
  cidr_block                       = "10.20.0.0/16"
  assign_generated_ipv6_cidr_block = true
}

resource "aws_subnet" "public_a" {
  vpc_id          = aws_vpc.main.id
  cidr_block      = "10.20.0.0/20"
  ipv6_cidr_block = cidrsubnet(aws_vpc.main.ipv6_cidr_block, 8, 6)

  assign_ipv6_address_on_creation = true
  map_public_ip_on_launch         = true
}

Azure — IPv6 is added at the VNet and subnet level as a second address prefix:

# Add an IPv6 prefix to an existing dual-stack-capable VNet
az network vnet update \
  --resource-group rg-net --name vnet-core \
  --address-prefixes 10.30.0.0/16 fd00:db8:cafe::/56

# Subnet carries BOTH a v4 and a v6 prefix
az network vnet subnet update \
  --resource-group rg-net --vnet-name vnet-core --name snet-app \
  --address-prefixes 10.30.1.0/24 fd00:db8:cafe:1::/64

Azure caveat that bites people: many resources require dual-stack from creation. You generally cannot add IPv6 to a NIC IP configuration on an existing VM/NIC in-place the way you can append a v4 secondary; plan to recreate NIC IP configs, and remember that an Azure subnet’s IPv6 space must be a /64.

4. Egress without NAT: there is no NAT for IPv6

This is the asymmetry that surprises the most people. There is no NAT gateway for IPv6, by design — every address is already globally routable, so the entire concept of address translation for outbound traffic disappears. What replaces “private subnet behind a NAT” is different on each cloud:

AWS — the egress-only internet gateway (EIGW). It is the v6 analog of a NAT gateway’s security posture: it permits outbound-initiated traffic and the return packets, but blocks inbound-initiated connections. It performs no translation.

aws ec2 create-egress-only-internet-gateway --vpc-id vpc-0abc123
# -> eigw-0aabb

# Route private-subnet v6 default traffic to the EIGW (not to a NAT gw)
aws ec2 create-route \
  --route-table-id rtb-0private \
  --destination-ipv6-cidr-block ::/0 \
  --egress-only-internet-gateway-id eigw-0aabb

Public subnets keep using the regular internet gateway for v6 — the IGW handles both protocols. The EIGW exists only to give private subnets stateful outbound v6.

Azure — there is no EIGW equivalent. Outbound v6 for “private” workloads is controlled by NSG egress rules plus the absence of an inbound allow, and by routing. If you need a stable outbound v6 footprint, Azure NAT Gateway is IPv4-only; for v6 you rely on the standard load balancer / public IP outbound or simply default routing constrained by NSGs. The practical consequence: in Azure your “no unsolicited inbound” guarantee for v6 comes from NSG rules, not from a gateway type. Get the NSG right (Section 6) or you have no perimeter.

The single most important mental correction:

Stop thinking “private = un-routable.” In v6, every workload has a routable address. “Private” now means “the security/routing policy denies inbound,” not “the address can’t be reached.” If your NSG/security group is wrong, that database is on the public internet.

5. Load balancers and DNS: AAAA, dual-stack frontends, and Happy Eyeballs

A v6 client can only reach you if (a) the load balancer has a v6 frontend and (b) DNS returns an AAAA record. Both, or it silently falls back to v4 — masking your misconfiguration until the v4 path also fails.

AWS ALB/NLB — set the IP address type to dualstack:

aws elbv2 set-ip-address-type \
  --load-balancer-arn arn:aws:elasticloadbalancing:...:loadbalancer/app/web/abc \
  --ip-address-type dualstack

This makes the ELB resolve to both A and AAAA records. Targets can remain IPv4 — the ALB terminates v6 from the client and talks v4 to the backend, which means you do not have to make every instance dual-stack on day one. That property is what makes incremental rollout viable.

DNS via Route 53 — alias records cover both families automatically when the ELB is dualstack, but if you publish manual records you need the AAAA:

{
  "Comment": "dual-stack frontend",
  "Changes": [
    {
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "AAAA",
        "AliasTarget": {
          "HostedZoneId": "Z35SXDOTRQ7X7K",
          "DNSName": "dualstack.web-1234.us-east-1.elb.amazonaws.com",
          "EvaluateTargetHealth": true
        }
      }
    }
  ]
}

Azure — Standard Load Balancer with both a v4 and v6 frontend IP and matching rules. Each load-balancing rule is single-family, so you create a v6 frontend, a v6 public IP, and a parallel rule:

resource pip6 'Microsoft.Network/publicIPAddresses@2023-09-01' = {
  name: 'pip-lb-v6'
  location: location
  sku: { name: 'Standard' }
  properties: {
    publicIPAddressVersion: 'IPv6'
    publicIPAllocationMethod: 'Static'
  }
}

Then add a frontend IP config referencing pip6 and a second load-balancing rule bound to it; the v4 rule stays as-is.

Happy Eyeballs (RFC 8305) is your safety net and your blind spot. Modern clients race v6 and v4 connections and use whichever answers first, preferring v6 by a small head-start. This is great for users — a broken v6 path degrades gracefully to v4 in ~tens of milliseconds. It is terrible for your observability: a half-broken v6 frontend produces zero user complaints because clients silently fall back. You will only catch it with explicit v6 monitoring (Section “Verify”). Do not let “no one complained” stand in for “v6 works.”

6. Security groups and NSGs: the v6 rule gaps that silently leave ports open

This section is where real incidents happen. v4 and v6 rules are independent. A rule scoped to a v4 CIDR provides no protection on v6, and vice versa.

The classic AWS failure:

# This allows SSH from one v4 admin block...
aws ec2 authorize-security-group-ingress \
  --group-id sg-0abc --protocol tcp --port 22 --cidr 10.0.0.0/8

# ...and does NOTHING for v6. But the egress default is the trap:

The trap is the default egress rule. A freshly created EC2 security group ships with an allow-all egress rule for both 0.0.0.0/0 and ::/0. The moment you attach a v6 address, that workload has unrestricted outbound v6 — even though you may have carefully locked down v4 egress and never touched the v6 line. Likewise, if you intend to allow inbound on v6 you must add the explicit --ipv6-cidr rule; the v4 rule won’t cover it:

# Inbound v6 must be its own rule
aws ec2 authorize-security-group-ingress \
  --group-id sg-0web --protocol tcp --port 443 --ipv6-cidr ::/0

# Tighten v6 egress deliberately (revoke the implicit allow-all if your policy requires)
aws ec2 revoke-security-group-egress \
  --group-id sg-0web \
  --ip-permissions 'IpProtocol=-1,Ipv6Ranges=[{CidrIpv6=::/0}]'

Audit rule: for every security group, enumerate ingress AND egress and confirm each has both an IPv4 and an IPv6 disposition that you intended. “No v6 rule” does not mean “v6 denied” for egress — it means “v6 allowed by default.”

Azure NSGs have the same independence but a friendlier default: NSGs default-deny inbound for both families, so the inbound risk is lower. The gap is in your allow rules — a rule with sourceAddressPrefix: '10.0.0.0/8' does not match v6 traffic, and a rule meant to permit your CDN or partner over v6 must specify v6 prefixes. Use a parallel rule and keep priorities aligned:

az network nsg rule create \
  --resource-group rg-net --nsg-name nsg-app --name Allow-HTTPS-v6 \
  --priority 210 --direction Inbound --access Allow --protocol Tcp \
  --destination-port-ranges 443 \
  --source-address-prefixes '2001:db8:partner::/48' \
  --destination-address-prefixes '*'

Also remember link-local and ICMPv6: IPv6 depends on ICMPv6 (Neighbor Discovery, Router Advertisement, Path MTU Discovery) far more than v4 depends on ICMP. Blanket-blocking ICMPv6 breaks IPv6 in subtle ways — PMTUD failures cause hangs on large responses. Permit the essential ICMPv6 types rather than dropping all of them.

7. On-prem and hybrid: IPv6 over ExpressRoute / Direct Connect

Hybrid is where dual-stack maturity differs sharply by provider, so verify against current docs for your region before committing a design.

AWS Direct Connect supports IPv6 on transit and private virtual interfaces. You configure a dual-stack BGP peering — separate v4 and v6 address families over the same VIF, advertising your v6 prefixes from on-prem and receiving the VPC/Transit Gateway v6 routes. Transit Gateway propagates v6 routes alongside v4. The key design point: BGP runs two address families, and you must advertise/accept v6 prefixes explicitly; enabling the VIF is not enough.
Azure ExpressRoute supports IPv6 on private peering for the connection between your edge and the Microsoft Enterprise Edge routers, with v6 BGP sessions. Dual-stack the ExpressRoute Gateway and ensure the connected VNets carry v6 prefixes.

The cross-cloud lesson is the same as everywhere else in this article: the v6 address family is configured separately from v4 on the same circuit. A working v4 BGP session over Direct Connect tells you nothing about v6 — confirm the v6 prefixes are actually in the route tables on both ends.

For transit, decide early whether v6 is end-to-end routed (preferred — every prefix unique, no translation anywhere) or terminated/proxied at an edge. End-to-end is the entire point of v6; resist the urge to recreate NAT semantics out of habit.

Verify

Do not trust the console’s green checkmarks. Prove reachability on the wire, from a real v6 client, and prove v4 still works (parity).

# From a v6-capable client / host, confirm AAAA resolution and reachability
dig +short AAAA api.example.com
curl -6 -sS -o /dev/null -w "v6 -> HTTP %{http_code} in %{time_total}s\n" https://api.example.com/health
curl -4 -sS -o /dev/null -w "v4 -> HTTP %{http_code} in %{time_total}s\n" https://api.example.com/health

# Confirm an instance actually got a v6 address and a default route to the EIGW/IGW
ip -6 addr show
ip -6 route show   # expect a default (::/0) route via fe80::...

Confirm the AWS route table sends private v6 to the EIGW (and not nowhere):

aws ec2 describe-route-tables --route-table-ids rtb-0private \
  --query 'RouteTables[0].Routes[?DestinationIpv6CidrBlock==`::/0`]'

On Azure, validate the NSG actually evaluates as you intend for a v6 source using the network-watcher IP flow check:

az network watcher test-ip-flow \
  --resource-group rg-net \
  --vm vm-app01 \
  --direction Inbound --protocol TCP \
  --local 'fd00:db8:cafe:1::4:443' \
  --remote '2001:db8:partner::10:51000'

A KQL pass over flow logs to catch the silent-open case — v6 inbound on ports you believed were closed:

AzureNetworkAnalytics_CL
| where SubType_s == "FlowLog" and FlowType_s == "ExternalPublic"
| where SrcIP_s contains ":"            // crude v6 filter
| where FlowDirection_s == "I"          // inbound
| summarize hits = count() by DestPort_d, DestIP_s, L7Protocol_s
| order by hits desc

Parity checklist for the cutover: every endpoint that answers on v4 must answer on v6 with the same status, same TLS cert (SAN must still match the host), same auth behavior, and security tooling must see v6 flows. A v6 path that bypasses your WAF because the WAF only had a v4 listener is a real and common gap.

Enterprise scenario

A fintech platform team ran a customer-facing API behind an AWS ALB for a UK open-banking integration. A regulated aggregator partner announced they were moving to IPv6-only egress from their data center within a quarter; the platform’s API was v4-only, and the partner’s traffic would otherwise be forced through CGNAT, breaking the IP-allowlisting both sides relied on for the mTLS-plus-source-IP control.

The constraint: zero downtime, no backend rewrite. Hundreds of ECS tasks behind the ALB were IPv4-only and could not be re-platformed on the timeline.

The solution leaned on the ALB’s protocol-translation property. They associated an Amazon /56 to the VPC, carved /64s into the existing public subnets, and flipped the ALB to dual-stack — backends stayed IPv4. They published an AAAA alias in Route 53 to the dualstack. ELB name. The genuinely important fix was the security group: the ALB’s SG allowed 443 from the partner’s v4 block only, so they added the matching v6 ingress and, critically, revoked the implicit allow-all v6 egress on the task SGs that had silently appeared the moment v6 addresses attached.

# ALB to dual-stack; backends remain IPv4 (no task changes)
aws elbv2 set-ip-address-type --load-balancer-arn $ALB_ARN --ip-address-type dualstack

# Allow the partner's v6 prefix to the ALB on 443
aws ec2 authorize-security-group-ingress \
  --group-id $ALB_SG --protocol tcp --port 443 \
  --ipv6-cidr 2001:db8:5b1c::/48

Validation caught the subtle failure: their WAF web ACL was associated with the ALB (good), but their CloudWatch synthetic canary only probed v4. They added a curl -6 canary and immediately saw the v6 path returning a TLS handshake error — the ALB certificate’s SAN list was fine, but a downstream IP-allowlist Lambda authorizer keyed on a v4-shaped X-Forwarded-For and rejected the v6 source. They normalized the authorizer to parse both families, and v6 parity went green. Total customer-facing downtime: none.

The lesson the team wrote up: the address plumbing was an afternoon; the silent gaps — default v6 egress, a v4-only canary, a v4-shaped XFF parser — were the actual work.

Dual-Stack Done Deliberately: IPv6 Across VPCs, VNets, and Load Balancers

1. Why IPv6 now: the three forcing functions

2. The addressing plan: get the boundaries right before you touch a console

3. Enabling dual-stack on subnets without breaking existing IPv4

4. Egress without NAT: there is no NAT for IPv6

5. Load balancers and DNS: AAAA, dual-stack frontends, and Happy Eyeballs

6. Security groups and NSGs: the v6 rule gaps that silently leave ports open

7. On-prem and hybrid: IPv6 over ExpressRoute / Direct Connect

Verify

Enterprise scenario

Checklist

Written by Vinod

Comments

Keep Reading

Hub-and-Spoke vs Virtual WAN: Choosing an Enterprise Cloud Network Topology

Configure Fortinet FortiGate-VM HA Pair on GCP with FGCP and External Load Balancer

Configure NetScaler ADC VPX Virtual Appliance for L7 Load Balancing and SSL Offload