Quick take — A reusable Terraform module for AWS NAT Gateway: provision per-AZ public or private NAT gateways with EIP allocation, route table wiring, and connectivity logging for hashicorp/aws ~> 5.0. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "aws" {
region = "us-east-1"
}
module "nat_gateway" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-nat-gateway?ref=v1.0.0"
name = "..." # Name prefix for NAT gateways, EIPs, and the log group (…
nat_gateways = {} # Gateways to create, keyed by AZ. Each entry: `subnet_id…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
A NAT (Network Address Translation) Gateway is a managed AWS service that lets resources in private subnets initiate outbound connections to the internet (or to other VPCs/on-prem) while preventing the internet from initiating inbound connections back to them. It is the standard way to give private EC2 instances, Lambda-in-VPC, ECS tasks, and EKS nodes egress for pulling packages, calling external APIs, or reaching SaaS endpoints — without giving every workload a public IP.
A NAT gateway lives in one public subnet and serves traffic for one Availability Zone. There is no “multi-AZ NAT gateway” primitive: highly-available designs deploy one gateway per AZ and point each AZ’s private route tables at the gateway in the same AZ. Getting that right by hand is repetitive and error-prone (mismatched AZs silently add cross-AZ data charges, a single gateway becomes a SPOF, EIPs leak when destroys go wrong). Wrapping aws_nat_gateway in a module makes the per-AZ fan-out, EIP lifecycle, and route wiring declarative and consistent across every VPC and environment.
This module supports both connectivity_type = "public" (the default — internet egress via an Elastic IP) and connectivity_type = "private" (NAT between VPCs without an EIP, e.g. via Transit Gateway), provisions and tracks the EIPs, creates the 0.0.0.0/0 default routes in the private route tables you hand it, and optionally streams connection logs to CloudWatch for auditing.
When to use it
- You run workloads in private subnets (EKS nodes, ECS Fargate tasks, RDS-adjacent app tiers, Lambda-in-VPC) that need outbound-only internet access for package mirrors, OS patching, or third-party APIs.
- You want HA egress: a NAT gateway in every AZ so the loss of one AZ does not take down outbound traffic for the others.
- You need private NAT (no public IP) to overlap-free route between VPCs or to on-prem over Transit Gateway / VPC peering.
- You are standardizing VPC builds across many accounts and want the same per-AZ NAT pattern, EIP tagging, and route wiring everywhere.
- You do not need this if your private subnets require no egress at all, or if a single shared NAT instance (cheaper, self-managed) is an acceptable trade-off for non-prod — NAT gateway is the managed, scalable, production choice.
Module structure
terraform-module-aws-nat-gateway/
├── versions.tf # provider + Terraform version pins
├── main.tf # aws_eip, aws_nat_gateway, aws_route, optional log group
├── variables.tf # var-driven inputs with validations
└── outputs.tf # ids, eips, route ids, az→id map
# versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# main.tf
locals {
# One NAT gateway per public-subnet entry the caller supplies.
# Each map key is a stable logical name (typically the AZ, e.g. "us-east-1a").
is_public = var.connectivity_type == "public"
# Build the route fan-out: every (gateway_key, route_table_id) pair becomes
# one aws_route. Callers map each AZ's private route tables to that AZ's NAT.
routes = merge([
for key, cfg in var.nat_gateways : {
for rt in cfg.private_route_table_ids :
"${key}:${rt}" => {
nat_key = key
route_table_id = rt
}
}
]...)
}
# Elastic IPs — only for public NAT gateways. Allocated in the VPC domain.
resource "aws_eip" "this" {
for_each = local.is_public ? var.nat_gateways : {}
domain = "vpc"
public_ipv4_pool = var.public_ipv4_pool
network_border_group = var.network_border_group
tags = merge(
var.tags,
{ Name = "${var.name}-nat-eip-${each.key}" },
)
# Avoid a brief egress outage by creating the replacement EIP before
# destroying the old one during in-place changes.
lifecycle {
create_before_destroy = true
}
}
resource "aws_nat_gateway" "this" {
for_each = var.nat_gateways
connectivity_type = var.connectivity_type
subnet_id = each.value.subnet_id
# Public NAT requires an allocation_id; private NAT must not set one.
allocation_id = local.is_public ? aws_eip.this[each.key].id : null
# Optional fixed secondary private IP (private NAT use cases / pinning).
private_ip = each.value.private_ip
tags = merge(
var.tags,
{
Name = "${var.name}-nat-${each.key}"
AvailabilityZone = each.key
},
)
# The gateway depends on an Internet Gateway being attached to the VPC for
# public connectivity; surface that ordering when the caller passes the IGW.
depends_on = [var.internet_gateway_id]
lifecycle {
create_before_destroy = true
}
}
# Default route (0.0.0.0/0) from each private route table to its AZ's NAT GW.
resource "aws_route" "default_ipv4" {
for_each = var.create_default_routes ? local.routes : {}
route_table_id = each.value.route_table_id
destination_cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.this[each.value.nat_key].id
timeouts {
create = "5m"
}
}
# Optional VPC Flow Log scoped to NAT egress, delivered to CloudWatch Logs.
resource "aws_cloudwatch_log_group" "nat_flow" {
count = var.enable_flow_logs ? 1 : 0
name = "/aws/vpc/nat/${var.name}"
retention_in_days = var.flow_logs_retention_days
kms_key_id = var.flow_logs_kms_key_arn
tags = merge(var.tags, { Name = "${var.name}-nat-flow-logs" })
}
# variables.tf
variable "name" {
description = "Name prefix applied to the NAT gateways, EIPs, and log group."
type = string
validation {
condition = can(regex("^[a-zA-Z0-9-]{1,40}$", var.name))
error_message = "name must be 1-40 chars, alphanumeric and hyphens only."
}
}
variable "nat_gateways" {
description = <<-EOT
Map of NAT gateways to create, keyed by a stable logical name (use the AZ,
e.g. "eu-west-1a"). For public NAT, subnet_id must be a PUBLIC subnet in
that AZ. private_route_table_ids are the private route tables in the SAME
AZ that should default-route through this gateway.
EOT
type = map(object({
subnet_id = string
private_route_table_ids = optional(list(string), [])
private_ip = optional(string)
}))
validation {
condition = length(var.nat_gateways) > 0
error_message = "Provide at least one NAT gateway entry."
}
validation {
condition = alltrue([
for k, v in var.nat_gateways : can(regex("^subnet-", v.subnet_id))
])
error_message = "Every nat_gateways[*].subnet_id must be a subnet- ID."
}
}
variable "connectivity_type" {
description = "NAT connectivity: 'public' (internet egress via EIP) or 'private' (VPC-to-VPC, no EIP)."
type = string
default = "public"
validation {
condition = contains(["public", "private"], var.connectivity_type)
error_message = "connectivity_type must be 'public' or 'private'."
}
}
variable "create_default_routes" {
description = "Whether to create 0.0.0.0/0 routes from the supplied private route tables to each AZ's NAT gateway."
type = bool
default = true
}
variable "internet_gateway_id" {
description = "ID of the VPC's Internet Gateway. Used as an explicit dependency so public NAT is created after the IGW is attached. Set null for private NAT."
type = string
default = null
}
variable "public_ipv4_pool" {
description = "EC2 public IPv4 pool (e.g. a BYOIP pool ID) to allocate NAT EIPs from. Defaults to Amazon's pool when null."
type = string
default = null
}
variable "network_border_group" {
description = "Network border group that limits the EIP to a Local Zone / Wavelength group. Null for standard Regional EIPs."
type = string
default = null
}
variable "enable_flow_logs" {
description = "Create a CloudWatch Log Group for NAT egress flow logging."
type = bool
default = false
}
variable "flow_logs_retention_days" {
description = "Retention in days for the NAT flow-log group."
type = number
default = 30
validation {
condition = contains(
[1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1096, 1827, 2192, 2557, 2922, 3288, 3653],
var.flow_logs_retention_days
)
error_message = "flow_logs_retention_days must be a value CloudWatch Logs accepts."
}
}
variable "flow_logs_kms_key_arn" {
description = "Optional KMS key ARN to encrypt the NAT flow-log group at rest."
type = string
default = null
}
variable "tags" {
description = "Tags applied to all resources created by the module."
type = map(string)
default = {}
}
# outputs.tf
output "nat_gateway_ids" {
description = "Map of logical key (AZ) => NAT gateway ID."
value = { for k, gw in aws_nat_gateway.this : k => gw.id }
}
output "nat_gateway_id_list" {
description = "Flat list of all NAT gateway IDs."
value = [for gw in aws_nat_gateway.this : gw.id]
}
output "nat_gateway_public_ips" {
description = "Map of logical key (AZ) => allocated public IP (empty for private NAT)."
value = { for k, gw in aws_nat_gateway.this : k => gw.public_ip }
}
output "nat_gateway_private_ips" {
description = "Map of logical key (AZ) => assigned private IP of the gateway."
value = { for k, gw in aws_nat_gateway.this : k => gw.private_ip }
}
output "eip_allocation_ids" {
description = "Map of logical key (AZ) => EIP allocation ID (empty map for private NAT)."
value = { for k, eip in aws_eip.this : k => eip.id }
}
output "default_route_ids" {
description = "Map of 'natKey:routeTableId' => created default-route ID."
value = { for k, r in aws_route.default_ipv4 : k => r.id }
}
output "flow_log_group_name" {
description = "Name of the NAT flow-log CloudWatch group, or null when disabled."
value = try(aws_cloudwatch_log_group.nat_flow[0].name, null)
}
How to use it
locals {
azs = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}
module "nat_gateway" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-nat-gateway?ref=v1.0.0"
name = "prod-shared"
connectivity_type = "public"
internet_gateway_id = aws_internet_gateway.this.id
# One NAT gateway per AZ. Each public subnet hosts the gateway; the matching
# private route tables in the SAME AZ default-route through it.
nat_gateways = {
"eu-west-1a" = {
subnet_id = aws_subnet.public["eu-west-1a"].id
private_route_table_ids = [aws_route_table.private["eu-west-1a"].id]
}
"eu-west-1b" = {
subnet_id = aws_subnet.public["eu-west-1b"].id
private_route_table_ids = [aws_route_table.private["eu-west-1b"].id]
}
"eu-west-1c" = {
subnet_id = aws_subnet.public["eu-west-1c"].id
private_route_table_ids = [aws_route_table.private["eu-west-1c"].id]
}
}
enable_flow_logs = true
flow_logs_retention_days = 90
tags = {
Environment = "prod"
CostCenter = "platform-network"
ManagedBy = "terraform"
}
}
# Downstream reference: allowlist the NAT egress IPs at a partner's firewall by
# feeding the stable public IPs into a security automation / SSM parameter.
resource "aws_ssm_parameter" "nat_egress_ips" {
name = "/network/prod/nat-egress-ips"
type = "StringList"
value = join(",", values(module.nat_gateway.nat_gateway_public_ips))
tags = {
Description = "Stable NAT egress IPs for partner firewall allowlists"
}
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "s3"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...s3 state bucket/container + key per path...
}
}
2. Module config — live/prod/nat_gateway/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-nat-gateway?ref=v1.0.0"
}
inputs = {
name = "..."
nat_gateways = {}
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/nat_gateway && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Name prefix for NAT gateways, EIPs, and the log group (1-40 chars, alphanumeric + hyphens). |
nat_gateways |
map(object) |
— | Yes | Gateways to create, keyed by AZ. Each entry: subnet_id (public subnet for public NAT), private_route_table_ids, optional private_ip. |
connectivity_type |
string |
"public" |
No | "public" (internet egress via EIP) or "private" (VPC-to-VPC, no EIP). |
create_default_routes |
bool |
true |
No | Create 0.0.0.0/0 routes from the supplied private route tables to each AZ’s gateway. |
internet_gateway_id |
string |
null |
No | VPC Internet Gateway ID, used as an explicit dependency for public NAT ordering. |
public_ipv4_pool |
string |
null |
No | EC2 public IPv4 / BYOIP pool to allocate EIPs from. Amazon pool when null. |
network_border_group |
string |
null |
No | Limits EIPs to a Local Zone / Wavelength border group. |
enable_flow_logs |
bool |
false |
No | Create a CloudWatch Log Group for NAT egress flow logging. |
flow_logs_retention_days |
number |
30 |
No | Retention for the flow-log group (must be a CloudWatch-accepted value). |
flow_logs_kms_key_arn |
string |
null |
No | KMS key ARN to encrypt the flow-log group at rest. |
tags |
map(string) |
{} |
No | Tags applied to every resource the module creates. |
Outputs
| Name | Description |
|---|---|
nat_gateway_ids |
Map of logical key (AZ) => NAT gateway ID. |
nat_gateway_id_list |
Flat list of all NAT gateway IDs. |
nat_gateway_public_ips |
Map of AZ => allocated public IP (empty for private NAT). |
nat_gateway_private_ips |
Map of AZ => assigned private IP of the gateway. |
eip_allocation_ids |
Map of AZ => EIP allocation ID (empty for private NAT). |
default_route_ids |
Map of natKey:routeTableId => created default-route ID. |
flow_log_group_name |
Name of the NAT flow-log CloudWatch group, or null when disabled. |
Enterprise scenario
A regulated fintech runs its core ledger services on EKS across three AZs in eu-west-1, with all node groups in private subnets. The security team requires that outbound calls to a payment processor leave from a fixed, allowlisted set of IPs, while inbound from the internet stays impossible. They consume this module once per cluster VPC with connectivity_type = "public" and one gateway per AZ, publish the three EIPs via the nat_gateway_public_ips output into an SSM parameter that the processor’s onboarding pipeline reads, and turn on flow logs (enable_flow_logs = true, 90-day retention, KMS-encrypted) to satisfy auditors who need a record of every egress flow. Per-AZ gateways guarantee that an AZ failure degrades only that zone’s egress rather than the whole platform.
Best practices
- One gateway per AZ, routed in-zone. Always pair each AZ’s private route tables with the NAT gateway in the same AZ. Routing AZ-b’s subnets through AZ-a’s gateway works but quietly adds cross-AZ data-transfer charges and couples two AZs’ fate together.
- Mind the cost model. NAT gateways bill both an hourly charge and per-GB processed. For chatty S3/DynamoDB/ECR traffic, add VPC Gateway/Interface Endpoints so that traffic bypasses the NAT entirely — this is often the single biggest NAT cost saving. In non-prod, consider a single shared gateway or a NAT instance.
- Treat NAT EIPs as stable, allowlisted assets. Surface them via outputs, tag them clearly, and feed them into partner firewall allowlists.
create_before_destroyon the EIP/gateway avoids a brief egress outage during replacements — but a full destroy will change the IP, so coordinate allowlist updates. - Use private NAT for VPC-to-VPC. When you need NAT without internet exposure (overlapping-CIDR avoidance, on-prem reach via Transit Gateway), set
connectivity_type = "private"and omit the EIP/IGW — never hang a public IP off a gateway that only talks internally. - Log egress for audit and forensics. Enable flow logs with KMS encryption and a retention that matches your compliance window; NAT is the natural choke point to observe what private workloads are reaching on the internet.
- Right-size resilience to the SLA. Three gateways for prod HA, but a single gateway (one AZ) is a legitimate cost choice for dev/test where a zone outage is tolerable — make that a deliberate, documented decision via the
nat_gatewaysmap rather than an accident.