Quick take — A reusable Terraform module for google_compute_router_nat on hashicorp/google ~> 5.0: var-driven subnetwork NAT selection, dynamic port allocation, min/max ports per VM, and logging for private VM and GKE outbound internet access. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "google" {
project = "my-project"
region = "us-central1"
}
module "cloud_nat" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-nat?ref=v1.0.0"
project_id = "..." # GCP project ID that owns the NAT gateway.
name = "..." # NAT gateway name (1-63 chars, lowercase RFC1035).
region = "..." # Region for the NAT; must match the router's region.
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Cloud NAT is GCP’s fully managed, distributed Network Address Translation service. It lets instances without external IP addresses — private GKE nodes, Compute Engine VMs, Cloud Run jobs on a connector — reach the internet for outbound traffic (package mirrors, OS updates, third-party APIs) while staying unreachable from the outside. Unlike a NAT appliance, there is no proxy VM in the data path and nothing to scale or patch: Google programs the translation into the VPC’s software-defined network, so throughput grows with your instances.
The one structural fact that trips people up is that Cloud NAT is not a standalone resource — google_compute_router_nat must attach to a Cloud Router in the same region and VPC. The router is just the control-plane anchor here; no BGP is required for NAT to work. Wrapping this in a module is worth it because the raw resource has sharp edges: nat_ips is only valid with MANUAL_ONLY allocation, the per-subnet subnetwork block only applies when source_subnetwork_ip_ranges_to_nat = "LIST_OF_SUBNETWORKS", and enable_dynamic_port_allocation is mutually exclusive with enable_endpoint_independent_mapping — set both and apply fails. This module turns those into validated variables, can either create its own router or attach to one you already manage, and exports the gateway ID and the resolved router name downstream resources need.
When to use it
- You run private GKE clusters or VMs with no external IPs that still need outbound internet for image pulls,
apt/yumupdates, or calling external SaaS APIs. - You need stable, allow-listed egress IPs — a partner or payment gateway whitelists your source addresses, so you want reserved static NAT IPs in
MANUAL_ONLYmode. - You are standardising a landing-zone / Shared-VPC topology where every region needs identical NAT behaviour and you want one audited, policy-checked module instead of copy-pasted HCL.
- You have connection-heavy workloads (crawlers, CI runners, high fan-out microservices) and need to tune port allocation to avoid the silent, intermittent failures caused by NAT port exhaustion.
- Skip it when your workloads already have external IPs (they egress directly and bypass NAT), or when egress must stay fully private via Private Google Access / VPC Service Controls with no internet path at all.
Module structure
terraform-module-gcp-cloud-nat/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
# versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
# main.tf
locals {
# Either create a router here, or attach the NAT to a router supplied by the
# caller. Exactly one of the two paths resolves the router name used below.
create_router = var.create_router
router_name = local.create_router ? google_compute_router.this[0].name : var.router_name
# Dynamic port allocation and endpoint-independent mapping cannot both be on.
# When DPA is enabled we force EIM off regardless of the input.
endpoint_independent_mapping = (
var.enable_dynamic_port_allocation ? false : var.enable_endpoint_independent_mapping
)
}
# Optional control-plane router. NAT requires a router in the same region/VPC;
# no BGP config is needed for NAT-only use.
resource "google_compute_router" "this" {
count = local.create_router ? 1 : 0
name = coalesce(var.router_name, "${var.name}-router")
project = var.project_id
region = var.region
network = var.network
description = var.router_description
}
resource "google_compute_router_nat" "this" {
name = var.name
project = var.project_id
region = var.region
router = local.router_name
nat_ip_allocate_option = var.nat_ip_allocate_option
# nat_ips is only valid with MANUAL_ONLY allocation.
nat_ips = var.nat_ip_allocate_option == "MANUAL_ONLY" ? var.nat_ips : null
# Drain (gracefully release) IPs being removed from a MANUAL_ONLY pool.
drain_nat_ips = (
var.nat_ip_allocate_option == "MANUAL_ONLY" ? var.drain_nat_ips : null
)
source_subnetwork_ip_ranges_to_nat = var.source_subnetwork_ip_ranges_to_nat
# Per-subnet selection only applies in LIST_OF_SUBNETWORKS mode.
dynamic "subnetwork" {
for_each = (
var.source_subnetwork_ip_ranges_to_nat == "LIST_OF_SUBNETWORKS"
? var.subnetworks
: []
)
content {
name = subnetwork.value.name
source_ip_ranges_to_nat = subnetwork.value.source_ip_ranges_to_nat
secondary_ip_range_names = try(subnetwork.value.secondary_ip_range_names, null)
}
}
# Port allocation. With DPA, Cloud NAT scales ports per VM between min and max.
min_ports_per_vm = var.min_ports_per_vm
max_ports_per_vm = var.enable_dynamic_port_allocation ? var.max_ports_per_vm : null
enable_dynamic_port_allocation = var.enable_dynamic_port_allocation
enable_endpoint_independent_mapping = local.endpoint_independent_mapping
# Connection idle timeouts (seconds).
udp_idle_timeout_sec = var.udp_idle_timeout_sec
icmp_idle_timeout_sec = var.icmp_idle_timeout_sec
tcp_established_idle_timeout_sec = var.tcp_established_idle_timeout_sec
tcp_transitory_idle_timeout_sec = var.tcp_transitory_idle_timeout_sec
tcp_time_wait_timeout_sec = var.tcp_time_wait_timeout_sec
log_config {
enable = var.log_enable
filter = var.log_filter
}
}
# variables.tf
variable "project_id" {
type = string
description = "GCP project ID that owns the Cloud NAT gateway."
}
variable "name" {
type = string
description = "Name of the Cloud NAT gateway. Must be a valid GCP resource name."
validation {
condition = can(regex("^[a-z]([-a-z0-9]{0,61}[a-z0-9])?$", var.name))
error_message = "name must be 1-63 chars, lowercase letters, digits or hyphens, starting with a letter."
}
}
variable "region" {
type = string
description = "Region in which to create the NAT gateway (e.g. asia-south1). Must match the router's region."
}
# ---- Router wiring ---------------------------------------------------------
variable "create_router" {
type = bool
description = "If true, create a Cloud Router for this NAT. If false, attach to an existing router named by router_name."
default = true
}
variable "router_name" {
type = string
description = "Name of the Cloud Router. Required when create_router = false; otherwise used as the created router's name (defaults to <name>-router)."
default = null
validation {
condition = var.create_router || var.router_name != null
error_message = "router_name must be set when create_router = false."
}
}
variable "network" {
type = string
description = "Self-link or name of the VPC network. Required only when create_router = true."
default = null
validation {
condition = !var.create_router || var.network != null
error_message = "network must be set when create_router = true."
}
}
variable "router_description" {
type = string
description = "Optional description applied to the router when create_router = true."
default = null
}
# ---- NAT IP allocation -----------------------------------------------------
variable "nat_ip_allocate_option" {
type = string
description = "How NAT IPs are allocated: AUTO_ONLY (Google-managed, may change) or MANUAL_ONLY (reserved static IPs)."
default = "AUTO_ONLY"
validation {
condition = contains(["AUTO_ONLY", "MANUAL_ONLY"], var.nat_ip_allocate_option)
error_message = "nat_ip_allocate_option must be AUTO_ONLY or MANUAL_ONLY."
}
}
variable "nat_ips" {
type = list(string)
description = "Self-links of reserved regional external IPs to use when nat_ip_allocate_option = MANUAL_ONLY."
default = []
}
variable "drain_nat_ips" {
type = list(string)
description = "Self-links of NAT IPs to drain (stop assigning new connections to) while removing them from a MANUAL_ONLY pool."
default = []
}
# ---- Subnetwork NAT selection ----------------------------------------------
variable "source_subnetwork_ip_ranges_to_nat" {
type = string
description = "Which subnet ranges get NAT: ALL_SUBNETWORKS_ALL_IP_RANGES, ALL_SUBNETWORKS_ALL_PRIMARY_IP_RANGES, or LIST_OF_SUBNETWORKS."
default = "ALL_SUBNETWORKS_ALL_IP_RANGES"
validation {
condition = contains([
"ALL_SUBNETWORKS_ALL_IP_RANGES",
"ALL_SUBNETWORKS_ALL_PRIMARY_IP_RANGES",
"LIST_OF_SUBNETWORKS",
], var.source_subnetwork_ip_ranges_to_nat)
error_message = "Invalid source_subnetwork_ip_ranges_to_nat value."
}
}
variable "subnetworks" {
type = list(object({
name = string
source_ip_ranges_to_nat = list(string)
secondary_ip_range_names = optional(list(string))
}))
description = "Per-subnet NAT config; used only when source_subnetwork_ip_ranges_to_nat = LIST_OF_SUBNETWORKS. source_ip_ranges_to_nat items: ALL_IP_RANGES, PRIMARY_IP_RANGE, or LIST_OF_SECONDARY_IP_RANGES."
default = []
}
# ---- Port allocation -------------------------------------------------------
variable "min_ports_per_vm" {
type = number
description = "Minimum number of ports allocated to each VM. Raise for connection-heavy workloads to avoid port exhaustion."
default = 64
validation {
condition = var.min_ports_per_vm >= 64 && var.min_ports_per_vm <= 65536
error_message = "min_ports_per_vm must be between 64 and 65536 (and ideally a power of two)."
}
}
variable "max_ports_per_vm" {
type = number
description = "Maximum ports per VM when enable_dynamic_port_allocation = true. Must be >= min_ports_per_vm."
default = 2048
}
variable "enable_dynamic_port_allocation" {
type = bool
description = "Let Cloud NAT scale ports per VM between min and max on demand. Mutually exclusive with endpoint-independent mapping (module forces EIM off when true)."
default = false
}
variable "enable_endpoint_independent_mapping" {
type = bool
description = "Enable endpoint-independent mapping. Ignored (forced false) when enable_dynamic_port_allocation = true."
default = false
}
# ---- Idle timeouts ---------------------------------------------------------
variable "udp_idle_timeout_sec" {
type = number
description = "UDP idle timeout in seconds."
default = 30
}
variable "icmp_idle_timeout_sec" {
type = number
description = "ICMP idle timeout in seconds."
default = 30
}
variable "tcp_established_idle_timeout_sec" {
type = number
description = "TCP established connection idle timeout in seconds."
default = 1200
}
variable "tcp_transitory_idle_timeout_sec" {
type = number
description = "TCP transitory connection idle timeout in seconds."
default = 30
}
variable "tcp_time_wait_timeout_sec" {
type = number
description = "Timeout (seconds) for TCP connections in TIME_WAIT before the port is reusable."
default = 120
}
# ---- Logging ---------------------------------------------------------------
variable "log_enable" {
type = bool
description = "Enable Cloud NAT logging to Cloud Logging."
default = true
}
variable "log_filter" {
type = string
description = "Which NAT events to log: ERRORS_ONLY, TRANSLATIONS_ONLY, or ALL."
default = "ERRORS_ONLY"
validation {
condition = contains(["ERRORS_ONLY", "TRANSLATIONS_ONLY", "ALL"], var.log_filter)
error_message = "log_filter must be ERRORS_ONLY, TRANSLATIONS_ONLY, or ALL."
}
}
# outputs.tf
output "nat_id" {
description = "The fully-qualified ID of the Cloud NAT gateway."
value = google_compute_router_nat.this.id
}
output "nat_name" {
description = "The name of the Cloud NAT gateway."
value = google_compute_router_nat.this.name
}
output "router_name" {
description = "The name of the Cloud Router the NAT is attached to (created or supplied)."
value = local.router_name
}
output "router_id" {
description = "The ID of the created Cloud Router, or null when attaching to an existing router."
value = local.create_router ? google_compute_router.this[0].id : null
}
output "router_self_link" {
description = "The self-link of the created Cloud Router, or null when attaching to an existing router."
value = local.create_router ? google_compute_router.this[0].self_link : null
}
output "nat_ip_allocate_option" {
description = "The NAT IP allocation mode in effect (AUTO_ONLY or MANUAL_ONLY)."
value = google_compute_router_nat.this.nat_ip_allocate_option
}
output "nat_ips" {
description = "The set of reserved external IP self-links in use (empty for AUTO_ONLY)."
value = google_compute_router_nat.this.nat_ips
}
How to use it
module "cloud_nat" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-nat?ref=v1.0.0"
project_id = "kloudvin-prod-net"
name = "nat-prod-asia-south1"
region = "asia-south1"
# Create a dedicated router for this NAT and attach it to the Shared VPC.
create_router = true
network = google_compute_network.shared_vpc.self_link
# Stable egress IPs that the partner payment gateway allow-lists.
nat_ip_allocate_option = "MANUAL_ONLY"
nat_ips = [
google_compute_address.nat[0].self_link,
google_compute_address.nat[1].self_link,
]
# NAT only the private GKE node subnet and its pod/service secondary ranges.
source_subnetwork_ip_ranges_to_nat = "LIST_OF_SUBNETWORKS"
subnetworks = [
{
name = google_compute_subnetwork.gke.id
source_ip_ranges_to_nat = ["ALL_IP_RANGES"]
secondary_ip_range_names = []
},
]
# Connection-heavy nodes: let NAT scale ports on demand.
enable_dynamic_port_allocation = true
min_ports_per_vm = 256
max_ports_per_vm = 8192
log_enable = true
log_filter = "ERRORS_ONLY"
}
# Downstream: a monitoring alert that fires on dropped (port-exhausted) NAT
# allocations, scoped to this gateway via the exported NAT name.
resource "google_monitoring_alert_policy" "nat_dropped" {
project = "kloudvin-prod-net"
display_name = "Cloud NAT dropped allocations - ${module.cloud_nat.nat_name}"
combiner = "OR"
conditions {
display_name = "Dropped sent packets > 0"
condition_threshold {
filter = join("", [
"resource.type = \"nat_gateway\" AND ",
"metric.type = \"router.googleapis.com/nat/dropped_sent_packets_count\" AND ",
"resource.labels.gateway_name = \"${module.cloud_nat.nat_name}\"",
])
comparison = "COMPARISON_GT"
threshold_value = 0
duration = "300s"
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_RATE"
}
}
}
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "gcs"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...gcs state bucket/container + key per path...
}
}
2. Module config — live/prod/cloud_nat/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-nat?ref=v1.0.0"
}
inputs = {
project_id = "..."
name = "..."
region = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/cloud_nat && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
| project_id | string | — | yes | GCP project ID that owns the NAT gateway. |
| name | string | — | yes | NAT gateway name (1-63 chars, lowercase RFC1035). |
| region | string | — | yes | Region for the NAT; must match the router’s region. |
| create_router | bool | true | no | Create a Cloud Router for this NAT, or attach to an existing one. |
| router_name | string | null | no | Existing router name (required when create_router = false); otherwise the created router’s name (default <name>-router). |
| network | string | null | no | VPC self-link or name; required when create_router = true. |
| router_description | string | null | no | Description for the router when create_router = true. |
| nat_ip_allocate_option | string | “AUTO_ONLY” | no | AUTO_ONLY (Google-managed) or MANUAL_ONLY (reserved IPs). |
| nat_ips | list(string) | [] | no | Reserved external IP self-links when MANUAL_ONLY. |
| drain_nat_ips | list(string) | [] | no | NAT IP self-links to drain while removing them (MANUAL_ONLY). |
| source_subnetwork_ip_ranges_to_nat | string | “ALL_SUBNETWORKS_ALL_IP_RANGES” | no | Which subnet ranges receive NAT. |
| subnetworks | list(object) | [] | no | Per-subnet NAT config for LIST_OF_SUBNETWORKS mode. |
| min_ports_per_vm | number | 64 | no | Minimum NAT ports per VM (64-65536). |
| max_ports_per_vm | number | 2048 | no | Maximum NAT ports per VM when dynamic allocation is on. |
| enable_dynamic_port_allocation | bool | false | no | Scale ports per VM between min and max; disables EIM. |
| enable_endpoint_independent_mapping | bool | false | no | Endpoint-independent mapping (forced off when DPA is on). |
| udp_idle_timeout_sec | number | 30 | no | UDP idle timeout (seconds). |
| icmp_idle_timeout_sec | number | 30 | no | ICMP idle timeout (seconds). |
| tcp_established_idle_timeout_sec | number | 1200 | no | TCP established idle timeout (seconds). |
| tcp_transitory_idle_timeout_sec | number | 30 | no | TCP transitory idle timeout (seconds). |
| tcp_time_wait_timeout_sec | number | 120 | no | TCP TIME_WAIT timeout (seconds). |
| log_enable | bool | true | no | Enable Cloud NAT logging. |
| log_filter | string | “ERRORS_ONLY” | no | ERRORS_ONLY, TRANSLATIONS_ONLY, or ALL. |
Outputs
| Name | Description |
|---|---|
| nat_id | Fully-qualified ID of the Cloud NAT gateway. |
| nat_name | Name of the NAT gateway; used to scope metrics and alerts. |
| router_name | Name of the router the NAT is attached to (created or supplied). |
| router_id | ID of the created router, or null when attaching to an existing one. |
| router_self_link | Self-link of the created router, or null when attaching to an existing one. |
| nat_ip_allocate_option | The NAT IP allocation mode in effect. |
| nat_ips | Reserved external IP self-links in use (empty for AUTO_ONLY). |
Enterprise scenario
A fintech enterprise runs private GKE clusters across asia-south1 and asia-southeast1 in a Shared VPC landing zone, with zero external IPs on any node for compliance. Each region instantiates this module once with MANUAL_ONLY allocation and two reserved static egress IPs, which the upstream card-network and KYC providers allow-list — so outbound calls succeed without ever exposing a node publicly. Because the CI runners and batch jobs open thousands of short-lived connections, the team enables dynamic port allocation (min_ports_per_vm = 256, max_ports_per_vm = 8192) and wires the exported nat_name into a Cloud Monitoring alert on dropped_sent_packets_count, turning silent port exhaustion into a paged incident instead of a mystery.
Best practices
- Use
MANUAL_ONLYreserved IPs for anything an external party allow-lists. AUTO_ONLY addresses can change as the gateway scales, silently breaking partner firewall rules; reserve regional static IPs and size the pool (each IP supports ~64K ports per protocol) for peak concurrent connections. - Prefer dynamic port allocation for high fan-out workloads, setting
min_ports_per_vmto a sane floor andmax_ports_per_vmto a ceiling. It avoids over-reserving ports on idle VMs while absorbing bursts — far better than a single large staticmin_ports_per_vm. Remember it cannot coexist with endpoint-independent mapping. - Scope NAT with
LIST_OF_SUBNETWORKSrather thanALL_SUBNETWORKS_ALL_IP_RANGESin shared environments, and include pod/service secondary ranges explicitly for GKE. Blanket NAT can hand internet egress to subnets that were meant to stay private. - Keep logging at
ERRORS_ONLYin steady state, switching toALLonly while troubleshooting.ERRORS_ONLYsurfaces dropped/exhausted allocations cheaply; fullTRANSLATIONSlogging on a busy gateway generates large Cloud Logging volumes and real cost. - Alert on
nat/dropped_sent_packets_countandnat/port_usageso port exhaustion is caught before users see intermittent connection failures — these are the canonical signals that you need more ports per VM or more NAT IPs. - Deploy one NAT per region per VPC, named
nat-<env>-<region>, and usedrain_nat_ipswhen rotating MANUAL_ONLY addresses so in-flight connections drain gracefully instead of being severed.