IaC GCP

Terraform Module: GCP Cloud NAT — private egress without external IPs

Quick take — A reusable Terraform module for google_compute_router_nat on hashicorp/google ~> 5.0: var-driven subnetwork NAT selection, dynamic port allocation, min/max ports per VM, and logging for private VM and GKE outbound internet access. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

module "cloud_nat" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-nat?ref=v1.0.0"

  project_id = "..."  # GCP project ID that owns the NAT gateway.
  name       = "..."  # NAT gateway name (1-63 chars, lowercase RFC1035).
  region     = "..."  # Region for the NAT; must match the router's region.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Cloud NAT is GCP’s fully managed, distributed Network Address Translation service. It lets instances without external IP addresses — private GKE nodes, Compute Engine VMs, Cloud Run jobs on a connector — reach the internet for outbound traffic (package mirrors, OS updates, third-party APIs) while staying unreachable from the outside. Unlike a NAT appliance, there is no proxy VM in the data path and nothing to scale or patch: Google programs the translation into the VPC’s software-defined network, so throughput grows with your instances.

The one structural fact that trips people up is that Cloud NAT is not a standalone resource — google_compute_router_nat must attach to a Cloud Router in the same region and VPC. The router is just the control-plane anchor here; no BGP is required for NAT to work. Wrapping this in a module is worth it because the raw resource has sharp edges: nat_ips is only valid with MANUAL_ONLY allocation, the per-subnet subnetwork block only applies when source_subnetwork_ip_ranges_to_nat = "LIST_OF_SUBNETWORKS", and enable_dynamic_port_allocation is mutually exclusive with enable_endpoint_independent_mapping — set both and apply fails. This module turns those into validated variables, can either create its own router or attach to one you already manage, and exports the gateway ID and the resolved router name downstream resources need.

When to use it

Module structure

terraform-module-gcp-cloud-nat/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
# versions.tf
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}
# main.tf

locals {
  # Either create a router here, or attach the NAT to a router supplied by the
  # caller. Exactly one of the two paths resolves the router name used below.
  create_router = var.create_router
  router_name   = local.create_router ? google_compute_router.this[0].name : var.router_name

  # Dynamic port allocation and endpoint-independent mapping cannot both be on.
  # When DPA is enabled we force EIM off regardless of the input.
  endpoint_independent_mapping = (
    var.enable_dynamic_port_allocation ? false : var.enable_endpoint_independent_mapping
  )
}

# Optional control-plane router. NAT requires a router in the same region/VPC;
# no BGP config is needed for NAT-only use.
resource "google_compute_router" "this" {
  count = local.create_router ? 1 : 0

  name        = coalesce(var.router_name, "${var.name}-router")
  project     = var.project_id
  region      = var.region
  network     = var.network
  description = var.router_description
}

resource "google_compute_router_nat" "this" {
  name    = var.name
  project = var.project_id
  region  = var.region
  router  = local.router_name

  nat_ip_allocate_option = var.nat_ip_allocate_option
  # nat_ips is only valid with MANUAL_ONLY allocation.
  nat_ips = var.nat_ip_allocate_option == "MANUAL_ONLY" ? var.nat_ips : null
  # Drain (gracefully release) IPs being removed from a MANUAL_ONLY pool.
  drain_nat_ips = (
    var.nat_ip_allocate_option == "MANUAL_ONLY" ? var.drain_nat_ips : null
  )

  source_subnetwork_ip_ranges_to_nat = var.source_subnetwork_ip_ranges_to_nat

  # Per-subnet selection only applies in LIST_OF_SUBNETWORKS mode.
  dynamic "subnetwork" {
    for_each = (
      var.source_subnetwork_ip_ranges_to_nat == "LIST_OF_SUBNETWORKS"
      ? var.subnetworks
      : []
    )
    content {
      name                     = subnetwork.value.name
      source_ip_ranges_to_nat  = subnetwork.value.source_ip_ranges_to_nat
      secondary_ip_range_names = try(subnetwork.value.secondary_ip_range_names, null)
    }
  }

  # Port allocation. With DPA, Cloud NAT scales ports per VM between min and max.
  min_ports_per_vm                = var.min_ports_per_vm
  max_ports_per_vm                = var.enable_dynamic_port_allocation ? var.max_ports_per_vm : null
  enable_dynamic_port_allocation  = var.enable_dynamic_port_allocation
  enable_endpoint_independent_mapping = local.endpoint_independent_mapping

  # Connection idle timeouts (seconds).
  udp_idle_timeout_sec             = var.udp_idle_timeout_sec
  icmp_idle_timeout_sec            = var.icmp_idle_timeout_sec
  tcp_established_idle_timeout_sec  = var.tcp_established_idle_timeout_sec
  tcp_transitory_idle_timeout_sec  = var.tcp_transitory_idle_timeout_sec
  tcp_time_wait_timeout_sec        = var.tcp_time_wait_timeout_sec

  log_config {
    enable = var.log_enable
    filter = var.log_filter
  }
}
# variables.tf

variable "project_id" {
  type        = string
  description = "GCP project ID that owns the Cloud NAT gateway."
}

variable "name" {
  type        = string
  description = "Name of the Cloud NAT gateway. Must be a valid GCP resource name."

  validation {
    condition     = can(regex("^[a-z]([-a-z0-9]{0,61}[a-z0-9])?$", var.name))
    error_message = "name must be 1-63 chars, lowercase letters, digits or hyphens, starting with a letter."
  }
}

variable "region" {
  type        = string
  description = "Region in which to create the NAT gateway (e.g. asia-south1). Must match the router's region."
}

# ---- Router wiring ---------------------------------------------------------

variable "create_router" {
  type        = bool
  description = "If true, create a Cloud Router for this NAT. If false, attach to an existing router named by router_name."
  default     = true
}

variable "router_name" {
  type        = string
  description = "Name of the Cloud Router. Required when create_router = false; otherwise used as the created router's name (defaults to <name>-router)."
  default     = null

  validation {
    condition     = var.create_router || var.router_name != null
    error_message = "router_name must be set when create_router = false."
  }
}

variable "network" {
  type        = string
  description = "Self-link or name of the VPC network. Required only when create_router = true."
  default     = null

  validation {
    condition     = !var.create_router || var.network != null
    error_message = "network must be set when create_router = true."
  }
}

variable "router_description" {
  type        = string
  description = "Optional description applied to the router when create_router = true."
  default     = null
}

# ---- NAT IP allocation -----------------------------------------------------

variable "nat_ip_allocate_option" {
  type        = string
  description = "How NAT IPs are allocated: AUTO_ONLY (Google-managed, may change) or MANUAL_ONLY (reserved static IPs)."
  default     = "AUTO_ONLY"

  validation {
    condition     = contains(["AUTO_ONLY", "MANUAL_ONLY"], var.nat_ip_allocate_option)
    error_message = "nat_ip_allocate_option must be AUTO_ONLY or MANUAL_ONLY."
  }
}

variable "nat_ips" {
  type        = list(string)
  description = "Self-links of reserved regional external IPs to use when nat_ip_allocate_option = MANUAL_ONLY."
  default     = []
}

variable "drain_nat_ips" {
  type        = list(string)
  description = "Self-links of NAT IPs to drain (stop assigning new connections to) while removing them from a MANUAL_ONLY pool."
  default     = []
}

# ---- Subnetwork NAT selection ----------------------------------------------

variable "source_subnetwork_ip_ranges_to_nat" {
  type        = string
  description = "Which subnet ranges get NAT: ALL_SUBNETWORKS_ALL_IP_RANGES, ALL_SUBNETWORKS_ALL_PRIMARY_IP_RANGES, or LIST_OF_SUBNETWORKS."
  default     = "ALL_SUBNETWORKS_ALL_IP_RANGES"

  validation {
    condition = contains([
      "ALL_SUBNETWORKS_ALL_IP_RANGES",
      "ALL_SUBNETWORKS_ALL_PRIMARY_IP_RANGES",
      "LIST_OF_SUBNETWORKS",
    ], var.source_subnetwork_ip_ranges_to_nat)
    error_message = "Invalid source_subnetwork_ip_ranges_to_nat value."
  }
}

variable "subnetworks" {
  type = list(object({
    name                     = string
    source_ip_ranges_to_nat  = list(string)
    secondary_ip_range_names = optional(list(string))
  }))
  description = "Per-subnet NAT config; used only when source_subnetwork_ip_ranges_to_nat = LIST_OF_SUBNETWORKS. source_ip_ranges_to_nat items: ALL_IP_RANGES, PRIMARY_IP_RANGE, or LIST_OF_SECONDARY_IP_RANGES."
  default     = []
}

# ---- Port allocation -------------------------------------------------------

variable "min_ports_per_vm" {
  type        = number
  description = "Minimum number of ports allocated to each VM. Raise for connection-heavy workloads to avoid port exhaustion."
  default     = 64

  validation {
    condition     = var.min_ports_per_vm >= 64 && var.min_ports_per_vm <= 65536
    error_message = "min_ports_per_vm must be between 64 and 65536 (and ideally a power of two)."
  }
}

variable "max_ports_per_vm" {
  type        = number
  description = "Maximum ports per VM when enable_dynamic_port_allocation = true. Must be >= min_ports_per_vm."
  default     = 2048
}

variable "enable_dynamic_port_allocation" {
  type        = bool
  description = "Let Cloud NAT scale ports per VM between min and max on demand. Mutually exclusive with endpoint-independent mapping (module forces EIM off when true)."
  default     = false
}

variable "enable_endpoint_independent_mapping" {
  type        = bool
  description = "Enable endpoint-independent mapping. Ignored (forced false) when enable_dynamic_port_allocation = true."
  default     = false
}

# ---- Idle timeouts ---------------------------------------------------------

variable "udp_idle_timeout_sec" {
  type        = number
  description = "UDP idle timeout in seconds."
  default     = 30
}

variable "icmp_idle_timeout_sec" {
  type        = number
  description = "ICMP idle timeout in seconds."
  default     = 30
}

variable "tcp_established_idle_timeout_sec" {
  type        = number
  description = "TCP established connection idle timeout in seconds."
  default     = 1200
}

variable "tcp_transitory_idle_timeout_sec" {
  type        = number
  description = "TCP transitory connection idle timeout in seconds."
  default     = 30
}

variable "tcp_time_wait_timeout_sec" {
  type        = number
  description = "Timeout (seconds) for TCP connections in TIME_WAIT before the port is reusable."
  default     = 120
}

# ---- Logging ---------------------------------------------------------------

variable "log_enable" {
  type        = bool
  description = "Enable Cloud NAT logging to Cloud Logging."
  default     = true
}

variable "log_filter" {
  type        = string
  description = "Which NAT events to log: ERRORS_ONLY, TRANSLATIONS_ONLY, or ALL."
  default     = "ERRORS_ONLY"

  validation {
    condition     = contains(["ERRORS_ONLY", "TRANSLATIONS_ONLY", "ALL"], var.log_filter)
    error_message = "log_filter must be ERRORS_ONLY, TRANSLATIONS_ONLY, or ALL."
  }
}
# outputs.tf

output "nat_id" {
  description = "The fully-qualified ID of the Cloud NAT gateway."
  value       = google_compute_router_nat.this.id
}

output "nat_name" {
  description = "The name of the Cloud NAT gateway."
  value       = google_compute_router_nat.this.name
}

output "router_name" {
  description = "The name of the Cloud Router the NAT is attached to (created or supplied)."
  value       = local.router_name
}

output "router_id" {
  description = "The ID of the created Cloud Router, or null when attaching to an existing router."
  value       = local.create_router ? google_compute_router.this[0].id : null
}

output "router_self_link" {
  description = "The self-link of the created Cloud Router, or null when attaching to an existing router."
  value       = local.create_router ? google_compute_router.this[0].self_link : null
}

output "nat_ip_allocate_option" {
  description = "The NAT IP allocation mode in effect (AUTO_ONLY or MANUAL_ONLY)."
  value       = google_compute_router_nat.this.nat_ip_allocate_option
}

output "nat_ips" {
  description = "The set of reserved external IP self-links in use (empty for AUTO_ONLY)."
  value       = google_compute_router_nat.this.nat_ips
}

How to use it

module "cloud_nat" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-nat?ref=v1.0.0"

  project_id = "kloudvin-prod-net"
  name       = "nat-prod-asia-south1"
  region     = "asia-south1"

  # Create a dedicated router for this NAT and attach it to the Shared VPC.
  create_router = true
  network       = google_compute_network.shared_vpc.self_link

  # Stable egress IPs that the partner payment gateway allow-lists.
  nat_ip_allocate_option = "MANUAL_ONLY"
  nat_ips = [
    google_compute_address.nat[0].self_link,
    google_compute_address.nat[1].self_link,
  ]

  # NAT only the private GKE node subnet and its pod/service secondary ranges.
  source_subnetwork_ip_ranges_to_nat = "LIST_OF_SUBNETWORKS"
  subnetworks = [
    {
      name                     = google_compute_subnetwork.gke.id
      source_ip_ranges_to_nat  = ["ALL_IP_RANGES"]
      secondary_ip_range_names = []
    },
  ]

  # Connection-heavy nodes: let NAT scale ports on demand.
  enable_dynamic_port_allocation = true
  min_ports_per_vm               = 256
  max_ports_per_vm               = 8192

  log_enable = true
  log_filter = "ERRORS_ONLY"
}

# Downstream: a monitoring alert that fires on dropped (port-exhausted) NAT
# allocations, scoped to this gateway via the exported NAT name.
resource "google_monitoring_alert_policy" "nat_dropped" {
  project      = "kloudvin-prod-net"
  display_name = "Cloud NAT dropped allocations - ${module.cloud_nat.nat_name}"
  combiner     = "OR"

  conditions {
    display_name = "Dropped sent packets > 0"
    condition_threshold {
      filter = join("", [
        "resource.type = \"nat_gateway\" AND ",
        "metric.type = \"router.googleapis.com/nat/dropped_sent_packets_count\" AND ",
        "resource.labels.gateway_name = \"${module.cloud_nat.nat_name}\"",
      ])
      comparison      = "COMPARISON_GT"
      threshold_value = 0
      duration        = "300s"
      aggregations {
        alignment_period   = "60s"
        per_series_aligner = "ALIGN_RATE"
      }
    }
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "gcs"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...gcs state bucket/container + key per path...
  }
}

2. Module configlive/prod/cloud_nat/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-nat?ref=v1.0.0"
}

inputs = {
  project_id = "..."
  name = "..."
  region = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/cloud_nat && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
project_id string yes GCP project ID that owns the NAT gateway.
name string yes NAT gateway name (1-63 chars, lowercase RFC1035).
region string yes Region for the NAT; must match the router’s region.
create_router bool true no Create a Cloud Router for this NAT, or attach to an existing one.
router_name string null no Existing router name (required when create_router = false); otherwise the created router’s name (default <name>-router).
network string null no VPC self-link or name; required when create_router = true.
router_description string null no Description for the router when create_router = true.
nat_ip_allocate_option string “AUTO_ONLY” no AUTO_ONLY (Google-managed) or MANUAL_ONLY (reserved IPs).
nat_ips list(string) [] no Reserved external IP self-links when MANUAL_ONLY.
drain_nat_ips list(string) [] no NAT IP self-links to drain while removing them (MANUAL_ONLY).
source_subnetwork_ip_ranges_to_nat string “ALL_SUBNETWORKS_ALL_IP_RANGES” no Which subnet ranges receive NAT.
subnetworks list(object) [] no Per-subnet NAT config for LIST_OF_SUBNETWORKS mode.
min_ports_per_vm number 64 no Minimum NAT ports per VM (64-65536).
max_ports_per_vm number 2048 no Maximum NAT ports per VM when dynamic allocation is on.
enable_dynamic_port_allocation bool false no Scale ports per VM between min and max; disables EIM.
enable_endpoint_independent_mapping bool false no Endpoint-independent mapping (forced off when DPA is on).
udp_idle_timeout_sec number 30 no UDP idle timeout (seconds).
icmp_idle_timeout_sec number 30 no ICMP idle timeout (seconds).
tcp_established_idle_timeout_sec number 1200 no TCP established idle timeout (seconds).
tcp_transitory_idle_timeout_sec number 30 no TCP transitory idle timeout (seconds).
tcp_time_wait_timeout_sec number 120 no TCP TIME_WAIT timeout (seconds).
log_enable bool true no Enable Cloud NAT logging.
log_filter string “ERRORS_ONLY” no ERRORS_ONLY, TRANSLATIONS_ONLY, or ALL.

Outputs

Name Description
nat_id Fully-qualified ID of the Cloud NAT gateway.
nat_name Name of the NAT gateway; used to scope metrics and alerts.
router_name Name of the router the NAT is attached to (created or supplied).
router_id ID of the created router, or null when attaching to an existing one.
router_self_link Self-link of the created router, or null when attaching to an existing one.
nat_ip_allocate_option The NAT IP allocation mode in effect.
nat_ips Reserved external IP self-links in use (empty for AUTO_ONLY).

Enterprise scenario

A fintech enterprise runs private GKE clusters across asia-south1 and asia-southeast1 in a Shared VPC landing zone, with zero external IPs on any node for compliance. Each region instantiates this module once with MANUAL_ONLY allocation and two reserved static egress IPs, which the upstream card-network and KYC providers allow-list — so outbound calls succeed without ever exposing a node publicly. Because the CI runners and batch jobs open thousands of short-lived connections, the team enables dynamic port allocation (min_ports_per_vm = 256, max_ports_per_vm = 8192) and wires the exported nat_name into a Cloud Monitoring alert on dropped_sent_packets_count, turning silent port exhaustion into a paged incident instead of a mystery.

Best practices

TerraformGCPCloud NATModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading