IaC GCP

Terraform Module: GCP GKE Node Pool — Decoupled, Auto-Repairing Worker Capacity for Your Clusters

Quick take — A production-ready Terraform module for google_container_node_pool on hashicorp/google ~> 5.0: autoscaling, surge upgrades, Workload Identity, Spot nodes, taints, and least-privilege node service accounts. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

module "gke_node_pool" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-gke-node-pool?ref=v1.0.0"

  name         = "..."  # Node pool name; lowercase, DNS-compatible, <= 40 chars.
  project_id   = "..."  # GCP project ID owning the cluster.
  location     = "..."  # Region (regional pool) or zone (zonal pool).
  cluster_name = "..."  # Existing GKE cluster to attach the pool to.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

A GKE node pool is a group of nodes within a Google Kubernetes Engine cluster that all share the same configuration: machine type, disk, OAuth scopes, labels, taints, and lifecycle behaviour. The cluster control plane is managed by Google, but the worker capacity that actually runs your Pods lives in node pools — and almost every real workload needs more than one. You typically run a small general-purpose pool for system add-ons, a larger pool for stateless apps, and perhaps a Spot or GPU pool for batch and ML.

The reason to wrap google_container_node_pool in a reusable module — rather than defining it inline next to the cluster — is decoupling and repetition. Node pools are the part of GKE you change most often: you resize them, swap machine families during cost reviews, add Spot capacity, roll new node images, and occasionally recreate them entirely. Keeping the pool in its own module lets you add, scale, or destroy worker capacity without ever touching the google_container_cluster resource (and risking control-plane churn). It also forces consistency: every pool created through this module gets autoscaling bounds, surge-upgrade settings, auto-repair/auto-upgrade, Workload Identity metadata, a least-privilege service account, and shielded-node defaults — instead of each team hand-rolling a slightly different node_config.

When to use it

Module structure

terraform-module-gcp-gke-node-pool/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # google_container_node_pool resource
├── variables.tf     # all input variables + validations
└── outputs.tf       # id, name, instance group URLs, version

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # Workload Identity is only valid when we have a workload pool to bind to.
  workload_metadata_mode = var.enable_workload_identity ? "GKE_METADATA" : "GCE_METADATA"

  # GKE requires the monitoring + logging + storage scopes at minimum when a
  # custom service account is supplied; we merge the caller's extras on top.
  oauth_scopes = distinct(concat(
    ["https://www.googleapis.com/auth/cloud-platform"],
    var.additional_oauth_scopes,
  ))
}

resource "google_container_node_pool" "this" {
  provider = google

  name     = var.name
  project  = var.project_id
  location = var.location
  cluster  = var.cluster_name

  # Either a fixed size OR autoscaling — never both. node_count is omitted
  # when autoscaling is on so GKE can own the replica count.
  node_count = var.enable_autoscaling ? null : var.node_count

  # Spread nodes across the given zones for a regional pool; null lets GKE
  # use every zone in the region.
  node_locations = length(var.node_locations) > 0 ? var.node_locations : null

  max_pods_per_node = var.max_pods_per_node

  dynamic "autoscaling" {
    for_each = var.enable_autoscaling ? [1] : []
    content {
      min_node_count  = var.min_node_count
      max_node_count  = var.max_node_count
      location_policy = var.autoscaling_location_policy
    }
  }

  management {
    auto_repair  = var.auto_repair
    auto_upgrade = var.auto_upgrade
  }

  # Surge upgrades keep capacity available while nodes roll. max_surge adds
  # temporary nodes; max_unavailable bounds disruption.
  upgrade_settings {
    strategy        = var.upgrade_strategy
    max_surge       = var.upgrade_strategy == "SURGE" ? var.max_surge : null
    max_unavailable = var.upgrade_strategy == "SURGE" ? var.max_unavailable : null
  }

  node_config {
    machine_type    = var.machine_type
    image_type      = var.image_type
    disk_size_gb    = var.disk_size_gb
    disk_type       = var.disk_type
    spot            = var.spot
    service_account = var.node_service_account
    oauth_scopes    = local.oauth_scopes

    labels = var.node_labels
    tags   = var.network_tags

    # Shielded nodes: verified boot + integrity monitoring against rootkits.
    shielded_instance_config {
      enable_secure_boot          = var.enable_secure_boot
      enable_integrity_monitoring = var.enable_integrity_monitoring
    }

    # Bind the GCE metadata server to Workload Identity so Pods get GCP IAM
    # via KSA->GSA federation instead of node-level credentials.
    workload_metadata_config {
      mode = local.workload_metadata_mode
    }

    dynamic "taint" {
      for_each = var.node_taints
      content {
        key    = taint.value.key
        value  = taint.value.value
        effect = taint.value.effect
      }
    }

    metadata = merge(
      { "disable-legacy-endpoints" = "true" },
      var.node_metadata,
    )

    resource_labels = var.resource_labels
  }

  lifecycle {
    # node_count drifts constantly when autoscaling is on; ignore it so plans
    # stay clean. The autoscaler — not Terraform — owns the live count.
    ignore_changes = [node_config[0].labels, initial_node_count]
  }

  timeouts {
    create = "30m"
    update = "30m"
    delete = "30m"
  }
}

variables.tf

variable "name" {
  description = "Name of the node pool. Should be short and DNS-compatible."
  type        = string

  validation {
    condition     = can(regex("^[a-z][a-z0-9-]{0,38}[a-z0-9]$", var.name))
    error_message = "name must be lowercase alphanumeric/hyphens, start with a letter, and be <= 40 chars."
  }
}

variable "project_id" {
  description = "GCP project ID that owns the cluster."
  type        = string
}

variable "location" {
  description = "Cluster location: a region (e.g. asia-south1) for regional pools or a zone for zonal."
  type        = string
}

variable "cluster_name" {
  description = "Name of the existing GKE cluster to attach this node pool to."
  type        = string
}

variable "node_locations" {
  description = "Optional explicit list of zones to spread nodes across. Empty uses all zones in the region."
  type        = list(string)
  default     = []
}

variable "machine_type" {
  description = "Compute Engine machine type for nodes (e.g. e2-standard-4, n2-standard-8)."
  type        = string
  default     = "e2-standard-4"
}

variable "image_type" {
  description = "Node image type. COS_CONTAINERD is the supported default for GKE."
  type        = string
  default     = "COS_CONTAINERD"

  validation {
    condition     = contains(["COS_CONTAINERD", "UBUNTU_CONTAINERD", "COS"], var.image_type)
    error_message = "image_type must be one of COS_CONTAINERD, UBUNTU_CONTAINERD, or COS."
  }
}

variable "disk_size_gb" {
  description = "Boot disk size per node in GB."
  type        = number
  default     = 100

  validation {
    condition     = var.disk_size_gb >= 30
    error_message = "disk_size_gb must be at least 30 GB for GKE nodes."
  }
}

variable "disk_type" {
  description = "Boot disk type (pd-standard, pd-balanced, pd-ssd)."
  type        = string
  default     = "pd-balanced"

  validation {
    condition     = contains(["pd-standard", "pd-balanced", "pd-ssd"], var.disk_type)
    error_message = "disk_type must be pd-standard, pd-balanced, or pd-ssd."
  }
}

variable "spot" {
  description = "Run nodes as Spot VMs (cheaper, preemptible). Use only for fault-tolerant workloads."
  type        = bool
  default     = false
}

variable "node_count" {
  description = "Fixed node count per zone when autoscaling is disabled."
  type        = number
  default     = 1
}

variable "enable_autoscaling" {
  description = "Enable the cluster autoscaler for this pool. When true, node_count is ignored."
  type        = bool
  default     = true
}

variable "min_node_count" {
  description = "Minimum nodes per zone when autoscaling is enabled."
  type        = number
  default     = 1
}

variable "max_node_count" {
  description = "Maximum nodes per zone when autoscaling is enabled."
  type        = number
  default     = 5

  validation {
    condition     = var.max_node_count >= var.min_node_count
    error_message = "max_node_count must be greater than or equal to min_node_count."
  }
}

variable "autoscaling_location_policy" {
  description = "Autoscaler placement policy: BALANCED (spread) or ANY (best-effort, Spot-friendly)."
  type        = string
  default     = "BALANCED"

  validation {
    condition     = contains(["BALANCED", "ANY"], var.autoscaling_location_policy)
    error_message = "autoscaling_location_policy must be BALANCED or ANY."
  }
}

variable "max_pods_per_node" {
  description = "Maximum Pods per node. Lower values conserve the cluster's IP range."
  type        = number
  default     = 110
}

variable "auto_repair" {
  description = "Automatically repair unhealthy nodes."
  type        = bool
  default     = true
}

variable "auto_upgrade" {
  description = "Automatically upgrade nodes to match the control plane version."
  type        = bool
  default     = true
}

variable "upgrade_strategy" {
  description = "Node upgrade strategy: SURGE (rolling with extra capacity) or BLUE_GREEN."
  type        = string
  default     = "SURGE"

  validation {
    condition     = contains(["SURGE", "BLUE_GREEN"], var.upgrade_strategy)
    error_message = "upgrade_strategy must be SURGE or BLUE_GREEN."
  }
}

variable "max_surge" {
  description = "Extra nodes added during a SURGE upgrade."
  type        = number
  default     = 1
}

variable "max_unavailable" {
  description = "Nodes that may be unavailable during a SURGE upgrade."
  type        = number
  default     = 0
}

variable "node_service_account" {
  description = "Email of the least-privilege IAM service account the nodes run as. Required for prod."
  type        = string
  default     = null
}

variable "additional_oauth_scopes" {
  description = "Extra OAuth scopes beyond cloud-platform. Usually empty when using Workload Identity."
  type        = list(string)
  default     = []
}

variable "enable_workload_identity" {
  description = "Bind nodes to the GKE metadata server for Workload Identity (KSA->GSA federation)."
  type        = bool
  default     = true
}

variable "enable_secure_boot" {
  description = "Enable Shielded VM Secure Boot on nodes."
  type        = bool
  default     = true
}

variable "enable_integrity_monitoring" {
  description = "Enable Shielded VM integrity monitoring on nodes."
  type        = bool
  default     = true
}

variable "node_labels" {
  description = "Kubernetes labels applied to nodes (for nodeSelector/affinity)."
  type        = map(string)
  default     = {}
}

variable "node_taints" {
  description = "Kubernetes taints to keep general workloads off specialised pools."
  type = list(object({
    key    = string
    value  = string
    effect = string
  }))
  default = []

  validation {
    condition = alltrue([
      for t in var.node_taints : contains(["NO_SCHEDULE", "PREFER_NO_SCHEDULE", "NO_EXECUTE"], t.effect)
    ])
    error_message = "Each taint effect must be NO_SCHEDULE, PREFER_NO_SCHEDULE, or NO_EXECUTE."
  }
}

variable "network_tags" {
  description = "GCE network tags on node VMs (for firewall rule targeting)."
  type        = list(string)
  default     = []
}

variable "node_metadata" {
  description = "Additional GCE instance metadata key/value pairs for nodes."
  type        = map(string)
  default     = {}
}

variable "resource_labels" {
  description = "GCE resource labels applied to node VMs for billing/cost allocation."
  type        = map(string)
  default     = {}
}

outputs.tf

output "id" {
  description = "Fully qualified node pool ID (projects/.../nodePools/...)."
  value       = google_container_node_pool.this.id
}

output "name" {
  description = "Name of the node pool."
  value       = google_container_node_pool.this.name
}

output "version" {
  description = "Kubernetes version currently running on the node pool."
  value       = google_container_node_pool.this.version
}

output "instance_group_urls" {
  description = "Managed instance group URLs backing the node pool (one per zone)."
  value       = google_container_node_pool.this.instance_group_urls
}

output "managed_instance_group_urls" {
  description = "Managed instance group manager URLs for the node pool."
  value       = google_container_node_pool.this.managed_instance_group_urls
}

output "service_account" {
  description = "Service account email the nodes run as (resolved or default)."
  value       = try(google_container_node_pool.this.node_config[0].service_account, null)
}

How to use it

A typical cluster consumes this module twice: an on-demand apps pool for steady traffic and a tainted Spot batch pool for fault-tolerant jobs.

module "gke_apps_pool" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-gke-node-pool?ref=v1.0.0"

  name         = "apps"
  project_id   = "kloudvin-prod"
  location     = "asia-south1"
  cluster_name = "kloudvin-prod-gke"

  machine_type        = "n2-standard-8"
  enable_autoscaling  = true
  min_node_count      = 2
  max_node_count      = 12
  disk_type           = "pd-ssd"

  node_service_account     = google_service_account.gke_nodes.email
  enable_workload_identity = true

  node_labels     = { team = "platform", tier = "general" }
  resource_labels = { cost-center = "platform", env = "prod" }
}

module "gke_batch_pool" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-gke-node-pool?ref=v1.0.0"

  name         = "batch-spot"
  project_id   = "kloudvin-prod"
  location     = "asia-south1"
  cluster_name = "kloudvin-prod-gke"

  machine_type                = "c2-standard-16"
  spot                        = true
  enable_autoscaling          = true
  min_node_count              = 0
  max_node_count              = 20
  autoscaling_location_policy = "ANY"

  node_service_account     = google_service_account.gke_nodes.email
  enable_workload_identity = true

  node_taints = [{
    key    = "workload-type"
    value  = "batch"
    effect = "NO_SCHEDULE"
  }]

  resource_labels = { cost-center = "data-eng", env = "prod" }
}

# Least-privilege node identity shared by both pools.
resource "google_service_account" "gke_nodes" {
  account_id   = "gke-prod-nodes"
  display_name = "GKE prod node pool service account"
  project      = "kloudvin-prod"
}

# Downstream reference: target a firewall rule at this pool's MIGs, and surface
# the node SA so IAM bindings elsewhere can grant it access.
output "apps_pool_migs" {
  value = module.gke_apps_pool.instance_group_urls
}

output "node_runtime_sa" {
  value = module.gke_apps_pool.service_account
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "gcs"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...gcs state bucket/container + key per path...
  }
}

2. Module configlive/prod/gke_node_pool/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-gke-node-pool?ref=v1.0.0"
}

inputs = {
  name = "..."
  project_id = "..."
  location = "..."
  cluster_name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/gke_node_pool && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Node pool name; lowercase, DNS-compatible, <= 40 chars.
project_id string Yes GCP project ID owning the cluster.
location string Yes Region (regional pool) or zone (zonal pool).
cluster_name string Yes Existing GKE cluster to attach the pool to.
node_locations list(string) [] No Explicit zones to spread nodes across; empty = all region zones.
machine_type string e2-standard-4 No Compute Engine machine type for nodes.
image_type string COS_CONTAINERD No Node image type (COS_CONTAINERD/UBUNTU_CONTAINERD/COS).
disk_size_gb number 100 No Boot disk size per node in GB (>= 30).
disk_type string pd-balanced No Boot disk type (pd-standard/pd-balanced/pd-ssd).
spot bool false No Run nodes as Spot VMs for fault-tolerant workloads.
node_count number 1 No Fixed node count per zone when autoscaling is off.
enable_autoscaling bool true No Enable the cluster autoscaler for this pool.
min_node_count number 1 No Minimum nodes per zone when autoscaling.
max_node_count number 5 No Maximum nodes per zone when autoscaling (>= min).
autoscaling_location_policy string BALANCED No Autoscaler placement: BALANCED or ANY.
max_pods_per_node number 110 No Maximum Pods per node; lower conserves IP range.
auto_repair bool true No Automatically repair unhealthy nodes.
auto_upgrade bool true No Auto-upgrade nodes to match the control plane.
upgrade_strategy string SURGE No Upgrade strategy: SURGE or BLUE_GREEN.
max_surge number 1 No Extra nodes added during a SURGE upgrade.
max_unavailable number 0 No Nodes allowed unavailable during a SURGE upgrade.
node_service_account string null No Least-privilege IAM SA email for nodes (set in prod).
additional_oauth_scopes list(string) [] No Extra OAuth scopes beyond cloud-platform.
enable_workload_identity bool true No Bind nodes to GKE metadata server for Workload Identity.
enable_secure_boot bool true No Enable Shielded VM Secure Boot.
enable_integrity_monitoring bool true No Enable Shielded VM integrity monitoring.
node_labels map(string) {} No Kubernetes labels on nodes for nodeSelector/affinity.
node_taints list(object) [] No Taints to keep general workloads off the pool.
network_tags list(string) [] No GCE network tags for firewall targeting.
node_metadata map(string) {} No Additional GCE instance metadata for nodes.
resource_labels map(string) {} No GCE resource labels for billing/cost allocation.

Outputs

Name Description
id Fully qualified node pool ID (projects/…/nodePools/…).
name Name of the node pool.
version Kubernetes version currently running on the pool.
instance_group_urls Managed instance group URLs backing the pool (one per zone).
managed_instance_group_urls Managed instance group manager URLs for the pool.
service_account Service account email the nodes run as.

Enterprise scenario

A fintech running a regional GKE cluster in asia-south1 uses this module to split capacity by risk and cost. Real-time payment APIs land on an on-demand n2-standard-8 apps pool with min_node_count = 3 and BLUE_GREEN upgrades so a bad node image can be rolled back instantly without surge churn during settlement windows. Overnight reconciliation and fraud-model retraining run on a tainted c2-standard-16 Spot pool that scales from 0 to 20 with location_policy = ANY, cutting batch compute spend by roughly 70 percent while the NO_SCHEDULE taint guarantees latency-sensitive Pods never land there. Because the pools are separate module instances, the platform team resizes or recreates the Spot pool during a machine-family migration without ever re-planning the payment-critical pool or the cluster itself.

Best practices

TerraformGCPGKE Node PoolModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading