IaC GCP

Terraform Module: GCP AlloyDB — a private, HA PostgreSQL cluster with continuous backup and a read pool

Quick take — A reusable hashicorp/google ~> 5.0 module for google_alloydb_cluster and google_alloydb_instance: PSA-only networking, a regional HA primary, continuous backup for PITR, an optional read-pool, and CMEK. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

module "alloydb" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-alloydb?ref=v1.0.0"

  project_id            = "..."  # Project ID that hosts the AlloyDB cluster.
  region                = "..."  # Region for the cluster and its backups.
  cluster_id            = "..."  # Cluster ID; also the prefix for instance IDs (2–63 char…
  network               = "..."  # VPC self-link for Private Service Access.
  initial_user_password = "..."  # Superuser password (sensitive; source from Secret Manag…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

AlloyDB is GCP’s fully managed, PostgreSQL-compatible database built for demanding transactional and hybrid (HTAP) workloads. Unlike Cloud SQL, AlloyDB splits the control surface into two distinct resources: a cluster (google_alloydb_cluster) that owns the storage layer, the network attachment, backups, encryption, and the database version, and one or more instances (google_alloydb_instance) that provide the compute that serves queries. A cluster is useless without at least one PRIMARY instance; you then add READ_POOL instances for horizontal read scaling against the same underlying storage, with node counts of 1–20.

The architectural details that make AlloyDB different from a stock Postgres also make it easy to misconfigure. AlloyDB has no public IP option at all — it is reachable only over a VPC through Private Service Access (Service Networking peering) or a Private Service Connect endpoint, so the network_config.network (or psc_config) is mandatory, not optional. A production cluster wants a regional HA primary (availability_type = "REGIONAL") for an automatic standby, continuous backup enabled so you get point-in-time recovery down to the second within the recovery window (this is separate from scheduled automated_backup_policy snapshots), deletion protection, and CMEK if your compliance baseline forbids Google-managed keys. The initial superuser password is set on the cluster, so it belongs in Secret Manager, never as a literal in HCL.

This module wraps the cluster, its primary, and an optional read pool into one opinionated, variable-driven block. It defaults to PSA-only networking with a REGIONAL primary, continuous backup with a configurable recovery window, scheduled automated backups, and deletion protection on. It optionally enables CMEK and creates a read-pool instance with a chosen node count, then emits the cluster name, primary connection IP, and read-pool IP as outputs so a GKE workload, a Cloud Run service, or a Secret Manager secret can consume them.

When to use it

Reach for Cloud SQL instead when you want SQL Server/MySQL or the lowest-cost small Postgres instance, Spanner when you need horizontal write scaling beyond a single primary and global consistency, or BigQuery when the workload is purely analytical rather than transactional.

Module structure

terraform-module-gcp-alloydb/
├── versions.tf      # provider + required_version pins
├── main.tf          # cluster, primary instance, optional read pool
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # cluster/instance ids, names, connection IPs

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # Use a CMEK block only when a KMS key is supplied.
  cmek = var.kms_key_name == null ? [] : [1]
}

resource "google_alloydb_cluster" "this" {
  project    = var.project_id
  cluster_id = var.cluster_id
  location   = var.region

  # AlloyDB is private-only: it must be attached to a VPC via
  # Private Service Access (Service Networking peering).
  network_config {
    network = var.network
  }

  database_version = var.database_version

  # Initial superuser. The password should come from Secret Manager /
  # a sensitive variable — never a literal committed to HCL.
  initial_user {
    user     = var.initial_user
    password = var.initial_user_password
  }

  # Continuous backup gives point-in-time recovery to the second within
  # the recovery window. This is distinct from scheduled snapshots below.
  continuous_backup_config {
    enabled              = true
    recovery_window_days = var.continuous_backup_recovery_window_days

    dynamic "encryption_config" {
      for_each = local.cmek
      content {
        kms_key_name = var.kms_key_name
      }
    }
  }

  # Scheduled snapshot backups, retained by count.
  automated_backup_policy {
    location      = var.region
    backup_window = "3600s"
    enabled       = var.automated_backup_enabled

    weekly_schedule {
      days_of_week = ["MONDAY", "TUESDAY", "WEDNESDAY", "THURSDAY", "FRIDAY", "SATURDAY", "SUNDAY"]
      start_times {
        hours   = var.backup_start_hour
        minutes = 0
        seconds = 0
        nanos   = 0
      }
    }

    quantity_based_retention {
      count = var.automated_backup_retention_count
    }

    dynamic "encryption_config" {
      for_each = local.cmek
      content {
        kms_key_name = var.kms_key_name
      }
    }
  }

  # CMEK for the cluster's primary storage.
  dynamic "encryption_config" {
    for_each = local.cmek
    content {
      kms_key_name = var.kms_key_name
    }
  }

  deletion_policy = var.deletion_protection ? "DEFAULT" : "FORCE"

  labels = var.labels

  lifecycle {
    # Protect the superuser password from being read back as a diff.
    ignore_changes = [initial_user[0].password]
  }
}

# The PRIMARY instance: the compute that serves reads and writes.
resource "google_alloydb_instance" "primary" {
  cluster       = google_alloydb_cluster.this.name
  instance_id   = "${var.cluster_id}-primary"
  instance_type = "PRIMARY"

  # REGIONAL gives an automatic standby in a second zone (HA);
  # ZONAL is single-zone and cheaper for non-prod.
  availability_type = var.availability_type

  machine_config {
    cpu_count = var.primary_cpu_count
  }

  database_flags = var.database_flags

  labels = var.labels
}

# Optional READ_POOL for horizontal read scaling against the same storage.
resource "google_alloydb_instance" "read_pool" {
  count = var.read_pool_node_count > 0 ? 1 : 0

  cluster       = google_alloydb_cluster.this.name
  instance_id   = "${var.cluster_id}-read-pool"
  instance_type = "READ_POOL"

  read_pool_config {
    node_count = var.read_pool_node_count
  }

  machine_config {
    cpu_count = var.read_pool_cpu_count
  }

  labels = var.labels

  # The primary must exist first so the cluster is fully initialised.
  depends_on = [google_alloydb_instance.primary]
}

variables.tf

variable "project_id" {
  description = "Project ID that hosts the AlloyDB cluster."
  type        = string
}

variable "region" {
  description = "Region for the cluster and its backups (e.g. asia-south1)."
  type        = string
}

variable "cluster_id" {
  description = "Cluster ID. Also used as the prefix for instance IDs."
  type        = string

  validation {
    condition     = can(regex("^[a-z][a-z0-9-]{0,61}[a-z0-9]$", var.cluster_id))
    error_message = "cluster_id must be 2-63 chars, lowercase letters, digits, or hyphens, and start with a letter."
  }
}

variable "network" {
  description = "Self-link of the VPC for Private Service Access, e.g. projects/PROJECT/global/networks/NETWORK."
  type        = string
}

variable "database_version" {
  description = "PostgreSQL major version for the cluster."
  type        = string
  default     = "POSTGRES_15"

  validation {
    condition     = contains(["POSTGRES_14", "POSTGRES_15", "POSTGRES_16"], var.database_version)
    error_message = "database_version must be one of POSTGRES_14, POSTGRES_15, or POSTGRES_16."
  }
}

variable "initial_user" {
  description = "Name of the initial superuser created on the cluster."
  type        = string
  default     = "postgres"
}

variable "initial_user_password" {
  description = "Password for the initial superuser. Source from Secret Manager / a sensitive var, not a literal."
  type        = string
  sensitive   = true
}

variable "availability_type" {
  description = "Primary availability: REGIONAL (HA, automatic standby) or ZONAL (single zone)."
  type        = string
  default     = "REGIONAL"

  validation {
    condition     = contains(["REGIONAL", "ZONAL"], var.availability_type)
    error_message = "availability_type must be REGIONAL or ZONAL."
  }
}

variable "primary_cpu_count" {
  description = "vCPU count for the primary instance. AlloyDB requires 2, 4, 8, 16, 32, 64, or 96."
  type        = number
  default     = 4

  validation {
    condition     = contains([2, 4, 8, 16, 32, 64, 96], var.primary_cpu_count)
    error_message = "primary_cpu_count must be one of 2, 4, 8, 16, 32, 64, or 96."
  }
}

variable "read_pool_node_count" {
  description = "Number of nodes in the read pool (1-20). Set to 0 to create no read pool."
  type        = number
  default     = 0

  validation {
    condition     = var.read_pool_node_count >= 0 && var.read_pool_node_count <= 20
    error_message = "read_pool_node_count must be between 0 and 20."
  }
}

variable "read_pool_cpu_count" {
  description = "vCPU count per read-pool node."
  type        = number
  default     = 4

  validation {
    condition     = contains([2, 4, 8, 16, 32, 64, 96], var.read_pool_cpu_count)
    error_message = "read_pool_cpu_count must be one of 2, 4, 8, 16, 32, 64, or 96."
  }
}

variable "continuous_backup_recovery_window_days" {
  description = "Days of continuous backup retained for point-in-time recovery (1-35)."
  type        = number
  default     = 14

  validation {
    condition     = var.continuous_backup_recovery_window_days >= 1 && var.continuous_backup_recovery_window_days <= 35
    error_message = "continuous_backup_recovery_window_days must be between 1 and 35."
  }
}

variable "automated_backup_enabled" {
  description = "Whether scheduled (snapshot) automated backups are enabled."
  type        = bool
  default     = true
}

variable "backup_start_hour" {
  description = "Hour of day (UTC, 0-23) for the scheduled backup window to start."
  type        = number
  default     = 18

  validation {
    condition     = var.backup_start_hour >= 0 && var.backup_start_hour <= 23
    error_message = "backup_start_hour must be between 0 and 23."
  }
}

variable "automated_backup_retention_count" {
  description = "Number of scheduled automated backups to retain."
  type        = number
  default     = 14
}

variable "database_flags" {
  description = "Map of PostgreSQL flags applied to the primary, e.g. { \"max_connections\" = \"200\" }."
  type        = map(string)
  default     = {}
}

variable "kms_key_name" {
  description = "Cloud KMS key for CMEK on cluster storage and backups. Null uses Google-managed keys."
  type        = string
  default     = null
}

variable "deletion_protection" {
  description = "When true, the cluster cannot be destroyed without first relaxing the deletion policy."
  type        = bool
  default     = true
}

variable "labels" {
  description = "Labels applied to the cluster and instances."
  type        = map(string)
  default     = {}
}

outputs.tf

output "cluster_id" {
  description = "Fully qualified AlloyDB cluster resource ID."
  value       = google_alloydb_cluster.this.id
}

output "cluster_name" {
  description = "Cluster name (projects/.../locations/.../clusters/...)."
  value       = google_alloydb_cluster.this.name
}

output "primary_instance_id" {
  description = "Fully qualified ID of the primary instance."
  value       = google_alloydb_instance.primary.id
}

output "primary_ip_address" {
  description = "Private IP address clients use to connect to the primary."
  value       = google_alloydb_instance.primary.ip_address
}

output "read_pool_ip_address" {
  description = "Private IP address of the read pool, or null when no read pool is created."
  value       = try(google_alloydb_instance.read_pool[0].ip_address, null)
}

output "database_version" {
  description = "PostgreSQL version actually running on the cluster."
  value       = google_alloydb_cluster.this.database_version
}

How to use it

module "alloydb" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-alloydb?ref=v1.0.0"

  project_id = "kv-payments-prod"
  region     = "asia-south1"
  cluster_id = "payments-core"

  # VPC already configured for Private Service Access (Service Networking peering).
  network = "projects/kv-payments-prod/global/networks/core-vpc"

  database_version = "POSTGRES_16"

  # Superuser password pulled from Secret Manager, never inlined.
  initial_user          = "postgres"
  initial_user_password = data.google_secret_manager_secret_version.alloydb_pw.secret_data

  # HA primary plus a 3-node read pool for reporting/read traffic.
  availability_type    = "REGIONAL"
  primary_cpu_count    = 8
  read_pool_node_count = 3
  read_pool_cpu_count  = 4

  # 30-day PITR window and CMEK for a regulated workload.
  continuous_backup_recovery_window_days = 30
  kms_key_name                           = "projects/kv-payments-prod/locations/asia-south1/keyRings/db/cryptoKeys/alloydb"

  database_flags = {
    "max_connections"          = "400"
    "alloydb.enable_pg_cron"   = "on"
  }

  deletion_protection = true

  labels = {
    team        = "payments"
    environment = "prod"
  }
}

data "google_secret_manager_secret_version" "alloydb_pw" {
  secret = "alloydb-payments-core-superuser"
}

# Downstream: publish the primary's private IP to the app's runtime config
# so a GKE workload connects over the VPC with no public exposure.
resource "google_secret_manager_secret_version" "db_host" {
  secret      = "payments-core-db-host"
  secret_data = module.alloydb.primary_ip_address
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "gcs"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...gcs state bucket/container + key per path...
  }
}

2. Module configlive/prod/alloydb/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-alloydb?ref=v1.0.0"
}

inputs = {
  project_id = "..."
  region = "..."
  cluster_id = "..."
  network = "..."
  initial_user_password = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/alloydb && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
project_id string Yes Project ID that hosts the AlloyDB cluster.
region string Yes Region for the cluster and its backups.
cluster_id string Yes Cluster ID; also the prefix for instance IDs (2–63 chars, validated).
network string Yes VPC self-link for Private Service Access.
database_version string POSTGRES_15 No PostgreSQL major version (14/15/16, validated).
initial_user string postgres No Name of the initial superuser.
initial_user_password string Yes Superuser password (sensitive; source from Secret Manager).
availability_type string REGIONAL No REGIONAL (HA) or ZONAL for the primary.
primary_cpu_count number 4 No vCPUs for the primary (2/4/8/16/32/64/96, validated).
read_pool_node_count number 0 No Read-pool node count (0–20); 0 creates no pool.
read_pool_cpu_count number 4 No vCPUs per read-pool node (validated).
continuous_backup_recovery_window_days number 14 No PITR window in days (1–35).
automated_backup_enabled bool true No Enable scheduled snapshot backups.
backup_start_hour number 18 No UTC hour (0–23) the backup window starts.
automated_backup_retention_count number 14 No Number of scheduled backups to retain.
database_flags map(string) {} No PostgreSQL flags applied to the primary.
kms_key_name string null No Cloud KMS key for CMEK; null uses Google-managed keys.
deletion_protection bool true No Block cluster destroy unless the deletion policy is relaxed.
labels map(string) {} No Labels applied to the cluster and instances.

Outputs

Name Description
cluster_id Fully qualified AlloyDB cluster resource ID.
cluster_name Cluster name (projects/.../locations/.../clusters/...).
primary_instance_id Fully qualified ID of the primary instance.
primary_ip_address Private IP clients use to connect to the primary.
read_pool_ip_address Private IP of the read pool, or null when none is created.
database_version PostgreSQL version actually running on the cluster.

Enterprise scenario

A payments platform runs its core ledger on a single REGIONAL AlloyDB primary in asia-south1 for sub-millisecond, strongly consistent writes, while the finance and analytics teams hammer a 3-node read pool for end-of-day reconciliation reports without ever touching the write path. Continuous backup is set to a 30-day recovery window so the platform can satisfy an auditor’s “restore the ledger to 14:32 on the 3rd” request to the second, and CMEK ties both cluster storage and every backup to a Cloud KMS key the security team controls. Because the cluster is PSA-only with deletion protection on, the database has no public IP to scan and cannot be torn down by a stray terraform destroy.

Best practices

TerraformGCPAlloyDBModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading