Terraform Module: GCP AlloyDB — a private, HA PostgreSQL cluster with continuous backup and a read pool

Quick take — A reusable hashicorp/google ~> 5.0 module for google_alloydb_cluster and google_alloydb_instance: PSA-only networking, a regional HA primary, continuous backup for PITR, an optional read-pool, and CMEK. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

module "alloydb" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-alloydb?ref=v1.0.0"

  project_id            = "..."  # Project ID that hosts the AlloyDB cluster.
  region                = "..."  # Region for the cluster and its backups.
  cluster_id            = "..."  # Cluster ID; also the prefix for instance IDs (2–63 char…
  network               = "..."  # VPC self-link for Private Service Access.
  initial_user_password = "..."  # Superuser password (sensitive; source from Secret Manag…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

AlloyDB is GCP’s fully managed, PostgreSQL-compatible database built for demanding transactional and hybrid (HTAP) workloads. Unlike Cloud SQL, AlloyDB splits the control surface into two distinct resources: a cluster (google_alloydb_cluster) that owns the storage layer, the network attachment, backups, encryption, and the database version, and one or more instances (google_alloydb_instance) that provide the compute that serves queries. A cluster is useless without at least one PRIMARY instance; you then add READ_POOL instances for horizontal read scaling against the same underlying storage, with node counts of 1–20.

The architectural details that make AlloyDB different from a stock Postgres also make it easy to misconfigure. AlloyDB has no public IP option at all — it is reachable only over a VPC through Private Service Access (Service Networking peering) or a Private Service Connect endpoint, so the network_config.network (or psc_config) is mandatory, not optional. A production cluster wants a regional HA primary (availability_type = "REGIONAL") for an automatic standby, continuous backup enabled so you get point-in-time recovery down to the second within the recovery window (this is separate from scheduled automated_backup_policy snapshots), deletion protection, and CMEK if your compliance baseline forbids Google-managed keys. The initial superuser password is set on the cluster, so it belongs in Secret Manager, never as a literal in HCL.

This module wraps the cluster, its primary, and an optional read pool into one opinionated, variable-driven block. It defaults to PSA-only networking with a REGIONAL primary, continuous backup with a configurable recovery window, scheduled automated backups, and deletion protection on. It optionally enables CMEK and creates a read-pool instance with a chosen node count, then emits the cluster name, primary connection IP, and read-pool IP as outputs so a GKE workload, a Cloud Run service, or a Secret Manager secret can consume them.

When to use it

You need a managed, PostgreSQL-compatible database with better price-performance than stock Postgres for transactional or mixed HTAP workloads, and you want one reviewed, hardened shape instead of bespoke cluster/instance blocks per team.
Security baselines require no public surface — AlloyDB is private-only by design, reachable over a VPC via Private Service Access or PSC, and this module makes that the enforced default.
You want regional high availability (an automatic standby in a second zone) and continuous backup with point-in-time recovery to be the default for production, alongside scheduled snapshot backups, rather than toggles someone remembers later.
You need to scale reads horizontally with a read pool (1–20 nodes) that shares the primary’s storage with low replication lag, provisioned from the same module call.
Compliance requires customer-managed encryption keys (CMEK) on both the cluster storage and its backups, wired to a Cloud KMS key.

Reach for Cloud SQL instead when you want SQL Server/MySQL or the lowest-cost small Postgres instance, Spanner when you need horizontal write scaling beyond a single primary and global consistency, or BigQuery when the workload is purely analytical rather than transactional.

Module structure

terraform-module-gcp-alloydb/
├── versions.tf      # provider + required_version pins
├── main.tf          # cluster, primary instance, optional read pool
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # cluster/instance ids, names, connection IPs

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # Use a CMEK block only when a KMS key is supplied.
  cmek = var.kms_key_name == null ? [] : [1]
}

resource "google_alloydb_cluster" "this" {
  project    = var.project_id
  cluster_id = var.cluster_id
  location   = var.region

  # AlloyDB is private-only: it must be attached to a VPC via
  # Private Service Access (Service Networking peering).
  network_config {
    network = var.network
  }

  database_version = var.database_version

  # Initial superuser. The password should come from Secret Manager /
  # a sensitive variable — never a literal committed to HCL.
  initial_user {
    user     = var.initial_user
    password = var.initial_user_password
  }

  # Continuous backup gives point-in-time recovery to the second within
  # the recovery window. This is distinct from scheduled snapshots below.
  continuous_backup_config {
    enabled              = true
    recovery_window_days = var.continuous_backup_recovery_window_days

    dynamic "encryption_config" {
      for_each = local.cmek
      content {
        kms_key_name = var.kms_key_name
      }
    }
  }

  # Scheduled snapshot backups, retained by count.
  automated_backup_policy {
    location      = var.region
    backup_window = "3600s"
    enabled       = var.automated_backup_enabled

    weekly_schedule {
      days_of_week = ["MONDAY", "TUESDAY", "WEDNESDAY", "THURSDAY", "FRIDAY", "SATURDAY", "SUNDAY"]
      start_times {
        hours   = var.backup_start_hour
        minutes = 0
        seconds = 0
        nanos   = 0
      }
    }

    quantity_based_retention {
      count = var.automated_backup_retention_count
    }

    dynamic "encryption_config" {
      for_each = local.cmek
      content {
        kms_key_name = var.kms_key_name
      }
    }
  }

  # CMEK for the cluster's primary storage.
  dynamic "encryption_config" {
    for_each = local.cmek
    content {
      kms_key_name = var.kms_key_name
    }
  }

  deletion_policy = var.deletion_protection ? "DEFAULT" : "FORCE"

  labels = var.labels

  lifecycle {
    # Protect the superuser password from being read back as a diff.
    ignore_changes = [initial_user[0].password]
  }
}

# The PRIMARY instance: the compute that serves reads and writes.
resource "google_alloydb_instance" "primary" {
  cluster       = google_alloydb_cluster.this.name
  instance_id   = "${var.cluster_id}-primary"
  instance_type = "PRIMARY"

  # REGIONAL gives an automatic standby in a second zone (HA);
  # ZONAL is single-zone and cheaper for non-prod.
  availability_type = var.availability_type

  machine_config {
    cpu_count = var.primary_cpu_count
  }

  database_flags = var.database_flags

  labels = var.labels
}

# Optional READ_POOL for horizontal read scaling against the same storage.
resource "google_alloydb_instance" "read_pool" {
  count = var.read_pool_node_count > 0 ? 1 : 0

  cluster       = google_alloydb_cluster.this.name
  instance_id   = "${var.cluster_id}-read-pool"
  instance_type = "READ_POOL"

  read_pool_config {
    node_count = var.read_pool_node_count
  }

  machine_config {
    cpu_count = var.read_pool_cpu_count
  }

  labels = var.labels

  # The primary must exist first so the cluster is fully initialised.
  depends_on = [google_alloydb_instance.primary]
}

variables.tf

variable "project_id" {
  description = "Project ID that hosts the AlloyDB cluster."
  type        = string
}

variable "region" {
  description = "Region for the cluster and its backups (e.g. asia-south1)."
  type        = string
}

variable "cluster_id" {
  description = "Cluster ID. Also used as the prefix for instance IDs."
  type        = string

  validation {
    condition     = can(regex("^[a-z][a-z0-9-]{0,61}[a-z0-9]$", var.cluster_id))
    error_message = "cluster_id must be 2-63 chars, lowercase letters, digits, or hyphens, and start with a letter."
  }
}

variable "network" {
  description = "Self-link of the VPC for Private Service Access, e.g. projects/PROJECT/global/networks/NETWORK."
  type        = string
}

variable "database_version" {
  description = "PostgreSQL major version for the cluster."
  type        = string
  default     = "POSTGRES_15"

  validation {
    condition     = contains(["POSTGRES_14", "POSTGRES_15", "POSTGRES_16"], var.database_version)
    error_message = "database_version must be one of POSTGRES_14, POSTGRES_15, or POSTGRES_16."
  }
}

variable "initial_user" {
  description = "Name of the initial superuser created on the cluster."
  type        = string
  default     = "postgres"
}

variable "initial_user_password" {
  description = "Password for the initial superuser. Source from Secret Manager / a sensitive var, not a literal."
  type        = string
  sensitive   = true
}

variable "availability_type" {
  description = "Primary availability: REGIONAL (HA, automatic standby) or ZONAL (single zone)."
  type        = string
  default     = "REGIONAL"

  validation {
    condition     = contains(["REGIONAL", "ZONAL"], var.availability_type)
    error_message = "availability_type must be REGIONAL or ZONAL."
  }
}

variable "primary_cpu_count" {
  description = "vCPU count for the primary instance. AlloyDB requires 2, 4, 8, 16, 32, 64, or 96."
  type        = number
  default     = 4

  validation {
    condition     = contains([2, 4, 8, 16, 32, 64, 96], var.primary_cpu_count)
    error_message = "primary_cpu_count must be one of 2, 4, 8, 16, 32, 64, or 96."
  }
}

variable "read_pool_node_count" {
  description = "Number of nodes in the read pool (1-20). Set to 0 to create no read pool."
  type        = number
  default     = 0

  validation {
    condition     = var.read_pool_node_count >= 0 && var.read_pool_node_count <= 20
    error_message = "read_pool_node_count must be between 0 and 20."
  }
}

variable "read_pool_cpu_count" {
  description = "vCPU count per read-pool node."
  type        = number
  default     = 4

  validation {
    condition     = contains([2, 4, 8, 16, 32, 64, 96], var.read_pool_cpu_count)
    error_message = "read_pool_cpu_count must be one of 2, 4, 8, 16, 32, 64, or 96."
  }
}

variable "continuous_backup_recovery_window_days" {
  description = "Days of continuous backup retained for point-in-time recovery (1-35)."
  type        = number
  default     = 14

  validation {
    condition     = var.continuous_backup_recovery_window_days >= 1 && var.continuous_backup_recovery_window_days <= 35
    error_message = "continuous_backup_recovery_window_days must be between 1 and 35."
  }
}

variable "automated_backup_enabled" {
  description = "Whether scheduled (snapshot) automated backups are enabled."
  type        = bool
  default     = true
}

variable "backup_start_hour" {
  description = "Hour of day (UTC, 0-23) for the scheduled backup window to start."
  type        = number
  default     = 18

  validation {
    condition     = var.backup_start_hour >= 0 && var.backup_start_hour <= 23
    error_message = "backup_start_hour must be between 0 and 23."
  }
}

variable "automated_backup_retention_count" {
  description = "Number of scheduled automated backups to retain."
  type        = number
  default     = 14
}

variable "database_flags" {
  description = "Map of PostgreSQL flags applied to the primary, e.g. { \"max_connections\" = \"200\" }."
  type        = map(string)
  default     = {}
}

variable "kms_key_name" {
  description = "Cloud KMS key for CMEK on cluster storage and backups. Null uses Google-managed keys."
  type        = string
  default     = null
}

variable "deletion_protection" {
  description = "When true, the cluster cannot be destroyed without first relaxing the deletion policy."
  type        = bool
  default     = true
}

variable "labels" {
  description = "Labels applied to the cluster and instances."
  type        = map(string)
  default     = {}
}

outputs.tf

output "cluster_id" {
  description = "Fully qualified AlloyDB cluster resource ID."
  value       = google_alloydb_cluster.this.id
}

output "cluster_name" {
  description = "Cluster name (projects/.../locations/.../clusters/...)."
  value       = google_alloydb_cluster.this.name
}

output "primary_instance_id" {
  description = "Fully qualified ID of the primary instance."
  value       = google_alloydb_instance.primary.id
}

output "primary_ip_address" {
  description = "Private IP address clients use to connect to the primary."
  value       = google_alloydb_instance.primary.ip_address
}

output "read_pool_ip_address" {
  description = "Private IP address of the read pool, or null when no read pool is created."
  value       = try(google_alloydb_instance.read_pool[0].ip_address, null)
}

output "database_version" {
  description = "PostgreSQL version actually running on the cluster."
  value       = google_alloydb_cluster.this.database_version
}

How to use it

module "alloydb" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-alloydb?ref=v1.0.0"

  project_id = "kv-payments-prod"
  region     = "asia-south1"
  cluster_id = "payments-core"

  # VPC already configured for Private Service Access (Service Networking peering).
  network = "projects/kv-payments-prod/global/networks/core-vpc"

  database_version = "POSTGRES_16"

  # Superuser password pulled from Secret Manager, never inlined.
  initial_user          = "postgres"
  initial_user_password = data.google_secret_manager_secret_version.alloydb_pw.secret_data

  # HA primary plus a 3-node read pool for reporting/read traffic.
  availability_type    = "REGIONAL"
  primary_cpu_count    = 8
  read_pool_node_count = 3
  read_pool_cpu_count  = 4

  # 30-day PITR window and CMEK for a regulated workload.
  continuous_backup_recovery_window_days = 30
  kms_key_name                           = "projects/kv-payments-prod/locations/asia-south1/keyRings/db/cryptoKeys/alloydb"

  database_flags = {
    "max_connections"          = "400"
    "alloydb.enable_pg_cron"   = "on"
  }

  deletion_protection = true

  labels = {
    team        = "payments"
    environment = "prod"
  }
}

data "google_secret_manager_secret_version" "alloydb_pw" {
  secret = "alloydb-payments-core-superuser"
}

# Downstream: publish the primary's private IP to the app's runtime config
# so a GKE workload connects over the VPC with no public exposure.
resource "google_secret_manager_secret_version" "db_host" {
  secret      = "payments-core-db-host"
  secret_data = module.alloydb.primary_ip_address
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "gcs"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...gcs state bucket/container + key per path...
  }
}

2. Module config — live/prod/alloydb/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-alloydb?ref=v1.0.0"
}

inputs = {
  project_id = "..."
  region = "..."
  cluster_id = "..."
  network = "..."
  initial_user_password = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/alloydb && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
project_id	string	—	Yes	Project ID that hosts the AlloyDB cluster.
region	string	—	Yes	Region for the cluster and its backups.
cluster_id	string	—	Yes	Cluster ID; also the prefix for instance IDs (2–63 chars, validated).
network	string	—	Yes	VPC self-link for Private Service Access.
database_version	string	`POSTGRES_15`	No	PostgreSQL major version (14/15/16, validated).
initial_user	string	`postgres`	No	Name of the initial superuser.
initial_user_password	string	—	Yes	Superuser password (sensitive; source from Secret Manager).
availability_type	string	`REGIONAL`	No	`REGIONAL` (HA) or `ZONAL` for the primary.
primary_cpu_count	number	`4`	No	vCPUs for the primary (2/4/8/16/32/64/96, validated).
read_pool_node_count	number	`0`	No	Read-pool node count (0–20); 0 creates no pool.
read_pool_cpu_count	number	`4`	No	vCPUs per read-pool node (validated).
continuous_backup_recovery_window_days	number	`14`	No	PITR window in days (1–35).
automated_backup_enabled	bool	`true`	No	Enable scheduled snapshot backups.
backup_start_hour	number	`18`	No	UTC hour (0–23) the backup window starts.
automated_backup_retention_count	number	`14`	No	Number of scheduled backups to retain.
database_flags	map(string)	`{}`	No	PostgreSQL flags applied to the primary.
kms_key_name	string	`null`	No	Cloud KMS key for CMEK; null uses Google-managed keys.
deletion_protection	bool	`true`	No	Block cluster destroy unless the deletion policy is relaxed.
labels	map(string)	`{}`	No	Labels applied to the cluster and instances.

Outputs

Name	Description
cluster_id	Fully qualified AlloyDB cluster resource ID.
cluster_name	Cluster name (`projects/.../locations/.../clusters/...`).
primary_instance_id	Fully qualified ID of the primary instance.
primary_ip_address	Private IP clients use to connect to the primary.
read_pool_ip_address	Private IP of the read pool, or null when none is created.
database_version	PostgreSQL version actually running on the cluster.

Enterprise scenario

A payments platform runs its core ledger on a single REGIONAL AlloyDB primary in asia-south1 for sub-millisecond, strongly consistent writes, while the finance and analytics teams hammer a 3-node read pool for end-of-day reconciliation reports without ever touching the write path. Continuous backup is set to a 30-day recovery window so the platform can satisfy an auditor’s “restore the ledger to 14:32 on the 3rd” request to the second, and CMEK ties both cluster storage and every backup to a Cloud KMS key the security team controls. Because the cluster is PSA-only with deletion protection on, the database has no public IP to scan and cannot be torn down by a stray terraform destroy.

Best practices

Keep the superuser password out of state and HCL. Source initial_user_password from Secret Manager and keep the ignore_changes = [initial_user[0].password] lifecycle rule so a rotated password doesn’t surface as a perpetual diff. Prefer AlloyDB IAM database authentication for application identities over the static superuser.
Run REGIONAL for prod, ZONAL for everything else. The automatic standby is what gives you the HA SLA; downgrading non-prod clusters to ZONAL and a smaller primary_cpu_count is the single biggest AlloyDB cost lever.
Treat continuous backup and scheduled backups as different tools. Continuous backup (recovery_window_days) is your PITR safety net; the automated_backup_policy snapshots are your retained restore points. Size the recovery window to your real RPO/audit requirement — every extra day costs storage.
Right-size the read pool, don’t over-provision the primary. Offload reporting and read-heavy traffic to READ_POOL nodes (scale 1–20) rather than buying a bigger primary; the pool shares the primary’s storage with low lag and is cheaper to grow and shrink.
Enforce CMEK and private networking as policy. Set kms_key_name for any regulated workload so storage and backups use a key you control, and rely on the module’s PSA-only network_config — AlloyDB has no public IP, so the VPC peering and firewall posture are your entire perimeter.
Name and label consistently. Derive instance IDs from cluster_id (the module does this) and apply team/environment labels so cost, backups, and IAM all line up per workload across projects.