Quick take — A reusable hashicorp/google Terraform module for GCP Bigtable: autoscaling SSD/HDD clusters, multi-cluster replication, deletion protection, and tables with column families and GC policies. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "google" {
project = "my-project"
region = "us-central1"
}
module "bigtable" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-bigtable?ref=v1.0.0"
project_id = "..." # GCP project ID hosting the Bigtable instance.
app = "..." # Workload short name used in the instance name (validate…
environment = "..." # One of `dev`, `staging`, `prod`, `sandbox`.
location_short = "..." # Cosmetic region/zone token for naming (e.g. `euw1`).
clusters = ["...", "..."] # 1–8 clusters; each with zone, `storage_type`, and fixed…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Cloud Bigtable is GCP’s fully managed, petabyte-scale, wide-column NoSQL database — the same engine that backs Search, Maps, and Gmail. It is built for workloads that need single-digit-millisecond reads/writes at very high throughput: time-series telemetry, IoT ingestion, ad-tech and fraud-scoring feature stores, financial tick data, and any HBase-compatible application. Unlike Firestore or Spanner, Bigtable has no SQL layer and no secondary indexes; you design a single row key and read by key or key-range, and you pay for provisioned nodes (or autoscaling) plus storage.
The raw resource graph is fiddly. A real Bigtable deployment is never just an instance — it is an instance plus one or more google_bigtable_instance.cluster blocks (each pinned to a zone, with a storage type and either fixed num_nodes or an autoscaling_config), plus google_bigtable_table resources that declare column families, plus a google_bigtable_gc_policy per family so old cell versions actually get garbage-collected. Get the cluster/zone/replication wiring wrong and you either over-provision (Bigtable’s biggest cost trap) or accidentally tear down a stateful cluster on the next apply.
This module wraps google_bigtable_instance and its companions behind clean, validated variables. You declare clusters as a list of objects, tables as a map with their column families and GC rules, and the module handles lifecycle guards, deletion protection, and consistent app-env-region naming so every team ships Bigtable the same safe way.
When to use it
- You run high-throughput, low-latency key/value or time-series workloads (IoT, observability metrics, clickstream, feature stores) that have outgrown Firestore/Memorystore but don’t need SQL or joins.
- You need an HBase-compatible backend and want to lift Apache HBase / Cassandra workloads onto a managed service.
- You want multi-cluster replication across zones or regions for HA and read locality, with eventual consistency and automatic failover via app profiles.
- You want autoscaling Bigtable nodes tied to CPU/storage utilization so you stop paying for idle capacity overnight.
- You are standardizing a platform and want every Bigtable instance to carry the same labels, deletion protection, and table/GC conventions.
Reach for Spanner instead if you need strong consistency with SQL and transactions, or Firestore for document/mobile-sync workloads. Bigtable shines when the access pattern is “give me this row key (range) as fast as possible, at scale.”
Module structure
terraform-module-gcp-bigtable/
├── versions.tf # provider + Terraform version pins
├── main.tf # google_bigtable_instance + tables + GC policies + app profile
├── variables.tf # var-driven inputs with validation
└── outputs.tf # instance id/name, cluster ids, table ids
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
main.tf
locals {
# Consistent app-env-region naming, e.g. "telemetry-prod-euw1".
instance_name = "${var.app}-${var.environment}-${var.location_short}"
# Bigtable autoscaling and fixed sizing are mutually exclusive per cluster.
# We normalize the cluster list once so the resource block stays readable.
clusters = {
for c in var.clusters : c.cluster_id => c
}
}
resource "google_bigtable_instance" "this" {
project = var.project_id
name = local.instance_name
display_name = coalesce(var.display_name, local.instance_name)
# DEVELOPMENT instances are single-node and cheaper but cannot replicate
# or autoscale. PRODUCTION is the default for anything real.
instance_type = var.instance_type
# Guards against `terraform destroy` / accidental recreation of a stateful DB.
deletion_protection = var.deletion_protection
dynamic "cluster" {
for_each = local.clusters
content {
cluster_id = cluster.value.cluster_id
zone = cluster.value.zone
storage_type = cluster.value.storage_type
# Fixed sizing: only set when autoscaling is NOT configured.
num_nodes = cluster.value.autoscaling == null ? cluster.value.num_nodes : null
# Customer-managed encryption key (optional, per cluster).
dynamic "kms_key_name" {
for_each = cluster.value.kms_key_name == null ? [] : [cluster.value.kms_key_name]
content {}
}
dynamic "autoscaling_config" {
for_each = cluster.value.autoscaling == null ? [] : [cluster.value.autoscaling]
content {
min_nodes = autoscaling_config.value.min_nodes
max_nodes = autoscaling_config.value.max_nodes
cpu_target = autoscaling_config.value.cpu_target
storage_target = autoscaling_config.value.storage_target
}
}
}
}
labels = var.labels
lifecycle {
# num_nodes drifts when autoscaling is active; ignore it so apply is a no-op.
ignore_changes = [cluster[0].num_nodes]
}
}
# CMEK has to be passed via the cluster block in google ~> 5.0; the
# kms_key_name argument lives directly on the cluster, so we set it inline.
# (Handled above through cluster.value.kms_key_name when present.)
resource "google_bigtable_table" "this" {
for_each = var.tables
project = var.project_id
instance_name = google_bigtable_instance.this.name
name = each.key
# Optional initial row-key splits for pre-warming / avoiding hotspotting.
dynamic "split_keys" {
for_each = length(each.value.split_keys) > 0 ? [each.value.split_keys] : []
content {}
}
dynamic "column_family" {
for_each = each.value.column_families
content {
family = column_family.value
}
}
# Tables are stateful; do not let a config tweak silently drop one.
deletion_protection = var.table_deletion_protection ? "PROTECTED" : "UNPROTECTED"
lifecycle {
prevent_destroy = false
}
}
# One GC policy per (table, column family). Bigtable will not garbage-collect
# old cell versions unless a policy says so — critical for cost on time-series.
resource "google_bigtable_gc_policy" "this" {
for_each = {
for gc in flatten([
for table_name, table in var.tables : [
for family, rule in table.gc_rules : {
key = "${table_name}.${family}"
table_name = table_name
family = family
max_age_days = rule.max_age_days
max_versions = rule.max_versions
}
]
]) : gc.key => gc
}
project = var.project_id
instance_name = google_bigtable_instance.this.name
table = each.value.table_name
column_family = each.value.family
deletion_policy = "ABANDON"
dynamic "max_age" {
for_each = each.value.max_age_days == null ? [] : [each.value.max_age_days]
content {
duration = "${max_age.value * 24}h"
}
}
dynamic "max_version" {
for_each = each.value.max_versions == null ? [] : [each.value.max_versions]
content {
number = max_version.value
}
}
depends_on = [google_bigtable_table.this]
}
# App profile controls routing for multi-cluster reads/writes. Single-cluster
# routing pins traffic to one cluster (lower latency, no replication conflicts);
# multi-cluster routing load-balances and fails over automatically.
resource "google_bigtable_app_profile" "this" {
count = var.app_profile == null ? 0 : 1
project = var.project_id
instance = google_bigtable_instance.this.name
app_profile_id = var.app_profile.id
description = var.app_profile.description
multi_cluster_routing_use_any = var.app_profile.routing == "multi_cluster"
dynamic "single_cluster_routing" {
for_each = var.app_profile.routing == "single_cluster" ? [var.app_profile] : []
content {
cluster_id = single_cluster_routing.value.cluster_id
allow_transactional_writes = single_cluster_routing.value.allow_transactional_writes
}
}
ignore_warnings = true
}
variables.tf
variable "project_id" {
description = "GCP project ID that will host the Bigtable instance."
type = string
}
variable "app" {
description = "Application/workload short name, used in the instance name (e.g. \"telemetry\")."
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{1,20}$", var.app))
error_message = "app must be lowercase alphanumeric/hyphen, 2-21 chars, starting with a letter."
}
}
variable "environment" {
description = "Deployment environment (dev, staging, prod, ...)."
type = string
validation {
condition = contains(["dev", "staging", "prod", "sandbox"], var.environment)
error_message = "environment must be one of: dev, staging, prod, sandbox."
}
}
variable "location_short" {
description = "Short region/zone token for naming, e.g. \"euw1\", \"use4\". Cosmetic only."
type = string
}
variable "display_name" {
description = "Human-friendly instance display name. Defaults to the generated instance name."
type = string
default = null
}
variable "instance_type" {
description = "Bigtable instance type: PRODUCTION (replicable, autoscalable) or DEVELOPMENT (single node, cheap)."
type = string
default = "PRODUCTION"
validation {
condition = contains(["PRODUCTION", "DEVELOPMENT"], var.instance_type)
error_message = "instance_type must be PRODUCTION or DEVELOPMENT."
}
}
variable "deletion_protection" {
description = "Prevent the instance from being destroyed by Terraform. Keep true in prod."
type = bool
default = true
}
variable "clusters" {
description = <<-EOT
List of Bigtable clusters. Each cluster lives in one zone. Provide either a
fixed num_nodes OR an autoscaling block (not both). Multiple clusters enable
replication across zones/regions.
EOT
type = list(object({
cluster_id = string
zone = string
storage_type = optional(string, "SSD")
num_nodes = optional(number, 1)
kms_key_name = optional(string)
autoscaling = optional(object({
min_nodes = number
max_nodes = number
cpu_target = number
storage_target = optional(number, 2560)
}))
}))
validation {
condition = length(var.clusters) >= 1 && length(var.clusters) <= 8
error_message = "Provide between 1 and 8 clusters (Bigtable's replication limit)."
}
validation {
condition = alltrue([for c in var.clusters : contains(["SSD", "HDD"], c.storage_type)])
error_message = "Each cluster storage_type must be SSD or HDD."
}
validation {
condition = alltrue([
for c in var.clusters : c.autoscaling == null ? true :
(c.autoscaling.cpu_target >= 10 && c.autoscaling.cpu_target <= 80)
])
error_message = "autoscaling.cpu_target must be between 10 and 80 percent."
}
}
variable "tables" {
description = <<-EOT
Map of tables to create, keyed by table name. Each table declares its
column_families, optional row-key split_keys for pre-splitting, and gc_rules
(max_age_days and/or max_versions) per family.
EOT
type = map(object({
column_families = optional(list(string), [])
split_keys = optional(list(string), [])
gc_rules = optional(map(object({
max_age_days = optional(number)
max_versions = optional(number)
})), {})
}))
default = {}
}
variable "table_deletion_protection" {
description = "Mark created tables PROTECTED so a config change cannot drop them."
type = bool
default = true
}
variable "app_profile" {
description = <<-EOT
Optional custom app profile controlling read/write routing. routing is
"multi_cluster" (auto failover/load-balance) or "single_cluster" (pin to one
cluster, required for transactional single-row writes).
EOT
type = object({
id = string
description = optional(string, "Managed by Terraform")
routing = string
cluster_id = optional(string)
allow_transactional_writes = optional(bool, false)
})
default = null
validation {
condition = var.app_profile == null ? true : contains(["multi_cluster", "single_cluster"], var.app_profile.routing)
error_message = "app_profile.routing must be multi_cluster or single_cluster."
}
validation {
condition = var.app_profile == null ? true : (var.app_profile.routing != "single_cluster" || var.app_profile.cluster_id != null)
error_message = "single_cluster routing requires app_profile.cluster_id to be set."
}
}
variable "labels" {
description = "Labels applied to the Bigtable instance."
type = map(string)
default = {}
}
outputs.tf
output "instance_id" {
description = "Fully qualified Bigtable instance ID (projects/<project>/instances/<name>)."
value = google_bigtable_instance.this.id
}
output "instance_name" {
description = "Bigtable instance name (used in client connection strings and CLI)."
value = google_bigtable_instance.this.name
}
output "cluster_ids" {
description = "Map of cluster_id => zone for every cluster in the instance."
value = { for c in var.clusters : c.cluster_id => c.zone }
}
output "table_ids" {
description = "Map of table name => fully qualified table ID."
value = { for name, t in google_bigtable_table.this : name => t.id }
}
output "table_names" {
description = "List of created table names."
value = keys(google_bigtable_table.this)
}
output "app_profile_id" {
description = "App profile ID, or null if no custom profile was created."
value = try(google_bigtable_app_profile.this[0].app_profile_id, null)
}
How to use it
The example below provisions a replicated, autoscaling telemetry instance with one events table, then wires a Dataflow / app-tier service account and a downstream consumer that reads the instance name from the module output.
module "bigtable" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-bigtable?ref=v1.0.0"
project_id = "kv-data-prod"
app = "telemetry"
environment = "prod"
location_short = "euw1"
instance_type = "PRODUCTION"
deletion_protection = true
clusters = [
{
cluster_id = "telemetry-prod-euw1-c1"
zone = "europe-west1-b"
storage_type = "SSD"
autoscaling = {
min_nodes = 3
max_nodes = 20
cpu_target = 60
}
},
{
# Second cluster in another zone = replication + read locality.
cluster_id = "telemetry-prod-euw1-c2"
zone = "europe-west1-c"
storage_type = "SSD"
autoscaling = {
min_nodes = 3
max_nodes = 20
cpu_target = 60
}
},
]
tables = {
"device_events" = {
column_families = ["raw", "agg"]
# Pre-split on a reversed-device-id prefix to avoid write hotspotting.
split_keys = ["1", "3", "5", "7", "9", "b", "d", "f"]
gc_rules = {
raw = { max_age_days = 30 } # drop raw cells after 30 days
agg = { max_versions = 1 } # keep only the latest aggregate
}
}
}
app_profile = {
id = "ingest"
routing = "multi_cluster"
}
labels = {
team = "data-platform"
cost-center = "kv-1042"
workload = "telemetry"
}
}
# Downstream: grant the ingestion SA write access on this exact instance,
# using the module output rather than a hardcoded name.
resource "google_bigtable_instance_iam_member" "ingest_writer" {
project = "kv-data-prod"
instance = module.bigtable.instance_name
role = "roles/bigtable.user"
member = "serviceAccount:telemetry-ingest@kv-data-prod.iam.gserviceaccount.com"
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "gcs"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...gcs state bucket/container + key per path...
}
}
2. Module config — live/prod/bigtable/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-bigtable?ref=v1.0.0"
}
inputs = {
project_id = "..."
app = "..."
environment = "..."
location_short = "..."
clusters = ["...", "..."]
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/bigtable && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
project_id |
string |
— | Yes | GCP project ID hosting the Bigtable instance. |
app |
string |
— | Yes | Workload short name used in the instance name (validated lowercase). |
environment |
string |
— | Yes | One of dev, staging, prod, sandbox. |
location_short |
string |
— | Yes | Cosmetic region/zone token for naming (e.g. euw1). |
display_name |
string |
null |
No | Human-friendly display name; defaults to the generated instance name. |
instance_type |
string |
"PRODUCTION" |
No | PRODUCTION or DEVELOPMENT. |
deletion_protection |
bool |
true |
No | Block terraform destroy on the instance. |
clusters |
list(object) |
— | Yes | 1–8 clusters; each with zone, storage_type, and fixed num_nodes or an autoscaling block. |
tables |
map(object) |
{} |
No | Tables keyed by name, with column_families, split_keys, and per-family gc_rules. |
table_deletion_protection |
bool |
true |
No | Create tables as PROTECTED. |
app_profile |
object |
null |
No | Optional app profile: multi_cluster or single_cluster routing. |
labels |
map(string) |
{} |
No | Labels applied to the instance. |
Outputs
| Name | Description |
|---|---|
instance_id |
Fully qualified instance ID (projects/<project>/instances/<name>). |
instance_name |
Instance name used in client connections and cbt CLI. |
cluster_ids |
Map of cluster_id => zone for every cluster. |
table_ids |
Map of table name => fully qualified table ID. |
table_names |
List of created table names. |
app_profile_id |
Custom app profile ID, or null if none was created. |
Enterprise scenario
A connected-vehicle platform ingests ~400k telemetry messages/second from a global fleet. They deploy this module per region with a two-cluster, autoscaling (min_nodes = 6, max_nodes = 40, cpu_target = 60) PRODUCTION instance and a device_events table pre-split on a reversed-VIN row key to spread writes evenly. A 30-day max_age GC policy on the raw family keeps storage flat while a daily aggregation job writes a max_versions = 1 rollup family for the dashboards. A multi_cluster app profile gives them automatic zone failover during maintenance, and instance_name feeds the Dataflow pipeline and per-team IAM bindings — so onboarding a new region is a copy-paste of one module block plus a cluster_id.
Best practices
- Design the row key first, then pre-split. Bigtable performance lives and dies on key design — avoid sequential/timestamp prefixes that hotspot a single node. Use
split_keys(and field-promotion/salting/reversed-ID patterns) to distribute writes across tablets from day one. - Always attach a GC policy per column family. Without
max_ageormax_versions, old cell versions accumulate forever and storage cost grows unbounded — setgc_ruleson every family, especially time-series tables. - Prefer autoscaling and right-size
cpu_target. Nodes are the dominant cost. Setmin_nodesto your floor, a generousmax_nodesceiling, andcpu_targetaround 60 (keep below ~70 to leave headroom for replication and tail latency). Keepignore_changesonnum_nodesso autoscaling drift doesn’t churn the plan. - Use replication for HA, and choose routing deliberately. Two+ clusters across zones give failover and read locality, but use a
single_clusterapp profile for workloads that need transactional single-row writes or must avoid cross-cluster read-your-writes surprises. - Lock down statefulness and encryption. Keep
deletion_protection = trueandtable_deletion_protection = truein prod, and pass a CMEKkms_key_nameper cluster where compliance requires customer-managed keys. - Standardize naming and labels. The
app-env-regioninstance name plusteam/cost-centerlabels make Bigtable spend attributable and instances greppable across projects — non-negotiable once you run more than a couple of clusters.