Terraform Module: GCP Vertex AI Featurestore — autoscaled online serving with CMEK in one wrapper

Quick take — Provision a GCP Vertex AI Featurestore with Terraform: autoscaled or fixed-node online serving, customer-managed encryption (CMEK), labels and force_destroy, all behind a reusable module. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

module "vertex_featurestore" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-featurestore?ref=v1.0.0"

  name   = "..."  # Featurestore name; lowercase letters, digits, underscor…
  region = "..."  # GCP region for the regional Featurestore (e.g. `us-cent…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

A Vertex AI Featurestore is GCP’s managed home for ML features — the engineered signals (a user’s 30-day spend, an item’s rolling click-through rate, a device’s fraud score) that both training jobs and live prediction services read. The top-level google_vertex_ai_featurestore resource is the container: it owns the regional storage and, critically, the online serving tier that answers low-latency point lookups at prediction time. Entity types and individual features live underneath it, but the Featurestore is where you make the decisions that cost money and drive latency: whether online serving runs on a fixed node count or autoscales, and whether the data is encrypted with a customer-managed key.

Those are exactly the knobs teams get wrong when they click through the console. A reusable module pins them down. It forces a deliberate choice between fixed_node_count (predictable cost, predictable throughput) and scaling (min/max nodes that follow traffic), wires in CMEK via encryption_spec so the data inherits your org’s key policy, and standardizes labels and the force_destroy flag so a terraform destroy in a sandbox doesn’t get blocked by stray entity types while production stays protected. Wrapping it means every Featurestore across your estate serves online traffic the same way, encrypts the same way, and tags the same way — without anyone re-deriving the right online_serving_config block from the docs each time.

When to use it

You need a low-latency online feature store for real-time inference (fraud scoring, recommendations, dynamic pricing) and want online serving capacity defined as code.
You want autoscaling online serving so node count tracks request volume instead of paying for peak 24/7, or conversely a locked fixed_node_count for steady, predictable workloads.
Your organization mandates CMEK — Featurestore data at rest must be encrypted with a Cloud KMS key you control and can rotate or disable.
You run multiple Featurestores (per team, per region, per environment) and need consistent labels, region placement, and lifecycle behavior across all of them.
You are standing up the classic Featurestore (entity-type / feature model). If you only need the newer Feature Registry + online store (google_vertex_ai_feature_online_store), that is a different resource and a different module.

Module structure

terraform-module-gcp-vertex-featurestore/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # google_vertex_ai_featurestore resource
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # id, name, region, serving mode, etc.

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # Exactly one online serving mode is active. When online serving is
  # disabled entirely, both blocks collapse to empty and Vertex AI
  # treats the store as offline-only.
  use_scaling = var.online_serving_enabled && var.scaling != null
  use_fixed   = var.online_serving_enabled && var.scaling == null && var.fixed_node_count > 0
}

resource "google_vertex_ai_featurestore" "this" {
  provider = google

  name   = var.name
  region = var.region
  labels = var.labels

  # If true, `terraform destroy` will delete the Featurestore even when
  # entity types still exist under it. Keep false in production.
  force_destroy = var.force_destroy

  online_serving_config {
    # Fixed-node path: a constant number of serving nodes.
    fixed_node_count = local.use_fixed ? var.fixed_node_count : null

    # Autoscaling path: nodes float between min and max with load.
    dynamic "scaling" {
      for_each = local.use_scaling ? [var.scaling] : []
      content {
        min_node_count = scaling.value.min_node_count
        max_node_count = scaling.value.max_node_count
      }
    }
  }

  # Customer-managed encryption (CMEK). Omitted entirely when no key is
  # supplied, in which case Google-managed encryption applies.
  dynamic "encryption_spec" {
    for_each = var.kms_key_name != null ? [1] : []
    content {
      kms_key_name = var.kms_key_name
    }
  }

  timeouts {
    create = var.create_timeout
    update = var.update_timeout
    delete = var.delete_timeout
  }
}

variables.tf

variable "name" {
  description = "Name of the Featurestore. Must be unique within the project and region."
  type        = string

  validation {
    condition     = can(regex("^[a-z][a-z0-9_]{0,59}$", var.name))
    error_message = "name must start with a lowercase letter and contain only lowercase letters, digits, and underscores (max 60 chars)."
  }
}

variable "region" {
  description = "GCP region for the Featurestore, e.g. us-central1. Featurestores are regional resources."
  type        = string
}

variable "labels" {
  description = "Key/value labels applied to the Featurestore for cost allocation and ownership."
  type        = map(string)
  default     = {}
}

variable "online_serving_enabled" {
  description = "Whether to provision online serving capacity. When false, the store is offline-only (no node cost)."
  type        = bool
  default     = true
}

variable "fixed_node_count" {
  description = "Number of nodes for fixed-capacity online serving. Used only when scaling is null. Set 0 to disable fixed serving."
  type        = number
  default     = 1

  validation {
    condition     = var.fixed_node_count >= 0 && var.fixed_node_count <= 100
    error_message = "fixed_node_count must be between 0 and 100."
  }
}

variable "scaling" {
  description = "Autoscaling config for online serving. When set, overrides fixed_node_count. Null disables autoscaling."
  type = object({
    min_node_count = number
    max_node_count = number
  })
  default = null

  validation {
    condition = var.scaling == null || (
      var.scaling.min_node_count >= 1 &&
      var.scaling.max_node_count >= var.scaling.min_node_count &&
      var.scaling.max_node_count <= 100
    )
    error_message = "scaling requires 1 <= min_node_count <= max_node_count <= 100."
  }
}

variable "kms_key_name" {
  description = "Full resource ID of a Cloud KMS key for CMEK, e.g. projects/p/locations/us-central1/keyRings/r/cryptoKeys/k. Null uses Google-managed encryption."
  type        = string
  default     = null

  validation {
    condition     = var.kms_key_name == null || can(regex("^projects/.+/locations/.+/keyRings/.+/cryptoKeys/.+$", var.kms_key_name))
    error_message = "kms_key_name must be a full Cloud KMS cryptoKey resource ID or null."
  }
}

variable "force_destroy" {
  description = "If true, allow destroying the Featurestore even when entity types still exist. Keep false in production."
  type        = bool
  default     = false
}

variable "create_timeout" {
  description = "Timeout for create operations."
  type        = string
  default     = "20m"
}

variable "update_timeout" {
  description = "Timeout for update operations."
  type        = string
  default     = "20m"
}

variable "delete_timeout" {
  description = "Timeout for delete operations."
  type        = string
  default     = "20m"
}

outputs.tf

output "id" {
  description = "Fully qualified Featurestore ID (projects/{project}/locations/{region}/featurestores/{name})."
  value       = google_vertex_ai_featurestore.this.id
}

output "name" {
  description = "Short name of the Featurestore."
  value       = google_vertex_ai_featurestore.this.name
}

output "region" {
  description = "Region the Featurestore is deployed in."
  value       = google_vertex_ai_featurestore.this.region
}

output "etag" {
  description = "Used for optimistic concurrency control on updates."
  value       = google_vertex_ai_featurestore.this.etag
}

output "online_serving_mode" {
  description = "Resolved online serving mode: 'autoscaling', 'fixed', or 'offline'."
  value       = local.use_scaling ? "autoscaling" : (local.use_fixed ? "fixed" : "offline")
}

output "kms_key_name" {
  description = "CMEK key in use, or null when Google-managed encryption applies."
  value       = var.kms_key_name
}

How to use it

module "vertex_ai_featurestore" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-featurestore?ref=v1.0.0"

  name   = "fraud_features_prod"
  region = "us-central1"

  # Autoscale online serving between 2 and 8 nodes as traffic rises.
  scaling = {
    min_node_count = 2
    max_node_count = 8
  }

  # Encrypt feature data at rest with our own KMS key.
  kms_key_name = "projects/kv-ml-prod/locations/us-central1/keyRings/vertex/cryptoKeys/featurestore"

  force_destroy = false

  labels = {
    team        = "risk-ml"
    environment = "prod"
    cost-center = "ml-platform"
  }
}

# Downstream: define an entity type inside the Featurestore returned above,
# referencing the module's id output.
resource "google_vertex_ai_featurestore_entitytype" "user" {
  name         = "user"
  featurestore = module.vertex_ai_featurestore.id

  description = "Per-user aggregated fraud signals."
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "gcs"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...gcs state bucket/container + key per path...
  }
}

2. Module config — live/prod/vertex_featurestore/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-featurestore?ref=v1.0.0"
}

inputs = {
  name = "..."
  region = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/vertex_featurestore && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
name	`string`	—	Yes	Featurestore name; lowercase letters, digits, underscores, starts with a letter, max 60 chars.
region	`string`	—	Yes	GCP region for the regional Featurestore (e.g. `us-central1`).
labels	`map(string)`	`{}`	No	Labels for cost allocation and ownership.
online_serving_enabled	`bool`	`true`	No	Provision online serving capacity; `false` makes the store offline-only.
fixed_node_count	`number`	`1`	No	Fixed serving node count (0–100); used only when `scaling` is null.
scaling	`object({ min_node_count, max_node_count })`	`null`	No	Autoscaling bounds for online serving; overrides `fixed_node_count` when set.
kms_key_name	`string`	`null`	No	Full Cloud KMS cryptoKey ID for CMEK; null uses Google-managed encryption.
force_destroy	`bool`	`false`	No	Allow destroy even when entity types exist.
create_timeout	`string`	`"20m"`	No	Timeout for create operations.
update_timeout	`string`	`"20m"`	No	Timeout for update operations.
delete_timeout	`string`	`"20m"`	No	Timeout for delete operations.

Outputs

Name	Description
id	Fully qualified Featurestore ID (`projects/{project}/locations/{region}/featurestores/{name}`).
name	Short name of the Featurestore.
region	Region the Featurestore is deployed in.
etag	Etag for optimistic concurrency control on updates.
online_serving_mode	Resolved serving mode: `autoscaling`, `fixed`, or `offline`.
kms_key_name	CMEK key in use, or null when Google-managed encryption applies.

Enterprise scenario

A digital-payments company runs real-time fraud scoring on every transaction. Their risk-ML team uses this module to stand up fraud_features_prod in us-central1 with autoscaling online serving (2–8 nodes) so it absorbs payday and holiday traffic spikes without manual resizing, and pins a Cloud KMS key via kms_key_name to satisfy PCI-driven encryption requirements. The same module, with online_serving_enabled = false and force_destroy = true, provisions a throwaway offline-only store in their sandbox project for feature backfill experiments — same code, zero online node cost, and a clean teardown.

Best practices

Choose scaling vs. fixed deliberately. Use scaling for spiky inference traffic so you don’t pay for peak nodes around the clock; use fixed_node_count only when load is steady and you want a hard throughput ceiling. Never set both — the module’s locals already enforce that scaling wins.
Always set CMEK in regulated environments. Pass a kms_key_name so feature data inherits your key rotation and revocation policy; granting the Vertex AI service agent roles/cloudkms.cryptoKeyEncrypterDecrypter on that key is a prerequisite, so provision the IAM binding before the Featurestore.
Keep force_destroy = false in production. It is a deliberate guardrail — destroying a Featurestore with live entity types wipes every feature value. Only flip it on in disposable sandbox or CI projects.
Right-size — and zero out — offline-only stores. If a store is purely for batch/training reads, set online_serving_enabled = false; online nodes are billed per node-hour whether queried or not, so an idle online tier is pure waste.
Standardize names and labels. Use a {domain}_{purpose}_{env} convention (e.g. fraud_features_prod) and mandatory team / environment / cost-center labels so Featurestores are attributable in billing exports and easy to find across regions.
Co-locate the Featurestore with consumers. Place it in the same region as the prediction service and source BigQuery datasets to cut online-serving latency and avoid cross-region egress on ingestion.