Quick take — Provision a GCP Vertex AI Featurestore with Terraform: autoscaled or fixed-node online serving, customer-managed encryption (CMEK), labels and force_destroy, all behind a reusable module. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "google" {
project = "my-project"
region = "us-central1"
}
module "vertex_featurestore" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-featurestore?ref=v1.0.0"
name = "..." # Featurestore name; lowercase letters, digits, underscor…
region = "..." # GCP region for the regional Featurestore (e.g. `us-cent…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
A Vertex AI Featurestore is GCP’s managed home for ML features — the engineered signals (a user’s 30-day spend, an item’s rolling click-through rate, a device’s fraud score) that both training jobs and live prediction services read. The top-level google_vertex_ai_featurestore resource is the container: it owns the regional storage and, critically, the online serving tier that answers low-latency point lookups at prediction time. Entity types and individual features live underneath it, but the Featurestore is where you make the decisions that cost money and drive latency: whether online serving runs on a fixed node count or autoscales, and whether the data is encrypted with a customer-managed key.
Those are exactly the knobs teams get wrong when they click through the console. A reusable module pins them down. It forces a deliberate choice between fixed_node_count (predictable cost, predictable throughput) and scaling (min/max nodes that follow traffic), wires in CMEK via encryption_spec so the data inherits your org’s key policy, and standardizes labels and the force_destroy flag so a terraform destroy in a sandbox doesn’t get blocked by stray entity types while production stays protected. Wrapping it means every Featurestore across your estate serves online traffic the same way, encrypts the same way, and tags the same way — without anyone re-deriving the right online_serving_config block from the docs each time.
When to use it
- You need a low-latency online feature store for real-time inference (fraud scoring, recommendations, dynamic pricing) and want online serving capacity defined as code.
- You want autoscaling online serving so node count tracks request volume instead of paying for peak 24/7, or conversely a locked
fixed_node_countfor steady, predictable workloads. - Your organization mandates CMEK — Featurestore data at rest must be encrypted with a Cloud KMS key you control and can rotate or disable.
- You run multiple Featurestores (per team, per region, per environment) and need consistent labels, region placement, and lifecycle behavior across all of them.
- You are standing up the classic Featurestore (entity-type / feature model). If you only need the newer Feature Registry + online store (
google_vertex_ai_feature_online_store), that is a different resource and a different module.
Module structure
terraform-module-gcp-vertex-featurestore/
├── versions.tf # provider + Terraform version pins
├── main.tf # google_vertex_ai_featurestore resource
├── variables.tf # var-driven inputs with validation
└── outputs.tf # id, name, region, serving mode, etc.
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
main.tf
locals {
# Exactly one online serving mode is active. When online serving is
# disabled entirely, both blocks collapse to empty and Vertex AI
# treats the store as offline-only.
use_scaling = var.online_serving_enabled && var.scaling != null
use_fixed = var.online_serving_enabled && var.scaling == null && var.fixed_node_count > 0
}
resource "google_vertex_ai_featurestore" "this" {
provider = google
name = var.name
region = var.region
labels = var.labels
# If true, `terraform destroy` will delete the Featurestore even when
# entity types still exist under it. Keep false in production.
force_destroy = var.force_destroy
online_serving_config {
# Fixed-node path: a constant number of serving nodes.
fixed_node_count = local.use_fixed ? var.fixed_node_count : null
# Autoscaling path: nodes float between min and max with load.
dynamic "scaling" {
for_each = local.use_scaling ? [var.scaling] : []
content {
min_node_count = scaling.value.min_node_count
max_node_count = scaling.value.max_node_count
}
}
}
# Customer-managed encryption (CMEK). Omitted entirely when no key is
# supplied, in which case Google-managed encryption applies.
dynamic "encryption_spec" {
for_each = var.kms_key_name != null ? [1] : []
content {
kms_key_name = var.kms_key_name
}
}
timeouts {
create = var.create_timeout
update = var.update_timeout
delete = var.delete_timeout
}
}
variables.tf
variable "name" {
description = "Name of the Featurestore. Must be unique within the project and region."
type = string
validation {
condition = can(regex("^[a-z][a-z0-9_]{0,59}$", var.name))
error_message = "name must start with a lowercase letter and contain only lowercase letters, digits, and underscores (max 60 chars)."
}
}
variable "region" {
description = "GCP region for the Featurestore, e.g. us-central1. Featurestores are regional resources."
type = string
}
variable "labels" {
description = "Key/value labels applied to the Featurestore for cost allocation and ownership."
type = map(string)
default = {}
}
variable "online_serving_enabled" {
description = "Whether to provision online serving capacity. When false, the store is offline-only (no node cost)."
type = bool
default = true
}
variable "fixed_node_count" {
description = "Number of nodes for fixed-capacity online serving. Used only when scaling is null. Set 0 to disable fixed serving."
type = number
default = 1
validation {
condition = var.fixed_node_count >= 0 && var.fixed_node_count <= 100
error_message = "fixed_node_count must be between 0 and 100."
}
}
variable "scaling" {
description = "Autoscaling config for online serving. When set, overrides fixed_node_count. Null disables autoscaling."
type = object({
min_node_count = number
max_node_count = number
})
default = null
validation {
condition = var.scaling == null || (
var.scaling.min_node_count >= 1 &&
var.scaling.max_node_count >= var.scaling.min_node_count &&
var.scaling.max_node_count <= 100
)
error_message = "scaling requires 1 <= min_node_count <= max_node_count <= 100."
}
}
variable "kms_key_name" {
description = "Full resource ID of a Cloud KMS key for CMEK, e.g. projects/p/locations/us-central1/keyRings/r/cryptoKeys/k. Null uses Google-managed encryption."
type = string
default = null
validation {
condition = var.kms_key_name == null || can(regex("^projects/.+/locations/.+/keyRings/.+/cryptoKeys/.+$", var.kms_key_name))
error_message = "kms_key_name must be a full Cloud KMS cryptoKey resource ID or null."
}
}
variable "force_destroy" {
description = "If true, allow destroying the Featurestore even when entity types still exist. Keep false in production."
type = bool
default = false
}
variable "create_timeout" {
description = "Timeout for create operations."
type = string
default = "20m"
}
variable "update_timeout" {
description = "Timeout for update operations."
type = string
default = "20m"
}
variable "delete_timeout" {
description = "Timeout for delete operations."
type = string
default = "20m"
}
outputs.tf
output "id" {
description = "Fully qualified Featurestore ID (projects/{project}/locations/{region}/featurestores/{name})."
value = google_vertex_ai_featurestore.this.id
}
output "name" {
description = "Short name of the Featurestore."
value = google_vertex_ai_featurestore.this.name
}
output "region" {
description = "Region the Featurestore is deployed in."
value = google_vertex_ai_featurestore.this.region
}
output "etag" {
description = "Used for optimistic concurrency control on updates."
value = google_vertex_ai_featurestore.this.etag
}
output "online_serving_mode" {
description = "Resolved online serving mode: 'autoscaling', 'fixed', or 'offline'."
value = local.use_scaling ? "autoscaling" : (local.use_fixed ? "fixed" : "offline")
}
output "kms_key_name" {
description = "CMEK key in use, or null when Google-managed encryption applies."
value = var.kms_key_name
}
How to use it
module "vertex_ai_featurestore" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-featurestore?ref=v1.0.0"
name = "fraud_features_prod"
region = "us-central1"
# Autoscale online serving between 2 and 8 nodes as traffic rises.
scaling = {
min_node_count = 2
max_node_count = 8
}
# Encrypt feature data at rest with our own KMS key.
kms_key_name = "projects/kv-ml-prod/locations/us-central1/keyRings/vertex/cryptoKeys/featurestore"
force_destroy = false
labels = {
team = "risk-ml"
environment = "prod"
cost-center = "ml-platform"
}
}
# Downstream: define an entity type inside the Featurestore returned above,
# referencing the module's id output.
resource "google_vertex_ai_featurestore_entitytype" "user" {
name = "user"
featurestore = module.vertex_ai_featurestore.id
description = "Per-user aggregated fraud signals."
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "gcs"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...gcs state bucket/container + key per path...
}
}
2. Module config — live/prod/vertex_featurestore/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-featurestore?ref=v1.0.0"
}
inputs = {
name = "..."
region = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/vertex_featurestore && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
| name | string |
— | Yes | Featurestore name; lowercase letters, digits, underscores, starts with a letter, max 60 chars. |
| region | string |
— | Yes | GCP region for the regional Featurestore (e.g. us-central1). |
| labels | map(string) |
{} |
No | Labels for cost allocation and ownership. |
| online_serving_enabled | bool |
true |
No | Provision online serving capacity; false makes the store offline-only. |
| fixed_node_count | number |
1 |
No | Fixed serving node count (0–100); used only when scaling is null. |
| scaling | object({ min_node_count, max_node_count }) |
null |
No | Autoscaling bounds for online serving; overrides fixed_node_count when set. |
| kms_key_name | string |
null |
No | Full Cloud KMS cryptoKey ID for CMEK; null uses Google-managed encryption. |
| force_destroy | bool |
false |
No | Allow destroy even when entity types exist. |
| create_timeout | string |
"20m" |
No | Timeout for create operations. |
| update_timeout | string |
"20m" |
No | Timeout for update operations. |
| delete_timeout | string |
"20m" |
No | Timeout for delete operations. |
Outputs
| Name | Description |
|---|---|
| id | Fully qualified Featurestore ID (projects/{project}/locations/{region}/featurestores/{name}). |
| name | Short name of the Featurestore. |
| region | Region the Featurestore is deployed in. |
| etag | Etag for optimistic concurrency control on updates. |
| online_serving_mode | Resolved serving mode: autoscaling, fixed, or offline. |
| kms_key_name | CMEK key in use, or null when Google-managed encryption applies. |
Enterprise scenario
A digital-payments company runs real-time fraud scoring on every transaction. Their risk-ML team uses this module to stand up fraud_features_prod in us-central1 with autoscaling online serving (2–8 nodes) so it absorbs payday and holiday traffic spikes without manual resizing, and pins a Cloud KMS key via kms_key_name to satisfy PCI-driven encryption requirements. The same module, with online_serving_enabled = false and force_destroy = true, provisions a throwaway offline-only store in their sandbox project for feature backfill experiments — same code, zero online node cost, and a clean teardown.
Best practices
- Choose scaling vs. fixed deliberately. Use
scalingfor spiky inference traffic so you don’t pay for peak nodes around the clock; usefixed_node_countonly when load is steady and you want a hard throughput ceiling. Never set both — the module’s locals already enforce thatscalingwins. - Always set CMEK in regulated environments. Pass a
kms_key_nameso feature data inherits your key rotation and revocation policy; granting the Vertex AI service agentroles/cloudkms.cryptoKeyEncrypterDecrypteron that key is a prerequisite, so provision the IAM binding before the Featurestore. - Keep
force_destroy = falsein production. It is a deliberate guardrail — destroying a Featurestore with live entity types wipes every feature value. Only flip it on in disposable sandbox or CI projects. - Right-size — and zero out — offline-only stores. If a store is purely for batch/training reads, set
online_serving_enabled = false; online nodes are billed per node-hour whether queried or not, so an idle online tier is pure waste. - Standardize names and labels. Use a
{domain}_{purpose}_{env}convention (e.g.fraud_features_prod) and mandatoryteam/environment/cost-centerlabels so Featurestores are attributable in billing exports and easy to find across regions. - Co-locate the Featurestore with consumers. Place it in the same region as the prediction service and source BigQuery datasets to cut online-serving latency and avoid cross-region egress on ingestion.