Terraform Module: GCP Vertex AI — Reproducible, governed model-serving endpoints

Quick take — Build production-ready Vertex AI prediction endpoints with Terraform: VPC-private serving via PSC, CMEK encryption, traffic-split deployed models, and clean outputs you can wire into downstream IaC. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

module "vertex_ai" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-ai?ref=v1.0.0"

  project_id  = "..."  # GCP project ID that will host the Vertex AI endpoint.
  region      = "..."  # Region for the endpoint (e.g. `us-central1`, `asia-sout…
  environment = "..."  # Deployment environment; must be `dev`, `stage`, or `pro…
  name_prefix = "..."  # Lowercase prefix for the endpoint display name (e.g. `f…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Vertex AI is Google Cloud’s managed ML platform. The google_vertex_ai_endpoint resource provisions a prediction endpoint — a stable, regional serving target that exposes a public or private REST/gRPC URL. You upload a trained model into the Model Registry, deploy it onto an endpoint (which spins up a dedicated machine pool behind the scenes), and clients call the endpoint with :predict / :rawPredict requests. The endpoint decouples the address clients depend on from the model version answering the request, so you can roll out a new model behind the same URL.

The endpoint resource itself is deceptively small, but the production decisions around it are not: do you serve over the public internet or pin it to a VPC via Private Service Connect (PSC); are payloads encrypted with a customer-managed key (CMEK); which dedicated PSC consumer projects are allowed to reach a private endpoint; and how do you keep Terraform from fighting the live traffic split that your CD pipeline mutates during canary rollouts. Wrapping all of that in a reusable module gives every team a consistent, governed, auditable way to stand up an inference endpoint without re-deriving the PSC + CMEK + IAM wiring each time.

This module owns the endpoint and its network/encryption posture. It deliberately leaves which model is deployed (and the canary traffic percentages) to your delivery pipeline, because those change far more often than the endpoint’s identity and should not trigger Terraform drift on every deploy.

When to use it

You need a stable inference URL that survives model retraining and version bumps, with the model swap handled out-of-band by CD.
Your security baseline requires private-only serving (no public endpoint) reachable through Private Service Connect from a known set of consumer projects.
Compliance mandates CMEK so prediction request/response data and the deployed model artifacts are encrypted with a key you control in Cloud KMS.
You run multiple environments (dev/stage/prod) or many model families and want one parameterized module instead of copy-pasted endpoint blocks.
You want the resource skeleton (endpoint + IAM + service identity) under Terraform while keeping live traffic-split mutations outside of it.

If you only ever click “Deploy to endpoint” in the console for a one-off experiment, the module is overkill — reach for it once an endpoint becomes a dependency something else relies on.

Module structure

terraform-module-gcp-vertex-ai/
├── versions.tf      # provider + required version pinning
├── main.tf          # endpoint, service-identity, PSC/CMEK wiring, IAM
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # id, name, endpoint URL, network config

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # Vertex AI endpoint display names must be <= 128 chars; we keep a tidy,
  # environment-qualified name so endpoints are recognisable in the console.
  endpoint_display_name = "${var.name_prefix}-${var.environment}-endpoint"

  # PSC is mutually exclusive with public/automatic networking. We only emit a
  # PSC block when private serving is requested.
  use_psc = var.private_service_connect != null
}

# Ensure the Vertex AI service identity exists in this project. The CMEK key
# must be granted to THIS service account, so we surface it for the key binding.
resource "google_project_service_identity" "vertex" {
  provider = google-beta

  project = var.project_id
  service = "aiplatform.googleapis.com"
}

# Grant the Vertex AI service agent encrypt/decrypt on the CMEK key so it can
# encrypt model artifacts and online-prediction data at rest.
resource "google_kms_crypto_key_iam_member" "vertex_cmek" {
  count = var.encryption_kms_key_name != null ? 1 : 0

  crypto_key_id = var.encryption_kms_key_name
  role          = "roles/cloudkms.cryptoKeyEncrypterDecrypter"
  member        = "serviceAccount:${google_project_service_identity.vertex.email}"
}

resource "google_vertex_ai_endpoint" "this" {
  provider = google

  # name is the short resource ID segment; display_name is the human label.
  name         = var.endpoint_id
  display_name = local.endpoint_display_name
  description  = var.description
  location     = var.region
  project      = var.project_id

  # Customer-managed encryption for model + online prediction data at rest.
  dynamic "encryption_spec" {
    for_each = var.encryption_kms_key_name != null ? [1] : []
    content {
      kms_key_name = var.encryption_kms_key_name
    }
  }

  # ---- Networking: pick exactly ONE serving posture ----

  # (a) Legacy VPC peering: a fully-qualified network the endpoint peers into.
  network = var.peered_network

  # (b) Private Service Connect: private serving with explicit consumer allow-list.
  dynamic "private_service_connect_config" {
    for_each = local.use_psc ? [var.private_service_connect] : []
    content {
      enable_private_service_connect = true
      project_allowlist              = private_service_connect_config.value.project_allowlist
    }
  }

  labels = merge(
    {
      environment = var.environment
      managed-by  = "terraform"
      component   = "vertex-ai-endpoint"
    },
    var.labels,
  )

  # The deployed model + traffic split are mutated by the CD pipeline at deploy
  # time. Ignore them here so routine canary rollouts never show as drift.
  lifecycle {
    ignore_changes = [
      deployed_models,
      traffic_split,
    ]

    # Networking posture is immutable on an endpoint; protect against an
    # accidental destroy/recreate that would change the stable serving URL.
    precondition {
      condition     = !(var.peered_network != null && local.use_psc)
      error_message = "Set either peered_network OR private_service_connect, not both."
    }
  }

  depends_on = [google_kms_crypto_key_iam_member.vertex_cmek]
}

# Grant caller identities permission to invoke predictions on this endpoint.
resource "google_vertex_ai_endpoint_iam_member" "predictors" {
  for_each = toset(var.predictor_members)

  provider = google

  project  = var.project_id
  location = var.region
  endpoint = google_vertex_ai_endpoint.this.name
  role     = "roles/aiplatform.user"
  member   = each.value
}

variables.tf

variable "project_id" {
  type        = string
  description = "GCP project ID that will host the Vertex AI endpoint."
}

variable "region" {
  type        = string
  description = "Region for the endpoint (e.g. us-central1, europe-west4, asia-south1)."

  validation {
    condition     = can(regex("^[a-z]+-[a-z]+[0-9]$", var.region))
    error_message = "region must be a valid GCP region like us-central1 or asia-south1."
  }
}

variable "environment" {
  type        = string
  description = "Deployment environment, used in the display name and labels."

  validation {
    condition     = contains(["dev", "stage", "prod"], var.environment)
    error_message = "environment must be one of: dev, stage, prod."
  }
}

variable "name_prefix" {
  type        = string
  description = "Short prefix for the endpoint display name (e.g. fraud-scoring)."

  validation {
    condition     = can(regex("^[a-z][a-z0-9-]{1,40}$", var.name_prefix))
    error_message = "name_prefix must be lowercase alphanumeric/hyphen, 2-41 chars, starting with a letter."
  }
}

variable "endpoint_id" {
  type        = string
  description = "Optional user-specified resource ID for the endpoint. If null, GCP auto-generates a numeric ID."
  default     = null

  validation {
    condition     = var.endpoint_id == null || can(regex("^[a-z0-9-]{1,63}$", coalesce(var.endpoint_id, "x")))
    error_message = "endpoint_id must be lowercase alphanumeric/hyphen, up to 63 chars."
  }
}

variable "description" {
  type        = string
  description = "Human-readable description for the endpoint."
  default     = "Managed by Terraform — Vertex AI online prediction endpoint."
}

variable "encryption_kms_key_name" {
  type        = string
  description = "Fully-qualified Cloud KMS CryptoKey ID for CMEK (projects/.../locations/.../keyRings/.../cryptoKeys/...). Null = Google-managed encryption."
  default     = null

  validation {
    condition     = var.encryption_kms_key_name == null || can(regex("^projects/[^/]+/locations/[^/]+/keyRings/[^/]+/cryptoKeys/[^/]+$", var.encryption_kms_key_name))
    error_message = "encryption_kms_key_name must be a fully-qualified Cloud KMS CryptoKey resource ID."
  }
}

variable "peered_network" {
  type        = string
  description = "Fully-qualified VPC network (projects/{number}/global/networks/{name}) for legacy private serving via VPC peering. Mutually exclusive with private_service_connect."
  default     = null
}

variable "private_service_connect" {
  type = object({
    project_allowlist = list(string)
  })
  description = "Enable Private Service Connect serving. project_allowlist is the set of consumer project IDs/numbers allowed to create PSC endpoints to this service. Mutually exclusive with peered_network."
  default     = null

  validation {
    condition     = var.private_service_connect == null || length(var.private_service_connect.project_allowlist) > 0
    error_message = "private_service_connect.project_allowlist must contain at least one consumer project."
  }
}

variable "predictor_members" {
  type        = list(string)
  description = "IAM members (user:, serviceAccount:, group:) to grant roles/aiplatform.user for invoking predictions."
  default     = []
}

variable "labels" {
  type        = map(string)
  description = "Additional labels merged onto the endpoint."
  default     = {}
}

outputs.tf

output "id" {
  description = "Fully-qualified endpoint resource ID (projects/.../locations/.../endpoints/...)."
  value       = google_vertex_ai_endpoint.this.id
}

output "name" {
  description = "Short numeric/string resource name of the endpoint (the segment used in API paths)."
  value       = google_vertex_ai_endpoint.this.name
}

output "display_name" {
  description = "Human-readable display name shown in the Vertex AI console."
  value       = google_vertex_ai_endpoint.this.display_name
}

output "region" {
  description = "Region the endpoint lives in — handy for building the regional API host."
  value       = google_vertex_ai_endpoint.this.location
}

output "predict_uri" {
  description = "Regional REST URI for :predict calls against this endpoint."
  value       = "https://${google_vertex_ai_endpoint.this.location}-aiplatform.googleapis.com/v1/${google_vertex_ai_endpoint.this.id}:predict"
}

output "service_attachment" {
  description = "PSC service attachment URI (only set when Private Service Connect is enabled); use it to create consumer-side PSC endpoints."
  value       = try(google_vertex_ai_endpoint.this.private_service_connect_config[0].service_attachment, null)
}

output "vertex_service_agent_email" {
  description = "Email of the Vertex AI service identity (the SA that must hold CMEK encrypt/decrypt)."
  value       = google_project_service_identity.vertex.email
}

How to use it

module "vertex_ai" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-ai?ref=v1.0.0"

  project_id  = "kv-ml-prod-7421"
  region      = "asia-south1"
  environment = "prod"
  name_prefix = "fraud-scoring"
  endpoint_id = "fraud-scoring-rt"

  # CMEK so prediction payloads + model artifacts are encrypted with our key.
  encryption_kms_key_name = "projects/kv-ml-prod-7421/locations/asia-south1/keyRings/vertex/cryptoKeys/endpoint-cmek"

  # Private-only serving: only these consumer projects may reach the endpoint.
  private_service_connect = {
    project_allowlist = [
      "kv-payments-prod-3310",
      "kv-platform-shared-0098",
    ]
  }

  # The online-scoring service account is allowed to call :predict.
  predictor_members = [
    "serviceAccount:scoring-svc@kv-payments-prod-3310.iam.gserviceaccount.com",
  ]

  labels = {
    team        = "risk-ml"
    cost-center = "ml-platform"
  }
}

# Downstream: a Cloud Run scoring service receives the stable endpoint ID and
# region as env vars so it never hard-codes the inference URL.
resource "google_cloud_run_v2_service" "scorer" {
  name     = "txn-scorer"
  location = "asia-south1"
  project  = "kv-payments-prod-3310"

  template {
    containers {
      image = "asia-south1-docker.pkg.dev/kv-payments-prod-3310/svc/txn-scorer:1.8.2"

      env {
        name  = "VERTEX_ENDPOINT_ID"
        value = module.vertex_ai.name
      }
      env {
        name  = "VERTEX_REGION"
        value = module.vertex_ai.region
      }
      env {
        name  = "VERTEX_PREDICT_URI"
        value = module.vertex_ai.predict_uri
      }
    }
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "gcs"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...gcs state bucket/container + key per path...
  }
}

2. Module config — live/prod/vertex_ai/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-ai?ref=v1.0.0"
}

inputs = {
  project_id = "..."
  region = "..."
  environment = "..."
  name_prefix = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/vertex_ai && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`project_id`	`string`	—	Yes	GCP project ID that will host the Vertex AI endpoint.
`region`	`string`	—	Yes	Region for the endpoint (e.g. `us-central1`, `asia-south1`); validated against the GCP region pattern.
`environment`	`string`	—	Yes	Deployment environment; must be `dev`, `stage`, or `prod`. Used in display name and labels.
`name_prefix`	`string`	—	Yes	Lowercase prefix for the endpoint display name (e.g. `fraud-scoring`).
`endpoint_id`	`string`	`null`	No	Optional user-specified resource ID; if null, GCP auto-generates a numeric ID.
`description`	`string`	`"Managed by Terraform — Vertex AI online prediction endpoint."`	No	Human-readable description for the endpoint.
`encryption_kms_key_name`	`string`	`null`	No	Fully-qualified Cloud KMS CryptoKey ID for CMEK; null means Google-managed encryption.
`peered_network`	`string`	`null`	No	Fully-qualified VPC network for legacy private serving via VPC peering. Mutually exclusive with `private_service_connect`.
`private_service_connect`	`object({ project_allowlist = list(string) })`	`null`	No	Enable PSC private serving with a consumer-project allow-list. Mutually exclusive with `peered_network`.
`predictor_members`	`list(string)`	`[]`	No	IAM members granted `roles/aiplatform.user` to invoke predictions.
`labels`	`map(string)`	`{}`	No	Additional labels merged onto the endpoint.

Outputs

Name	Description
`id`	Fully-qualified endpoint resource ID (`projects/.../locations/.../endpoints/...`).
`name`	Short resource name segment used in API paths.
`display_name`	Human-readable display name shown in the Vertex AI console.
`region`	Region the endpoint lives in.
`predict_uri`	Regional REST URI for `:predict` calls against this endpoint.
`service_attachment`	PSC service attachment URI (only when Private Service Connect is enabled).
`vertex_service_agent_email`	Email of the Vertex AI service identity that must hold CMEK encrypt/decrypt.

Enterprise scenario

A payments risk team runs real-time transaction fraud scoring in asia-south1. They stamp out one prod endpoint per model family with this module, pinned to Private Service Connect so the inference URL is unreachable from the public internet and only the payments and shared-platform projects appear in the project_allowlist. CMEK is mandated by their PCI scope, so every endpoint encrypts model artifacts and prediction payloads with a regional Cloud KMS key the team owns and rotates. Because deployed_models and traffic_split are ignored in the module’s lifecycle, their CD pipeline can canary a freshly retrained model to 5% of traffic and ramp it without any terraform apply ever showing drift.

Best practices

Keep the endpoint immutable, let CD own the model. Networking posture (PSC vs. peering) and CMEK are fixed at create time, so guard them with ignore_changes on deployed_models/traffic_split and a precondition that blocks accidental network changes — a destroy/recreate would change the stable serving URL every consumer depends on.
Default to Private Service Connect, not public. For any endpoint handling regulated or PII-bearing payloads, set private_service_connect with a tight project_allowlist and wire consumers off the service_attachment output rather than exposing a public :predict URL.
Grant the CMEK key to the Vertex service agent first. Encryption fails if the aiplatform.googleapis.com service identity lacks roles/cloudkms.cryptoKeyEncrypterDecrypter; the module creates that binding and depends_ons it so the key grant lands before the endpoint is created. Use a regional key in the same region as the endpoint.
Control cost at deploy time, not at the endpoint. The endpoint itself is free; the spend comes from the machine pool of the deployed model. Right-size machine_type, cap max_replica_count, and enable scale-to-zero-eligible dedicated resources in your deployment step — an idle over-provisioned replica pool is the usual bill shock.
Name and label for chargeback. Use name_prefix/environment to produce predictable fraud-scoring-prod-endpoint display names, and push team/cost-center through labels so Vertex AI inference spend is attributable in billing exports.
Scope predictor IAM narrowly. Grant roles/aiplatform.user only to the specific calling service accounts via predictor_members; avoid project-level Vertex roles so a compromised workload in the project can’t silently invoke or enumerate every model endpoint.