IaC GCP

Terraform Module: GCP Vertex AI — Reproducible, governed model-serving endpoints

Quick take — Build production-ready Vertex AI prediction endpoints with Terraform: VPC-private serving via PSC, CMEK encryption, traffic-split deployed models, and clean outputs you can wire into downstream IaC. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

module "vertex_ai" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-ai?ref=v1.0.0"

  project_id  = "..."  # GCP project ID that will host the Vertex AI endpoint.
  region      = "..."  # Region for the endpoint (e.g. `us-central1`, `asia-sout…
  environment = "..."  # Deployment environment; must be `dev`, `stage`, or `pro…
  name_prefix = "..."  # Lowercase prefix for the endpoint display name (e.g. `f…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Vertex AI is Google Cloud’s managed ML platform. The google_vertex_ai_endpoint resource provisions a prediction endpoint — a stable, regional serving target that exposes a public or private REST/gRPC URL. You upload a trained model into the Model Registry, deploy it onto an endpoint (which spins up a dedicated machine pool behind the scenes), and clients call the endpoint with :predict / :rawPredict requests. The endpoint decouples the address clients depend on from the model version answering the request, so you can roll out a new model behind the same URL.

The endpoint resource itself is deceptively small, but the production decisions around it are not: do you serve over the public internet or pin it to a VPC via Private Service Connect (PSC); are payloads encrypted with a customer-managed key (CMEK); which dedicated PSC consumer projects are allowed to reach a private endpoint; and how do you keep Terraform from fighting the live traffic split that your CD pipeline mutates during canary rollouts. Wrapping all of that in a reusable module gives every team a consistent, governed, auditable way to stand up an inference endpoint without re-deriving the PSC + CMEK + IAM wiring each time.

This module owns the endpoint and its network/encryption posture. It deliberately leaves which model is deployed (and the canary traffic percentages) to your delivery pipeline, because those change far more often than the endpoint’s identity and should not trigger Terraform drift on every deploy.

When to use it

If you only ever click “Deploy to endpoint” in the console for a one-off experiment, the module is overkill — reach for it once an endpoint becomes a dependency something else relies on.

Module structure

terraform-module-gcp-vertex-ai/
├── versions.tf      # provider + required version pinning
├── main.tf          # endpoint, service-identity, PSC/CMEK wiring, IAM
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # id, name, endpoint URL, network config

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # Vertex AI endpoint display names must be <= 128 chars; we keep a tidy,
  # environment-qualified name so endpoints are recognisable in the console.
  endpoint_display_name = "${var.name_prefix}-${var.environment}-endpoint"

  # PSC is mutually exclusive with public/automatic networking. We only emit a
  # PSC block when private serving is requested.
  use_psc = var.private_service_connect != null
}

# Ensure the Vertex AI service identity exists in this project. The CMEK key
# must be granted to THIS service account, so we surface it for the key binding.
resource "google_project_service_identity" "vertex" {
  provider = google-beta

  project = var.project_id
  service = "aiplatform.googleapis.com"
}

# Grant the Vertex AI service agent encrypt/decrypt on the CMEK key so it can
# encrypt model artifacts and online-prediction data at rest.
resource "google_kms_crypto_key_iam_member" "vertex_cmek" {
  count = var.encryption_kms_key_name != null ? 1 : 0

  crypto_key_id = var.encryption_kms_key_name
  role          = "roles/cloudkms.cryptoKeyEncrypterDecrypter"
  member        = "serviceAccount:${google_project_service_identity.vertex.email}"
}

resource "google_vertex_ai_endpoint" "this" {
  provider = google

  # name is the short resource ID segment; display_name is the human label.
  name         = var.endpoint_id
  display_name = local.endpoint_display_name
  description  = var.description
  location     = var.region
  project      = var.project_id

  # Customer-managed encryption for model + online prediction data at rest.
  dynamic "encryption_spec" {
    for_each = var.encryption_kms_key_name != null ? [1] : []
    content {
      kms_key_name = var.encryption_kms_key_name
    }
  }

  # ---- Networking: pick exactly ONE serving posture ----

  # (a) Legacy VPC peering: a fully-qualified network the endpoint peers into.
  network = var.peered_network

  # (b) Private Service Connect: private serving with explicit consumer allow-list.
  dynamic "private_service_connect_config" {
    for_each = local.use_psc ? [var.private_service_connect] : []
    content {
      enable_private_service_connect = true
      project_allowlist              = private_service_connect_config.value.project_allowlist
    }
  }

  labels = merge(
    {
      environment = var.environment
      managed-by  = "terraform"
      component   = "vertex-ai-endpoint"
    },
    var.labels,
  )

  # The deployed model + traffic split are mutated by the CD pipeline at deploy
  # time. Ignore them here so routine canary rollouts never show as drift.
  lifecycle {
    ignore_changes = [
      deployed_models,
      traffic_split,
    ]

    # Networking posture is immutable on an endpoint; protect against an
    # accidental destroy/recreate that would change the stable serving URL.
    precondition {
      condition     = !(var.peered_network != null && local.use_psc)
      error_message = "Set either peered_network OR private_service_connect, not both."
    }
  }

  depends_on = [google_kms_crypto_key_iam_member.vertex_cmek]
}

# Grant caller identities permission to invoke predictions on this endpoint.
resource "google_vertex_ai_endpoint_iam_member" "predictors" {
  for_each = toset(var.predictor_members)

  provider = google

  project  = var.project_id
  location = var.region
  endpoint = google_vertex_ai_endpoint.this.name
  role     = "roles/aiplatform.user"
  member   = each.value
}

variables.tf

variable "project_id" {
  type        = string
  description = "GCP project ID that will host the Vertex AI endpoint."
}

variable "region" {
  type        = string
  description = "Region for the endpoint (e.g. us-central1, europe-west4, asia-south1)."

  validation {
    condition     = can(regex("^[a-z]+-[a-z]+[0-9]$", var.region))
    error_message = "region must be a valid GCP region like us-central1 or asia-south1."
  }
}

variable "environment" {
  type        = string
  description = "Deployment environment, used in the display name and labels."

  validation {
    condition     = contains(["dev", "stage", "prod"], var.environment)
    error_message = "environment must be one of: dev, stage, prod."
  }
}

variable "name_prefix" {
  type        = string
  description = "Short prefix for the endpoint display name (e.g. fraud-scoring)."

  validation {
    condition     = can(regex("^[a-z][a-z0-9-]{1,40}$", var.name_prefix))
    error_message = "name_prefix must be lowercase alphanumeric/hyphen, 2-41 chars, starting with a letter."
  }
}

variable "endpoint_id" {
  type        = string
  description = "Optional user-specified resource ID for the endpoint. If null, GCP auto-generates a numeric ID."
  default     = null

  validation {
    condition     = var.endpoint_id == null || can(regex("^[a-z0-9-]{1,63}$", coalesce(var.endpoint_id, "x")))
    error_message = "endpoint_id must be lowercase alphanumeric/hyphen, up to 63 chars."
  }
}

variable "description" {
  type        = string
  description = "Human-readable description for the endpoint."
  default     = "Managed by Terraform — Vertex AI online prediction endpoint."
}

variable "encryption_kms_key_name" {
  type        = string
  description = "Fully-qualified Cloud KMS CryptoKey ID for CMEK (projects/.../locations/.../keyRings/.../cryptoKeys/...). Null = Google-managed encryption."
  default     = null

  validation {
    condition     = var.encryption_kms_key_name == null || can(regex("^projects/[^/]+/locations/[^/]+/keyRings/[^/]+/cryptoKeys/[^/]+$", var.encryption_kms_key_name))
    error_message = "encryption_kms_key_name must be a fully-qualified Cloud KMS CryptoKey resource ID."
  }
}

variable "peered_network" {
  type        = string
  description = "Fully-qualified VPC network (projects/{number}/global/networks/{name}) for legacy private serving via VPC peering. Mutually exclusive with private_service_connect."
  default     = null
}

variable "private_service_connect" {
  type = object({
    project_allowlist = list(string)
  })
  description = "Enable Private Service Connect serving. project_allowlist is the set of consumer project IDs/numbers allowed to create PSC endpoints to this service. Mutually exclusive with peered_network."
  default     = null

  validation {
    condition     = var.private_service_connect == null || length(var.private_service_connect.project_allowlist) > 0
    error_message = "private_service_connect.project_allowlist must contain at least one consumer project."
  }
}

variable "predictor_members" {
  type        = list(string)
  description = "IAM members (user:, serviceAccount:, group:) to grant roles/aiplatform.user for invoking predictions."
  default     = []
}

variable "labels" {
  type        = map(string)
  description = "Additional labels merged onto the endpoint."
  default     = {}
}

outputs.tf

output "id" {
  description = "Fully-qualified endpoint resource ID (projects/.../locations/.../endpoints/...)."
  value       = google_vertex_ai_endpoint.this.id
}

output "name" {
  description = "Short numeric/string resource name of the endpoint (the segment used in API paths)."
  value       = google_vertex_ai_endpoint.this.name
}

output "display_name" {
  description = "Human-readable display name shown in the Vertex AI console."
  value       = google_vertex_ai_endpoint.this.display_name
}

output "region" {
  description = "Region the endpoint lives in — handy for building the regional API host."
  value       = google_vertex_ai_endpoint.this.location
}

output "predict_uri" {
  description = "Regional REST URI for :predict calls against this endpoint."
  value       = "https://${google_vertex_ai_endpoint.this.location}-aiplatform.googleapis.com/v1/${google_vertex_ai_endpoint.this.id}:predict"
}

output "service_attachment" {
  description = "PSC service attachment URI (only set when Private Service Connect is enabled); use it to create consumer-side PSC endpoints."
  value       = try(google_vertex_ai_endpoint.this.private_service_connect_config[0].service_attachment, null)
}

output "vertex_service_agent_email" {
  description = "Email of the Vertex AI service identity (the SA that must hold CMEK encrypt/decrypt)."
  value       = google_project_service_identity.vertex.email
}

How to use it

module "vertex_ai" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-ai?ref=v1.0.0"

  project_id  = "kv-ml-prod-7421"
  region      = "asia-south1"
  environment = "prod"
  name_prefix = "fraud-scoring"
  endpoint_id = "fraud-scoring-rt"

  # CMEK so prediction payloads + model artifacts are encrypted with our key.
  encryption_kms_key_name = "projects/kv-ml-prod-7421/locations/asia-south1/keyRings/vertex/cryptoKeys/endpoint-cmek"

  # Private-only serving: only these consumer projects may reach the endpoint.
  private_service_connect = {
    project_allowlist = [
      "kv-payments-prod-3310",
      "kv-platform-shared-0098",
    ]
  }

  # The online-scoring service account is allowed to call :predict.
  predictor_members = [
    "serviceAccount:scoring-svc@kv-payments-prod-3310.iam.gserviceaccount.com",
  ]

  labels = {
    team        = "risk-ml"
    cost-center = "ml-platform"
  }
}

# Downstream: a Cloud Run scoring service receives the stable endpoint ID and
# region as env vars so it never hard-codes the inference URL.
resource "google_cloud_run_v2_service" "scorer" {
  name     = "txn-scorer"
  location = "asia-south1"
  project  = "kv-payments-prod-3310"

  template {
    containers {
      image = "asia-south1-docker.pkg.dev/kv-payments-prod-3310/svc/txn-scorer:1.8.2"

      env {
        name  = "VERTEX_ENDPOINT_ID"
        value = module.vertex_ai.name
      }
      env {
        name  = "VERTEX_REGION"
        value = module.vertex_ai.region
      }
      env {
        name  = "VERTEX_PREDICT_URI"
        value = module.vertex_ai.predict_uri
      }
    }
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "gcs"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...gcs state bucket/container + key per path...
  }
}

2. Module configlive/prod/vertex_ai/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-vertex-ai?ref=v1.0.0"
}

inputs = {
  project_id = "..."
  region = "..."
  environment = "..."
  name_prefix = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/vertex_ai && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
project_id string Yes GCP project ID that will host the Vertex AI endpoint.
region string Yes Region for the endpoint (e.g. us-central1, asia-south1); validated against the GCP region pattern.
environment string Yes Deployment environment; must be dev, stage, or prod. Used in display name and labels.
name_prefix string Yes Lowercase prefix for the endpoint display name (e.g. fraud-scoring).
endpoint_id string null No Optional user-specified resource ID; if null, GCP auto-generates a numeric ID.
description string "Managed by Terraform — Vertex AI online prediction endpoint." No Human-readable description for the endpoint.
encryption_kms_key_name string null No Fully-qualified Cloud KMS CryptoKey ID for CMEK; null means Google-managed encryption.
peered_network string null No Fully-qualified VPC network for legacy private serving via VPC peering. Mutually exclusive with private_service_connect.
private_service_connect object({ project_allowlist = list(string) }) null No Enable PSC private serving with a consumer-project allow-list. Mutually exclusive with peered_network.
predictor_members list(string) [] No IAM members granted roles/aiplatform.user to invoke predictions.
labels map(string) {} No Additional labels merged onto the endpoint.

Outputs

Name Description
id Fully-qualified endpoint resource ID (projects/.../locations/.../endpoints/...).
name Short resource name segment used in API paths.
display_name Human-readable display name shown in the Vertex AI console.
region Region the endpoint lives in.
predict_uri Regional REST URI for :predict calls against this endpoint.
service_attachment PSC service attachment URI (only when Private Service Connect is enabled).
vertex_service_agent_email Email of the Vertex AI service identity that must hold CMEK encrypt/decrypt.

Enterprise scenario

A payments risk team runs real-time transaction fraud scoring in asia-south1. They stamp out one prod endpoint per model family with this module, pinned to Private Service Connect so the inference URL is unreachable from the public internet and only the payments and shared-platform projects appear in the project_allowlist. CMEK is mandated by their PCI scope, so every endpoint encrypts model artifacts and prediction payloads with a regional Cloud KMS key the team owns and rotates. Because deployed_models and traffic_split are ignored in the module’s lifecycle, their CD pipeline can canary a freshly retrained model to 5% of traffic and ramp it without any terraform apply ever showing drift.

Best practices

TerraformGCPVertex AIModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading