Quick take — A reusable hashicorp/google ~> 5.0 module for google_cloud_run_v2_service: autoscaling, concurrency, secrets from Secret Manager, VPC egress, health probes, and a least-privilege runtime service account. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "google" {
project = "my-project"
region = "us-central1"
}
module "cloud_run" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-run?ref=v1.0.0"
project_id = "..." # GCP project ID hosting the service.
name = "..." # Service name; RFC1035, lowercase, <= 49 chars.
location = "..." # Region, e.g. `asia-south1`.
image = "..." # Container image, ideally pinned by digest.
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Cloud Run is GCP’s fully managed serverless container platform. You hand it a container image, and it runs that image behind an HTTPS endpoint, scaling the number of instances from zero up to your ceiling based on incoming traffic — you pay (by default) only while a request is being served. There are no nodes to patch, no autoscaler to tune, and no load balancer to wire up for the basic case.
The trouble is that a correct Cloud Run service is rarely just an image and a port. In production you almost always need: a dedicated runtime service account (not the default Compute SA with project-wide Editor), CPU/memory limits, an autoscaling floor and ceiling, request concurrency tuning, secrets injected from Secret Manager rather than baked into the image, startup/liveness probes so bad revisions never take traffic, and frequently private egress through a VPC connector to reach Cloud SQL or internal APIs. Hand-writing the google_cloud_run_v2_service block for every service means every team re-derives those settings — and gets the security-sensitive ones subtly wrong.
This module wraps google_cloud_run_v2_service (the v2 / Knative-free API) into a single, opinionated, variable-driven block. It creates a least-privilege runtime service account, wires Secret Manager references as environment variables, sets sane resource and scaling defaults, and exposes the service URL and revision name as outputs so downstream resources (a load balancer, a DNS record, a Pub/Sub push subscription) can consume them.
When to use it
- You deploy stateless HTTP/gRPC containers and want autoscaling-to-zero without managing GKE or instance groups.
- You have many similar services (APIs, BFFs, webhook handlers, internal tools) and want one consistent, reviewed pattern instead of bespoke blocks per repo.
- You need secrets from Secret Manager mounted as env vars, and a dedicated runtime identity per service for least-privilege IAM.
- You front the service with an external HTTPS Load Balancer (via a Serverless NEG) or expose it directly, and want the URL as a clean Terraform output.
- You need private egress to a VPC (Cloud SQL private IP, internal Memorystore, on-prem over Interconnect) via a Direct VPC egress or a Serverless VPC Access connector.
Skip it if you need long-lived stateful workloads, GPU/TPU batch jobs better suited to Cloud Run Jobs or GKE, or sub-millisecond cold-start guarantees that only always-on infrastructure provides.
Module structure
terraform-module-gcp-cloud-run/
├── versions.tf # provider + required_version pins
├── main.tf # runtime SA, IAM, the v2 service, invoker binding
├── variables.tf # var-driven inputs with validation
└── outputs.tf # id/name, url, revision, service account email
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
main.tf
locals {
# A stable, predictable runtime SA id derived from the service name.
# SA account_id must be 6-30 chars, lowercase, start with a letter.
service_account_id = substr("${var.name}-run", 0, 30)
}
# Dedicated least-privilege runtime identity for this service.
resource "google_service_account" "runtime" {
count = var.create_service_account ? 1 : 0
project = var.project_id
account_id = local.service_account_id
display_name = "Cloud Run runtime SA for ${var.name}"
description = "Identity assumed by the ${var.name} Cloud Run service at runtime."
}
locals {
runtime_sa_email = var.create_service_account ? google_service_account.runtime[0].email : var.service_account_email
}
# Allow the runtime SA to read each referenced secret. The module only grants
# access to the exact secrets the service consumes, at the secret level.
resource "google_secret_manager_secret_iam_member" "runtime_accessor" {
for_each = { for s in var.secret_env : s.name => s }
project = var.project_id
secret_id = each.value.secret_id
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${local.runtime_sa_email}"
}
resource "google_cloud_run_v2_service" "this" {
project = var.project_id
name = var.name
location = var.location
ingress = var.ingress
deletion_protection = var.deletion_protection
labels = var.labels
template {
service_account = local.runtime_sa_email
timeout = "${var.request_timeout_seconds}s"
max_instance_request_concurrency = var.max_concurrency
execution_environment = var.execution_environment
scaling {
min_instance_count = var.min_instances
max_instance_count = var.max_instances
}
# Optional private egress into a VPC (Cloud SQL private IP, internal APIs).
dynamic "vpc_access" {
for_each = var.vpc_connector == null && length(var.network_interfaces) == 0 ? [] : [1]
content {
connector = var.vpc_connector
egress = var.vpc_egress
dynamic "network_interfaces" {
for_each = var.network_interfaces
content {
network = network_interfaces.value.network
subnetwork = network_interfaces.value.subnetwork
tags = lookup(network_interfaces.value, "tags", null)
}
}
}
}
containers {
image = var.image
dynamic "ports" {
for_each = var.container_port == null ? [] : [var.container_port]
content {
container_port = ports.value
}
}
resources {
limits = var.resource_limits
cpu_idle = var.cpu_idle
startup_cpu_boost = var.startup_cpu_boost
}
# Plain (non-secret) environment variables.
dynamic "env" {
for_each = var.env
content {
name = env.key
value = env.value
}
}
# Secret-backed environment variables sourced from Secret Manager.
dynamic "env" {
for_each = { for s in var.secret_env : s.name => s }
content {
name = env.value.name
value_source {
secret_key_ref {
secret = env.value.secret_id
version = lookup(env.value, "version", "latest")
}
}
}
}
# Startup probe: a revision only receives traffic once this passes.
dynamic "startup_probe" {
for_each = var.startup_probe_path == null ? [] : [1]
content {
initial_delay_seconds = var.startup_probe_initial_delay
period_seconds = var.startup_probe_period
failure_threshold = var.startup_probe_failure_threshold
timeout_seconds = var.startup_probe_timeout
http_get {
path = var.startup_probe_path
port = var.container_port
}
}
}
# Liveness probe: a failing instance is restarted.
dynamic "liveness_probe" {
for_each = var.liveness_probe_path == null ? [] : [1]
content {
period_seconds = var.liveness_probe_period
failure_threshold = var.liveness_probe_failure_threshold
timeout_seconds = var.liveness_probe_timeout
http_get {
path = var.liveness_probe_path
port = var.container_port
}
}
}
}
}
# Traffic always points at the latest healthy revision unless overridden.
traffic {
type = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
percent = 100
}
}
# Who may invoke the service. For a public endpoint, pass ["allUsers"].
# For internal-only, pass the calling service accounts as members.
resource "google_cloud_run_v2_service_iam_member" "invokers" {
for_each = toset(var.invokers)
project = var.project_id
location = google_cloud_run_v2_service.this.location
name = google_cloud_run_v2_service.this.name
role = "roles/run.invoker"
member = each.value
}
variables.tf
variable "project_id" {
description = "GCP project ID that hosts the Cloud Run service."
type = string
}
variable "name" {
description = "Cloud Run service name (lowercase, RFC1035: letters, digits, hyphens; <= 49 chars)."
type = string
validation {
condition = can(regex("^[a-z]([-a-z0-9]*[a-z0-9])?$", var.name)) && length(var.name) <= 49
error_message = "name must be lowercase RFC1035 (start with a letter, hyphens allowed) and <= 49 chars."
}
}
variable "location" {
description = "Region for the service, e.g. asia-south1, europe-west1, us-central1."
type = string
}
variable "image" {
description = "Fully qualified container image, ideally pinned by digest (e.g. REGION-docker.pkg.dev/PROJ/REPO/app@sha256:...)."
type = string
}
variable "container_port" {
description = "Port the container listens on. Set null to use Cloud Run's default ($PORT, 8080)."
type = number
default = 8080
}
variable "resource_limits" {
description = "CPU and memory limits for the container. Memory must be >= 512Mi when cpu < 1."
type = map(string)
default = {
cpu = "1"
memory = "512Mi"
}
}
variable "cpu_idle" {
description = "If true, CPU is throttled when no request is in flight (request-based billing). Set false for always-allocated CPU (background work)."
type = bool
default = true
}
variable "startup_cpu_boost" {
description = "Temporarily double CPU during container startup to reduce cold-start latency."
type = bool
default = true
}
variable "min_instances" {
description = "Minimum number of warm instances. 0 allows scale-to-zero; >= 1 removes cold starts at a cost."
type = number
default = 0
validation {
condition = var.min_instances >= 0
error_message = "min_instances must be >= 0."
}
}
variable "max_instances" {
description = "Maximum number of instances the service may scale to."
type = number
default = 10
validation {
condition = var.max_instances >= 1
error_message = "max_instances must be >= 1."
}
}
variable "max_concurrency" {
description = "Max concurrent requests per instance (1-1000). Lower for CPU-bound apps, higher for I/O-bound."
type = number
default = 80
validation {
condition = var.max_concurrency >= 1 && var.max_concurrency <= 1000
error_message = "max_concurrency must be between 1 and 1000."
}
}
variable "request_timeout_seconds" {
description = "Maximum request duration in seconds (1-3600)."
type = number
default = 300
validation {
condition = var.request_timeout_seconds >= 1 && var.request_timeout_seconds <= 3600
error_message = "request_timeout_seconds must be between 1 and 3600."
}
}
variable "execution_environment" {
description = "Sandbox generation: EXECUTION_ENVIRONMENT_GEN1 or EXECUTION_ENVIRONMENT_GEN2 (gen2 needed for NFS/some syscalls)."
type = string
default = "EXECUTION_ENVIRONMENT_GEN2"
validation {
condition = contains(["EXECUTION_ENVIRONMENT_GEN1", "EXECUTION_ENVIRONMENT_GEN2"], var.execution_environment)
error_message = "execution_environment must be EXECUTION_ENVIRONMENT_GEN1 or EXECUTION_ENVIRONMENT_GEN2."
}
}
variable "ingress" {
description = "Ingress setting: INGRESS_TRAFFIC_ALL, INGRESS_TRAFFIC_INTERNAL_ONLY, or INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER."
type = string
default = "INGRESS_TRAFFIC_ALL"
validation {
condition = contains([
"INGRESS_TRAFFIC_ALL",
"INGRESS_TRAFFIC_INTERNAL_ONLY",
"INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER",
], var.ingress)
error_message = "ingress must be one of INGRESS_TRAFFIC_ALL, INGRESS_TRAFFIC_INTERNAL_ONLY, INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER."
}
}
variable "invokers" {
description = "IAM members granted roles/run.invoker. Use [\"allUsers\"] for a public endpoint, or specific service accounts for private."
type = list(string)
default = []
}
variable "create_service_account" {
description = "Create a dedicated runtime service account. If false, you must supply service_account_email."
type = bool
default = true
}
variable "service_account_email" {
description = "Existing runtime SA email to use when create_service_account is false."
type = string
default = null
validation {
condition = var.create_service_account || var.service_account_email != null
error_message = "service_account_email is required when create_service_account is false."
}
}
variable "env" {
description = "Plain (non-secret) environment variables as a name => value map."
type = map(string)
default = {}
}
variable "secret_env" {
description = "Secret-backed env vars from Secret Manager. Each: { name, secret_id, version }. version defaults to 'latest'."
type = list(object({
name = string
secret_id = string
version = optional(string, "latest")
}))
default = []
}
variable "vpc_connector" {
description = "Serverless VPC Access connector ID for private egress. Mutually exclusive with network_interfaces (Direct VPC egress)."
type = string
default = null
}
variable "network_interfaces" {
description = "Direct VPC egress interfaces. Each: { network, subnetwork, tags }. Leave empty to use vpc_connector or no VPC."
type = list(object({
network = string
subnetwork = string
tags = optional(list(string))
}))
default = []
}
variable "vpc_egress" {
description = "Egress mode when a VPC is attached: ALL_TRAFFIC or PRIVATE_RANGES_ONLY."
type = string
default = "PRIVATE_RANGES_ONLY"
validation {
condition = contains(["ALL_TRAFFIC", "PRIVATE_RANGES_ONLY"], var.vpc_egress)
error_message = "vpc_egress must be ALL_TRAFFIC or PRIVATE_RANGES_ONLY."
}
}
variable "startup_probe_path" {
description = "HTTP path for the startup probe (e.g. /healthz). Null disables the probe."
type = string
default = null
}
variable "startup_probe_initial_delay" {
description = "Seconds to wait before the first startup probe."
type = number
default = 0
}
variable "startup_probe_period" {
description = "Seconds between startup probes."
type = number
default = 10
}
variable "startup_probe_failure_threshold" {
description = "Consecutive startup probe failures before the revision is marked failed."
type = number
default = 3
}
variable "startup_probe_timeout" {
description = "Per-attempt startup probe timeout in seconds."
type = number
default = 3
}
variable "liveness_probe_path" {
description = "HTTP path for the liveness probe (e.g. /healthz). Null disables the probe."
type = string
default = null
}
variable "liveness_probe_period" {
description = "Seconds between liveness probes."
type = number
default = 30
}
variable "liveness_probe_failure_threshold" {
description = "Consecutive liveness probe failures before the instance is restarted."
type = number
default = 3
}
variable "liveness_probe_timeout" {
description = "Per-attempt liveness probe timeout in seconds."
type = number
default = 3
}
variable "deletion_protection" {
description = "Block accidental deletion of the service via Terraform."
type = bool
default = true
}
variable "labels" {
description = "Labels applied to the Cloud Run service."
type = map(string)
default = {}
}
outputs.tf
output "id" {
description = "Fully qualified Cloud Run service ID."
value = google_cloud_run_v2_service.this.id
}
output "name" {
description = "Name of the Cloud Run service."
value = google_cloud_run_v2_service.this.name
}
output "uri" {
description = "Public HTTPS URL of the service (run.app or custom)."
value = google_cloud_run_v2_service.this.uri
}
output "location" {
description = "Region the service is deployed in."
value = google_cloud_run_v2_service.this.location
}
output "latest_ready_revision" {
description = "Name of the latest revision that is serving / ready."
value = google_cloud_run_v2_service.this.latest_ready_revision
}
output "service_account_email" {
description = "Runtime service account email used by the service."
value = local.runtime_sa_email
}
How to use it
# Secret created/managed elsewhere; the module is granted accessor on it.
resource "google_secret_manager_secret" "db_url" {
project = var.project_id
secret_id = "orders-api-db-url"
replication {
auto {}
}
}
module "cloud_run" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-run?ref=v1.0.0"
project_id = var.project_id
name = "orders-api"
location = "asia-south1"
# Pin by digest in real pipelines; tag shown for readability.
image = "asia-south1-docker.pkg.dev/${var.project_id}/services/orders-api:1.8.2"
container_port = 8080
min_instances = 1 # keep one warm instance to avoid cold starts on a customer-facing API
max_instances = 30
max_concurrency = 60
resource_limits = {
cpu = "2"
memory = "1Gi"
}
env = {
LOG_LEVEL = "info"
REGION = "asia-south1"
}
secret_env = [
{
name = "DATABASE_URL"
secret_id = google_secret_manager_secret.db_url.secret_id
version = "latest"
},
]
# Private egress to Cloud SQL over the VPC.
vpc_connector = "projects/${var.project_id}/locations/asia-south1/connectors/run-conn"
vpc_egress = "PRIVATE_RANGES_ONLY"
startup_probe_path = "/healthz"
liveness_probe_path = "/healthz"
# Fronted by an external HTTPS LB, so keep ingress restricted to the LB.
ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
invokers = ["allUsers"]
labels = {
team = "payments"
environment = "prod"
}
}
# Downstream: attach the service to an external HTTPS Load Balancer via a Serverless NEG.
resource "google_compute_region_network_endpoint_group" "orders_neg" {
project = var.project_id
name = "orders-api-neg"
region = "asia-south1"
network_endpoint_type = "SERVERLESS"
cloud_run {
service = module.cloud_run.name # <- module output wires the NEG to the service
}
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "gcs"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...gcs state bucket/container + key per path...
}
}
2. Module config — live/prod/cloud_run/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-run?ref=v1.0.0"
}
inputs = {
project_id = "..."
name = "..."
location = "..."
image = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/cloud_run && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
| project_id | string | — | yes | GCP project ID hosting the service. |
| name | string | — | yes | Service name; RFC1035, lowercase, <= 49 chars. |
| location | string | — | yes | Region, e.g. asia-south1. |
| image | string | — | yes | Container image, ideally pinned by digest. |
| container_port | number | 8080 |
no | Port the container listens on; null for Cloud Run default. |
| resource_limits | map(string) | {cpu="1",memory="512Mi"} |
no | CPU/memory limits per instance. |
| cpu_idle | bool | true |
no | Throttle CPU between requests (request-based billing). |
| startup_cpu_boost | bool | true |
no | Double CPU during startup to cut cold-start latency. |
| min_instances | number | 0 |
no | Warm instance floor; 0 allows scale-to-zero. |
| max_instances | number | 10 |
no | Instance ceiling. |
| max_concurrency | number | 80 |
no | Concurrent requests per instance (1-1000). |
| request_timeout_seconds | number | 300 |
no | Max request duration (1-3600). |
| execution_environment | string | EXECUTION_ENVIRONMENT_GEN2 |
no | Sandbox generation (gen1/gen2). |
| ingress | string | INGRESS_TRAFFIC_ALL |
no | All / internal-only / internal-LB ingress. |
| invokers | list(string) | [] |
no | IAM members granted roles/run.invoker. |
| create_service_account | bool | true |
no | Create a dedicated runtime SA. |
| service_account_email | string | null |
no | Existing runtime SA email when not creating one. |
| env | map(string) | {} |
no | Plain environment variables. |
| secret_env | list(object) | [] |
no | Secret Manager-backed env vars {name, secret_id, version}. |
| vpc_connector | string | null |
no | Serverless VPC Access connector ID for private egress. |
| network_interfaces | list(object) | [] |
no | Direct VPC egress interfaces {network, subnetwork, tags}. |
| vpc_egress | string | PRIVATE_RANGES_ONLY |
no | Egress mode when a VPC is attached. |
| startup_probe_path | string | null |
no | Startup probe HTTP path; null disables. |
| startup_probe_initial_delay | number | 0 |
no | Delay before first startup probe. |
| startup_probe_period | number | 10 |
no | Seconds between startup probes. |
| startup_probe_failure_threshold | number | 3 |
no | Failures before a revision is marked failed. |
| startup_probe_timeout | number | 3 |
no | Per-attempt startup probe timeout. |
| liveness_probe_path | string | null |
no | Liveness probe HTTP path; null disables. |
| liveness_probe_period | number | 30 |
no | Seconds between liveness probes. |
| liveness_probe_failure_threshold | number | 3 |
no | Failures before an instance restarts. |
| liveness_probe_timeout | number | 3 |
no | Per-attempt liveness probe timeout. |
| deletion_protection | bool | true |
no | Block Terraform deletion of the service. |
| labels | map(string) | {} |
no | Labels on the service. |
Outputs
| Name | Description |
|---|---|
| id | Fully qualified Cloud Run service ID. |
| name | Service name. |
| uri | Public HTTPS URL (run.app or custom domain). |
| location | Region the service runs in. |
| latest_ready_revision | Name of the latest ready/serving revision. |
| service_account_email | Runtime service account email used by the service. |
Enterprise scenario
A payments platform runs roughly 40 internal microservices behind a single external HTTPS Load Balancer. Each team owns a thin root module that calls this Cloud Run module once per service, setting ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER" so the only public path is through the LB (where Cloud Armor and WAF rules live), pulling DB credentials and API keys from Secret Manager via secret_env, and reaching private Cloud SQL through a shared VPC connector. Because every service gets its own runtime service account with accessor rights on only its own secrets, a compromised container cannot read another team’s credentials, and the platform team can audit the entire fleet’s IAM surface from one consistent pattern.
Best practices
- Pin images by digest, not tags. Pass
image = "...@sha256:..."so a revision is immutable and reproducible; a moving:latesttag means a redeploy can silently ship a different image and you lose rollback determinism. - Always use a dedicated runtime SA (least privilege). Keep
create_service_account = trueand grant that SA only the roles it needs (the module already scopes Secret Manager access to the exact secrets). Never run on the default Compute Engine service account, which carries broad project permissions. - Lock down ingress and invocation. For LB-fronted services use
INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER; for service-to-service calls drop"allUsers"frominvokersand list the caller service accounts instead so the endpoint requires authenticatedrun.invoker. - Tune concurrency and CPU for cost. Leave
cpu_idle = truefor request-driven APIs so you only pay during requests, and raisemax_concurrencyfor I/O-bound services to pack more requests per instance — fewer instances means lower cost. Use amin_instancesfloor only on latency-sensitive paths. - Protect against cold starts and bad revisions with probes. Set
startup_probe_pathso traffic shifts only after the app reports healthy, andliveness_probe_pathso wedged instances are recycled; combine withstartup_cpu_boostto shorten cold starts. - Standardize naming and labels. Keep service names RFC1035 and prefix by domain (
orders-api,payments-bff), and populatelabelswithteam/environmentso billing export and Cloud Monitoring can slice cost and SLOs per owner.