Terraform Module: GCP Cloud Run — Production-Ready Serverless Containers in One Block

Quick take — A reusable hashicorp/google ~> 5.0 module for google_cloud_run_v2_service: autoscaling, concurrency, secrets from Secret Manager, VPC egress, health probes, and a least-privilege runtime service account. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

module "cloud_run" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-run?ref=v1.0.0"

  project_id = "..."  # GCP project ID hosting the service.
  name       = "..."  # Service name; RFC1035, lowercase, <= 49 chars.
  location   = "..."  # Region, e.g. `asia-south1`.
  image      = "..."  # Container image, ideally pinned by digest.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Cloud Run is GCP’s fully managed serverless container platform. You hand it a container image, and it runs that image behind an HTTPS endpoint, scaling the number of instances from zero up to your ceiling based on incoming traffic — you pay (by default) only while a request is being served. There are no nodes to patch, no autoscaler to tune, and no load balancer to wire up for the basic case.

The trouble is that a correct Cloud Run service is rarely just an image and a port. In production you almost always need: a dedicated runtime service account (not the default Compute SA with project-wide Editor), CPU/memory limits, an autoscaling floor and ceiling, request concurrency tuning, secrets injected from Secret Manager rather than baked into the image, startup/liveness probes so bad revisions never take traffic, and frequently private egress through a VPC connector to reach Cloud SQL or internal APIs. Hand-writing the google_cloud_run_v2_service block for every service means every team re-derives those settings — and gets the security-sensitive ones subtly wrong.

This module wraps google_cloud_run_v2_service (the v2 / Knative-free API) into a single, opinionated, variable-driven block. It creates a least-privilege runtime service account, wires Secret Manager references as environment variables, sets sane resource and scaling defaults, and exposes the service URL and revision name as outputs so downstream resources (a load balancer, a DNS record, a Pub/Sub push subscription) can consume them.

When to use it

You deploy stateless HTTP/gRPC containers and want autoscaling-to-zero without managing GKE or instance groups.
You have many similar services (APIs, BFFs, webhook handlers, internal tools) and want one consistent, reviewed pattern instead of bespoke blocks per repo.
You need secrets from Secret Manager mounted as env vars, and a dedicated runtime identity per service for least-privilege IAM.
You front the service with an external HTTPS Load Balancer (via a Serverless NEG) or expose it directly, and want the URL as a clean Terraform output.
You need private egress to a VPC (Cloud SQL private IP, internal Memorystore, on-prem over Interconnect) via a Direct VPC egress or a Serverless VPC Access connector.

Skip it if you need long-lived stateful workloads, GPU/TPU batch jobs better suited to Cloud Run Jobs or GKE, or sub-millisecond cold-start guarantees that only always-on infrastructure provides.

Module structure

terraform-module-gcp-cloud-run/
├── versions.tf      # provider + required_version pins
├── main.tf          # runtime SA, IAM, the v2 service, invoker binding
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # id/name, url, revision, service account email

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # A stable, predictable runtime SA id derived from the service name.
  # SA account_id must be 6-30 chars, lowercase, start with a letter.
  service_account_id = substr("${var.name}-run", 0, 30)
}

# Dedicated least-privilege runtime identity for this service.
resource "google_service_account" "runtime" {
  count = var.create_service_account ? 1 : 0

  project      = var.project_id
  account_id   = local.service_account_id
  display_name = "Cloud Run runtime SA for ${var.name}"
  description  = "Identity assumed by the ${var.name} Cloud Run service at runtime."
}

locals {
  runtime_sa_email = var.create_service_account ? google_service_account.runtime[0].email : var.service_account_email
}

# Allow the runtime SA to read each referenced secret. The module only grants
# access to the exact secrets the service consumes, at the secret level.
resource "google_secret_manager_secret_iam_member" "runtime_accessor" {
  for_each = { for s in var.secret_env : s.name => s }

  project   = var.project_id
  secret_id = each.value.secret_id
  role      = "roles/secretmanager.secretAccessor"
  member    = "serviceAccount:${local.runtime_sa_email}"
}

resource "google_cloud_run_v2_service" "this" {
  project             = var.project_id
  name                = var.name
  location            = var.location
  ingress             = var.ingress
  deletion_protection = var.deletion_protection

  labels = var.labels

  template {
    service_account                  = local.runtime_sa_email
    timeout                          = "${var.request_timeout_seconds}s"
    max_instance_request_concurrency = var.max_concurrency
    execution_environment            = var.execution_environment

    scaling {
      min_instance_count = var.min_instances
      max_instance_count = var.max_instances
    }

    # Optional private egress into a VPC (Cloud SQL private IP, internal APIs).
    dynamic "vpc_access" {
      for_each = var.vpc_connector == null && length(var.network_interfaces) == 0 ? [] : [1]
      content {
        connector = var.vpc_connector
        egress    = var.vpc_egress

        dynamic "network_interfaces" {
          for_each = var.network_interfaces
          content {
            network    = network_interfaces.value.network
            subnetwork = network_interfaces.value.subnetwork
            tags       = lookup(network_interfaces.value, "tags", null)
          }
        }
      }
    }

    containers {
      image = var.image

      dynamic "ports" {
        for_each = var.container_port == null ? [] : [var.container_port]
        content {
          container_port = ports.value
        }
      }

      resources {
        limits            = var.resource_limits
        cpu_idle          = var.cpu_idle
        startup_cpu_boost = var.startup_cpu_boost
      }

      # Plain (non-secret) environment variables.
      dynamic "env" {
        for_each = var.env
        content {
          name  = env.key
          value = env.value
        }
      }

      # Secret-backed environment variables sourced from Secret Manager.
      dynamic "env" {
        for_each = { for s in var.secret_env : s.name => s }
        content {
          name = env.value.name
          value_source {
            secret_key_ref {
              secret  = env.value.secret_id
              version = lookup(env.value, "version", "latest")
            }
          }
        }
      }

      # Startup probe: a revision only receives traffic once this passes.
      dynamic "startup_probe" {
        for_each = var.startup_probe_path == null ? [] : [1]
        content {
          initial_delay_seconds = var.startup_probe_initial_delay
          period_seconds        = var.startup_probe_period
          failure_threshold     = var.startup_probe_failure_threshold
          timeout_seconds       = var.startup_probe_timeout
          http_get {
            path = var.startup_probe_path
            port = var.container_port
          }
        }
      }

      # Liveness probe: a failing instance is restarted.
      dynamic "liveness_probe" {
        for_each = var.liveness_probe_path == null ? [] : [1]
        content {
          period_seconds    = var.liveness_probe_period
          failure_threshold = var.liveness_probe_failure_threshold
          timeout_seconds   = var.liveness_probe_timeout
          http_get {
            path = var.liveness_probe_path
            port = var.container_port
          }
        }
      }
    }
  }

  # Traffic always points at the latest healthy revision unless overridden.
  traffic {
    type    = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
    percent = 100
  }
}

# Who may invoke the service. For a public endpoint, pass ["allUsers"].
# For internal-only, pass the calling service accounts as members.
resource "google_cloud_run_v2_service_iam_member" "invokers" {
  for_each = toset(var.invokers)

  project  = var.project_id
  location = google_cloud_run_v2_service.this.location
  name     = google_cloud_run_v2_service.this.name
  role     = "roles/run.invoker"
  member   = each.value
}

variables.tf

variable "project_id" {
  description = "GCP project ID that hosts the Cloud Run service."
  type        = string
}

variable "name" {
  description = "Cloud Run service name (lowercase, RFC1035: letters, digits, hyphens; <= 49 chars)."
  type        = string

  validation {
    condition     = can(regex("^[a-z]([-a-z0-9]*[a-z0-9])?$", var.name)) && length(var.name) <= 49
    error_message = "name must be lowercase RFC1035 (start with a letter, hyphens allowed) and <= 49 chars."
  }
}

variable "location" {
  description = "Region for the service, e.g. asia-south1, europe-west1, us-central1."
  type        = string
}

variable "image" {
  description = "Fully qualified container image, ideally pinned by digest (e.g. REGION-docker.pkg.dev/PROJ/REPO/app@sha256:...)."
  type        = string
}

variable "container_port" {
  description = "Port the container listens on. Set null to use Cloud Run's default ($PORT, 8080)."
  type        = number
  default     = 8080
}

variable "resource_limits" {
  description = "CPU and memory limits for the container. Memory must be >= 512Mi when cpu < 1."
  type        = map(string)
  default = {
    cpu    = "1"
    memory = "512Mi"
  }
}

variable "cpu_idle" {
  description = "If true, CPU is throttled when no request is in flight (request-based billing). Set false for always-allocated CPU (background work)."
  type        = bool
  default     = true
}

variable "startup_cpu_boost" {
  description = "Temporarily double CPU during container startup to reduce cold-start latency."
  type        = bool
  default     = true
}

variable "min_instances" {
  description = "Minimum number of warm instances. 0 allows scale-to-zero; >= 1 removes cold starts at a cost."
  type        = number
  default     = 0

  validation {
    condition     = var.min_instances >= 0
    error_message = "min_instances must be >= 0."
  }
}

variable "max_instances" {
  description = "Maximum number of instances the service may scale to."
  type        = number
  default     = 10

  validation {
    condition     = var.max_instances >= 1
    error_message = "max_instances must be >= 1."
  }
}

variable "max_concurrency" {
  description = "Max concurrent requests per instance (1-1000). Lower for CPU-bound apps, higher for I/O-bound."
  type        = number
  default     = 80

  validation {
    condition     = var.max_concurrency >= 1 && var.max_concurrency <= 1000
    error_message = "max_concurrency must be between 1 and 1000."
  }
}

variable "request_timeout_seconds" {
  description = "Maximum request duration in seconds (1-3600)."
  type        = number
  default     = 300

  validation {
    condition     = var.request_timeout_seconds >= 1 && var.request_timeout_seconds <= 3600
    error_message = "request_timeout_seconds must be between 1 and 3600."
  }
}

variable "execution_environment" {
  description = "Sandbox generation: EXECUTION_ENVIRONMENT_GEN1 or EXECUTION_ENVIRONMENT_GEN2 (gen2 needed for NFS/some syscalls)."
  type        = string
  default     = "EXECUTION_ENVIRONMENT_GEN2"

  validation {
    condition     = contains(["EXECUTION_ENVIRONMENT_GEN1", "EXECUTION_ENVIRONMENT_GEN2"], var.execution_environment)
    error_message = "execution_environment must be EXECUTION_ENVIRONMENT_GEN1 or EXECUTION_ENVIRONMENT_GEN2."
  }
}

variable "ingress" {
  description = "Ingress setting: INGRESS_TRAFFIC_ALL, INGRESS_TRAFFIC_INTERNAL_ONLY, or INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER."
  type        = string
  default     = "INGRESS_TRAFFIC_ALL"

  validation {
    condition = contains([
      "INGRESS_TRAFFIC_ALL",
      "INGRESS_TRAFFIC_INTERNAL_ONLY",
      "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER",
    ], var.ingress)
    error_message = "ingress must be one of INGRESS_TRAFFIC_ALL, INGRESS_TRAFFIC_INTERNAL_ONLY, INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER."
  }
}

variable "invokers" {
  description = "IAM members granted roles/run.invoker. Use [\"allUsers\"] for a public endpoint, or specific service accounts for private."
  type        = list(string)
  default     = []
}

variable "create_service_account" {
  description = "Create a dedicated runtime service account. If false, you must supply service_account_email."
  type        = bool
  default     = true
}

variable "service_account_email" {
  description = "Existing runtime SA email to use when create_service_account is false."
  type        = string
  default     = null

  validation {
    condition     = var.create_service_account || var.service_account_email != null
    error_message = "service_account_email is required when create_service_account is false."
  }
}

variable "env" {
  description = "Plain (non-secret) environment variables as a name => value map."
  type        = map(string)
  default     = {}
}

variable "secret_env" {
  description = "Secret-backed env vars from Secret Manager. Each: { name, secret_id, version }. version defaults to 'latest'."
  type = list(object({
    name      = string
    secret_id = string
    version   = optional(string, "latest")
  }))
  default = []
}

variable "vpc_connector" {
  description = "Serverless VPC Access connector ID for private egress. Mutually exclusive with network_interfaces (Direct VPC egress)."
  type        = string
  default     = null
}

variable "network_interfaces" {
  description = "Direct VPC egress interfaces. Each: { network, subnetwork, tags }. Leave empty to use vpc_connector or no VPC."
  type = list(object({
    network    = string
    subnetwork = string
    tags       = optional(list(string))
  }))
  default = []
}

variable "vpc_egress" {
  description = "Egress mode when a VPC is attached: ALL_TRAFFIC or PRIVATE_RANGES_ONLY."
  type        = string
  default     = "PRIVATE_RANGES_ONLY"

  validation {
    condition     = contains(["ALL_TRAFFIC", "PRIVATE_RANGES_ONLY"], var.vpc_egress)
    error_message = "vpc_egress must be ALL_TRAFFIC or PRIVATE_RANGES_ONLY."
  }
}

variable "startup_probe_path" {
  description = "HTTP path for the startup probe (e.g. /healthz). Null disables the probe."
  type        = string
  default     = null
}

variable "startup_probe_initial_delay" {
  description = "Seconds to wait before the first startup probe."
  type        = number
  default     = 0
}

variable "startup_probe_period" {
  description = "Seconds between startup probes."
  type        = number
  default     = 10
}

variable "startup_probe_failure_threshold" {
  description = "Consecutive startup probe failures before the revision is marked failed."
  type        = number
  default     = 3
}

variable "startup_probe_timeout" {
  description = "Per-attempt startup probe timeout in seconds."
  type        = number
  default     = 3
}

variable "liveness_probe_path" {
  description = "HTTP path for the liveness probe (e.g. /healthz). Null disables the probe."
  type        = string
  default     = null
}

variable "liveness_probe_period" {
  description = "Seconds between liveness probes."
  type        = number
  default     = 30
}

variable "liveness_probe_failure_threshold" {
  description = "Consecutive liveness probe failures before the instance is restarted."
  type        = number
  default     = 3
}

variable "liveness_probe_timeout" {
  description = "Per-attempt liveness probe timeout in seconds."
  type        = number
  default     = 3
}

variable "deletion_protection" {
  description = "Block accidental deletion of the service via Terraform."
  type        = bool
  default     = true
}

variable "labels" {
  description = "Labels applied to the Cloud Run service."
  type        = map(string)
  default     = {}
}

outputs.tf

output "id" {
  description = "Fully qualified Cloud Run service ID."
  value       = google_cloud_run_v2_service.this.id
}

output "name" {
  description = "Name of the Cloud Run service."
  value       = google_cloud_run_v2_service.this.name
}

output "uri" {
  description = "Public HTTPS URL of the service (run.app or custom)."
  value       = google_cloud_run_v2_service.this.uri
}

output "location" {
  description = "Region the service is deployed in."
  value       = google_cloud_run_v2_service.this.location
}

output "latest_ready_revision" {
  description = "Name of the latest revision that is serving / ready."
  value       = google_cloud_run_v2_service.this.latest_ready_revision
}

output "service_account_email" {
  description = "Runtime service account email used by the service."
  value       = local.runtime_sa_email
}

How to use it

# Secret created/managed elsewhere; the module is granted accessor on it.
resource "google_secret_manager_secret" "db_url" {
  project   = var.project_id
  secret_id = "orders-api-db-url"
  replication {
    auto {}
  }
}

module "cloud_run" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-run?ref=v1.0.0"

  project_id = var.project_id
  name       = "orders-api"
  location   = "asia-south1"

  # Pin by digest in real pipelines; tag shown for readability.
  image = "asia-south1-docker.pkg.dev/${var.project_id}/services/orders-api:1.8.2"

  container_port  = 8080
  min_instances   = 1     # keep one warm instance to avoid cold starts on a customer-facing API
  max_instances   = 30
  max_concurrency = 60

  resource_limits = {
    cpu    = "2"
    memory = "1Gi"
  }

  env = {
    LOG_LEVEL = "info"
    REGION    = "asia-south1"
  }

  secret_env = [
    {
      name      = "DATABASE_URL"
      secret_id = google_secret_manager_secret.db_url.secret_id
      version   = "latest"
    },
  ]

  # Private egress to Cloud SQL over the VPC.
  vpc_connector = "projects/${var.project_id}/locations/asia-south1/connectors/run-conn"
  vpc_egress    = "PRIVATE_RANGES_ONLY"

  startup_probe_path  = "/healthz"
  liveness_probe_path = "/healthz"

  # Fronted by an external HTTPS LB, so keep ingress restricted to the LB.
  ingress  = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
  invokers = ["allUsers"]

  labels = {
    team        = "payments"
    environment = "prod"
  }
}

# Downstream: attach the service to an external HTTPS Load Balancer via a Serverless NEG.
resource "google_compute_region_network_endpoint_group" "orders_neg" {
  project               = var.project_id
  name                  = "orders-api-neg"
  region                = "asia-south1"
  network_endpoint_type = "SERVERLESS"

  cloud_run {
    service = module.cloud_run.name # <- module output wires the NEG to the service
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "gcs"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...gcs state bucket/container + key per path...
  }
}

2. Module config — live/prod/cloud_run/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-cloud-run?ref=v1.0.0"
}

inputs = {
  project_id = "..."
  name = "..."
  location = "..."
  image = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/cloud_run && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
project_id	string	—	yes	GCP project ID hosting the service.
name	string	—	yes	Service name; RFC1035, lowercase, <= 49 chars.
location	string	—	yes	Region, e.g. `asia-south1`.
image	string	—	yes	Container image, ideally pinned by digest.
container_port	number	`8080`	no	Port the container listens on; `null` for Cloud Run default.
resource_limits	map(string)	`{cpu="1",memory="512Mi"}`	no	CPU/memory limits per instance.
cpu_idle	bool	`true`	no	Throttle CPU between requests (request-based billing).
startup_cpu_boost	bool	`true`	no	Double CPU during startup to cut cold-start latency.
min_instances	number	`0`	no	Warm instance floor; `0` allows scale-to-zero.
max_instances	number	`10`	no	Instance ceiling.
max_concurrency	number	`80`	no	Concurrent requests per instance (1-1000).
request_timeout_seconds	number	`300`	no	Max request duration (1-3600).
execution_environment	string	`EXECUTION_ENVIRONMENT_GEN2`	no	Sandbox generation (gen1/gen2).
ingress	string	`INGRESS_TRAFFIC_ALL`	no	All / internal-only / internal-LB ingress.
invokers	list(string)	`[]`	no	IAM members granted `roles/run.invoker`.
create_service_account	bool	`true`	no	Create a dedicated runtime SA.
service_account_email	string	`null`	no	Existing runtime SA email when not creating one.
env	map(string)	`{}`	no	Plain environment variables.
secret_env	list(object)	`[]`	no	Secret Manager-backed env vars `{name, secret_id, version}`.
vpc_connector	string	`null`	no	Serverless VPC Access connector ID for private egress.
network_interfaces	list(object)	`[]`	no	Direct VPC egress interfaces `{network, subnetwork, tags}`.
vpc_egress	string	`PRIVATE_RANGES_ONLY`	no	Egress mode when a VPC is attached.
startup_probe_path	string	`null`	no	Startup probe HTTP path; `null` disables.
startup_probe_initial_delay	number	`0`	no	Delay before first startup probe.
startup_probe_period	number	`10`	no	Seconds between startup probes.
startup_probe_failure_threshold	number	`3`	no	Failures before a revision is marked failed.
startup_probe_timeout	number	`3`	no	Per-attempt startup probe timeout.
liveness_probe_path	string	`null`	no	Liveness probe HTTP path; `null` disables.
liveness_probe_period	number	`30`	no	Seconds between liveness probes.
liveness_probe_failure_threshold	number	`3`	no	Failures before an instance restarts.
liveness_probe_timeout	number	`3`	no	Per-attempt liveness probe timeout.
deletion_protection	bool	`true`	no	Block Terraform deletion of the service.
labels	map(string)	`{}`	no	Labels on the service.

Outputs

Name	Description
id	Fully qualified Cloud Run service ID.
name	Service name.
uri	Public HTTPS URL (`run.app` or custom domain).
location	Region the service runs in.
latest_ready_revision	Name of the latest ready/serving revision.
service_account_email	Runtime service account email used by the service.

Enterprise scenario

A payments platform runs roughly 40 internal microservices behind a single external HTTPS Load Balancer. Each team owns a thin root module that calls this Cloud Run module once per service, setting ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER" so the only public path is through the LB (where Cloud Armor and WAF rules live), pulling DB credentials and API keys from Secret Manager via secret_env, and reaching private Cloud SQL through a shared VPC connector. Because every service gets its own runtime service account with accessor rights on only its own secrets, a compromised container cannot read another team’s credentials, and the platform team can audit the entire fleet’s IAM surface from one consistent pattern.

Best practices

Pin images by digest, not tags. Pass image = "...@sha256:..." so a revision is immutable and reproducible; a moving :latest tag means a redeploy can silently ship a different image and you lose rollback determinism.
Always use a dedicated runtime SA (least privilege). Keep create_service_account = true and grant that SA only the roles it needs (the module already scopes Secret Manager access to the exact secrets). Never run on the default Compute Engine service account, which carries broad project permissions.
Lock down ingress and invocation. For LB-fronted services use INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER; for service-to-service calls drop "allUsers" from invokers and list the caller service accounts instead so the endpoint requires authenticated run.invoker.
Tune concurrency and CPU for cost. Leave cpu_idle = true for request-driven APIs so you only pay during requests, and raise max_concurrency for I/O-bound services to pack more requests per instance — fewer instances means lower cost. Use a min_instances floor only on latency-sensitive paths.
Protect against cold starts and bad revisions with probes. Set startup_probe_path so traffic shifts only after the app reports healthy, and liveness_probe_path so wedged instances are recycled; combine with startup_cpu_boost to shorten cold starts.
Standardize naming and labels. Keep service names RFC1035 and prefix by domain (orders-api, payments-bff), and populate labels with team/environment so billing export and Cloud Monitoring can slice cost and SLOs per owner.