Quick take — A reusable Terraform module for google_composer_environment on hashicorp/google ~> 5.0: Composer 2/3 with private IPs, workload-tuned scheduler/worker sizing, PyPI packages, and Airflow config overrides. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "google" {
project = "my-project"
region = "us-central1"
}
module "composer" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-composer?ref=v1.0.0"
name = "..." # Composer environment name (RFC-1035, <=64 chars).
project_id = "..." # GCP project hosting the environment.
region = "..." # Region, e.g. `asia-south1`.
network = "..." # VPC network self-link or name.
subnetwork = "..." # Subnetwork self-link or name.
service_account = "..." # Worker service account email (least privilege).
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Cloud Composer is Google Cloud’s managed Apache Airflow service. It runs the Airflow scheduler, web server, workers, and metadata database on a GKE-backed control plane that Google operates for you, so your team writes DAGs and uploads them to a GCS bucket instead of babysitting Airflow infrastructure. In Composer 2 and 3 the worker fleet autoscales, and you pay for the Cloud Composer compute (CPU/memory/storage SKUs) plus the underlying environment, rather than a fixed cluster.
The problem is that a production-grade google_composer_environment is anything but a five-line resource. You have to reason about the image version (which pins both the Composer release and the Airflow version), private IP and IP aliasing for the GKE cluster, the service account and its IAM, scheduler/worker/web-server CPU and memory, the environment size, a maintenance window, optional CMEK, PyPI dependencies, and Airflow [section]-key config overrides. Hand-rolling that per environment leads to drift between dev, staging, and prod, and to one-off mistakes like a public environment or an under-provisioned scheduler.
This module wraps google_composer_environment behind a small, validated variable surface so every environment your platform team stands up is private-by-default, correctly sized, and consistent. You feed it a name, a region, a service account, and a sizing profile; it returns the Airflow web UI URI, the DAG GCS bucket, and the GKE cluster so downstream automation and CI/CD can wire DAG deployment to it.
When to use it
- You run more than one Composer environment (per stage, per team, or per data domain) and want them provisioned identically from code.
- You need Airflow to orchestrate GCP-native pipelines — BigQuery loads, Dataproc/Dataflow jobs, Cloud Storage transfers, dbt runs — and want the metadata DB, scheduler, and workers fully managed.
- You require private IP environments with no public GKE endpoint for compliance, and want that to be the default rather than an afterthought.
- You want PyPI packages (e.g.
apache-airflow-providers-snowflake,dbt-bigquery) and Airflow config overrides (parallelism, DAG concurrency, email alerting) declared as Terraform inputs and version-controlled.
Reach for plain Cloud Scheduler + Cloud Functions/Workflows instead if you only have a handful of unrelated triggers — Composer is overkill (and not cheap) for trivial scheduling. Use Composer when you genuinely need Airflow’s DAG model, backfills, sensors, and the provider ecosystem.
Module structure
terraform-module-gcp-composer/
├── versions.tf # provider + Terraform version pins
├── main.tf # google_composer_environment + locals
├── variables.tf # validated input surface
└── outputs.tf # env id, Airflow URI, DAG bucket, GKE cluster
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
main.tf
locals {
# Composer 3 drops the explicit web_server_config / node count knobs that
# Composer 2 exposes. We branch on the major version parsed from the image.
is_composer_v3 = can(regex("composer-3", var.image_version))
labels = merge(
{
managed-by = "terraform"
environment = var.environment
},
var.labels,
)
}
resource "google_composer_environment" "this" {
provider = google
name = var.name
project = var.project_id
region = var.region
labels = local.labels
config {
environment_size = var.environment_size
# ---- Software (image, PyPI, Airflow overrides, env vars) ----
software_config {
image_version = var.image_version
pypi_packages = var.pypi_packages
airflow_config_overrides = var.airflow_config_overrides
env_variables = var.env_variables
}
# ---- Networking: private IP, attach to an existing VPC/subnet ----
node_config {
network = var.network
subnetwork = var.subnetwork
service_account = var.service_account
dynamic "ip_allocation_policy" {
for_each = var.pods_secondary_range_name != null ? [1] : []
content {
cluster_secondary_range_name = var.pods_secondary_range_name
services_secondary_range_name = var.services_secondary_range_name
}
}
}
private_environment_config {
enable_private_endpoint = var.enable_private_endpoint
master_ipv4_cidr_block = var.master_ipv4_cidr_block
cloud_composer_network_ipv4_cidr_block = var.composer_network_ipv4_cidr_block
}
# ---- Workload sizing (Composer 2/3 autopilot model) ----
workloads_config {
scheduler {
cpu = var.scheduler.cpu
memory_gb = var.scheduler.memory_gb
storage_gb = var.scheduler.storage_gb
count = var.scheduler.count
}
# web_server is only configurable on Composer 2.
dynamic "web_server" {
for_each = local.is_composer_v3 ? [] : [1]
content {
cpu = var.web_server.cpu
memory_gb = var.web_server.memory_gb
storage_gb = var.web_server.storage_gb
}
}
worker {
cpu = var.worker.cpu
memory_gb = var.worker.memory_gb
storage_gb = var.worker.storage_gb
min_count = var.worker.min_count
max_count = var.worker.max_count
}
}
# ---- Maintenance window (drain/upgrade outside business hours) ----
dynamic "maintenance_window" {
for_each = var.maintenance_window != null ? [var.maintenance_window] : []
content {
start_time = maintenance_window.value.start_time
end_time = maintenance_window.value.end_time
recurrence = maintenance_window.value.recurrence
}
}
# ---- Optional CMEK for the environment + metadata DB ----
dynamic "encryption_config" {
for_each = var.kms_key_name != null ? [1] : []
content {
kms_key_name = var.kms_key_name
}
}
# ---- Optional Resilience mode (HA scheduler + DB, Composer 2 zone-redundant) ----
resilience_mode = var.resilience_mode
}
timeouts {
create = var.create_timeout
update = var.update_timeout
delete = var.delete_timeout
}
}
variables.tf
variable "name" {
type = string
description = "Name of the Cloud Composer environment (lowercase letters, digits, hyphens; must start with a letter)."
validation {
condition = can(regex("^[a-z]([-a-z0-9]*[a-z0-9])?$", var.name)) && length(var.name) <= 64
error_message = "name must be 1-64 chars, lowercase RFC-1035: start with a letter, then letters/digits/hyphens."
}
}
variable "project_id" {
type = string
description = "GCP project ID that will host the Composer environment."
}
variable "region" {
type = string
description = "Region for the Composer environment (e.g. us-central1, asia-south1)."
}
variable "environment" {
type = string
description = "Logical environment label (dev/staging/prod) — applied as a label and used in defaults."
default = "dev"
validation {
condition = contains(["dev", "staging", "prod", "sandbox"], var.environment)
error_message = "environment must be one of: dev, staging, prod, sandbox."
}
}
variable "image_version" {
type = string
description = "Composer/Airflow image, e.g. 'composer-2.9.7-airflow-2.9.3' or 'composer-3-airflow-2.10.2-build.x'."
default = "composer-2.9.7-airflow-2.9.3"
validation {
condition = can(regex("^composer-(2|3)", var.image_version))
error_message = "image_version must target Composer 2 or 3 (start with 'composer-2' or 'composer-3')."
}
}
variable "environment_size" {
type = string
description = "Composer environment size: ENVIRONMENT_SIZE_SMALL | _MEDIUM | _LARGE."
default = "ENVIRONMENT_SIZE_SMALL"
validation {
condition = contains(
["ENVIRONMENT_SIZE_SMALL", "ENVIRONMENT_SIZE_MEDIUM", "ENVIRONMENT_SIZE_LARGE"],
var.environment_size
)
error_message = "environment_size must be ENVIRONMENT_SIZE_SMALL, _MEDIUM, or _LARGE."
}
}
# ---- Networking ----
variable "network" {
type = string
description = "Self-link or name of the VPC network the environment attaches to."
}
variable "subnetwork" {
type = string
description = "Self-link or name of the subnetwork for the environment's GKE nodes."
}
variable "pods_secondary_range_name" {
type = string
description = "Secondary range name on the subnet for GKE pods (VPC-native). Null to let Composer auto-allocate."
default = null
}
variable "services_secondary_range_name" {
type = string
description = "Secondary range name on the subnet for GKE services. Used only when pods_secondary_range_name is set."
default = null
}
variable "enable_private_endpoint" {
type = bool
description = "If true, the GKE control plane has no public endpoint (fully private). Defaults to true."
default = true
}
variable "master_ipv4_cidr_block" {
type = string
description = "RFC-1918 /28 CIDR for the GKE control plane in a private environment."
default = "172.16.0.0/28"
validation {
condition = can(cidrhost(var.master_ipv4_cidr_block, 0))
error_message = "master_ipv4_cidr_block must be a valid CIDR (a /28 is required for the GKE master)."
}
}
variable "composer_network_ipv4_cidr_block" {
type = string
description = "CIDR block for the Composer-managed network in a private environment (Composer 2). Null to use the default."
default = null
}
# ---- Service account ----
variable "service_account" {
type = string
description = "Email of the service account the environment's workers run as. Grant it least-privilege roles for your DAGs."
validation {
condition = can(regex("^[^@]+@[^@]+\\.iam\\.gserviceaccount\\.com$|^[^@]+@[^@]+\\.gserviceaccount\\.com$", var.service_account))
error_message = "service_account must be a valid service account email (…@PROJECT.iam.gserviceaccount.com)."
}
}
# ---- Workload sizing ----
variable "scheduler" {
type = object({
cpu = number
memory_gb = number
storage_gb = number
count = number
})
description = "Airflow scheduler sizing. count > 1 enables multiple schedulers (Airflow 2)."
default = {
cpu = 1
memory_gb = 2
storage_gb = 1
count = 1
}
}
variable "web_server" {
type = object({
cpu = number
memory_gb = number
storage_gb = number
})
description = "Airflow web server sizing (Composer 2 only; ignored on Composer 3)."
default = {
cpu = 1
memory_gb = 2
storage_gb = 1
}
}
variable "worker" {
type = object({
cpu = number
memory_gb = number
storage_gb = number
min_count = number
max_count = number
})
description = "Airflow worker sizing and autoscaling bounds."
default = {
cpu = 1
memory_gb = 2
storage_gb = 1
min_count = 1
max_count = 3
}
validation {
condition = var.worker.min_count >= 1 && var.worker.max_count >= var.worker.min_count
error_message = "worker.min_count must be >= 1 and worker.max_count must be >= worker.min_count."
}
}
# ---- Software ----
variable "pypi_packages" {
type = map(string)
description = "PyPI packages to install, e.g. { \"dbt-bigquery\" = \"==1.8.0\", \"apache-airflow-providers-snowflake\" = \"\" }."
default = {}
}
variable "airflow_config_overrides" {
type = map(string)
description = "Airflow config overrides keyed as 'section-key', e.g. { \"core-dags_are_paused_at_creation\" = \"True\" }."
default = {}
}
variable "env_variables" {
type = map(string)
description = "Environment variables injected into the Airflow scheduler/worker processes (non-secret only)."
default = {}
}
# ---- Reliability / security ----
variable "resilience_mode" {
type = string
description = "Resilience mode: STANDARD_RESILIENCE or HIGH_RESILIENCE (zone-redundant scheduler + DB, Composer 2)."
default = "STANDARD_RESILIENCE"
validation {
condition = contains(["STANDARD_RESILIENCE", "HIGH_RESILIENCE"], var.resilience_mode)
error_message = "resilience_mode must be STANDARD_RESILIENCE or HIGH_RESILIENCE."
}
}
variable "kms_key_name" {
type = string
description = "Full resource ID of a Cloud KMS CryptoKey for CMEK. Null uses Google-managed encryption."
default = null
}
variable "maintenance_window" {
type = object({
start_time = string # RFC3339, e.g. "2024-01-01T01:00:00Z"
end_time = string # RFC3339, e.g. "2024-01-01T05:00:00Z"
recurrence = string # RRULE, e.g. "FREQ=WEEKLY;BYDAY=SA,SU"
})
description = "Weekly maintenance window for environment upgrades/maintenance. Null lets Google pick."
default = null
}
variable "labels" {
type = map(string)
description = "Additional labels merged onto the environment (managed-by and environment are added automatically)."
default = {}
}
# ---- Timeouts ----
variable "create_timeout" {
type = string
description = "Create timeout (Composer environments take a while to build)."
default = "60m"
}
variable "update_timeout" {
type = string
description = "Update timeout."
default = "60m"
}
variable "delete_timeout" {
type = string
description = "Delete timeout."
default = "30m"
}
outputs.tf
output "id" {
description = "Fully-qualified Composer environment ID (projects/.../environments/NAME)."
value = google_composer_environment.this.id
}
output "name" {
description = "Name of the Composer environment."
value = google_composer_environment.this.name
}
output "airflow_uri" {
description = "URI of the Apache Airflow web UI for this environment."
value = google_composer_environment.this.config[0].airflow_uri
}
output "dag_gcs_prefix" {
description = "GCS path prefix where DAGs are stored — point your CI/CD DAG sync here."
value = google_composer_environment.this.config[0].dag_gcs_prefix
}
output "gke_cluster" {
description = "Self-link of the GKE cluster backing the environment."
value = google_composer_environment.this.config[0].gke_cluster
}
output "service_account" {
description = "Service account the environment's workers run as."
value = var.service_account
}
How to use it
module "cloud_composer" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-composer?ref=v1.0.0"
name = "data-orchestrator-prod"
project_id = "kloudvin-data-prod"
region = "asia-south1"
environment = "prod"
image_version = "composer-2.9.7-airflow-2.9.3"
environment_size = "ENVIRONMENT_SIZE_MEDIUM"
# Attach to the shared data VPC and use pre-created secondary ranges.
network = "projects/kloudvin-net/global/networks/data-vpc"
subnetwork = "projects/kloudvin-net/regions/asia-south1/subnetworks/composer-asia-south1"
pods_secondary_range_name = "composer-pods"
services_secondary_range_name = "composer-services"
enable_private_endpoint = true
master_ipv4_cidr_block = "172.16.8.0/28"
service_account = "composer-prod-worker@kloudvin-data-prod.iam.gserviceaccount.com"
# HA scheduler + zone-redundant metadata DB for production.
resilience_mode = "HIGH_RESILIENCE"
scheduler = { cpu = 2, memory_gb = 7.5, storage_gb = 5, count = 2 }
worker = { cpu = 2, memory_gb = 7.5, storage_gb = 10, min_count = 2, max_count = 8 }
pypi_packages = {
"apache-airflow-providers-snowflake" = "==5.7.0"
"dbt-bigquery" = "==1.8.0"
}
airflow_config_overrides = {
"core-dag_concurrency" = "32"
"core-dags_are_paused_at_creation" = "True"
"scheduler-catchup_by_default" = "False"
"email-email_backend" = "airflow.providers.sendgrid.utils.emailer.send_email"
}
maintenance_window = {
start_time = "2024-01-01T18:00:00Z" # 23:30 IST
end_time = "2024-01-01T22:00:00Z"
recurrence = "FREQ=WEEKLY;BYDAY=SA,SU"
}
labels = {
team = "data-platform"
cost-center = "analytics"
}
}
# Downstream: sync DAGs from the repo into the environment's GCS bucket.
# dag_gcs_prefix looks like gs://<bucket>/dags, so strip the gs:// prefix
# and the trailing /dags to recover the bucket name for the object resource.
locals {
dag_bucket = regex("^gs://([^/]+)/", module.cloud_composer.dag_gcs_prefix)[0]
}
resource "google_storage_bucket_object" "etl_dag" {
name = "dags/etl_daily.py"
bucket = local.dag_bucket
source = "${path.module}/dags/etl_daily.py"
}
output "airflow_console_url" {
description = "Open the Airflow UI here."
value = module.cloud_composer.airflow_uri
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "gcs"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...gcs state bucket/container + key per path...
}
}
2. Module config — live/prod/composer/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-composer?ref=v1.0.0"
}
inputs = {
name = "..."
project_id = "..."
region = "..."
network = "..."
subnetwork = "..."
service_account = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/composer && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Composer environment name (RFC-1035, <=64 chars). |
project_id |
string |
— | Yes | GCP project hosting the environment. |
region |
string |
— | Yes | Region, e.g. asia-south1. |
environment |
string |
"dev" |
No | Logical stage label (dev/staging/prod/sandbox). |
image_version |
string |
"composer-2.9.7-airflow-2.9.3" |
No | Composer + Airflow image; must target Composer 2 or 3. |
environment_size |
string |
"ENVIRONMENT_SIZE_SMALL" |
No | SMALL / MEDIUM / LARGE environment size. |
network |
string |
— | Yes | VPC network self-link or name. |
subnetwork |
string |
— | Yes | Subnetwork self-link or name. |
pods_secondary_range_name |
string |
null |
No | Secondary range for GKE pods (VPC-native). |
services_secondary_range_name |
string |
null |
No | Secondary range for GKE services. |
enable_private_endpoint |
bool |
true |
No | Make the GKE control plane fully private. |
master_ipv4_cidr_block |
string |
"172.16.0.0/28" |
No | /28 CIDR for the GKE control plane. |
composer_network_ipv4_cidr_block |
string |
null |
No | CIDR for the Composer-managed network (Composer 2). |
service_account |
string |
— | Yes | Worker service account email (least privilege). |
scheduler |
object |
{cpu=1, memory_gb=2, storage_gb=1, count=1} |
No | Scheduler sizing; count>1 = multiple schedulers. |
web_server |
object |
{cpu=1, memory_gb=2, storage_gb=1} |
No | Web server sizing (Composer 2 only). |
worker |
object |
{cpu=1, memory_gb=2, storage_gb=1, min_count=1, max_count=3} |
No | Worker sizing + autoscaling bounds. |
pypi_packages |
map(string) |
{} |
No | PyPI packages → version constraints. |
airflow_config_overrides |
map(string) |
{} |
No | Airflow overrides keyed section-key. |
env_variables |
map(string) |
{} |
No | Non-secret env vars for Airflow processes. |
resilience_mode |
string |
"STANDARD_RESILIENCE" |
No | STANDARD or HIGH resilience (HA). |
kms_key_name |
string |
null |
No | Cloud KMS key for CMEK. |
maintenance_window |
object |
null |
No | Weekly maintenance window (start/end/RRULE). |
labels |
map(string) |
{} |
No | Extra labels merged onto the environment. |
create_timeout |
string |
"60m" |
No | Create operation timeout. |
update_timeout |
string |
"60m" |
No | Update operation timeout. |
delete_timeout |
string |
"30m" |
No | Delete operation timeout. |
Outputs
| Name | Description |
|---|---|
id |
Fully-qualified environment ID (projects/.../environments/NAME). |
name |
Environment name. |
airflow_uri |
URI of the Apache Airflow web UI. |
dag_gcs_prefix |
GCS prefix where DAGs live — target for CI/CD DAG sync. |
gke_cluster |
Self-link of the backing GKE cluster. |
service_account |
Service account the workers run as. |
Enterprise scenario
A retail analytics group runs nightly BigQuery transforms, Dataproc Spark jobs, and dbt models orchestrated from Airflow. They stamp out one Composer environment per stage — data-orchestrator-dev, -staging, and -prod — from this single module, all attached to a shared data VPC with enable_private_endpoint = true so no Airflow web server or GKE master is exposed to the internet (access is via IAP). Production runs HIGH_RESILIENCE with two schedulers and an 8-worker ceiling to absorb month-end backfills, while dev stays on ENVIRONMENT_SIZE_SMALL with a single worker to keep the bill down. A GitHub Actions pipeline reads each environment’s dag_gcs_prefix output and syncs the DAG repo into the right bucket on merge, so a DAG change flows to dev, then prod, with zero console clicks.
Best practices
- Pin the image, upgrade deliberately.
image_versioncouples the Composer release and the Airflow version; never float it. Test a newcomposer-X-airflow-Yin dev (DAGs + providers) before bumping prod, and keep the change in its own PR so the in-place upgrade is reviewable. - Private by default, IAP for humans. Keep
enable_private_endpoint = true, place the environment on a dedicated subnet with explicit pod/service secondary ranges, and reach the Airflow UI through Identity-Aware Proxy rather than a public endpoint. Sizemaster_ipv4_cidr_blockas a non-overlapping /28 per environment. - Least-privilege worker SA. Give the
service_accountonly the roles its DAGs need (e.g.roles/bigquery.dataEditor,roles/dataproc.editor) — not project Editor. Put credentials and tokens in Secret Manager and read them in DAGs via the Secret Manager backend; never pass secrets throughenv_variables(they are visible in the environment config). - Right-size workers and disable catch-up to control cost. Composer 2/3 bills by worker CPU/memory-seconds, so set
worker.min_countlow and let autoscaling burst tomax_count; setscheduler-catchup_by_default = Falseso a paused-then-resumed DAG doesn’t stampede a month of historical runs and a fleet of workers. - Run HA in prod only.
HIGH_RESILIENCEplusscheduler.count = 2gives zone-redundant scheduling and metadata DB, which matters for prod SLAs but roughly doubles baseline cost — leave dev/staging onSTANDARD_RESILIENCEwith a single scheduler. - Set a maintenance window and consistent labels. Pin
maintenance_windowto off-hours (in IST that means a UTC offset of -5:30) so Google performs disruptive maintenance when no critical DAGs run, and rely on the module’smanaged-by/environmentlabels plusteam/cost-centerfor cost attribution and inventory.