IaC GCP

Terraform Module: GCP Cloud Composer — managed Airflow with private networking baked in

Quick take — A reusable Terraform module for google_composer_environment on hashicorp/google ~> 5.0: Composer 2/3 with private IPs, workload-tuned scheduler/worker sizing, PyPI packages, and Airflow config overrides. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

module "composer" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-composer?ref=v1.0.0"

  name            = "..."  # Composer environment name (RFC-1035, <=64 chars).
  project_id      = "..."  # GCP project hosting the environment.
  region          = "..."  # Region, e.g. `asia-south1`.
  network         = "..."  # VPC network self-link or name.
  subnetwork      = "..."  # Subnetwork self-link or name.
  service_account = "..."  # Worker service account email (least privilege).
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Cloud Composer is Google Cloud’s managed Apache Airflow service. It runs the Airflow scheduler, web server, workers, and metadata database on a GKE-backed control plane that Google operates for you, so your team writes DAGs and uploads them to a GCS bucket instead of babysitting Airflow infrastructure. In Composer 2 and 3 the worker fleet autoscales, and you pay for the Cloud Composer compute (CPU/memory/storage SKUs) plus the underlying environment, rather than a fixed cluster.

The problem is that a production-grade google_composer_environment is anything but a five-line resource. You have to reason about the image version (which pins both the Composer release and the Airflow version), private IP and IP aliasing for the GKE cluster, the service account and its IAM, scheduler/worker/web-server CPU and memory, the environment size, a maintenance window, optional CMEK, PyPI dependencies, and Airflow [section]-key config overrides. Hand-rolling that per environment leads to drift between dev, staging, and prod, and to one-off mistakes like a public environment or an under-provisioned scheduler.

This module wraps google_composer_environment behind a small, validated variable surface so every environment your platform team stands up is private-by-default, correctly sized, and consistent. You feed it a name, a region, a service account, and a sizing profile; it returns the Airflow web UI URI, the DAG GCS bucket, and the GKE cluster so downstream automation and CI/CD can wire DAG deployment to it.

When to use it

Reach for plain Cloud Scheduler + Cloud Functions/Workflows instead if you only have a handful of unrelated triggers — Composer is overkill (and not cheap) for trivial scheduling. Use Composer when you genuinely need Airflow’s DAG model, backfills, sensors, and the provider ecosystem.

Module structure

terraform-module-gcp-composer/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # google_composer_environment + locals
├── variables.tf     # validated input surface
└── outputs.tf       # env id, Airflow URI, DAG bucket, GKE cluster

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  # Composer 3 drops the explicit web_server_config / node count knobs that
  # Composer 2 exposes. We branch on the major version parsed from the image.
  is_composer_v3 = can(regex("composer-3", var.image_version))

  labels = merge(
    {
      managed-by  = "terraform"
      environment = var.environment
    },
    var.labels,
  )
}

resource "google_composer_environment" "this" {
  provider = google

  name    = var.name
  project = var.project_id
  region  = var.region
  labels  = local.labels

  config {
    environment_size = var.environment_size

    # ---- Software (image, PyPI, Airflow overrides, env vars) ----
    software_config {
      image_version          = var.image_version
      pypi_packages          = var.pypi_packages
      airflow_config_overrides = var.airflow_config_overrides
      env_variables          = var.env_variables
    }

    # ---- Networking: private IP, attach to an existing VPC/subnet ----
    node_config {
      network         = var.network
      subnetwork      = var.subnetwork
      service_account = var.service_account

      dynamic "ip_allocation_policy" {
        for_each = var.pods_secondary_range_name != null ? [1] : []
        content {
          cluster_secondary_range_name  = var.pods_secondary_range_name
          services_secondary_range_name = var.services_secondary_range_name
        }
      }
    }

    private_environment_config {
      enable_private_endpoint    = var.enable_private_endpoint
      master_ipv4_cidr_block     = var.master_ipv4_cidr_block
      cloud_composer_network_ipv4_cidr_block = var.composer_network_ipv4_cidr_block
    }

    # ---- Workload sizing (Composer 2/3 autopilot model) ----
    workloads_config {
      scheduler {
        cpu        = var.scheduler.cpu
        memory_gb  = var.scheduler.memory_gb
        storage_gb = var.scheduler.storage_gb
        count      = var.scheduler.count
      }

      # web_server is only configurable on Composer 2.
      dynamic "web_server" {
        for_each = local.is_composer_v3 ? [] : [1]
        content {
          cpu        = var.web_server.cpu
          memory_gb  = var.web_server.memory_gb
          storage_gb = var.web_server.storage_gb
        }
      }

      worker {
        cpu        = var.worker.cpu
        memory_gb  = var.worker.memory_gb
        storage_gb = var.worker.storage_gb
        min_count  = var.worker.min_count
        max_count  = var.worker.max_count
      }
    }

    # ---- Maintenance window (drain/upgrade outside business hours) ----
    dynamic "maintenance_window" {
      for_each = var.maintenance_window != null ? [var.maintenance_window] : []
      content {
        start_time = maintenance_window.value.start_time
        end_time   = maintenance_window.value.end_time
        recurrence = maintenance_window.value.recurrence
      }
    }

    # ---- Optional CMEK for the environment + metadata DB ----
    dynamic "encryption_config" {
      for_each = var.kms_key_name != null ? [1] : []
      content {
        kms_key_name = var.kms_key_name
      }
    }

    # ---- Optional Resilience mode (HA scheduler + DB, Composer 2 zone-redundant) ----
    resilience_mode = var.resilience_mode
  }

  timeouts {
    create = var.create_timeout
    update = var.update_timeout
    delete = var.delete_timeout
  }
}

variables.tf

variable "name" {
  type        = string
  description = "Name of the Cloud Composer environment (lowercase letters, digits, hyphens; must start with a letter)."

  validation {
    condition     = can(regex("^[a-z]([-a-z0-9]*[a-z0-9])?$", var.name)) && length(var.name) <= 64
    error_message = "name must be 1-64 chars, lowercase RFC-1035: start with a letter, then letters/digits/hyphens."
  }
}

variable "project_id" {
  type        = string
  description = "GCP project ID that will host the Composer environment."
}

variable "region" {
  type        = string
  description = "Region for the Composer environment (e.g. us-central1, asia-south1)."
}

variable "environment" {
  type        = string
  description = "Logical environment label (dev/staging/prod) — applied as a label and used in defaults."
  default     = "dev"

  validation {
    condition     = contains(["dev", "staging", "prod", "sandbox"], var.environment)
    error_message = "environment must be one of: dev, staging, prod, sandbox."
  }
}

variable "image_version" {
  type        = string
  description = "Composer/Airflow image, e.g. 'composer-2.9.7-airflow-2.9.3' or 'composer-3-airflow-2.10.2-build.x'."
  default     = "composer-2.9.7-airflow-2.9.3"

  validation {
    condition     = can(regex("^composer-(2|3)", var.image_version))
    error_message = "image_version must target Composer 2 or 3 (start with 'composer-2' or 'composer-3')."
  }
}

variable "environment_size" {
  type        = string
  description = "Composer environment size: ENVIRONMENT_SIZE_SMALL | _MEDIUM | _LARGE."
  default     = "ENVIRONMENT_SIZE_SMALL"

  validation {
    condition = contains(
      ["ENVIRONMENT_SIZE_SMALL", "ENVIRONMENT_SIZE_MEDIUM", "ENVIRONMENT_SIZE_LARGE"],
      var.environment_size
    )
    error_message = "environment_size must be ENVIRONMENT_SIZE_SMALL, _MEDIUM, or _LARGE."
  }
}

# ---- Networking ----

variable "network" {
  type        = string
  description = "Self-link or name of the VPC network the environment attaches to."
}

variable "subnetwork" {
  type        = string
  description = "Self-link or name of the subnetwork for the environment's GKE nodes."
}

variable "pods_secondary_range_name" {
  type        = string
  description = "Secondary range name on the subnet for GKE pods (VPC-native). Null to let Composer auto-allocate."
  default     = null
}

variable "services_secondary_range_name" {
  type        = string
  description = "Secondary range name on the subnet for GKE services. Used only when pods_secondary_range_name is set."
  default     = null
}

variable "enable_private_endpoint" {
  type        = bool
  description = "If true, the GKE control plane has no public endpoint (fully private). Defaults to true."
  default     = true
}

variable "master_ipv4_cidr_block" {
  type        = string
  description = "RFC-1918 /28 CIDR for the GKE control plane in a private environment."
  default     = "172.16.0.0/28"

  validation {
    condition     = can(cidrhost(var.master_ipv4_cidr_block, 0))
    error_message = "master_ipv4_cidr_block must be a valid CIDR (a /28 is required for the GKE master)."
  }
}

variable "composer_network_ipv4_cidr_block" {
  type        = string
  description = "CIDR block for the Composer-managed network in a private environment (Composer 2). Null to use the default."
  default     = null
}

# ---- Service account ----

variable "service_account" {
  type        = string
  description = "Email of the service account the environment's workers run as. Grant it least-privilege roles for your DAGs."

  validation {
    condition     = can(regex("^[^@]+@[^@]+\\.iam\\.gserviceaccount\\.com$|^[^@]+@[^@]+\\.gserviceaccount\\.com$", var.service_account))
    error_message = "service_account must be a valid service account email (…@PROJECT.iam.gserviceaccount.com)."
  }
}

# ---- Workload sizing ----

variable "scheduler" {
  type = object({
    cpu        = number
    memory_gb  = number
    storage_gb = number
    count      = number
  })
  description = "Airflow scheduler sizing. count > 1 enables multiple schedulers (Airflow 2)."
  default = {
    cpu        = 1
    memory_gb  = 2
    storage_gb = 1
    count      = 1
  }
}

variable "web_server" {
  type = object({
    cpu        = number
    memory_gb  = number
    storage_gb = number
  })
  description = "Airflow web server sizing (Composer 2 only; ignored on Composer 3)."
  default = {
    cpu        = 1
    memory_gb  = 2
    storage_gb = 1
  }
}

variable "worker" {
  type = object({
    cpu        = number
    memory_gb  = number
    storage_gb = number
    min_count  = number
    max_count  = number
  })
  description = "Airflow worker sizing and autoscaling bounds."
  default = {
    cpu        = 1
    memory_gb  = 2
    storage_gb = 1
    min_count  = 1
    max_count  = 3
  }

  validation {
    condition     = var.worker.min_count >= 1 && var.worker.max_count >= var.worker.min_count
    error_message = "worker.min_count must be >= 1 and worker.max_count must be >= worker.min_count."
  }
}

# ---- Software ----

variable "pypi_packages" {
  type        = map(string)
  description = "PyPI packages to install, e.g. { \"dbt-bigquery\" = \"==1.8.0\", \"apache-airflow-providers-snowflake\" = \"\" }."
  default     = {}
}

variable "airflow_config_overrides" {
  type        = map(string)
  description = "Airflow config overrides keyed as 'section-key', e.g. { \"core-dags_are_paused_at_creation\" = \"True\" }."
  default     = {}
}

variable "env_variables" {
  type        = map(string)
  description = "Environment variables injected into the Airflow scheduler/worker processes (non-secret only)."
  default     = {}
}

# ---- Reliability / security ----

variable "resilience_mode" {
  type        = string
  description = "Resilience mode: STANDARD_RESILIENCE or HIGH_RESILIENCE (zone-redundant scheduler + DB, Composer 2)."
  default     = "STANDARD_RESILIENCE"

  validation {
    condition     = contains(["STANDARD_RESILIENCE", "HIGH_RESILIENCE"], var.resilience_mode)
    error_message = "resilience_mode must be STANDARD_RESILIENCE or HIGH_RESILIENCE."
  }
}

variable "kms_key_name" {
  type        = string
  description = "Full resource ID of a Cloud KMS CryptoKey for CMEK. Null uses Google-managed encryption."
  default     = null
}

variable "maintenance_window" {
  type = object({
    start_time = string # RFC3339, e.g. "2024-01-01T01:00:00Z"
    end_time   = string # RFC3339, e.g. "2024-01-01T05:00:00Z"
    recurrence = string # RRULE, e.g. "FREQ=WEEKLY;BYDAY=SA,SU"
  })
  description = "Weekly maintenance window for environment upgrades/maintenance. Null lets Google pick."
  default     = null
}

variable "labels" {
  type        = map(string)
  description = "Additional labels merged onto the environment (managed-by and environment are added automatically)."
  default     = {}
}

# ---- Timeouts ----

variable "create_timeout" {
  type        = string
  description = "Create timeout (Composer environments take a while to build)."
  default     = "60m"
}

variable "update_timeout" {
  type        = string
  description = "Update timeout."
  default     = "60m"
}

variable "delete_timeout" {
  type        = string
  description = "Delete timeout."
  default     = "30m"
}

outputs.tf

output "id" {
  description = "Fully-qualified Composer environment ID (projects/.../environments/NAME)."
  value       = google_composer_environment.this.id
}

output "name" {
  description = "Name of the Composer environment."
  value       = google_composer_environment.this.name
}

output "airflow_uri" {
  description = "URI of the Apache Airflow web UI for this environment."
  value       = google_composer_environment.this.config[0].airflow_uri
}

output "dag_gcs_prefix" {
  description = "GCS path prefix where DAGs are stored — point your CI/CD DAG sync here."
  value       = google_composer_environment.this.config[0].dag_gcs_prefix
}

output "gke_cluster" {
  description = "Self-link of the GKE cluster backing the environment."
  value       = google_composer_environment.this.config[0].gke_cluster
}

output "service_account" {
  description = "Service account the environment's workers run as."
  value       = var.service_account
}

How to use it

module "cloud_composer" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-composer?ref=v1.0.0"

  name        = "data-orchestrator-prod"
  project_id  = "kloudvin-data-prod"
  region      = "asia-south1"
  environment = "prod"

  image_version    = "composer-2.9.7-airflow-2.9.3"
  environment_size = "ENVIRONMENT_SIZE_MEDIUM"

  # Attach to the shared data VPC and use pre-created secondary ranges.
  network                       = "projects/kloudvin-net/global/networks/data-vpc"
  subnetwork                    = "projects/kloudvin-net/regions/asia-south1/subnetworks/composer-asia-south1"
  pods_secondary_range_name     = "composer-pods"
  services_secondary_range_name = "composer-services"

  enable_private_endpoint = true
  master_ipv4_cidr_block  = "172.16.8.0/28"

  service_account = "composer-prod-worker@kloudvin-data-prod.iam.gserviceaccount.com"

  # HA scheduler + zone-redundant metadata DB for production.
  resilience_mode = "HIGH_RESILIENCE"

  scheduler = { cpu = 2, memory_gb = 7.5, storage_gb = 5, count = 2 }
  worker    = { cpu = 2, memory_gb = 7.5, storage_gb = 10, min_count = 2, max_count = 8 }

  pypi_packages = {
    "apache-airflow-providers-snowflake" = "==5.7.0"
    "dbt-bigquery"                       = "==1.8.0"
  }

  airflow_config_overrides = {
    "core-dag_concurrency"                  = "32"
    "core-dags_are_paused_at_creation"      = "True"
    "scheduler-catchup_by_default"          = "False"
    "email-email_backend"                   = "airflow.providers.sendgrid.utils.emailer.send_email"
  }

  maintenance_window = {
    start_time = "2024-01-01T18:00:00Z" # 23:30 IST
    end_time   = "2024-01-01T22:00:00Z"
    recurrence = "FREQ=WEEKLY;BYDAY=SA,SU"
  }

  labels = {
    team        = "data-platform"
    cost-center = "analytics"
  }
}

# Downstream: sync DAGs from the repo into the environment's GCS bucket.
# dag_gcs_prefix looks like gs://<bucket>/dags, so strip the gs:// prefix
# and the trailing /dags to recover the bucket name for the object resource.
locals {
  dag_bucket = regex("^gs://([^/]+)/", module.cloud_composer.dag_gcs_prefix)[0]
}

resource "google_storage_bucket_object" "etl_dag" {
  name   = "dags/etl_daily.py"
  bucket = local.dag_bucket
  source = "${path.module}/dags/etl_daily.py"
}

output "airflow_console_url" {
  description = "Open the Airflow UI here."
  value       = module.cloud_composer.airflow_uri
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "gcs"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...gcs state bucket/container + key per path...
  }
}

2. Module configlive/prod/composer/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-gcp-composer?ref=v1.0.0"
}

inputs = {
  name = "..."
  project_id = "..."
  region = "..."
  network = "..."
  subnetwork = "..."
  service_account = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/composer && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Composer environment name (RFC-1035, <=64 chars).
project_id string Yes GCP project hosting the environment.
region string Yes Region, e.g. asia-south1.
environment string "dev" No Logical stage label (dev/staging/prod/sandbox).
image_version string "composer-2.9.7-airflow-2.9.3" No Composer + Airflow image; must target Composer 2 or 3.
environment_size string "ENVIRONMENT_SIZE_SMALL" No SMALL / MEDIUM / LARGE environment size.
network string Yes VPC network self-link or name.
subnetwork string Yes Subnetwork self-link or name.
pods_secondary_range_name string null No Secondary range for GKE pods (VPC-native).
services_secondary_range_name string null No Secondary range for GKE services.
enable_private_endpoint bool true No Make the GKE control plane fully private.
master_ipv4_cidr_block string "172.16.0.0/28" No /28 CIDR for the GKE control plane.
composer_network_ipv4_cidr_block string null No CIDR for the Composer-managed network (Composer 2).
service_account string Yes Worker service account email (least privilege).
scheduler object {cpu=1, memory_gb=2, storage_gb=1, count=1} No Scheduler sizing; count>1 = multiple schedulers.
web_server object {cpu=1, memory_gb=2, storage_gb=1} No Web server sizing (Composer 2 only).
worker object {cpu=1, memory_gb=2, storage_gb=1, min_count=1, max_count=3} No Worker sizing + autoscaling bounds.
pypi_packages map(string) {} No PyPI packages → version constraints.
airflow_config_overrides map(string) {} No Airflow overrides keyed section-key.
env_variables map(string) {} No Non-secret env vars for Airflow processes.
resilience_mode string "STANDARD_RESILIENCE" No STANDARD or HIGH resilience (HA).
kms_key_name string null No Cloud KMS key for CMEK.
maintenance_window object null No Weekly maintenance window (start/end/RRULE).
labels map(string) {} No Extra labels merged onto the environment.
create_timeout string "60m" No Create operation timeout.
update_timeout string "60m" No Update operation timeout.
delete_timeout string "30m" No Delete operation timeout.

Outputs

Name Description
id Fully-qualified environment ID (projects/.../environments/NAME).
name Environment name.
airflow_uri URI of the Apache Airflow web UI.
dag_gcs_prefix GCS prefix where DAGs live — target for CI/CD DAG sync.
gke_cluster Self-link of the backing GKE cluster.
service_account Service account the workers run as.

Enterprise scenario

A retail analytics group runs nightly BigQuery transforms, Dataproc Spark jobs, and dbt models orchestrated from Airflow. They stamp out one Composer environment per stage — data-orchestrator-dev, -staging, and -prod — from this single module, all attached to a shared data VPC with enable_private_endpoint = true so no Airflow web server or GKE master is exposed to the internet (access is via IAP). Production runs HIGH_RESILIENCE with two schedulers and an 8-worker ceiling to absorb month-end backfills, while dev stays on ENVIRONMENT_SIZE_SMALL with a single worker to keep the bill down. A GitHub Actions pipeline reads each environment’s dag_gcs_prefix output and syncs the DAG repo into the right bucket on merge, so a DAG change flows to dev, then prod, with zero console clicks.

Best practices

TerraformGCPCloud ComposerModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading