IaC Azure

Terraform Module: Azure Container Instances — serverless containers without the cluster tax

Quick take — A production-ready Terraform module for azurerm_container_group: multi-container groups, managed identity, private VNet injection, secure env vars, and log analytics — all var-driven for hashicorp/azurerm ~> 4.0. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "container_instances" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-container-instances?ref=v1.0.0"

  name                = "..."           # Container group name; lowercase alphanumeric/hyphens, <…
  resource_group_name = "..."           # Resource group holding the group.
  location            = "..."           # Azure region (e.g. `centralindia`).
  containers          = ["...", "..."]  # Container definitions: image, cpu, memory, ports, env, …
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Container Instances (ACI) is the serverless way to run a container — or a small group of co-located containers — on Azure without standing up AKS, managing a node pool, or paying for an idle control plane. You hand Azure an image, a CPU/memory request, and a few knobs, and it bills you per second for the vCPU and GB-seconds the group actually consumes. The unit of deployment is the container group (azurerm_container_group): one or more containers that share a lifecycle, a network namespace, an IP, and optional mounted volumes — the ACI analogue of a Kubernetes pod.

The raw resource is deceptively large. A correct production container group has to juggle the OS type, an image registry credential block (or, better, a managed identity), an IP-address-type that swaps shape depending on whether you go public or VNet-injected, exposed ports, secure vs. plain environment variables, volume mounts, a restart policy, an optional liveness/readiness probe, and a diagnostics sink. Copy-pasting that across services is how you end up with one group logging to App Insights, another to nowhere, and a third still pulling :latest with an admin password in plaintext env vars.

This module wraps azurerm_container_group so a team gets a single, opinionated, variable-driven front door: pass a list of container definitions, pick Public or Private, optionally attach a user-assigned identity for ACR pulls, and the module wires the rest — Log Analytics diagnostics, secure environment variables, and a system-assigned identity by default — with validations that stop the obviously-wrong configurations at plan time.

When to use it

Reach for this module when the workload is bursty, batch, or stateless and small, and a full orchestrator would be overkill:

Do not use it for stateful databases, anything needing horizontal autoscaling or rolling updates, services that must survive node failure with self-healing, or workloads needing more than the ACI per-group CPU/memory ceilings. Those belong on AKS, Container Apps, or App Service. ACI has no built-in load balancing or autoscaling — if you need those, this is the wrong primitive.

Module structure

terraform-module-azure-container-instances/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # azurerm_container_group + diagnostic settings
├── variables.tf     # var-driven inputs with validations
└── outputs.tf       # id, fqdn, ip, identity principal id, etc.

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

locals {
  # ACI only accepts a dns_name_label when the IP is exposed publicly.
  effective_dns_label = var.ip_address_type == "Public" ? var.dns_name_label : null

  # A user-assigned identity (for ACR pulls) and/or system-assigned can both apply.
  identity_type = (
    length(var.user_assigned_identity_ids) > 0 && var.enable_system_assigned_identity
    ? "SystemAssigned, UserAssigned"
    : length(var.user_assigned_identity_ids) > 0
    ? "UserAssigned"
    : "SystemAssigned"
  )
}

resource "azurerm_container_group" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  location            = var.location

  os_type             = var.os_type
  restart_policy      = var.restart_policy
  ip_address_type     = var.ip_address_type
  dns_name_label      = local.effective_dns_label
  subnet_ids          = var.ip_address_type == "Private" ? var.subnet_ids : null

  # Zone pinning is only valid for Linux + Private (VNet-injected) groups.
  zones               = var.zones

  tags                = var.tags

  identity {
    type         = local.identity_type
    identity_ids = length(var.user_assigned_identity_ids) > 0 ? var.user_assigned_identity_ids : null
  }

  # Private registry auth — one block per registry. Omit entirely for public images.
  dynamic "image_registry_credential" {
    for_each = var.image_registry_credentials
    content {
      server                    = image_registry_credential.value.server
      username                  = lookup(image_registry_credential.value, "username", null)
      password                  = lookup(image_registry_credential.value, "password", null)
      user_assigned_identity_id = lookup(image_registry_credential.value, "user_assigned_identity_id", null)
    }
  }

  dynamic "container" {
    for_each = var.containers
    content {
      name   = container.value.name
      image  = container.value.image
      cpu    = container.value.cpu
      memory = container.value.memory

      cpu_limit    = lookup(container.value, "cpu_limit", null)
      memory_limit = lookup(container.value, "memory_limit", null)

      environment_variables        = lookup(container.value, "environment_variables", null)
      secure_environment_variables = lookup(container.value, "secure_environment_variables", null)
      commands                     = lookup(container.value, "commands", null)

      dynamic "ports" {
        for_each = lookup(container.value, "ports", [])
        content {
          port     = ports.value.port
          protocol = lookup(ports.value, "protocol", "TCP")
        }
      }

      dynamic "volume" {
        for_each = lookup(container.value, "volumes", [])
        content {
          name                 = volume.value.name
          mount_path           = volume.value.mount_path
          read_only            = lookup(volume.value, "read_only", false)
          storage_account_name = lookup(volume.value, "storage_account_name", null)
          storage_account_key  = lookup(volume.value, "storage_account_key", null)
          share_name           = lookup(volume.value, "share_name", null)
        }
      }

      dynamic "liveness_probe" {
        for_each = lookup(container.value, "liveness_probe", null) != null ? [container.value.liveness_probe] : []
        content {
          initial_delay_seconds = lookup(liveness_probe.value, "initial_delay_seconds", null)
          period_seconds        = lookup(liveness_probe.value, "period_seconds", null)
          failure_threshold     = lookup(liveness_probe.value, "failure_threshold", null)
          dynamic "http_get" {
            for_each = lookup(liveness_probe.value, "http_get", null) != null ? [liveness_probe.value.http_get] : []
            content {
              path   = lookup(http_get.value, "path", null)
              port   = http_get.value.port
              scheme = lookup(http_get.value, "scheme", "Http")
            }
          }
        }
      }

      dynamic "readiness_probe" {
        for_each = lookup(container.value, "readiness_probe", null) != null ? [container.value.readiness_probe] : []
        content {
          initial_delay_seconds = lookup(readiness_probe.value, "initial_delay_seconds", null)
          period_seconds        = lookup(readiness_probe.value, "period_seconds", null)
          failure_threshold     = lookup(readiness_probe.value, "failure_threshold", null)
          dynamic "http_get" {
            for_each = lookup(readiness_probe.value, "http_get", null) != null ? [readiness_probe.value.http_get] : []
            content {
              path   = lookup(http_get.value, "path", null)
              port   = http_get.value.port
              scheme = lookup(http_get.value, "scheme", "Http")
            }
          }
        }
      }
    }
  }

  # Ship stdout/stderr to a Log Analytics workspace when one is supplied.
  dynamic "diagnostics" {
    for_each = var.log_analytics_workspace_id != null ? [1] : []
    content {
      log_analytics {
        workspace_id  = var.log_analytics_workspace_id
        workspace_key = var.log_analytics_workspace_key
      }
    }
  }

  lifecycle {
    # Restarting a group reissues its public IP; treat IP as managed, not drift.
    ignore_changes = [tags["last_deployed"]]
  }
}

variables.tf

variable "name" {
  description = "Name of the container group. Must be a valid ACI DNS-compatible name."
  type        = string

  validation {
    condition     = can(regex("^[a-z0-9]([-a-z0-9]*[a-z0-9])?$", var.name)) && length(var.name) <= 63
    error_message = "name must be lowercase alphanumeric/hyphens, start and end alphanumeric, and be <= 63 chars."
  }
}

variable "resource_group_name" {
  description = "Resource group that will hold the container group."
  type        = string
}

variable "location" {
  description = "Azure region for the container group (e.g. centralindia, eastus)."
  type        = string
}

variable "os_type" {
  description = "Operating system for the container group."
  type        = string
  default     = "Linux"

  validation {
    condition     = contains(["Linux", "Windows"], var.os_type)
    error_message = "os_type must be either 'Linux' or 'Windows'."
  }
}

variable "restart_policy" {
  description = "Restart behaviour: Always (services), OnFailure (jobs), or Never (one-shot)."
  type        = string
  default     = "Always"

  validation {
    condition     = contains(["Always", "OnFailure", "Never"], var.restart_policy)
    error_message = "restart_policy must be one of: Always, OnFailure, Never."
  }
}

variable "ip_address_type" {
  description = "Public (internet-facing IP), Private (VNet-injected via subnet_ids), or None."
  type        = string
  default     = "Public"

  validation {
    condition     = contains(["Public", "Private", "None"], var.ip_address_type)
    error_message = "ip_address_type must be one of: Public, Private, None."
  }
}

variable "dns_name_label" {
  description = "DNS label for the public FQDN (<label>.<region>.azurecontainer.io). Public groups only."
  type        = string
  default     = null
}

variable "subnet_ids" {
  description = "Subnet IDs to inject the group into. Required (and only used) when ip_address_type = Private."
  type        = list(string)
  default     = []
}

variable "zones" {
  description = "Availability zones to pin the group to. Linux + Private groups only; null to leave unzoned."
  type        = list(string)
  default     = null
}

variable "enable_system_assigned_identity" {
  description = "Attach a system-assigned managed identity (used for ACR pulls / Key Vault access)."
  type        = bool
  default     = true
}

variable "user_assigned_identity_ids" {
  description = "User-assigned managed identity resource IDs to attach to the group."
  type        = list(string)
  default     = []
}

variable "image_registry_credentials" {
  description = <<-EOT
    Private registry credentials. Each entry needs a `server`, plus EITHER
    username/password OR a user_assigned_identity_id for AAD-based ACR pulls.
  EOT
  type = list(object({
    server                    = string
    username                  = optional(string)
    password                  = optional(string)
    user_assigned_identity_id = optional(string)
  }))
  default   = []
  sensitive = true
}

variable "containers" {
  description = "List of containers in the group. cpu/memory are in cores/GB."
  type = list(object({
    name         = string
    image        = string
    cpu          = number
    memory       = number
    cpu_limit    = optional(number)
    memory_limit = optional(number)
    commands     = optional(list(string))

    environment_variables        = optional(map(string))
    secure_environment_variables = optional(map(string))

    ports = optional(list(object({
      port     = number
      protocol = optional(string, "TCP")
    })), [])

    volumes = optional(list(object({
      name                 = string
      mount_path           = string
      read_only            = optional(bool, false)
      storage_account_name = optional(string)
      storage_account_key  = optional(string)
      share_name           = optional(string)
    })), [])

    liveness_probe = optional(object({
      initial_delay_seconds = optional(number)
      period_seconds        = optional(number)
      failure_threshold     = optional(number)
      http_get = optional(object({
        path   = optional(string)
        port   = number
        scheme = optional(string, "Http")
      }))
    }))

    readiness_probe = optional(object({
      initial_delay_seconds = optional(number)
      period_seconds        = optional(number)
      failure_threshold     = optional(number)
      http_get = optional(object({
        path   = optional(string)
        port   = number
        scheme = optional(string, "Http")
      }))
    }))
  }))

  validation {
    condition     = length(var.containers) > 0
    error_message = "At least one container must be defined."
  }

  validation {
    condition     = alltrue([for c in var.containers : c.cpu > 0 && c.memory > 0])
    error_message = "Every container must request cpu > 0 and memory > 0."
  }
}

variable "log_analytics_workspace_id" {
  description = "Log Analytics workspace ID (the workspace GUID) for container diagnostics. null disables it."
  type        = string
  default     = null
}

variable "log_analytics_workspace_key" {
  description = "Primary/secondary shared key for the Log Analytics workspace."
  type        = string
  default     = null
  sensitive   = true
}

variable "tags" {
  description = "Tags applied to the container group."
  type        = map(string)
  default     = {}
}

outputs.tf

output "id" {
  description = "Resource ID of the container group."
  value       = azurerm_container_group.this.id
}

output "name" {
  description = "Name of the container group."
  value       = azurerm_container_group.this.name
}

output "ip_address" {
  description = "IP address allocated to the container group (public or private)."
  value       = azurerm_container_group.this.ip_address
}

output "fqdn" {
  description = "Fully qualified domain name for a Public group with a dns_name_label (null otherwise)."
  value       = azurerm_container_group.this.fqdn
}

output "identity_principal_id" {
  description = "Principal ID of the system-assigned identity, for RBAC role assignments (ACR pull, Key Vault)."
  value       = try(azurerm_container_group.this.identity[0].principal_id, null)
}

output "identity_tenant_id" {
  description = "Tenant ID of the system-assigned identity."
  value       = try(azurerm_container_group.this.identity[0].tenant_id, null)
}

How to use it

A private, VNet-injected ingest worker plus a log-shipper sidecar, pulling from ACR via a user-assigned identity and shipping logs to Log Analytics:

module "container_instances" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-container-instances?ref=v1.0.0"

  name                = "aci-ingest-prod"
  resource_group_name = azurerm_resource_group.workloads.name
  location            = "centralindia"

  os_type         = "Linux"
  restart_policy  = "OnFailure"
  ip_address_type = "Private"
  subnet_ids      = [azurerm_subnet.aci_delegated.id]

  enable_system_assigned_identity = false
  user_assigned_identity_ids      = [azurerm_user_assigned_identity.aci_acr.id]

  image_registry_credentials = [{
    server                    = "kloudvinacr.azurecr.io"
    user_assigned_identity_id = azurerm_user_assigned_identity.aci_acr.id
  }]

  containers = [
    {
      name   = "ingest"
      image  = "kloudvinacr.azurecr.io/ingest:1.8.3"
      cpu    = 1.0
      memory = 2.0
      ports  = [{ port = 8080, protocol = "TCP" }]
      environment_variables = {
        QUEUE_NAME = "events-in"
        LOG_LEVEL  = "info"
      }
      secure_environment_variables = {
        SERVICEBUS_CONNECTION = data.azurerm_key_vault_secret.sb_conn.value
      }
      liveness_probe = {
        initial_delay_seconds = 15
        period_seconds        = 20
        http_get              = { path = "/healthz", port = 8080 }
      }
    },
    {
      name   = "logship"
      image  = "kloudvinacr.azurecr.io/fluentbit-sidecar:2.2.0"
      cpu    = 0.25
      memory = 0.5
    }
  ]

  log_analytics_workspace_id  = azurerm_log_analytics_workspace.platform.workspace_id
  log_analytics_workspace_key = azurerm_log_analytics_workspace.platform.primary_shared_key

  tags = {
    environment = "prod"
    workload    = "event-ingest"
    owner       = "platform-team"
  }
}

# Downstream: grant the group's identity permission to pull from ACR using a module output.
resource "azurerm_role_assignment" "aci_acr_pull" {
  scope                = azurerm_container_registry.kloudvin.id
  role_definition_name = "AcrPull"
  principal_id         = azurerm_user_assigned_identity.aci_acr.principal_id
}

# Downstream: a Private DNS A-record pointing at the group's private IP.
resource "azurerm_private_dns_a_record" "ingest" {
  name                = "ingest"
  zone_name           = azurerm_private_dns_zone.internal.name
  resource_group_name = azurerm_resource_group.networking.name
  ttl                 = 300
  records             = [module.container_instances.ip_address]
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/container_instances/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-container-instances?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
  containers = ["...", "..."]
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/container_instances && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Container group name; lowercase alphanumeric/hyphens, <= 63 chars.
resource_group_name string Yes Resource group holding the group.
location string Yes Azure region (e.g. centralindia).
os_type string "Linux" No Linux or Windows.
restart_policy string "Always" No Always, OnFailure, or Never.
ip_address_type string "Public" No Public, Private (VNet-injected), or None.
dns_name_label string null No DNS label for the public FQDN; Public groups only.
subnet_ids list(string) [] No Delegated subnet IDs; required when ip_address_type = Private.
zones list(string) null No Availability zones; Linux + Private groups only.
enable_system_assigned_identity bool true No Attach a system-assigned managed identity.
user_assigned_identity_ids list(string) [] No User-assigned identity resource IDs to attach.
image_registry_credentials list(object) [] No Private registry auth (username/password or UAMI). Sensitive.
containers list(object) Yes Container definitions: image, cpu, memory, ports, env, volumes, probes.
log_analytics_workspace_id string null No Log Analytics workspace GUID for diagnostics.
log_analytics_workspace_key string null No Shared key for the Log Analytics workspace. Sensitive.
tags map(string) {} No Tags applied to the container group.

Outputs

Name Description
id Resource ID of the container group.
name Name of the container group.
ip_address IP address allocated to the group (public or private).
fqdn Public FQDN when a dns_name_label is set; null otherwise.
identity_principal_id Principal ID of the system-assigned identity, for RBAC assignments.
identity_tenant_id Tenant ID of the system-assigned identity.

Enterprise scenario

A retail analytics platform runs a nightly inventory-reconciliation job that pulls files from three ERP feeds, normalizes them, and writes to a data lake. Azure Data Factory triggers this module’s container group on a schedule with restart_policy = "OnFailure" and ip_address_type = "Private" so the job runs inside the data-platform VNet, reaching the lake over a private endpoint. The job container pulls its image from ACR using a user-assigned identity (no registry passwords in state), streams stdout to the shared Log Analytics workspace for the on-call team’s Kusto dashboards, and the whole group bills for only the ~12 minutes it runs each night instead of parking an AKS node pool or a VM 24/7.

Best practices

TerraformAzureContainer InstancesModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading