Quick take — A production-ready Terraform module for azurerm_container_group: multi-container groups, managed identity, private VNet injection, secure env vars, and log analytics — all var-driven for hashicorp/azurerm ~> 4.0. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "container_instances" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-container-instances?ref=v1.0.0"
name = "..." # Container group name; lowercase alphanumeric/hyphens, <…
resource_group_name = "..." # Resource group holding the group.
location = "..." # Azure region (e.g. `centralindia`).
containers = ["...", "..."] # Container definitions: image, cpu, memory, ports, env, …
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Azure Container Instances (ACI) is the serverless way to run a container — or a small group of co-located containers — on Azure without standing up AKS, managing a node pool, or paying for an idle control plane. You hand Azure an image, a CPU/memory request, and a few knobs, and it bills you per second for the vCPU and GB-seconds the group actually consumes. The unit of deployment is the container group (azurerm_container_group): one or more containers that share a lifecycle, a network namespace, an IP, and optional mounted volumes — the ACI analogue of a Kubernetes pod.
The raw resource is deceptively large. A correct production container group has to juggle the OS type, an image registry credential block (or, better, a managed identity), an IP-address-type that swaps shape depending on whether you go public or VNet-injected, exposed ports, secure vs. plain environment variables, volume mounts, a restart policy, an optional liveness/readiness probe, and a diagnostics sink. Copy-pasting that across services is how you end up with one group logging to App Insights, another to nowhere, and a third still pulling :latest with an admin password in plaintext env vars.
This module wraps azurerm_container_group so a team gets a single, opinionated, variable-driven front door: pass a list of container definitions, pick Public or Private, optionally attach a user-assigned identity for ACR pulls, and the module wires the rest — Log Analytics diagnostics, secure environment variables, and a system-assigned identity by default — with validations that stop the obviously-wrong configurations at plan time.
When to use it
Reach for this module when the workload is bursty, batch, or stateless and small, and a full orchestrator would be overkill:
- Scheduled and event-driven jobs — nightly ETL, report generation, image processing — launched by Logic Apps, Azure Functions, or a Data Factory
Web/Containeractivity, then torn down. - CI/CD ephemeral build agents or test sandboxes that spin up per-pipeline and exit, where per-second billing beats a parked VM.
- Sidecar-style helper groups — an app container plus a log-shipper or a small proxy — that need a private IP inside an existing VNet but don’t justify a node pool.
- Burst capacity from AKS via virtual nodes, or a quick “lift a Docker Compose service into Azure” without rewriting it as a Deployment.
Do not use it for stateful databases, anything needing horizontal autoscaling or rolling updates, services that must survive node failure with self-healing, or workloads needing more than the ACI per-group CPU/memory ceilings. Those belong on AKS, Container Apps, or App Service. ACI has no built-in load balancing or autoscaling — if you need those, this is the wrong primitive.
Module structure
terraform-module-azure-container-instances/
├── versions.tf # provider + Terraform version pins
├── main.tf # azurerm_container_group + diagnostic settings
├── variables.tf # var-driven inputs with validations
└── outputs.tf # id, fqdn, ip, identity principal id, etc.
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
main.tf
locals {
# ACI only accepts a dns_name_label when the IP is exposed publicly.
effective_dns_label = var.ip_address_type == "Public" ? var.dns_name_label : null
# A user-assigned identity (for ACR pulls) and/or system-assigned can both apply.
identity_type = (
length(var.user_assigned_identity_ids) > 0 && var.enable_system_assigned_identity
? "SystemAssigned, UserAssigned"
: length(var.user_assigned_identity_ids) > 0
? "UserAssigned"
: "SystemAssigned"
)
}
resource "azurerm_container_group" "this" {
name = var.name
resource_group_name = var.resource_group_name
location = var.location
os_type = var.os_type
restart_policy = var.restart_policy
ip_address_type = var.ip_address_type
dns_name_label = local.effective_dns_label
subnet_ids = var.ip_address_type == "Private" ? var.subnet_ids : null
# Zone pinning is only valid for Linux + Private (VNet-injected) groups.
zones = var.zones
tags = var.tags
identity {
type = local.identity_type
identity_ids = length(var.user_assigned_identity_ids) > 0 ? var.user_assigned_identity_ids : null
}
# Private registry auth — one block per registry. Omit entirely for public images.
dynamic "image_registry_credential" {
for_each = var.image_registry_credentials
content {
server = image_registry_credential.value.server
username = lookup(image_registry_credential.value, "username", null)
password = lookup(image_registry_credential.value, "password", null)
user_assigned_identity_id = lookup(image_registry_credential.value, "user_assigned_identity_id", null)
}
}
dynamic "container" {
for_each = var.containers
content {
name = container.value.name
image = container.value.image
cpu = container.value.cpu
memory = container.value.memory
cpu_limit = lookup(container.value, "cpu_limit", null)
memory_limit = lookup(container.value, "memory_limit", null)
environment_variables = lookup(container.value, "environment_variables", null)
secure_environment_variables = lookup(container.value, "secure_environment_variables", null)
commands = lookup(container.value, "commands", null)
dynamic "ports" {
for_each = lookup(container.value, "ports", [])
content {
port = ports.value.port
protocol = lookup(ports.value, "protocol", "TCP")
}
}
dynamic "volume" {
for_each = lookup(container.value, "volumes", [])
content {
name = volume.value.name
mount_path = volume.value.mount_path
read_only = lookup(volume.value, "read_only", false)
storage_account_name = lookup(volume.value, "storage_account_name", null)
storage_account_key = lookup(volume.value, "storage_account_key", null)
share_name = lookup(volume.value, "share_name", null)
}
}
dynamic "liveness_probe" {
for_each = lookup(container.value, "liveness_probe", null) != null ? [container.value.liveness_probe] : []
content {
initial_delay_seconds = lookup(liveness_probe.value, "initial_delay_seconds", null)
period_seconds = lookup(liveness_probe.value, "period_seconds", null)
failure_threshold = lookup(liveness_probe.value, "failure_threshold", null)
dynamic "http_get" {
for_each = lookup(liveness_probe.value, "http_get", null) != null ? [liveness_probe.value.http_get] : []
content {
path = lookup(http_get.value, "path", null)
port = http_get.value.port
scheme = lookup(http_get.value, "scheme", "Http")
}
}
}
}
dynamic "readiness_probe" {
for_each = lookup(container.value, "readiness_probe", null) != null ? [container.value.readiness_probe] : []
content {
initial_delay_seconds = lookup(readiness_probe.value, "initial_delay_seconds", null)
period_seconds = lookup(readiness_probe.value, "period_seconds", null)
failure_threshold = lookup(readiness_probe.value, "failure_threshold", null)
dynamic "http_get" {
for_each = lookup(readiness_probe.value, "http_get", null) != null ? [readiness_probe.value.http_get] : []
content {
path = lookup(http_get.value, "path", null)
port = http_get.value.port
scheme = lookup(http_get.value, "scheme", "Http")
}
}
}
}
}
}
# Ship stdout/stderr to a Log Analytics workspace when one is supplied.
dynamic "diagnostics" {
for_each = var.log_analytics_workspace_id != null ? [1] : []
content {
log_analytics {
workspace_id = var.log_analytics_workspace_id
workspace_key = var.log_analytics_workspace_key
}
}
}
lifecycle {
# Restarting a group reissues its public IP; treat IP as managed, not drift.
ignore_changes = [tags["last_deployed"]]
}
}
variables.tf
variable "name" {
description = "Name of the container group. Must be a valid ACI DNS-compatible name."
type = string
validation {
condition = can(regex("^[a-z0-9]([-a-z0-9]*[a-z0-9])?$", var.name)) && length(var.name) <= 63
error_message = "name must be lowercase alphanumeric/hyphens, start and end alphanumeric, and be <= 63 chars."
}
}
variable "resource_group_name" {
description = "Resource group that will hold the container group."
type = string
}
variable "location" {
description = "Azure region for the container group (e.g. centralindia, eastus)."
type = string
}
variable "os_type" {
description = "Operating system for the container group."
type = string
default = "Linux"
validation {
condition = contains(["Linux", "Windows"], var.os_type)
error_message = "os_type must be either 'Linux' or 'Windows'."
}
}
variable "restart_policy" {
description = "Restart behaviour: Always (services), OnFailure (jobs), or Never (one-shot)."
type = string
default = "Always"
validation {
condition = contains(["Always", "OnFailure", "Never"], var.restart_policy)
error_message = "restart_policy must be one of: Always, OnFailure, Never."
}
}
variable "ip_address_type" {
description = "Public (internet-facing IP), Private (VNet-injected via subnet_ids), or None."
type = string
default = "Public"
validation {
condition = contains(["Public", "Private", "None"], var.ip_address_type)
error_message = "ip_address_type must be one of: Public, Private, None."
}
}
variable "dns_name_label" {
description = "DNS label for the public FQDN (<label>.<region>.azurecontainer.io). Public groups only."
type = string
default = null
}
variable "subnet_ids" {
description = "Subnet IDs to inject the group into. Required (and only used) when ip_address_type = Private."
type = list(string)
default = []
}
variable "zones" {
description = "Availability zones to pin the group to. Linux + Private groups only; null to leave unzoned."
type = list(string)
default = null
}
variable "enable_system_assigned_identity" {
description = "Attach a system-assigned managed identity (used for ACR pulls / Key Vault access)."
type = bool
default = true
}
variable "user_assigned_identity_ids" {
description = "User-assigned managed identity resource IDs to attach to the group."
type = list(string)
default = []
}
variable "image_registry_credentials" {
description = <<-EOT
Private registry credentials. Each entry needs a `server`, plus EITHER
username/password OR a user_assigned_identity_id for AAD-based ACR pulls.
EOT
type = list(object({
server = string
username = optional(string)
password = optional(string)
user_assigned_identity_id = optional(string)
}))
default = []
sensitive = true
}
variable "containers" {
description = "List of containers in the group. cpu/memory are in cores/GB."
type = list(object({
name = string
image = string
cpu = number
memory = number
cpu_limit = optional(number)
memory_limit = optional(number)
commands = optional(list(string))
environment_variables = optional(map(string))
secure_environment_variables = optional(map(string))
ports = optional(list(object({
port = number
protocol = optional(string, "TCP")
})), [])
volumes = optional(list(object({
name = string
mount_path = string
read_only = optional(bool, false)
storage_account_name = optional(string)
storage_account_key = optional(string)
share_name = optional(string)
})), [])
liveness_probe = optional(object({
initial_delay_seconds = optional(number)
period_seconds = optional(number)
failure_threshold = optional(number)
http_get = optional(object({
path = optional(string)
port = number
scheme = optional(string, "Http")
}))
}))
readiness_probe = optional(object({
initial_delay_seconds = optional(number)
period_seconds = optional(number)
failure_threshold = optional(number)
http_get = optional(object({
path = optional(string)
port = number
scheme = optional(string, "Http")
}))
}))
}))
validation {
condition = length(var.containers) > 0
error_message = "At least one container must be defined."
}
validation {
condition = alltrue([for c in var.containers : c.cpu > 0 && c.memory > 0])
error_message = "Every container must request cpu > 0 and memory > 0."
}
}
variable "log_analytics_workspace_id" {
description = "Log Analytics workspace ID (the workspace GUID) for container diagnostics. null disables it."
type = string
default = null
}
variable "log_analytics_workspace_key" {
description = "Primary/secondary shared key for the Log Analytics workspace."
type = string
default = null
sensitive = true
}
variable "tags" {
description = "Tags applied to the container group."
type = map(string)
default = {}
}
outputs.tf
output "id" {
description = "Resource ID of the container group."
value = azurerm_container_group.this.id
}
output "name" {
description = "Name of the container group."
value = azurerm_container_group.this.name
}
output "ip_address" {
description = "IP address allocated to the container group (public or private)."
value = azurerm_container_group.this.ip_address
}
output "fqdn" {
description = "Fully qualified domain name for a Public group with a dns_name_label (null otherwise)."
value = azurerm_container_group.this.fqdn
}
output "identity_principal_id" {
description = "Principal ID of the system-assigned identity, for RBAC role assignments (ACR pull, Key Vault)."
value = try(azurerm_container_group.this.identity[0].principal_id, null)
}
output "identity_tenant_id" {
description = "Tenant ID of the system-assigned identity."
value = try(azurerm_container_group.this.identity[0].tenant_id, null)
}
How to use it
A private, VNet-injected ingest worker plus a log-shipper sidecar, pulling from ACR via a user-assigned identity and shipping logs to Log Analytics:
module "container_instances" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-container-instances?ref=v1.0.0"
name = "aci-ingest-prod"
resource_group_name = azurerm_resource_group.workloads.name
location = "centralindia"
os_type = "Linux"
restart_policy = "OnFailure"
ip_address_type = "Private"
subnet_ids = [azurerm_subnet.aci_delegated.id]
enable_system_assigned_identity = false
user_assigned_identity_ids = [azurerm_user_assigned_identity.aci_acr.id]
image_registry_credentials = [{
server = "kloudvinacr.azurecr.io"
user_assigned_identity_id = azurerm_user_assigned_identity.aci_acr.id
}]
containers = [
{
name = "ingest"
image = "kloudvinacr.azurecr.io/ingest:1.8.3"
cpu = 1.0
memory = 2.0
ports = [{ port = 8080, protocol = "TCP" }]
environment_variables = {
QUEUE_NAME = "events-in"
LOG_LEVEL = "info"
}
secure_environment_variables = {
SERVICEBUS_CONNECTION = data.azurerm_key_vault_secret.sb_conn.value
}
liveness_probe = {
initial_delay_seconds = 15
period_seconds = 20
http_get = { path = "/healthz", port = 8080 }
}
},
{
name = "logship"
image = "kloudvinacr.azurecr.io/fluentbit-sidecar:2.2.0"
cpu = 0.25
memory = 0.5
}
]
log_analytics_workspace_id = azurerm_log_analytics_workspace.platform.workspace_id
log_analytics_workspace_key = azurerm_log_analytics_workspace.platform.primary_shared_key
tags = {
environment = "prod"
workload = "event-ingest"
owner = "platform-team"
}
}
# Downstream: grant the group's identity permission to pull from ACR using a module output.
resource "azurerm_role_assignment" "aci_acr_pull" {
scope = azurerm_container_registry.kloudvin.id
role_definition_name = "AcrPull"
principal_id = azurerm_user_assigned_identity.aci_acr.principal_id
}
# Downstream: a Private DNS A-record pointing at the group's private IP.
resource "azurerm_private_dns_a_record" "ingest" {
name = "ingest"
zone_name = azurerm_private_dns_zone.internal.name
resource_group_name = azurerm_resource_group.networking.name
ttl = 300
records = [module.container_instances.ip_address]
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/container_instances/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-container-instances?ref=v1.0.0"
}
inputs = {
name = "..."
resource_group_name = "..."
location = "..."
containers = ["...", "..."]
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/container_instances && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Container group name; lowercase alphanumeric/hyphens, <= 63 chars. |
resource_group_name |
string |
— | Yes | Resource group holding the group. |
location |
string |
— | Yes | Azure region (e.g. centralindia). |
os_type |
string |
"Linux" |
No | Linux or Windows. |
restart_policy |
string |
"Always" |
No | Always, OnFailure, or Never. |
ip_address_type |
string |
"Public" |
No | Public, Private (VNet-injected), or None. |
dns_name_label |
string |
null |
No | DNS label for the public FQDN; Public groups only. |
subnet_ids |
list(string) |
[] |
No | Delegated subnet IDs; required when ip_address_type = Private. |
zones |
list(string) |
null |
No | Availability zones; Linux + Private groups only. |
enable_system_assigned_identity |
bool |
true |
No | Attach a system-assigned managed identity. |
user_assigned_identity_ids |
list(string) |
[] |
No | User-assigned identity resource IDs to attach. |
image_registry_credentials |
list(object) |
[] |
No | Private registry auth (username/password or UAMI). Sensitive. |
containers |
list(object) |
— | Yes | Container definitions: image, cpu, memory, ports, env, volumes, probes. |
log_analytics_workspace_id |
string |
null |
No | Log Analytics workspace GUID for diagnostics. |
log_analytics_workspace_key |
string |
null |
No | Shared key for the Log Analytics workspace. Sensitive. |
tags |
map(string) |
{} |
No | Tags applied to the container group. |
Outputs
| Name | Description |
|---|---|
id |
Resource ID of the container group. |
name |
Name of the container group. |
ip_address |
IP address allocated to the group (public or private). |
fqdn |
Public FQDN when a dns_name_label is set; null otherwise. |
identity_principal_id |
Principal ID of the system-assigned identity, for RBAC assignments. |
identity_tenant_id |
Tenant ID of the system-assigned identity. |
Enterprise scenario
A retail analytics platform runs a nightly inventory-reconciliation job that pulls files from three ERP feeds, normalizes them, and writes to a data lake. Azure Data Factory triggers this module’s container group on a schedule with restart_policy = "OnFailure" and ip_address_type = "Private" so the job runs inside the data-platform VNet, reaching the lake over a private endpoint. The job container pulls its image from ACR using a user-assigned identity (no registry passwords in state), streams stdout to the shared Log Analytics workspace for the on-call team’s Kusto dashboards, and the whole group bills for only the ~12 minutes it runs each night instead of parking an AKS node pool or a VM 24/7.
Best practices
- Pull with a managed identity, never
latest. Pass auser_assigned_identity_idinimage_registry_credentialsand grant itAcrPull(as the downstream example shows) so no registry password ever lands in state. Always pin images to an immutable tag or digest (ingest:1.8.3), because ACI does not auto-restart on a new push. - Secrets go in
secure_environment_variables, notenvironment_variables. Plain env vars are visible inaz container showand the portal; secure ones are write-only. Better still, source their values from aazurerm_key_vault_secretdata source so they’re never hard-coded in HCL. - Inject into a delegated subnet for anything internal.
ip_address_type = "Private"with a subnet delegated toMicrosoft.ContainerInstance/containerGroupskeeps the group off the public internet and lets you front it with a private DNS record — ACI gives you no NSG-on-the-IP for public groups, so private is the safer default. - Right-size cpu/memory and pick the correct
restart_policy. ACI bills per vCPU-second and GB-second, so over-requesting cores is pure waste; useOnFailureorNeverfor jobs so a completed run actually stops billing instead of looping underAlways. - Always wire
log_analytics_workspace_id. A container group with no diagnostics sink loses stdout/stderr the moment it’s deleted; shipping to Log Analytics is the only durable way to debug a job that already exited. - Name and tag for ownership and cost. Use a predictable convention (
aci-<workload>-<env>) and tagenvironment,owner, andworkloadso per-second ACI spend is attributable in Cost Management and ephemeral groups are easy to identify and reap.