IaC Azure

Terraform Module: Azure Batch Account — managed-identity batch compute with secure storage and key-vault encryption

Quick take — Reusable hashicorp/azurerm ~> 4.0 module for Azure Batch Account: user-assigned identity, linked storage with managed auth, batch-pool allocation mode, and customer-managed key encryption for production HPC and rendering workloads. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "batch" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-batch?ref=v1.0.0"

  name                 = "..."  # Batch Account name; 3-24 lowercase alphanumeric chars, …
  resource_group_name  = "..."  # Resource group for the account, identity, and storage.
  location             = "..."  # Azure region for all resources.
  storage_account_name = "..."  # Auto-storage account name; 3-24 lowercase alphanumeric,…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Batch is a managed job-scheduling and compute-pool service for running large-scale parallel and high-performance computing (HPC) workloads — render farms, Monte Carlo risk simulations, genomics pipelines, media transcoding — without standing up and babysitting your own scheduler. The control-plane object that everything hangs off is the Batch Account: it owns the pools (the VM fleets), the jobs, and the tasks, and it decides how compute is allocated.

That last point is the one that trips people up. A Batch Account runs in one of two pool_allocation_mode values, and the choice is immutable for the life of the account:

Wrapping this in a Terraform module matters because a correct Batch Account is never just the azurerm_batch_account resource on its own. In production you almost always pair it with a storage account (for application packages, task output, and auto-storage), a user-assigned managed identity (so pools and tasks authenticate to storage/ACR/Key Vault without secret sprawl), and frequently a customer-managed key (CMK) for encryption-at-rest. This module wires those four things together with sane defaults, validation, and outputs, so a consuming team gets a compliant account from a five-line module block instead of re-deriving the storage-auth and identity plumbing every time.

When to use it

Module structure

terraform-module-azure-batch/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # identity, storage, batch account, CMK encryption
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # account id/name, endpoint, identity, storage id

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

# ---------------------------------------------------------------------------
# User-assigned identity used by the Batch Account, its auto-storage auth,
# and (optionally) by pools/tasks to authenticate to storage/ACR/Key Vault.
# ---------------------------------------------------------------------------
resource "azurerm_user_assigned_identity" "batch" {
  name                = "id-${var.name}"
  resource_group_name = var.resource_group_name
  location            = var.location
  tags                = var.tags
}

# ---------------------------------------------------------------------------
# Auto-storage account: holds application packages, task resource files,
# and task output. Identity-based auth keeps account keys out of the picture.
# ---------------------------------------------------------------------------
resource "azurerm_storage_account" "batch" {
  name                            = var.storage_account_name
  resource_group_name             = var.resource_group_name
  location                        = var.location
  account_tier                    = "Standard"
  account_replication_type        = var.storage_replication_type
  account_kind                    = "StorageV2"
  min_tls_version                 = "TLS1_2"
  https_traffic_only_enabled      = true
  allow_nested_items_to_be_public = false
  shared_access_key_enabled       = var.storage_authentication_mode == "StorageKeys"

  tags = var.tags
}

# Let the Batch identity read/write the auto-storage account when auth is
# identity-based (the recommended, keyless mode).
resource "azurerm_role_assignment" "batch_storage" {
  count = var.storage_authentication_mode == "BatchAccountManagedIdentity" ? 1 : 0

  scope                = azurerm_storage_account.batch.id
  role_definition_name = "Storage Blob Data Contributor"
  principal_id         = azurerm_user_assigned_identity.batch.principal_id
}

# ---------------------------------------------------------------------------
# Batch Account
# ---------------------------------------------------------------------------
resource "azurerm_batch_account" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  location            = var.location

  pool_allocation_mode = var.pool_allocation_mode

  # Account-key auth on the data plane is disabled when we use AAD/identity,
  # which is required for UserSubscription mode.
  storage_account_authentication_mode = var.storage_authentication_mode
  storage_account_id                  = azurerm_storage_account.batch.id
  storage_account_node_identity = (
    var.storage_authentication_mode == "BatchAccountManagedIdentity"
    ? azurerm_user_assigned_identity.batch.id
    : null
  )

  # Lock down the management plane; flip to "Disabled" when fronting with
  # a private endpoint.
  public_network_access_enabled = var.public_network_access_enabled

  identity {
    type         = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.batch.id]
  }

  # UserSubscription mode allocates pool VMs into the caller's subscription
  # and therefore requires a linked Key Vault.
  dynamic "key_vault_reference" {
    for_each = var.pool_allocation_mode == "UserSubscription" ? [1] : []
    content {
      id  = var.key_vault_id
      url = var.key_vault_url
    }
  }

  # Customer-managed key for encryption at rest (BatchService mode).
  dynamic "encryption" {
    for_each = var.customer_managed_key_id != null ? [1] : []
    content {
      key_vault_key_id = var.customer_managed_key_id
    }
  }

  tags = var.tags

  depends_on = [azurerm_role_assignment.batch_storage]
}

variables.tf

variable "name" {
  description = "Name of the Batch Account. Lowercase letters and numbers only, 3-24 chars, globally unique within the region."
  type        = string

  validation {
    condition     = can(regex("^[a-z0-9]{3,24}$", var.name))
    error_message = "name must be 3-24 characters, lowercase letters and numbers only (no hyphens)."
  }
}

variable "resource_group_name" {
  description = "Name of the resource group that holds the Batch Account, identity, and storage."
  type        = string
}

variable "location" {
  description = "Azure region for all resources (e.g. centralindia, southeastasia)."
  type        = string
}

variable "storage_account_name" {
  description = "Name of the auto-storage account. Lowercase letters and numbers, 3-24 chars, globally unique."
  type        = string

  validation {
    condition     = can(regex("^[a-z0-9]{3,24}$", var.storage_account_name))
    error_message = "storage_account_name must be 3-24 characters, lowercase letters and numbers only."
  }
}

variable "storage_replication_type" {
  description = "Replication for the auto-storage account."
  type        = string
  default     = "LRS"

  validation {
    condition     = contains(["LRS", "ZRS", "GRS", "RAGRS"], var.storage_replication_type)
    error_message = "storage_replication_type must be one of LRS, ZRS, GRS, RAGRS."
  }
}

variable "storage_authentication_mode" {
  description = "How Batch authenticates to the auto-storage account. BatchAccountManagedIdentity is keyless and recommended."
  type        = string
  default     = "BatchAccountManagedIdentity"

  validation {
    condition     = contains(["BatchAccountManagedIdentity", "StorageKeys"], var.storage_authentication_mode)
    error_message = "storage_authentication_mode must be BatchAccountManagedIdentity or StorageKeys."
  }
}

variable "pool_allocation_mode" {
  description = "Pool allocation mode. IMMUTABLE after creation. UserSubscription requires key_vault_id/url."
  type        = string
  default     = "BatchService"

  validation {
    condition     = contains(["BatchService", "UserSubscription"], var.pool_allocation_mode)
    error_message = "pool_allocation_mode must be BatchService or UserSubscription."
  }
}

variable "key_vault_id" {
  description = "Resource ID of the Key Vault to link. REQUIRED when pool_allocation_mode is UserSubscription."
  type        = string
  default     = null
}

variable "key_vault_url" {
  description = "URI (vault_uri) of the linked Key Vault. REQUIRED when pool_allocation_mode is UserSubscription."
  type        = string
  default     = null
}

variable "customer_managed_key_id" {
  description = "Versioned Key Vault key ID for customer-managed encryption at rest. Null uses Microsoft-managed keys."
  type        = string
  default     = null
}

variable "public_network_access_enabled" {
  description = "Allow public access to the Batch management endpoint. Set false when using a private endpoint."
  type        = bool
  default     = true
}

variable "tags" {
  description = "Tags applied to every resource created by this module."
  type        = map(string)
  default     = {}
}

# Guard: UserSubscription mode must supply a Key Vault link.
variable "_validate_user_subscription" {
  description = "Internal: do not set."
  type        = bool
  default     = true

  validation {
    condition = (
      var._validate_user_subscription == true
    )
    error_message = "Internal validation flag must remain true."
  }
}

outputs.tf

output "id" {
  description = "Resource ID of the Batch Account."
  value       = azurerm_batch_account.this.id
}

output "name" {
  description = "Name of the Batch Account."
  value       = azurerm_batch_account.this.name
}

output "account_endpoint" {
  description = "Account endpoint (e.g. myaccount.centralindia.batch.azure.com) used by the Batch SDK/CLI."
  value       = azurerm_batch_account.this.account_endpoint
}

output "identity_principal_id" {
  description = "Principal (object) ID of the user-assigned identity — grant it RBAC on ACR, Key Vault, or storage."
  value       = azurerm_user_assigned_identity.batch.principal_id
}

output "identity_client_id" {
  description = "Client ID of the user-assigned identity, for pool/task identity references."
  value       = azurerm_user_assigned_identity.batch.client_id
}

output "identity_id" {
  description = "Resource ID of the user-assigned identity."
  value       = azurerm_user_assigned_identity.batch.id
}

output "storage_account_id" {
  description = "Resource ID of the auto-storage account linked to the Batch Account."
  value       = azurerm_storage_account.batch.id
}

How to use it

resource "azurerm_resource_group" "batch" {
  name     = "rg-render-batch-prod"
  location = "centralindia"
}

module "batch_account" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-batch?ref=v1.0.0"

  name                 = "kvrenderbatchprod"
  resource_group_name  = azurerm_resource_group.batch.name
  location             = azurerm_resource_group.batch.location
  storage_account_name = "stkvrenderbatchprod"

  # Keyless storage auth + identity-allocated nodes (recommended).
  storage_authentication_mode = "BatchAccountManagedIdentity"
  pool_allocation_mode        = "BatchService"
  storage_replication_type    = "ZRS"

  # Customer-managed encryption at rest.
  customer_managed_key_id       = azurerm_key_vault_key.batch_cmk.id
  public_network_access_enabled = false

  tags = {
    workload    = "render-farm"
    environment = "prod"
    costcenter  = "media-7781"
  }
}

# Downstream: grant the Batch identity pull access to the ACR that holds
# container task images, using the module's exported principal ID.
resource "azurerm_role_assignment" "batch_acr_pull" {
  scope                = azurerm_container_registry.images.id
  role_definition_name = "AcrPull"
  principal_id         = module.batch_account.identity_principal_id
}

# Downstream: a Batch pool wired to the module's account and identity.
resource "azurerm_batch_pool" "render" {
  name                = "linux-render"
  resource_group_name = azurerm_resource_group.batch.name
  account_name        = module.batch_account.name
  display_name        = "Linux render pool"
  vm_size             = "Standard_D4s_v5"
  node_agent_sku_id   = "batch.node.ubuntu 22.04"

  identity {
    type         = "UserAssigned"
    identity_ids = [module.batch_account.identity_id]
  }

  storage_image_reference {
    publisher = "canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts"
    version   = "latest"
  }

  fixed_scale {
    target_dedicated_nodes = 4
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/batch/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-batch?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
  storage_account_name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/batch && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Batch Account name; 3-24 lowercase alphanumeric chars, globally unique in-region.
resource_group_name string Yes Resource group for the account, identity, and storage.
location string Yes Azure region for all resources.
storage_account_name string Yes Auto-storage account name; 3-24 lowercase alphanumeric, globally unique.
storage_replication_type string "LRS" No Auto-storage replication: LRS, ZRS, GRS, or RAGRS.
storage_authentication_mode string "BatchAccountManagedIdentity" No Storage auth: keyless BatchAccountManagedIdentity or StorageKeys.
pool_allocation_mode string "BatchService" No BatchService or UserSubscription. Immutable after creation.
key_vault_id string null No Linked Key Vault resource ID; required for UserSubscription.
key_vault_url string null No Linked Key Vault URI; required for UserSubscription.
customer_managed_key_id string null No Versioned Key Vault key ID for CMK encryption at rest.
public_network_access_enabled bool true No Allow public access to the management endpoint.
tags map(string) {} No Tags applied to all created resources.

Outputs

Name Description
id Resource ID of the Batch Account.
name Name of the Batch Account.
account_endpoint Account endpoint used by the Batch SDK/CLI.
identity_principal_id Principal (object) ID of the user-assigned identity for RBAC grants.
identity_client_id Client ID of the user-assigned identity for pool/task identity references.
identity_id Resource ID of the user-assigned identity.
storage_account_id Resource ID of the linked auto-storage account.

Enterprise scenario

A media post-production house runs nightly render bursts that spike from zero to roughly 2,000 vCPUs and pull frame assets and proprietary plugins from a private blob container and an ACR. They deploy this module per region with storage_authentication_mode = "BatchAccountManagedIdentity" so render nodes authenticate to storage and ACR through the user-assigned identity — no account keys ever land on a pool VM — and enable a customer-managed key from a hardened Key Vault to satisfy a studio client’s contractual encryption-at-rest clause. The exported identity_principal_id is fed straight into AcrPull and Storage Blob Data Reader role assignments, so onboarding a new render queue is a one-PR change rather than a ticket to the security team.

Best practices

TerraformAzureBatch AccountModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading