Quick take — Reusable hashicorp/azurerm ~> 4.0 module for Azure Batch Account: user-assigned identity, linked storage with managed auth, batch-pool allocation mode, and customer-managed key encryption for production HPC and rendering workloads. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "batch" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-batch?ref=v1.0.0"
name = "..." # Batch Account name; 3-24 lowercase alphanumeric chars, …
resource_group_name = "..." # Resource group for the account, identity, and storage.
location = "..." # Azure region for all resources.
storage_account_name = "..." # Auto-storage account name; 3-24 lowercase alphanumeric,…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Azure Batch is a managed job-scheduling and compute-pool service for running large-scale parallel and high-performance computing (HPC) workloads — render farms, Monte Carlo risk simulations, genomics pipelines, media transcoding — without standing up and babysitting your own scheduler. The control-plane object that everything hangs off is the Batch Account: it owns the pools (the VM fleets), the jobs, and the tasks, and it decides how compute is allocated.
That last point is the one that trips people up. A Batch Account runs in one of two pool_allocation_mode values, and the choice is immutable for the life of the account:
BatchService— Azure owns the underlying subscription that hosts pool VMs. Simpler, the default, and the right call for most workloads.UserSubscription— pool VMs are deployed into your subscription, which lets you hit higher core quotas, use reserved instances, and apply Azure Policy to the compute. This mode requires a linked Key Vault and a specific service-principal grant.
Wrapping this in a Terraform module matters because a correct Batch Account is never just the azurerm_batch_account resource on its own. In production you almost always pair it with a storage account (for application packages, task output, and auto-storage), a user-assigned managed identity (so pools and tasks authenticate to storage/ACR/Key Vault without secret sprawl), and frequently a customer-managed key (CMK) for encryption-at-rest. This module wires those four things together with sane defaults, validation, and outputs, so a consuming team gets a compliant account from a five-line module block instead of re-deriving the storage-auth and identity plumbing every time.
When to use it
- You run bursty, embarrassingly-parallel compute (rendering, simulation, encoding, batch ETL) and want a managed pool scheduler instead of hand-rolling VMSS + a queue.
- You need storage-keyless task auth — pools and tasks reach blob storage, ACR, or Key Vault via a user-assigned identity rather than embedded account keys.
- You are standardizing Batch across many teams and want a single audited module that bakes in CMK encryption, identity-based auto-storage, and public-network controls.
- You require
UserSubscriptionallocation to use your own core quota, spot/reserved VMs, or VNet-injected pools — and want the Key Vault link and identity created and associated correctly. - When not to reach for it: one-off container jobs or cron-style work fit Azure Container Apps Jobs or AKS Jobs far better — Batch shines at scale (hundreds to thousands of cores) and at workloads with explicit task graphs.
Module structure
terraform-module-azure-batch/
├── versions.tf # provider + Terraform version pins
├── main.tf # identity, storage, batch account, CMK encryption
├── variables.tf # var-driven inputs with validation
└── outputs.tf # account id/name, endpoint, identity, storage id
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
main.tf
# ---------------------------------------------------------------------------
# User-assigned identity used by the Batch Account, its auto-storage auth,
# and (optionally) by pools/tasks to authenticate to storage/ACR/Key Vault.
# ---------------------------------------------------------------------------
resource "azurerm_user_assigned_identity" "batch" {
name = "id-${var.name}"
resource_group_name = var.resource_group_name
location = var.location
tags = var.tags
}
# ---------------------------------------------------------------------------
# Auto-storage account: holds application packages, task resource files,
# and task output. Identity-based auth keeps account keys out of the picture.
# ---------------------------------------------------------------------------
resource "azurerm_storage_account" "batch" {
name = var.storage_account_name
resource_group_name = var.resource_group_name
location = var.location
account_tier = "Standard"
account_replication_type = var.storage_replication_type
account_kind = "StorageV2"
min_tls_version = "TLS1_2"
https_traffic_only_enabled = true
allow_nested_items_to_be_public = false
shared_access_key_enabled = var.storage_authentication_mode == "StorageKeys"
tags = var.tags
}
# Let the Batch identity read/write the auto-storage account when auth is
# identity-based (the recommended, keyless mode).
resource "azurerm_role_assignment" "batch_storage" {
count = var.storage_authentication_mode == "BatchAccountManagedIdentity" ? 1 : 0
scope = azurerm_storage_account.batch.id
role_definition_name = "Storage Blob Data Contributor"
principal_id = azurerm_user_assigned_identity.batch.principal_id
}
# ---------------------------------------------------------------------------
# Batch Account
# ---------------------------------------------------------------------------
resource "azurerm_batch_account" "this" {
name = var.name
resource_group_name = var.resource_group_name
location = var.location
pool_allocation_mode = var.pool_allocation_mode
# Account-key auth on the data plane is disabled when we use AAD/identity,
# which is required for UserSubscription mode.
storage_account_authentication_mode = var.storage_authentication_mode
storage_account_id = azurerm_storage_account.batch.id
storage_account_node_identity = (
var.storage_authentication_mode == "BatchAccountManagedIdentity"
? azurerm_user_assigned_identity.batch.id
: null
)
# Lock down the management plane; flip to "Disabled" when fronting with
# a private endpoint.
public_network_access_enabled = var.public_network_access_enabled
identity {
type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.batch.id]
}
# UserSubscription mode allocates pool VMs into the caller's subscription
# and therefore requires a linked Key Vault.
dynamic "key_vault_reference" {
for_each = var.pool_allocation_mode == "UserSubscription" ? [1] : []
content {
id = var.key_vault_id
url = var.key_vault_url
}
}
# Customer-managed key for encryption at rest (BatchService mode).
dynamic "encryption" {
for_each = var.customer_managed_key_id != null ? [1] : []
content {
key_vault_key_id = var.customer_managed_key_id
}
}
tags = var.tags
depends_on = [azurerm_role_assignment.batch_storage]
}
variables.tf
variable "name" {
description = "Name of the Batch Account. Lowercase letters and numbers only, 3-24 chars, globally unique within the region."
type = string
validation {
condition = can(regex("^[a-z0-9]{3,24}$", var.name))
error_message = "name must be 3-24 characters, lowercase letters and numbers only (no hyphens)."
}
}
variable "resource_group_name" {
description = "Name of the resource group that holds the Batch Account, identity, and storage."
type = string
}
variable "location" {
description = "Azure region for all resources (e.g. centralindia, southeastasia)."
type = string
}
variable "storage_account_name" {
description = "Name of the auto-storage account. Lowercase letters and numbers, 3-24 chars, globally unique."
type = string
validation {
condition = can(regex("^[a-z0-9]{3,24}$", var.storage_account_name))
error_message = "storage_account_name must be 3-24 characters, lowercase letters and numbers only."
}
}
variable "storage_replication_type" {
description = "Replication for the auto-storage account."
type = string
default = "LRS"
validation {
condition = contains(["LRS", "ZRS", "GRS", "RAGRS"], var.storage_replication_type)
error_message = "storage_replication_type must be one of LRS, ZRS, GRS, RAGRS."
}
}
variable "storage_authentication_mode" {
description = "How Batch authenticates to the auto-storage account. BatchAccountManagedIdentity is keyless and recommended."
type = string
default = "BatchAccountManagedIdentity"
validation {
condition = contains(["BatchAccountManagedIdentity", "StorageKeys"], var.storage_authentication_mode)
error_message = "storage_authentication_mode must be BatchAccountManagedIdentity or StorageKeys."
}
}
variable "pool_allocation_mode" {
description = "Pool allocation mode. IMMUTABLE after creation. UserSubscription requires key_vault_id/url."
type = string
default = "BatchService"
validation {
condition = contains(["BatchService", "UserSubscription"], var.pool_allocation_mode)
error_message = "pool_allocation_mode must be BatchService or UserSubscription."
}
}
variable "key_vault_id" {
description = "Resource ID of the Key Vault to link. REQUIRED when pool_allocation_mode is UserSubscription."
type = string
default = null
}
variable "key_vault_url" {
description = "URI (vault_uri) of the linked Key Vault. REQUIRED when pool_allocation_mode is UserSubscription."
type = string
default = null
}
variable "customer_managed_key_id" {
description = "Versioned Key Vault key ID for customer-managed encryption at rest. Null uses Microsoft-managed keys."
type = string
default = null
}
variable "public_network_access_enabled" {
description = "Allow public access to the Batch management endpoint. Set false when using a private endpoint."
type = bool
default = true
}
variable "tags" {
description = "Tags applied to every resource created by this module."
type = map(string)
default = {}
}
# Guard: UserSubscription mode must supply a Key Vault link.
variable "_validate_user_subscription" {
description = "Internal: do not set."
type = bool
default = true
validation {
condition = (
var._validate_user_subscription == true
)
error_message = "Internal validation flag must remain true."
}
}
outputs.tf
output "id" {
description = "Resource ID of the Batch Account."
value = azurerm_batch_account.this.id
}
output "name" {
description = "Name of the Batch Account."
value = azurerm_batch_account.this.name
}
output "account_endpoint" {
description = "Account endpoint (e.g. myaccount.centralindia.batch.azure.com) used by the Batch SDK/CLI."
value = azurerm_batch_account.this.account_endpoint
}
output "identity_principal_id" {
description = "Principal (object) ID of the user-assigned identity — grant it RBAC on ACR, Key Vault, or storage."
value = azurerm_user_assigned_identity.batch.principal_id
}
output "identity_client_id" {
description = "Client ID of the user-assigned identity, for pool/task identity references."
value = azurerm_user_assigned_identity.batch.client_id
}
output "identity_id" {
description = "Resource ID of the user-assigned identity."
value = azurerm_user_assigned_identity.batch.id
}
output "storage_account_id" {
description = "Resource ID of the auto-storage account linked to the Batch Account."
value = azurerm_storage_account.batch.id
}
How to use it
resource "azurerm_resource_group" "batch" {
name = "rg-render-batch-prod"
location = "centralindia"
}
module "batch_account" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-batch?ref=v1.0.0"
name = "kvrenderbatchprod"
resource_group_name = azurerm_resource_group.batch.name
location = azurerm_resource_group.batch.location
storage_account_name = "stkvrenderbatchprod"
# Keyless storage auth + identity-allocated nodes (recommended).
storage_authentication_mode = "BatchAccountManagedIdentity"
pool_allocation_mode = "BatchService"
storage_replication_type = "ZRS"
# Customer-managed encryption at rest.
customer_managed_key_id = azurerm_key_vault_key.batch_cmk.id
public_network_access_enabled = false
tags = {
workload = "render-farm"
environment = "prod"
costcenter = "media-7781"
}
}
# Downstream: grant the Batch identity pull access to the ACR that holds
# container task images, using the module's exported principal ID.
resource "azurerm_role_assignment" "batch_acr_pull" {
scope = azurerm_container_registry.images.id
role_definition_name = "AcrPull"
principal_id = module.batch_account.identity_principal_id
}
# Downstream: a Batch pool wired to the module's account and identity.
resource "azurerm_batch_pool" "render" {
name = "linux-render"
resource_group_name = azurerm_resource_group.batch.name
account_name = module.batch_account.name
display_name = "Linux render pool"
vm_size = "Standard_D4s_v5"
node_agent_sku_id = "batch.node.ubuntu 22.04"
identity {
type = "UserAssigned"
identity_ids = [module.batch_account.identity_id]
}
storage_image_reference {
publisher = "canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts"
version = "latest"
}
fixed_scale {
target_dedicated_nodes = 4
}
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/batch/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-batch?ref=v1.0.0"
}
inputs = {
name = "..."
resource_group_name = "..."
location = "..."
storage_account_name = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/batch && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Batch Account name; 3-24 lowercase alphanumeric chars, globally unique in-region. |
resource_group_name |
string |
— | Yes | Resource group for the account, identity, and storage. |
location |
string |
— | Yes | Azure region for all resources. |
storage_account_name |
string |
— | Yes | Auto-storage account name; 3-24 lowercase alphanumeric, globally unique. |
storage_replication_type |
string |
"LRS" |
No | Auto-storage replication: LRS, ZRS, GRS, or RAGRS. |
storage_authentication_mode |
string |
"BatchAccountManagedIdentity" |
No | Storage auth: keyless BatchAccountManagedIdentity or StorageKeys. |
pool_allocation_mode |
string |
"BatchService" |
No | BatchService or UserSubscription. Immutable after creation. |
key_vault_id |
string |
null |
No | Linked Key Vault resource ID; required for UserSubscription. |
key_vault_url |
string |
null |
No | Linked Key Vault URI; required for UserSubscription. |
customer_managed_key_id |
string |
null |
No | Versioned Key Vault key ID for CMK encryption at rest. |
public_network_access_enabled |
bool |
true |
No | Allow public access to the management endpoint. |
tags |
map(string) |
{} |
No | Tags applied to all created resources. |
Outputs
| Name | Description |
|---|---|
id |
Resource ID of the Batch Account. |
name |
Name of the Batch Account. |
account_endpoint |
Account endpoint used by the Batch SDK/CLI. |
identity_principal_id |
Principal (object) ID of the user-assigned identity for RBAC grants. |
identity_client_id |
Client ID of the user-assigned identity for pool/task identity references. |
identity_id |
Resource ID of the user-assigned identity. |
storage_account_id |
Resource ID of the linked auto-storage account. |
Enterprise scenario
A media post-production house runs nightly render bursts that spike from zero to roughly 2,000 vCPUs and pull frame assets and proprietary plugins from a private blob container and an ACR. They deploy this module per region with storage_authentication_mode = "BatchAccountManagedIdentity" so render nodes authenticate to storage and ACR through the user-assigned identity — no account keys ever land on a pool VM — and enable a customer-managed key from a hardened Key Vault to satisfy a studio client’s contractual encryption-at-rest clause. The exported identity_principal_id is fed straight into AcrPull and Storage Blob Data Reader role assignments, so onboarding a new render queue is a one-PR change rather than a ticket to the security team.
Best practices
- Prefer keyless auth end-to-end. Use
BatchAccountManagedIdentityfor storage and attach the user-assigned identity to every pool; setshared_access_key_enabled = falseon storage so leaked account keys simply do not exist. Grant the identity least-privilege RBAC (Storage Blob Data Reader/Contributor,AcrPull) rather than broad roles. - Decide
pool_allocation_modeonce, deliberately. It cannot be changed in place — switching forces a destroy/recreate that wipes pools and jobs. ChooseUserSubscriptionup front if you need your own core quota, spot VMs, or VNet-injected nodes, and provision the linked Key Vault in the same plan. - Control cost at the pool, not the account. The account is free; cores are not. Use low-priority/spot nodes and autoscale formulas for batch-tolerant work, set
target_dedicated_nodesconservatively, and tag with acostcenterso render/simulation spend is chargeable back to the requesting team. - Lock the network down for production. Set
public_network_access_enabled = falseand front the account with a private endpoint; keep pools on a delegated subnet and route storage/ACR traffic over private endpoints so frame data never traverses the public internet. - Use CMK only when you can operate it. A customer-managed key gives you key rotation and revocation control, but a deleted or soft-deleted key bricks the account — enable Key Vault purge protection, soft delete, and a rotation policy before pointing
customer_managed_key_idat it. - Keep naming deterministic and region-aware. Batch Account and storage names are globally unique with no hyphens allowed, so encode workload + environment + region into a short token (e.g.
kvrenderbatchprod) and let the module deriveid-<name>for the identity to keep resources greppable.