Terraform Module: Azure Availability Set — Pin VM Fault & Update Domains for In-Region Resilience

Quick take — A reusable hashicorp/azurerm 4.x Terraform module for azurerm_availability_set: validated fault/update domain counts, managed-disk alignment, and proximity placement wiring for resilient IaaS VMs. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "availability_set" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-availability-set?ref=v1.0.0"

  name                = "..."  # Name of the availability set (1-80 chars; starts with l…
  resource_group_name = "..."  # Resource group to create the availability set in.
  location            = "..."  # Azure region; must match the region of the VMs that joi…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

An Azure Availability Set is a logical grouping that spreads two or more IaaS virtual machines across fault domains (separate physical racks with independent power and network) and update domains (groups patched and rebooted separately during planned host maintenance). It is the in-datacenter resiliency primitive that backs the single-region 99.95% VM SLA — it protects you from a top-of-rack switch failure or a host OS update bringing down every VM in a tier at once. Unlike Availability Zones, an availability set lives inside a single datacenter and costs nothing extra; you pay only for the VMs you place in it.

Wrapping azurerm_availability_set in a reusable module matters because the resource has sharp, easy-to-miss constraints: the fault domain count is capped per region (often 2, sometimes 3), managed = true must be set so VM managed disks are also fault-domain aligned, and the set is immutable — once VMs are deployed you cannot move them into or out of an existing set, nor change the domain counts. A module bakes in validated domain counts, the always-on managed flag, consistent tagging, and an optional proximity placement group reference, so every VM tier across every subscription gets the same correct, audited configuration instead of a hand-typed resource block that silently degrades resiliency.

When to use it

You run classic IaaS workloads (domain controllers, SQL Server FCI nodes, legacy app tiers, appliance VMs) in a region that does not offer Availability Zones, where an availability set is the only intra-region resiliency option.
You need to meet the 99.95% single-region VM SLA, which requires two or more VMs of the same role in one availability set.
You are placing multiple VMs behind a Basic or Standard Load Balancer / Application Gateway backend pool and want them spread across racks and update waves.
You want VMs co-located for low latency (via a proximity placement group) while still keeping fault/update domain separation.
You are not using Virtual Machine Scale Sets in Flexible orchestration mode (which supersedes availability sets for most greenfield designs) but must support a lift-and-shift or vendor-pinned topology.

If your region supports zones and the workload is greenfield, prefer Availability Zones or a Flexible VMSS instead — an availability set cannot span zones.

Module structure

terraform-module-azure-availability-set/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

locals {
  # An availability set only provides its SLA with 2+ VMs and 2+ fault domains.
  # Surface a single, normalized tag set so every set is auditable.
  base_tags = merge(
    {
      "managed-by"        = "terraform"
      "module"            = "terraform-module-azure-availability-set"
      "resilience-tier"   = "availability-set"
    },
    var.tags
  )
}

resource "azurerm_availability_set" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  location            = var.location

  # Fault domains = independent racks; update domains = independent patch/reboot waves.
  platform_fault_domain_count  = var.platform_fault_domain_count
  platform_update_domain_count = var.platform_update_domain_count

  # MUST be true so VM managed disks are aligned to the fault domains.
  # Unmanaged (false) is legacy and breaks managed-disk VMs.
  managed = var.managed

  # Optional co-location for low-latency inter-VM traffic.
  proximity_placement_group_id = var.proximity_placement_group_id

  tags = local.base_tags
}

variables.tf

variable "name" {
  description = "Name of the availability set (1-80 chars; letters, numbers, underscores, periods, hyphens; must start with a letter or number)."
  type        = string

  validation {
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9._-]{0,79}$", var.name))
    error_message = "name must be 1-80 chars, start with a letter or number, and contain only letters, numbers, '.', '_' or '-'."
  }
}

variable "resource_group_name" {
  description = "Name of the resource group in which to create the availability set."
  type        = string
}

variable "location" {
  description = "Azure region for the availability set (e.g. 'centralindia'). Must match the region of the VMs that will join it."
  type        = string
}

variable "platform_fault_domain_count" {
  description = "Number of fault domains (independent racks). Region-capped, commonly 2 (some regions allow 3). Use 2 for the broadest compatibility."
  type        = number
  default     = 2

  validation {
    condition     = var.platform_fault_domain_count >= 1 && var.platform_fault_domain_count <= 3
    error_message = "platform_fault_domain_count must be between 1 and 3; most regions cap it at 2."
  }
}

variable "platform_update_domain_count" {
  description = "Number of update domains (independent patch/reboot waves). Valid range is 1-20; 5 is a sensible default that balances spread and density."
  type        = number
  default     = 5

  validation {
    condition     = var.platform_update_domain_count >= 1 && var.platform_update_domain_count <= 20
    error_message = "platform_update_domain_count must be between 1 and 20."
  }
}

variable "managed" {
  description = "Align managed disks to the fault domains. Keep true for any modern managed-disk VM; false is legacy/unmanaged only."
  type        = bool
  default     = true
}

variable "proximity_placement_group_id" {
  description = "Optional resource ID of a proximity placement group to co-locate the VMs for low latency. Set null to disable."
  type        = string
  default     = null

  validation {
    condition     = var.proximity_placement_group_id == null || can(regex("/proximityPlacementGroups/", var.proximity_placement_group_id))
    error_message = "proximity_placement_group_id must be a valid proximity placement group resource ID or null."
  }
}

variable "tags" {
  description = "Additional tags merged with the module's base tags."
  type        = map(string)
  default     = {}
}

outputs.tf

output "id" {
  description = "Resource ID of the availability set; assign to a VM's availability_set_id."
  value       = azurerm_availability_set.this.id
}

output "name" {
  description = "Name of the availability set."
  value       = azurerm_availability_set.this.name
}

output "location" {
  description = "Region of the availability set; VMs must be deployed in the same region."
  value       = azurerm_availability_set.this.location
}

output "platform_fault_domain_count" {
  description = "Effective fault domain count applied to the set."
  value       = azurerm_availability_set.this.platform_fault_domain_count
}

output "platform_update_domain_count" {
  description = "Effective update domain count applied to the set."
  value       = azurerm_availability_set.this.platform_update_domain_count
}

How to use it

resource "azurerm_resource_group" "app" {
  name     = "rg-app-sql-prod"
  location = "centralindia"
}

module "availability_set" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-availability-set?ref=v1.0.0"

  name                = "avail-sql-prod"
  resource_group_name = azurerm_resource_group.app.name
  location            = azurerm_resource_group.app.location

  platform_fault_domain_count  = 2
  platform_update_domain_count = 5
  managed                      = true

  tags = {
    workload    = "sql-fci"
    environment = "prod"
  }
}

# Downstream: place both SQL FCI nodes into the set using its output.
resource "azurerm_windows_virtual_machine" "sql" {
  count               = 2
  name                = "vm-sql-prod-${count.index}"
  resource_group_name = azurerm_resource_group.app.name
  location            = azurerm_resource_group.app.location
  size                = "Standard_D4s_v5"
  admin_username      = "sqladmin"
  admin_password      = var.sql_admin_password
  availability_set_id = module.availability_set.id # <-- module output wires the VM in

  network_interface_ids = [azurerm_network_interface.sql[count.index].id]

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Premium_LRS"
  }

  source_image_reference {
    publisher = "MicrosoftSQLServer"
    offer     = "sql2022-ws2022"
    sku       = "enterprise-gen2"
    version   = "latest"
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module config — live/prod/availability_set/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-availability-set?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/availability_set && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`name`	`string`	—	Yes	Name of the availability set (1-80 chars; starts with letter/number; letters, numbers, `.`, `_`, `-`).
`resource_group_name`	`string`	—	Yes	Resource group to create the availability set in.
`location`	`string`	—	Yes	Azure region; must match the region of the VMs that join the set.
`platform_fault_domain_count`	`number`	`2`	No	Fault domains (racks). Region-capped, validated 1-3; most regions allow max 2.
`platform_update_domain_count`	`number`	`5`	No	Update domains (patch/reboot waves). Validated 1-20.
`managed`	`bool`	`true`	No	Align managed disks to fault domains. Keep `true` for modern VMs.
`proximity_placement_group_id`	`string`	`null`	No	Optional PPG resource ID to co-locate VMs for low latency.
`tags`	`map(string)`	`{}`	No	Extra tags merged with the module’s base tags.

Outputs

Name	Description
`id`	Resource ID of the availability set; assign to each VM’s `availability_set_id`.
`name`	Name of the availability set.
`location`	Region of the set; VMs must be deployed in the same region.
`platform_fault_domain_count`	Effective fault domain count applied.
`platform_update_domain_count`	Effective update domain count applied.

Enterprise scenario

A bank’s core banking application runs an Always On SQL Server failover cluster in centralindia, a region that does not yet expose Availability Zones for the required VM SKU. The platform team consumes this module pinned to v1.0.0 to provision avail-sql-prod with two fault domains and five update domains, then deploys both FCI nodes into it via availability_set_id = module.availability_set.id. When Azure performs planned host maintenance, the two nodes land in different update domains so only one reboots at a time, the cluster fails over cleanly, and the bank holds its single-region 99.95% VM SLA without paying for zone-redundant infrastructure.

Best practices

Always keep managed = true. With managed-disk VMs, an unmanaged set leaves disks unaligned to fault domains and silently defeats the resiliency you provisioned for — the module defaults it on and never exposes a path to forget it.
Place at least two VMs of the same role per set, and never mix tiers. A single VM in an availability set earns no SLA; put web, app, and database tiers in separate sets so a rack or update wave never takes out an entire tier.
Set the set’s location before VMs and treat domain counts as immutable. You cannot change platform_fault_domain_count, move VMs in or out of an existing set, or relocate it later — plan the topology up front, since any change is a destroy-and-recreate of the VMs.
Default fault domains to 2 for portability. Many regions cap fault domains at 2; requesting 3 in a region that only supports 2 fails the deploy, so use the validated default unless you have confirmed the target region’s limit.
Use a consistent, region-scoped naming convention (e.g. avail-<workload>-<env>) and rely on the module’s base tags (managed-by, resilience-tier) so every set is discoverable in cost and resiliency audits.
Reach for Availability Zones or a Flexible VMSS where supported — availability sets are datacenter-bound and do not survive a full datacenter outage; choose zonal deployments for any new workload in a zone-enabled region.