IaC Azure

Terraform Module: Azure Availability Set — Pin VM Fault & Update Domains for In-Region Resilience

Quick take — A reusable hashicorp/azurerm 4.x Terraform module for azurerm_availability_set: validated fault/update domain counts, managed-disk alignment, and proximity placement wiring for resilient IaaS VMs. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "availability_set" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-availability-set?ref=v1.0.0"

  name                = "..."  # Name of the availability set (1-80 chars; starts with l…
  resource_group_name = "..."  # Resource group to create the availability set in.
  location            = "..."  # Azure region; must match the region of the VMs that joi…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

An Azure Availability Set is a logical grouping that spreads two or more IaaS virtual machines across fault domains (separate physical racks with independent power and network) and update domains (groups patched and rebooted separately during planned host maintenance). It is the in-datacenter resiliency primitive that backs the single-region 99.95% VM SLA — it protects you from a top-of-rack switch failure or a host OS update bringing down every VM in a tier at once. Unlike Availability Zones, an availability set lives inside a single datacenter and costs nothing extra; you pay only for the VMs you place in it.

Wrapping azurerm_availability_set in a reusable module matters because the resource has sharp, easy-to-miss constraints: the fault domain count is capped per region (often 2, sometimes 3), managed = true must be set so VM managed disks are also fault-domain aligned, and the set is immutable — once VMs are deployed you cannot move them into or out of an existing set, nor change the domain counts. A module bakes in validated domain counts, the always-on managed flag, consistent tagging, and an optional proximity placement group reference, so every VM tier across every subscription gets the same correct, audited configuration instead of a hand-typed resource block that silently degrades resiliency.

When to use it

If your region supports zones and the workload is greenfield, prefer Availability Zones or a Flexible VMSS instead — an availability set cannot span zones.

Module structure

terraform-module-azure-availability-set/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

locals {
  # An availability set only provides its SLA with 2+ VMs and 2+ fault domains.
  # Surface a single, normalized tag set so every set is auditable.
  base_tags = merge(
    {
      "managed-by"        = "terraform"
      "module"            = "terraform-module-azure-availability-set"
      "resilience-tier"   = "availability-set"
    },
    var.tags
  )
}

resource "azurerm_availability_set" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  location            = var.location

  # Fault domains = independent racks; update domains = independent patch/reboot waves.
  platform_fault_domain_count  = var.platform_fault_domain_count
  platform_update_domain_count = var.platform_update_domain_count

  # MUST be true so VM managed disks are aligned to the fault domains.
  # Unmanaged (false) is legacy and breaks managed-disk VMs.
  managed = var.managed

  # Optional co-location for low-latency inter-VM traffic.
  proximity_placement_group_id = var.proximity_placement_group_id

  tags = local.base_tags
}

variables.tf

variable "name" {
  description = "Name of the availability set (1-80 chars; letters, numbers, underscores, periods, hyphens; must start with a letter or number)."
  type        = string

  validation {
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9._-]{0,79}$", var.name))
    error_message = "name must be 1-80 chars, start with a letter or number, and contain only letters, numbers, '.', '_' or '-'."
  }
}

variable "resource_group_name" {
  description = "Name of the resource group in which to create the availability set."
  type        = string
}

variable "location" {
  description = "Azure region for the availability set (e.g. 'centralindia'). Must match the region of the VMs that will join it."
  type        = string
}

variable "platform_fault_domain_count" {
  description = "Number of fault domains (independent racks). Region-capped, commonly 2 (some regions allow 3). Use 2 for the broadest compatibility."
  type        = number
  default     = 2

  validation {
    condition     = var.platform_fault_domain_count >= 1 && var.platform_fault_domain_count <= 3
    error_message = "platform_fault_domain_count must be between 1 and 3; most regions cap it at 2."
  }
}

variable "platform_update_domain_count" {
  description = "Number of update domains (independent patch/reboot waves). Valid range is 1-20; 5 is a sensible default that balances spread and density."
  type        = number
  default     = 5

  validation {
    condition     = var.platform_update_domain_count >= 1 && var.platform_update_domain_count <= 20
    error_message = "platform_update_domain_count must be between 1 and 20."
  }
}

variable "managed" {
  description = "Align managed disks to the fault domains. Keep true for any modern managed-disk VM; false is legacy/unmanaged only."
  type        = bool
  default     = true
}

variable "proximity_placement_group_id" {
  description = "Optional resource ID of a proximity placement group to co-locate the VMs for low latency. Set null to disable."
  type        = string
  default     = null

  validation {
    condition     = var.proximity_placement_group_id == null || can(regex("/proximityPlacementGroups/", var.proximity_placement_group_id))
    error_message = "proximity_placement_group_id must be a valid proximity placement group resource ID or null."
  }
}

variable "tags" {
  description = "Additional tags merged with the module's base tags."
  type        = map(string)
  default     = {}
}

outputs.tf

output "id" {
  description = "Resource ID of the availability set; assign to a VM's availability_set_id."
  value       = azurerm_availability_set.this.id
}

output "name" {
  description = "Name of the availability set."
  value       = azurerm_availability_set.this.name
}

output "location" {
  description = "Region of the availability set; VMs must be deployed in the same region."
  value       = azurerm_availability_set.this.location
}

output "platform_fault_domain_count" {
  description = "Effective fault domain count applied to the set."
  value       = azurerm_availability_set.this.platform_fault_domain_count
}

output "platform_update_domain_count" {
  description = "Effective update domain count applied to the set."
  value       = azurerm_availability_set.this.platform_update_domain_count
}

How to use it

resource "azurerm_resource_group" "app" {
  name     = "rg-app-sql-prod"
  location = "centralindia"
}

module "availability_set" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-availability-set?ref=v1.0.0"

  name                = "avail-sql-prod"
  resource_group_name = azurerm_resource_group.app.name
  location            = azurerm_resource_group.app.location

  platform_fault_domain_count  = 2
  platform_update_domain_count = 5
  managed                      = true

  tags = {
    workload    = "sql-fci"
    environment = "prod"
  }
}

# Downstream: place both SQL FCI nodes into the set using its output.
resource "azurerm_windows_virtual_machine" "sql" {
  count               = 2
  name                = "vm-sql-prod-${count.index}"
  resource_group_name = azurerm_resource_group.app.name
  location            = azurerm_resource_group.app.location
  size                = "Standard_D4s_v5"
  admin_username      = "sqladmin"
  admin_password      = var.sql_admin_password
  availability_set_id = module.availability_set.id # <-- module output wires the VM in

  network_interface_ids = [azurerm_network_interface.sql[count.index].id]

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Premium_LRS"
  }

  source_image_reference {
    publisher = "MicrosoftSQLServer"
    offer     = "sql2022-ws2022"
    sku       = "enterprise-gen2"
    version   = "latest"
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/availability_set/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-availability-set?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/availability_set && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Name of the availability set (1-80 chars; starts with letter/number; letters, numbers, ., _, -).
resource_group_name string Yes Resource group to create the availability set in.
location string Yes Azure region; must match the region of the VMs that join the set.
platform_fault_domain_count number 2 No Fault domains (racks). Region-capped, validated 1-3; most regions allow max 2.
platform_update_domain_count number 5 No Update domains (patch/reboot waves). Validated 1-20.
managed bool true No Align managed disks to fault domains. Keep true for modern VMs.
proximity_placement_group_id string null No Optional PPG resource ID to co-locate VMs for low latency.
tags map(string) {} No Extra tags merged with the module’s base tags.

Outputs

Name Description
id Resource ID of the availability set; assign to each VM’s availability_set_id.
name Name of the availability set.
location Region of the set; VMs must be deployed in the same region.
platform_fault_domain_count Effective fault domain count applied.
platform_update_domain_count Effective update domain count applied.

Enterprise scenario

A bank’s core banking application runs an Always On SQL Server failover cluster in centralindia, a region that does not yet expose Availability Zones for the required VM SKU. The platform team consumes this module pinned to v1.0.0 to provision avail-sql-prod with two fault domains and five update domains, then deploys both FCI nodes into it via availability_set_id = module.availability_set.id. When Azure performs planned host maintenance, the two nodes land in different update domains so only one reboots at a time, the cluster fails over cleanly, and the bank holds its single-region 99.95% VM SLA without paying for zone-redundant infrastructure.

Best practices

TerraformAzureAvailability SetModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading