IaC Azure

Terraform Module: Azure Elastic SAN — shared block storage with per-volume-group isolation

Quick take — A reusable Terraform module for Azure Elastic SAN on azurerm ~> 4.0: provision the SAN, size base/extra TiB, carve volume groups with private endpoints, and expose iSCSI targets to your VMs and AKS. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "elastic_san" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-elastic-san?ref=v1.0.0"

  name                = "..."  # Name of the Elastic SAN (3-63 chars, lowercase alphanum…
  resource_group_name = "..."  # Resource group that will hold the SAN.
  location            = "..."  # Azure region (Elastic SAN is region-limited).
  base_size_in_tib    = 0      # Provisioned base capacity in TiB (carries performance),…
  volume_groups       = {}     # Volume groups with per-group network/encryption posture…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Elastic SAN is a fully managed, cloud-native storage area network. Instead of attaching one managed disk per VM, you provision a single SAN appliance with a pool of capacity, slice that pool into volume groups, and carve individual volumes that are exposed as iSCSI targets. Multiple compute clients — IaaS VMs, VMSS, or AKS nodes via the CSI driver — connect to those targets over the storage network. The big wins are shared throughput/IOPS that can be dynamically distributed across volumes, large scale (hundreds of TiB from one resource), and a billing model split into base capacity (provisioned performance) and additional capacity (cheaper, capacity-only).

The raw provider surface is fiddly: capacity is expressed in TiB with a minimum base size, SKU encodes both tier and redundancy (Premium_LRS vs Premium_ZRS), volume groups own the network ACLs and encryption settings, and each volume’s size_in_gib interacts with the SAN’s total provisioned TiB. Wrapping azurerm_elastic_san, azurerm_elastic_san_volume_group, and azurerm_elastic_san_volume in one module gives you validated inputs (no more “ZRS is only valid in some regions” surprises), consistent naming, optional private-endpoint lockdown per volume group, and clean outputs (the SAN id, volume ids, and the iSCSI target IQNs) that downstream compute can consume directly.

When to use it

Module structure

terraform-module-azure-elastic-san/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # elastic_san + volume_group(s) + volume(s) + optional private endpoints
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # SAN id/name, volume group ids, volume ids + iSCSI targets
# versions.tf
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}
# main.tf

# Map zone -> SKU-compatible value is encoded via the sku_name variable.
# Elastic SAN base capacity is measured in TiB and carries the provisioned
# performance; extended (additional) capacity is capacity-only and cheaper.

resource "azurerm_elastic_san" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  location            = var.location

  base_size_in_tib       = var.base_size_in_tib
  extended_size_in_tib   = var.extended_size_in_tib

  sku {
    name = var.sku_name
    tier = var.sku_tier
  }

  # Pin availability zones for LRS SANs that are zone-aware.
  # For Premium_ZRS leave this null (the platform spreads across zones).
  zones = var.sku_name == "Premium_ZRS" ? null : var.zones

  tags = var.tags
}

resource "azurerm_elastic_san_volume_group" "this" {
  for_each = var.volume_groups

  name           = each.key
  elastic_san_id = azurerm_elastic_san.this.id

  # iSCSI is the supported protocol type today.
  protocol_type = "Iscsi"

  # Platform-managed or customer-managed key encryption.
  encryption_type = each.value.encryption_key_url == null ? "EncryptionAtRestWithPlatformKey" : "EncryptionAtRestWithCustomerManagedKey"

  dynamic "encryption" {
    for_each = each.value.encryption_key_url == null ? [] : [each.value]
    content {
      key_vault_key_id          = encryption.value.encryption_key_url
      user_assigned_identity_id = encryption.value.identity_id
    }
  }

  dynamic "identity" {
    for_each = each.value.identity_id == null ? [] : [each.value.identity_id]
    content {
      type         = "UserAssigned"
      identity_ids = [identity.value]
    }
  }

  # Default-deny network rules; only the listed subnets may reach the targets.
  dynamic "network_rule" {
    for_each = each.value.allowed_subnet_ids
    content {
      subnet_id = network_rule.value
      action    = "Allow"
    }
  }
}

resource "azurerm_elastic_san_volume" "this" {
  for_each = local.volumes

  name            = each.value.volume_name
  volume_group_id = azurerm_elastic_san_volume_group.this[each.value.group_name].id
  size_in_gib     = each.value.size_in_gib

  # Optionally seed a new volume from a disk or snapshot source.
  dynamic "create_source" {
    for_each = each.value.create_source_id == null ? [] : [each.value]
    content {
      source_type = each.value.create_source_type
      source_id   = each.value.create_source_id
    }
  }
}

# Private endpoint per volume group for full network isolation (no public path).
resource "azurerm_private_endpoint" "this" {
  for_each = {
    for k, v in var.volume_groups : k => v
    if v.private_endpoint_subnet_id != null
  }

  name                = "pe-${var.name}-${each.key}"
  resource_group_name = var.resource_group_name
  location            = var.location
  subnet_id           = each.value.private_endpoint_subnet_id

  private_service_connection {
    name                           = "psc-${var.name}-${each.key}"
    private_connection_resource_id = azurerm_elastic_san_volume_group.this[each.key].id
    subresource_names              = ["volumegroup"]
    is_manual_connection           = false
  }

  tags = var.tags
}

locals {
  # Flatten volume_groups -> volumes into a single addressable map.
  volumes = merge([
    for group_name, group in var.volume_groups : {
      for vol_name, vol in group.volumes :
      "${group_name}/${vol_name}" => {
        group_name         = group_name
        volume_name        = vol_name
        size_in_gib        = vol.size_in_gib
        create_source_id   = vol.create_source_id
        create_source_type = vol.create_source_type
      }
    }
  ]...)
}
# variables.tf

variable "name" {
  type        = string
  description = "Name of the Elastic SAN resource."

  validation {
    condition     = can(regex("^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$", var.name))
    error_message = "name must be 3-63 chars, lowercase alphanumeric and hyphens, starting/ending alphanumeric."
  }
}

variable "resource_group_name" {
  type        = string
  description = "Resource group that will hold the Elastic SAN."
}

variable "location" {
  type        = string
  description = "Azure region. Elastic SAN is only available in a subset of regions."
}

variable "base_size_in_tib" {
  type        = number
  description = "Provisioned base capacity in TiB (carries the SAN performance). Minimum 1."

  validation {
    condition     = var.base_size_in_tib >= 1 && var.base_size_in_tib <= 100
    error_message = "base_size_in_tib must be between 1 and 100 TiB."
  }
}

variable "extended_size_in_tib" {
  type        = number
  default     = 0
  description = "Additional capacity-only TiB on top of base (cheaper, no extra performance). 0 to disable."

  validation {
    condition     = var.extended_size_in_tib >= 0 && var.extended_size_in_tib <= 900
    error_message = "extended_size_in_tib must be between 0 and 900 TiB."
  }
}

variable "sku_name" {
  type        = string
  default     = "Premium_LRS"
  description = "SKU name encoding tier + redundancy: Premium_LRS or Premium_ZRS."

  validation {
    condition     = contains(["Premium_LRS", "Premium_ZRS"], var.sku_name)
    error_message = "sku_name must be Premium_LRS or Premium_ZRS."
  }
}

variable "sku_tier" {
  type        = string
  default     = "Premium"
  description = "SKU tier. Premium is the supported tier for Elastic SAN."

  validation {
    condition     = contains(["Premium"], var.sku_tier)
    error_message = "sku_tier must be Premium."
  }
}

variable "zones" {
  type        = list(string)
  default     = null
  description = "Availability zones for an LRS SAN (e.g. [\"1\"]). Leave null for Premium_ZRS or region-default placement."

  validation {
    condition     = var.zones == null ? true : alltrue([for z in var.zones : contains(["1", "2", "3"], z)])
    error_message = "zones may only contain \"1\", \"2\", or \"3\"."
  }
}

variable "volume_groups" {
  description = "Map of volume groups. Each owns its network/encryption posture and a map of volumes."
  type = map(object({
    allowed_subnet_ids         = optional(list(string), [])
    private_endpoint_subnet_id = optional(string)
    encryption_key_url         = optional(string)
    identity_id                = optional(string)
    volumes = map(object({
      size_in_gib        = number
      create_source_id   = optional(string)
      create_source_type = optional(string, "Disk")
    }))
  }))

  validation {
    condition = alltrue([
      for g in values(var.volume_groups) : alltrue([
        for v in values(g.volumes) : v.size_in_gib >= 1 && v.size_in_gib <= 65536
      ])
    ])
    error_message = "Every volume size_in_gib must be between 1 and 65536 GiB."
  }

  validation {
    condition = alltrue([
      for g in values(var.volume_groups) :
      g.encryption_key_url == null || g.identity_id != null
    ])
    error_message = "A volume group with a customer-managed encryption_key_url must also set identity_id."
  }
}

variable "tags" {
  type        = map(string)
  default     = {}
  description = "Tags applied to the Elastic SAN and private endpoints."
}
# outputs.tf

output "id" {
  description = "Resource ID of the Elastic SAN."
  value       = azurerm_elastic_san.this.id
}

output "name" {
  description = "Name of the Elastic SAN."
  value       = azurerm_elastic_san.this.name
}

output "total_size_in_tib" {
  description = "Total provisioned capacity (base + extended) in TiB."
  value       = azurerm_elastic_san.this.total_size_in_tib
}

output "total_volume_size_in_gib" {
  description = "Total size of all volumes currently provisioned across the SAN, in GiB."
  value       = azurerm_elastic_san.this.total_volume_size_in_gib
}

output "volume_group_ids" {
  description = "Map of volume group name => volume group resource ID."
  value       = { for k, vg in azurerm_elastic_san_volume_group.this : k => vg.id }
}

output "volume_ids" {
  description = "Map of \"group/volume\" => volume resource ID."
  value       = { for k, v in azurerm_elastic_san_volume.this : k => v.id }
}

output "volume_iscsi_targets" {
  description = "Map of \"group/volume\" => iSCSI target details (target IQN, portal hostname, portal port)."
  value = {
    for k, v in azurerm_elastic_san_volume.this : k => {
      target_iqn           = v.target_iqn
      target_portal_host   = v.target_portal_hostname
      target_portal_port   = v.target_portal_port
      volume_id_for_mount  = v.volume_id
    }
  }
}

output "private_endpoint_ids" {
  description = "Map of volume group name => private endpoint resource ID (only for groups with a PE subnet)."
  value       = { for k, pe in azurerm_private_endpoint.this : k => pe.id }
}

How to use it

module "elastic_san" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-elastic-san?ref=v1.0.0"

  name                = "esan-prod-sql-weu"
  resource_group_name = azurerm_resource_group.storage.name
  location            = "westeurope"

  # 4 TiB of provisioned performance, +20 TiB cheaper capacity, zone-redundant.
  base_size_in_tib     = 4
  extended_size_in_tib = 20
  sku_name             = "Premium_ZRS"
  sku_tier             = "Premium"

  volume_groups = {
    "vg-sqlcluster" = {
      allowed_subnet_ids         = [azurerm_subnet.db.id]
      private_endpoint_subnet_id = azurerm_subnet.privatelink.id
      volumes = {
        "data01" = { size_in_gib = 2048 }
        "data02" = { size_in_gib = 2048 }
        "log01"  = { size_in_gib = 512 }
      }
    }
  }

  tags = {
    environment = "prod"
    workload    = "sql-fci"
    owner       = "platform-team"
  }
}

# Downstream: hand the iSCSI target IQN of the data01 volume to a VM
# extension / cloud-init that runs `iscsiadm` to log in and mount the LUN.
output "sql_data01_iscsi_iqn" {
  value = module.elastic_san.volume_iscsi_targets["vg-sqlcluster/data01"].target_iqn
}

resource "azurerm_role_assignment" "san_reader" {
  scope                = module.elastic_san.id
  role_definition_name = "Reader"
  principal_id         = azuread_group.dba_team.object_id
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/elastic_san/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-elastic-san?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
  base_size_in_tib = 0
  volume_groups = {}
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/elastic_san && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Name of the Elastic SAN (3-63 chars, lowercase alphanumeric + hyphens).
resource_group_name string Yes Resource group that will hold the SAN.
location string Yes Azure region (Elastic SAN is region-limited).
base_size_in_tib number Yes Provisioned base capacity in TiB (carries performance), 1-100.
extended_size_in_tib number 0 No Additional capacity-only TiB (cheaper), 0-900.
sku_name string "Premium_LRS" No Premium_LRS or Premium_ZRS (tier + redundancy).
sku_tier string "Premium" No SKU tier; Premium is supported.
zones list(string) null No Zones for an LRS SAN (e.g. ["1"]); null for ZRS/region default.
volume_groups map(object) Yes Volume groups with per-group network/encryption posture and a map of volumes (size_in_gib, optional create_source_id/create_source_type).
tags map(string) {} No Tags for the SAN and private endpoints.

Outputs

Name Description
id Resource ID of the Elastic SAN.
name Name of the Elastic SAN.
total_size_in_tib Total provisioned capacity (base + extended) in TiB.
total_volume_size_in_gib Total size of all provisioned volumes in GiB.
volume_group_ids Map of volume group name => volume group resource ID.
volume_ids Map of "group/volume" => volume resource ID.
volume_iscsi_targets Map of "group/volume" => iSCSI target details (IQN, portal host, portal port, volume id).
private_endpoint_ids Map of volume group name => private endpoint resource ID.

Enterprise scenario

A financial-services platform team migrates a SQL Server Failover Cluster Instance off a sprawl of 30+ individual P40 managed disks onto a single zone-redundant Elastic SAN. They provision a 4 TiB base + 20 TiB extended Premium_ZRS SAN, expose three volumes (two data, one log) from a vg-sqlcluster volume group locked to the database subnet via a private endpoint, and let both cluster nodes attach the same LUNs over iSCSI. Shared performance across the pool absorbs end-of-month reporting spikes that previously required over-provisioning every disk, and the extended (capacity-only) TiB keeps the cold archive volumes cheap while staying inside the same SAN and the same Terraform state.

Best practices

TerraformAzureElastic SANModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading