IaC Azure

Terraform Module: Azure Synapse Analytics — a governed, private-by-default workspace with pools you can scale on demand

Quick take — A reusable hashicorp/azurerm ~> 4.0 module for Azure Synapse Analytics: ADLS Gen2-backed workspace, optional dedicated/Spark pools, Entra ID admin, managed VNet, and firewall — wired for production. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "synapse" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-synapse?ref=v1.0.0"

  workspace_name      = "..."  # Globally unique workspace name, 1-50 lowercase alphanum…
  resource_group_name = "..."  # Resource group that holds the workspace.
  location            = "..."  # Azure region (e.g. `centralindia`).
  storage_account_id  = "..."  # Resource ID of an existing ADLS Gen2 (HNS-enabled) stor…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Synapse Analytics is Microsoft’s unified analytics platform: it stitches together a serverless SQL endpoint, provisioned dedicated SQL pools (the old SQL Data Warehouse, now Gen2 DWUs), Apache Spark pools, and pipelines/integration runtimes — all anchored on an Azure Data Lake Storage (ADLS) Gen2 filesystem that the workspace treats as its primary storage. The azurerm_synapse_workspace resource is the control-plane object that everything else hangs off: pools, firewall rules, the managed virtual network, Entra ID administrators, and the managed identity used to reach storage and Key Vault.

A bare azurerm_synapse_workspace is deceptively simple to declare but easy to ship insecurely. The defaults leave you reaching for footguns: a workspace with no firewall rules is unreachable, but the common “fix” — an AllowAll 0.0.0.0–255.255.255.255 rule — exposes the serverless and dedicated SQL endpoints to the entire internet. The managed VNet is opt-in. Double-encryption with a customer-managed key (CMK) is opt-in. And the storage account behind it must be a hierarchical-namespace (Gen2) account with the workspace’s managed identity granted Storage Blob Data Contributor, or pool creation and pipeline runs fail at runtime with opaque errors.

This module wraps all of that into one var-driven unit: it provisions (or consumes) the ADLS Gen2 backing store, creates the workspace with a managed VNet and SQL-AAD-only authentication on by default, wires the Entra ID admin, optionally stands up a dedicated SQL pool and a Spark pool, and exposes the connectivity endpoints and managed-identity principal ID as outputs so downstream RBAC and private DNS can be wired without copy-pasting GUIDs.

When to use it

Module structure

terraform-module-azure-synapse/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # ADLS Gen2 FS, workspace, AAD admin, firewall, SQL + Spark pools
├── variables.tf     # var-driven inputs with validations
└── outputs.tf       # ids, endpoints, managed identity principal

versions.tf

terraform {
  required_version = ">= 1.6.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

locals {
  # Workspace names must be globally unique, 1-50 chars, lowercase letters/numbers.
  workspace_name = lower(var.workspace_name)

  tags = merge(
    {
      module    = "terraform-module-azure-synapse"
      managedBy = "terraform"
    },
    var.tags
  )
}

# ---------------------------------------------------------------------------
# ADLS Gen2 filesystem that backs the workspace.
# The container lives in an existing hierarchical-namespace (Gen2) storage
# account whose resource ID is passed in via var.storage_account_id.
# ---------------------------------------------------------------------------
resource "azurerm_storage_data_lake_gen2_filesystem" "this" {
  name               = var.filesystem_name
  storage_account_id = var.storage_account_id
}

# ---------------------------------------------------------------------------
# Synapse workspace
# ---------------------------------------------------------------------------
resource "azurerm_synapse_workspace" "this" {
  name                                 = local.workspace_name
  resource_group_name                  = var.resource_group_name
  location                             = var.location
  storage_data_lake_gen2_filesystem_id = azurerm_storage_data_lake_gen2_filesystem.this.id

  # Secure-by-default posture.
  managed_virtual_network_enabled               = var.managed_virtual_network_enabled
  public_network_access_enabled                 = var.public_network_access_enabled
  sql_identity_control_enabled                  = true
  data_exfiltration_protection_enabled          = var.data_exfiltration_protection_enabled
  managed_resource_group_name                   = var.managed_resource_group_name
  linking_allowed_for_aad_tenant_ids            = var.data_exfiltration_protection_enabled ? var.allowed_aad_tenant_ids : null

  # Entra ID (Azure AD) administrator. SQL-auth admin is intentionally omitted
  # so the workspace is Entra-only unless the caller opts in via local SQL login.
  dynamic "azuread_administrator" {
    for_each = var.aad_administrator != null ? [var.aad_administrator] : []
    content {
      login     = azuread_administrator.value.login
      object_id = azuread_administrator.value.object_id
      tenant_id = azuread_administrator.value.tenant_id
    }
  }

  # Optional customer-managed key for double encryption at rest.
  dynamic "customer_managed_key" {
    for_each = var.cmk_key_versionless_id != null ? [1] : []
    content {
      key_versionless_id = var.cmk_key_versionless_id
      key_name           = "synapsecmk"
    }
  }

  identity {
    type = "SystemAssigned"
  }

  tags = local.tags
}

# ---------------------------------------------------------------------------
# Firewall rules. Default to NONE; the caller passes explicit CIDR-equivalent
# start/end IPs. The special "AllowAllWindowsAzureIps" 0.0.0.0 rule is gated
# behind its own variable so it is a conscious, reviewable decision.
# ---------------------------------------------------------------------------
resource "azurerm_synapse_firewall_rule" "rules" {
  for_each = { for r in var.firewall_rules : r.name => r }

  name                 = each.value.name
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  start_ip_address     = each.value.start_ip_address
  end_ip_address       = each.value.end_ip_address
}

resource "azurerm_synapse_firewall_rule" "allow_azure_services" {
  count = var.allow_azure_services ? 1 : 0

  name                 = "AllowAllWindowsAzureIps"
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  start_ip_address     = "0.0.0.0"
  end_ip_address       = "0.0.0.0"
}

# ---------------------------------------------------------------------------
# Optional dedicated SQL pool (provisioned DW compute, Gen2 DWUs).
# ---------------------------------------------------------------------------
resource "azurerm_synapse_sql_pool" "this" {
  count = var.dedicated_sql_pool != null ? 1 : 0

  name                 = var.dedicated_sql_pool.name
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  sku_name             = var.dedicated_sql_pool.sku_name
  create_mode          = "Default"
  storage_account_type = var.dedicated_sql_pool.storage_account_type
  collation            = var.dedicated_sql_pool.collation

  geo_backup_policy_enabled = var.dedicated_sql_pool.geo_backup_policy_enabled

  tags = local.tags
}

# ---------------------------------------------------------------------------
# Optional Apache Spark pool with autoscale + auto-pause.
# ---------------------------------------------------------------------------
resource "azurerm_synapse_spark_pool" "this" {
  count = var.spark_pool != null ? 1 : 0

  name                 = var.spark_pool.name
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  node_size_family     = var.spark_pool.node_size_family
  node_size            = var.spark_pool.node_size
  spark_version        = var.spark_pool.spark_version
  cache_size           = var.spark_pool.cache_size

  auto_scale {
    min_node_count = var.spark_pool.min_node_count
    max_node_count = var.spark_pool.max_node_count
  }

  auto_pause {
    delay_in_minutes = var.spark_pool.auto_pause_delay_in_minutes
  }

  tags = local.tags
}

variables.tf

variable "workspace_name" {
  description = "Globally unique Synapse workspace name (1-50 chars, lowercase letters and numbers)."
  type        = string

  validation {
    condition     = can(regex("^[a-z0-9]{1,50}$", lower(var.workspace_name)))
    error_message = "workspace_name must be 1-50 lowercase alphanumeric characters."
  }
}

variable "resource_group_name" {
  description = "Resource group that holds the workspace."
  type        = string
}

variable "location" {
  description = "Azure region for the workspace (e.g. centralindia, eastus)."
  type        = string
}

variable "storage_account_id" {
  description = "Resource ID of an existing ADLS Gen2 (hierarchical-namespace) storage account that backs the workspace."
  type        = string

  validation {
    condition     = can(regex("/providers/Microsoft.Storage/storageAccounts/", var.storage_account_id))
    error_message = "storage_account_id must be a Microsoft.Storage storageAccounts resource ID."
  }
}

variable "filesystem_name" {
  description = "Name of the Gen2 filesystem (container) created to back the workspace."
  type        = string
  default     = "synapsefs"

  validation {
    condition     = can(regex("^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$", var.filesystem_name))
    error_message = "filesystem_name must be 3-63 chars, lowercase alphanumeric or hyphen, not starting/ending with a hyphen."
  }
}

variable "managed_resource_group_name" {
  description = "Optional name for the managed resource group Synapse creates for workspace-internal resources. Null lets Azure name it."
  type        = string
  default     = null
}

variable "managed_virtual_network_enabled" {
  description = "Enable the Synapse managed virtual network for network isolation of Spark/integration compute."
  type        = bool
  default     = true
}

variable "public_network_access_enabled" {
  description = "Allow public network access to the workspace endpoints. Set false when using private endpoints only."
  type        = bool
  default     = false
}

variable "data_exfiltration_protection_enabled" {
  description = "Enable data exfiltration protection. When true, outbound connectivity is restricted to allowed AAD tenants."
  type        = bool
  default     = true
}

variable "allowed_aad_tenant_ids" {
  description = "Tenant IDs allowed for outbound linking when data exfiltration protection is enabled. The workspace's own tenant is always included."
  type        = list(string)
  default     = []
}

variable "aad_administrator" {
  description = "Entra ID (Azure AD) administrator for the workspace. Strongly recommended; null disables AAD admin assignment."
  type = object({
    login     = string
    object_id = string
    tenant_id = string
  })
  default = null
}

variable "cmk_key_versionless_id" {
  description = "Versionless Key Vault key ID for customer-managed double encryption at rest. Null uses Microsoft-managed keys."
  type        = string
  default     = null
}

variable "allow_azure_services" {
  description = "Create the 0.0.0.0 'AllowAllWindowsAzureIps' firewall rule so Azure services can reach the workspace. Use deliberately."
  type        = bool
  default     = false
}

variable "firewall_rules" {
  description = "Explicit IP firewall rules for the SQL endpoints."
  type = list(object({
    name             = string
    start_ip_address = string
    end_ip_address   = string
  }))
  default = []
}

variable "dedicated_sql_pool" {
  description = "Optional dedicated SQL pool (provisioned DW). Null skips it. sku_name is a DWU tier such as DW100c..DW30000c."
  type = object({
    name                      = string
    sku_name                  = string
    storage_account_type      = optional(string, "GRS")
    collation                 = optional(string, "SQL_Latin1_General_CP1_CI_AS")
    geo_backup_policy_enabled = optional(bool, true)
  })
  default = null

  validation {
    condition = var.dedicated_sql_pool == null ? true : can(regex("^DW[0-9]+c$", var.dedicated_sql_pool.sku_name))
    error_message = "dedicated_sql_pool.sku_name must be a Gen2 DWU tier like DW100c, DW500c, or DW1000c."
  }

  validation {
    condition = var.dedicated_sql_pool == null ? true : contains(["GRS", "LRS"], var.dedicated_sql_pool.storage_account_type)
    error_message = "dedicated_sql_pool.storage_account_type must be GRS or LRS."
  }
}

variable "spark_pool" {
  description = "Optional Apache Spark pool with autoscale + auto-pause. Null skips it."
  type = object({
    name                        = string
    node_size_family            = optional(string, "MemoryOptimized")
    node_size                   = optional(string, "Small")
    spark_version               = optional(string, "3.4")
    cache_size                  = optional(number, 50)
    min_node_count              = optional(number, 3)
    max_node_count              = optional(number, 10)
    auto_pause_delay_in_minutes = optional(number, 15)
  })
  default = null

  validation {
    condition = var.spark_pool == null ? true : contains(["Small", "Medium", "Large", "XLarge", "XXLarge", "XXXLarge"], var.spark_pool.node_size)
    error_message = "spark_pool.node_size must be one of Small, Medium, Large, XLarge, XXLarge, XXXLarge."
  }

  validation {
    condition = var.spark_pool == null ? true : (var.spark_pool.min_node_count >= 3 && var.spark_pool.max_node_count >= var.spark_pool.min_node_count)
    error_message = "spark_pool.min_node_count must be >= 3 and max_node_count must be >= min_node_count."
  }
}

variable "tags" {
  description = "Tags merged onto all created resources."
  type        = map(string)
  default     = {}
}

outputs.tf

output "workspace_id" {
  description = "Resource ID of the Synapse workspace."
  value       = azurerm_synapse_workspace.this.id
}

output "workspace_name" {
  description = "Name of the Synapse workspace."
  value       = azurerm_synapse_workspace.this.name
}

output "connectivity_endpoints" {
  description = "Map of workspace connectivity endpoints (web, dev, sql, sqlOnDemand, etc.)."
  value       = azurerm_synapse_workspace.this.connectivity_endpoints
}

output "identity_principal_id" {
  description = "Principal ID of the workspace system-assigned managed identity, for RBAC grants (e.g. Storage Blob Data Contributor)."
  value       = azurerm_synapse_workspace.this.identity[0].principal_id
}

output "filesystem_id" {
  description = "Resource ID of the ADLS Gen2 filesystem backing the workspace."
  value       = azurerm_storage_data_lake_gen2_filesystem.this.id
}

output "dedicated_sql_pool_id" {
  description = "Resource ID of the dedicated SQL pool, or null if not created."
  value       = try(azurerm_synapse_sql_pool.this[0].id, null)
}

output "spark_pool_id" {
  description = "Resource ID of the Spark pool, or null if not created."
  value       = try(azurerm_synapse_spark_pool.this[0].id, null)
}

How to use it

data "azurerm_client_config" "current" {}

resource "azurerm_storage_account" "lake" {
  name                     = "stkvanalyticsprod"
  resource_group_name      = "rg-analytics-prod"
  location                 = "centralindia"
  account_tier             = "Standard"
  account_replication_type = "GRS"
  account_kind             = "StorageV2"
  is_hns_enabled           = true # ADLS Gen2 hierarchical namespace — REQUIRED for Synapse
}

module "synapse_analytics" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-synapse?ref=v1.0.0"

  workspace_name      = "synwkloudvinprod"
  resource_group_name = "rg-analytics-prod"
  location            = "centralindia"
  storage_account_id  = azurerm_storage_account.lake.id
  filesystem_name     = "warehouse"

  # Entra-only admin (no SQL login).
  aad_administrator = {
    login     = "sg-synapse-admins"
    object_id = "00000000-0000-0000-0000-000000000000" # the security group's object ID
    tenant_id = data.azurerm_client_config.current.tenant_id
  }

  # Private-by-default; reach it from the corporate egress range only.
  public_network_access_enabled        = false
  managed_virtual_network_enabled      = true
  data_exfiltration_protection_enabled = true
  allowed_aad_tenant_ids               = [data.azurerm_client_config.current.tenant_id]

  firewall_rules = [
    {
      name             = "corp-egress"
      start_ip_address = "203.0.113.0"
      end_ip_address   = "203.0.113.255"
    }
  ]

  dedicated_sql_pool = {
    name     = "edw"
    sku_name = "DW500c"
  }

  spark_pool = {
    name                        = "etl"
    node_size                   = "Medium"
    min_node_count              = 3
    max_node_count              = 12
    auto_pause_delay_in_minutes = 10
  }

  tags = {
    environment = "prod"
    costCenter  = "data-platform"
  }
}

# Downstream: grant the workspace managed identity data-plane access to the lake
# so pipelines and pools can read/write. Uses the module's identity_principal_id output.
resource "azurerm_role_assignment" "synapse_to_lake" {
  scope                = azurerm_storage_account.lake.id
  role_definition_name = "Storage Blob Data Contributor"
  principal_id         = module.synapse_analytics.identity_principal_id
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/synapse/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-synapse?ref=v1.0.0"
}

inputs = {
  workspace_name = "..."
  resource_group_name = "..."
  location = "..."
  storage_account_id = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/synapse && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
workspace_name string Yes Globally unique workspace name, 1-50 lowercase alphanumeric chars.
resource_group_name string Yes Resource group that holds the workspace.
location string Yes Azure region (e.g. centralindia).
storage_account_id string Yes Resource ID of an existing ADLS Gen2 (HNS-enabled) storage account.
filesystem_name string "synapsefs" No Gen2 filesystem/container created to back the workspace (3-63 chars).
managed_resource_group_name string null No Name for the Synapse-managed resource group; null lets Azure name it.
managed_virtual_network_enabled bool true No Enable the Synapse managed VNet for Spark/integration compute isolation.
public_network_access_enabled bool false No Allow public access to workspace endpoints; false for private-endpoint-only.
data_exfiltration_protection_enabled bool true No Restrict outbound connectivity to allowed AAD tenants.
allowed_aad_tenant_ids list(string) [] No Tenant IDs allowed for outbound linking when exfiltration protection is on.
aad_administrator object null No Entra ID admin { login, object_id, tenant_id }; strongly recommended.
cmk_key_versionless_id string null No Versionless Key Vault key ID for customer-managed double encryption.
allow_azure_services bool false No Create the 0.0.0.0 AllowAllWindowsAzureIps firewall rule. Use deliberately.
firewall_rules list(object) [] No Explicit IP firewall rules { name, start_ip_address, end_ip_address }.
dedicated_sql_pool object null No Optional dedicated SQL pool; sku_name is a Gen2 DWU tier (DW100cDW30000c).
spark_pool object null No Optional Apache Spark pool with autoscale + auto-pause.
tags map(string) {} No Tags merged onto all created resources.

Outputs

Name Description
workspace_id Resource ID of the Synapse workspace.
workspace_name Name of the Synapse workspace.
connectivity_endpoints Map of endpoints (web, dev, sql, sqlOnDemand) for clients and tooling.
identity_principal_id Principal ID of the workspace system-assigned managed identity (for RBAC grants).
filesystem_id Resource ID of the ADLS Gen2 filesystem backing the workspace.
dedicated_sql_pool_id Resource ID of the dedicated SQL pool, or null if not created.
spark_pool_id Resource ID of the Spark pool, or null if not created.

Enterprise scenario

A retail data-platform team runs a central lakehouse for finance and merchandising. They deploy this module once per environment from a pipeline: prod gets a DW1000c dedicated SQL pool for the nightly enterprise data warehouse load plus a Medium Spark pool (auto-paused after 10 minutes) for PySpark ETL, while dev runs no dedicated pool and a Small Spark pool capped at five nodes to hold the bill down. Public network access is off in every environment, with reachability limited to the corporate egress CIDR and private endpoints; the module’s identity_principal_id output feeds a Storage Blob Data Contributor grant so analysts’ notebooks and pipelines can read curated zones without anyone hand-managing a service principal.

Best practices

TerraformAzureSynapse AnalyticsModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading