Terraform Module: Azure Synapse Analytics — a governed, private-by-default workspace with pools you can scale on demand

Quick take — A reusable hashicorp/azurerm ~> 4.0 module for Azure Synapse Analytics: ADLS Gen2-backed workspace, optional dedicated/Spark pools, Entra ID admin, managed VNet, and firewall — wired for production. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "synapse" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-synapse?ref=v1.0.0"

  workspace_name      = "..."  # Globally unique workspace name, 1-50 lowercase alphanum…
  resource_group_name = "..."  # Resource group that holds the workspace.
  location            = "..."  # Azure region (e.g. `centralindia`).
  storage_account_id  = "..."  # Resource ID of an existing ADLS Gen2 (HNS-enabled) stor…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Synapse Analytics is Microsoft’s unified analytics platform: it stitches together a serverless SQL endpoint, provisioned dedicated SQL pools (the old SQL Data Warehouse, now Gen2 DWUs), Apache Spark pools, and pipelines/integration runtimes — all anchored on an Azure Data Lake Storage (ADLS) Gen2 filesystem that the workspace treats as its primary storage. The azurerm_synapse_workspace resource is the control-plane object that everything else hangs off: pools, firewall rules, the managed virtual network, Entra ID administrators, and the managed identity used to reach storage and Key Vault.

A bare azurerm_synapse_workspace is deceptively simple to declare but easy to ship insecurely. The defaults leave you reaching for footguns: a workspace with no firewall rules is unreachable, but the common “fix” — an AllowAll 0.0.0.0–255.255.255.255 rule — exposes the serverless and dedicated SQL endpoints to the entire internet. The managed VNet is opt-in. Double-encryption with a customer-managed key (CMK) is opt-in. And the storage account behind it must be a hierarchical-namespace (Gen2) account with the workspace’s managed identity granted Storage Blob Data Contributor, or pool creation and pipeline runs fail at runtime with opaque errors.

This module wraps all of that into one var-driven unit: it provisions (or consumes) the ADLS Gen2 backing store, creates the workspace with a managed VNet and SQL-AAD-only authentication on by default, wires the Entra ID admin, optionally stands up a dedicated SQL pool and a Spark pool, and exposes the connectivity endpoints and managed-identity principal ID as outputs so downstream RBAC and private DNS can be wired without copy-pasting GUIDs.

When to use it

You are building a lakehouse or enterprise data warehouse on Azure and want serverless SQL over your data lake plus optional provisioned dedicated SQL / Spark compute, governed as code.
You need repeatable, compliant environments (dev/test/prod) where SQL-AAD-only auth, managed VNet isolation, and firewall scoping are enforced by default rather than remembered per-deployment.
You want pool lifecycle (dedicated SQL DWxxxc, Spark autoscale + auto-pause) expressed as Terraform variables so cost knobs are reviewable in a pull request.
You are NOT looking for ad-hoc one-off analytics — for a throwaway notebook, Synapse is heavy. Reach for this module when the workspace is a shared, long-lived platform asset.
Skip it if you only need serverless SQL over a single storage account with no provisioned compute and no governance requirements; a thinner setup may suffice.

Module structure

terraform-module-azure-synapse/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # ADLS Gen2 FS, workspace, AAD admin, firewall, SQL + Spark pools
├── variables.tf     # var-driven inputs with validations
└── outputs.tf       # ids, endpoints, managed identity principal

versions.tf

terraform {
  required_version = ">= 1.6.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

locals {
  # Workspace names must be globally unique, 1-50 chars, lowercase letters/numbers.
  workspace_name = lower(var.workspace_name)

  tags = merge(
    {
      module    = "terraform-module-azure-synapse"
      managedBy = "terraform"
    },
    var.tags
  )
}

# ---------------------------------------------------------------------------
# ADLS Gen2 filesystem that backs the workspace.
# The container lives in an existing hierarchical-namespace (Gen2) storage
# account whose resource ID is passed in via var.storage_account_id.
# ---------------------------------------------------------------------------
resource "azurerm_storage_data_lake_gen2_filesystem" "this" {
  name               = var.filesystem_name
  storage_account_id = var.storage_account_id
}

# ---------------------------------------------------------------------------
# Synapse workspace
# ---------------------------------------------------------------------------
resource "azurerm_synapse_workspace" "this" {
  name                                 = local.workspace_name
  resource_group_name                  = var.resource_group_name
  location                             = var.location
  storage_data_lake_gen2_filesystem_id = azurerm_storage_data_lake_gen2_filesystem.this.id

  # Secure-by-default posture.
  managed_virtual_network_enabled               = var.managed_virtual_network_enabled
  public_network_access_enabled                 = var.public_network_access_enabled
  sql_identity_control_enabled                  = true
  data_exfiltration_protection_enabled          = var.data_exfiltration_protection_enabled
  managed_resource_group_name                   = var.managed_resource_group_name
  linking_allowed_for_aad_tenant_ids            = var.data_exfiltration_protection_enabled ? var.allowed_aad_tenant_ids : null

  # Entra ID (Azure AD) administrator. SQL-auth admin is intentionally omitted
  # so the workspace is Entra-only unless the caller opts in via local SQL login.
  dynamic "azuread_administrator" {
    for_each = var.aad_administrator != null ? [var.aad_administrator] : []
    content {
      login     = azuread_administrator.value.login
      object_id = azuread_administrator.value.object_id
      tenant_id = azuread_administrator.value.tenant_id
    }
  }

  # Optional customer-managed key for double encryption at rest.
  dynamic "customer_managed_key" {
    for_each = var.cmk_key_versionless_id != null ? [1] : []
    content {
      key_versionless_id = var.cmk_key_versionless_id
      key_name           = "synapsecmk"
    }
  }

  identity {
    type = "SystemAssigned"
  }

  tags = local.tags
}

# ---------------------------------------------------------------------------
# Firewall rules. Default to NONE; the caller passes explicit CIDR-equivalent
# start/end IPs. The special "AllowAllWindowsAzureIps" 0.0.0.0 rule is gated
# behind its own variable so it is a conscious, reviewable decision.
# ---------------------------------------------------------------------------
resource "azurerm_synapse_firewall_rule" "rules" {
  for_each = { for r in var.firewall_rules : r.name => r }

  name                 = each.value.name
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  start_ip_address     = each.value.start_ip_address
  end_ip_address       = each.value.end_ip_address
}

resource "azurerm_synapse_firewall_rule" "allow_azure_services" {
  count = var.allow_azure_services ? 1 : 0

  name                 = "AllowAllWindowsAzureIps"
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  start_ip_address     = "0.0.0.0"
  end_ip_address       = "0.0.0.0"
}

# ---------------------------------------------------------------------------
# Optional dedicated SQL pool (provisioned DW compute, Gen2 DWUs).
# ---------------------------------------------------------------------------
resource "azurerm_synapse_sql_pool" "this" {
  count = var.dedicated_sql_pool != null ? 1 : 0

  name                 = var.dedicated_sql_pool.name
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  sku_name             = var.dedicated_sql_pool.sku_name
  create_mode          = "Default"
  storage_account_type = var.dedicated_sql_pool.storage_account_type
  collation            = var.dedicated_sql_pool.collation

  geo_backup_policy_enabled = var.dedicated_sql_pool.geo_backup_policy_enabled

  tags = local.tags
}

# ---------------------------------------------------------------------------
# Optional Apache Spark pool with autoscale + auto-pause.
# ---------------------------------------------------------------------------
resource "azurerm_synapse_spark_pool" "this" {
  count = var.spark_pool != null ? 1 : 0

  name                 = var.spark_pool.name
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  node_size_family     = var.spark_pool.node_size_family
  node_size            = var.spark_pool.node_size
  spark_version        = var.spark_pool.spark_version
  cache_size           = var.spark_pool.cache_size

  auto_scale {
    min_node_count = var.spark_pool.min_node_count
    max_node_count = var.spark_pool.max_node_count
  }

  auto_pause {
    delay_in_minutes = var.spark_pool.auto_pause_delay_in_minutes
  }

  tags = local.tags
}

variables.tf

variable "workspace_name" {
  description = "Globally unique Synapse workspace name (1-50 chars, lowercase letters and numbers)."
  type        = string

  validation {
    condition     = can(regex("^[a-z0-9]{1,50}$", lower(var.workspace_name)))
    error_message = "workspace_name must be 1-50 lowercase alphanumeric characters."
  }
}

variable "resource_group_name" {
  description = "Resource group that holds the workspace."
  type        = string
}

variable "location" {
  description = "Azure region for the workspace (e.g. centralindia, eastus)."
  type        = string
}

variable "storage_account_id" {
  description = "Resource ID of an existing ADLS Gen2 (hierarchical-namespace) storage account that backs the workspace."
  type        = string

  validation {
    condition     = can(regex("/providers/Microsoft.Storage/storageAccounts/", var.storage_account_id))
    error_message = "storage_account_id must be a Microsoft.Storage storageAccounts resource ID."
  }
}

variable "filesystem_name" {
  description = "Name of the Gen2 filesystem (container) created to back the workspace."
  type        = string
  default     = "synapsefs"

  validation {
    condition     = can(regex("^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$", var.filesystem_name))
    error_message = "filesystem_name must be 3-63 chars, lowercase alphanumeric or hyphen, not starting/ending with a hyphen."
  }
}

variable "managed_resource_group_name" {
  description = "Optional name for the managed resource group Synapse creates for workspace-internal resources. Null lets Azure name it."
  type        = string
  default     = null
}

variable "managed_virtual_network_enabled" {
  description = "Enable the Synapse managed virtual network for network isolation of Spark/integration compute."
  type        = bool
  default     = true
}

variable "public_network_access_enabled" {
  description = "Allow public network access to the workspace endpoints. Set false when using private endpoints only."
  type        = bool
  default     = false
}

variable "data_exfiltration_protection_enabled" {
  description = "Enable data exfiltration protection. When true, outbound connectivity is restricted to allowed AAD tenants."
  type        = bool
  default     = true
}

variable "allowed_aad_tenant_ids" {
  description = "Tenant IDs allowed for outbound linking when data exfiltration protection is enabled. The workspace's own tenant is always included."
  type        = list(string)
  default     = []
}

variable "aad_administrator" {
  description = "Entra ID (Azure AD) administrator for the workspace. Strongly recommended; null disables AAD admin assignment."
  type = object({
    login     = string
    object_id = string
    tenant_id = string
  })
  default = null
}

variable "cmk_key_versionless_id" {
  description = "Versionless Key Vault key ID for customer-managed double encryption at rest. Null uses Microsoft-managed keys."
  type        = string
  default     = null
}

variable "allow_azure_services" {
  description = "Create the 0.0.0.0 'AllowAllWindowsAzureIps' firewall rule so Azure services can reach the workspace. Use deliberately."
  type        = bool
  default     = false
}

variable "firewall_rules" {
  description = "Explicit IP firewall rules for the SQL endpoints."
  type = list(object({
    name             = string
    start_ip_address = string
    end_ip_address   = string
  }))
  default = []
}

variable "dedicated_sql_pool" {
  description = "Optional dedicated SQL pool (provisioned DW). Null skips it. sku_name is a DWU tier such as DW100c..DW30000c."
  type = object({
    name                      = string
    sku_name                  = string
    storage_account_type      = optional(string, "GRS")
    collation                 = optional(string, "SQL_Latin1_General_CP1_CI_AS")
    geo_backup_policy_enabled = optional(bool, true)
  })
  default = null

  validation {
    condition = var.dedicated_sql_pool == null ? true : can(regex("^DW[0-9]+c$", var.dedicated_sql_pool.sku_name))
    error_message = "dedicated_sql_pool.sku_name must be a Gen2 DWU tier like DW100c, DW500c, or DW1000c."
  }

  validation {
    condition = var.dedicated_sql_pool == null ? true : contains(["GRS", "LRS"], var.dedicated_sql_pool.storage_account_type)
    error_message = "dedicated_sql_pool.storage_account_type must be GRS or LRS."
  }
}

variable "spark_pool" {
  description = "Optional Apache Spark pool with autoscale + auto-pause. Null skips it."
  type = object({
    name                        = string
    node_size_family            = optional(string, "MemoryOptimized")
    node_size                   = optional(string, "Small")
    spark_version               = optional(string, "3.4")
    cache_size                  = optional(number, 50)
    min_node_count              = optional(number, 3)
    max_node_count              = optional(number, 10)
    auto_pause_delay_in_minutes = optional(number, 15)
  })
  default = null

  validation {
    condition = var.spark_pool == null ? true : contains(["Small", "Medium", "Large", "XLarge", "XXLarge", "XXXLarge"], var.spark_pool.node_size)
    error_message = "spark_pool.node_size must be one of Small, Medium, Large, XLarge, XXLarge, XXXLarge."
  }

  validation {
    condition = var.spark_pool == null ? true : (var.spark_pool.min_node_count >= 3 && var.spark_pool.max_node_count >= var.spark_pool.min_node_count)
    error_message = "spark_pool.min_node_count must be >= 3 and max_node_count must be >= min_node_count."
  }
}

variable "tags" {
  description = "Tags merged onto all created resources."
  type        = map(string)
  default     = {}
}

outputs.tf

output "workspace_id" {
  description = "Resource ID of the Synapse workspace."
  value       = azurerm_synapse_workspace.this.id
}

output "workspace_name" {
  description = "Name of the Synapse workspace."
  value       = azurerm_synapse_workspace.this.name
}

output "connectivity_endpoints" {
  description = "Map of workspace connectivity endpoints (web, dev, sql, sqlOnDemand, etc.)."
  value       = azurerm_synapse_workspace.this.connectivity_endpoints
}

output "identity_principal_id" {
  description = "Principal ID of the workspace system-assigned managed identity, for RBAC grants (e.g. Storage Blob Data Contributor)."
  value       = azurerm_synapse_workspace.this.identity[0].principal_id
}

output "filesystem_id" {
  description = "Resource ID of the ADLS Gen2 filesystem backing the workspace."
  value       = azurerm_storage_data_lake_gen2_filesystem.this.id
}

output "dedicated_sql_pool_id" {
  description = "Resource ID of the dedicated SQL pool, or null if not created."
  value       = try(azurerm_synapse_sql_pool.this[0].id, null)
}

output "spark_pool_id" {
  description = "Resource ID of the Spark pool, or null if not created."
  value       = try(azurerm_synapse_spark_pool.this[0].id, null)
}

How to use it

data "azurerm_client_config" "current" {}

resource "azurerm_storage_account" "lake" {
  name                     = "stkvanalyticsprod"
  resource_group_name      = "rg-analytics-prod"
  location                 = "centralindia"
  account_tier             = "Standard"
  account_replication_type = "GRS"
  account_kind             = "StorageV2"
  is_hns_enabled           = true # ADLS Gen2 hierarchical namespace — REQUIRED for Synapse
}

module "synapse_analytics" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-synapse?ref=v1.0.0"

  workspace_name      = "synwkloudvinprod"
  resource_group_name = "rg-analytics-prod"
  location            = "centralindia"
  storage_account_id  = azurerm_storage_account.lake.id
  filesystem_name     = "warehouse"

  # Entra-only admin (no SQL login).
  aad_administrator = {
    login     = "sg-synapse-admins"
    object_id = "00000000-0000-0000-0000-000000000000" # the security group's object ID
    tenant_id = data.azurerm_client_config.current.tenant_id
  }

  # Private-by-default; reach it from the corporate egress range only.
  public_network_access_enabled        = false
  managed_virtual_network_enabled      = true
  data_exfiltration_protection_enabled = true
  allowed_aad_tenant_ids               = [data.azurerm_client_config.current.tenant_id]

  firewall_rules = [
    {
      name             = "corp-egress"
      start_ip_address = "203.0.113.0"
      end_ip_address   = "203.0.113.255"
    }
  ]

  dedicated_sql_pool = {
    name     = "edw"
    sku_name = "DW500c"
  }

  spark_pool = {
    name                        = "etl"
    node_size                   = "Medium"
    min_node_count              = 3
    max_node_count              = 12
    auto_pause_delay_in_minutes = 10
  }

  tags = {
    environment = "prod"
    costCenter  = "data-platform"
  }
}

# Downstream: grant the workspace managed identity data-plane access to the lake
# so pipelines and pools can read/write. Uses the module's identity_principal_id output.
resource "azurerm_role_assignment" "synapse_to_lake" {
  scope                = azurerm_storage_account.lake.id
  role_definition_name = "Storage Blob Data Contributor"
  principal_id         = module.synapse_analytics.identity_principal_id
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module config — live/prod/synapse/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-synapse?ref=v1.0.0"
}

inputs = {
  workspace_name = "..."
  resource_group_name = "..."
  location = "..."
  storage_account_id = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/synapse && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
workspace_name	string	—	Yes	Globally unique workspace name, 1-50 lowercase alphanumeric chars.
resource_group_name	string	—	Yes	Resource group that holds the workspace.
location	string	—	Yes	Azure region (e.g. `centralindia`).
storage_account_id	string	—	Yes	Resource ID of an existing ADLS Gen2 (HNS-enabled) storage account.
filesystem_name	string	`"synapsefs"`	No	Gen2 filesystem/container created to back the workspace (3-63 chars).
managed_resource_group_name	string	`null`	No	Name for the Synapse-managed resource group; null lets Azure name it.
managed_virtual_network_enabled	bool	`true`	No	Enable the Synapse managed VNet for Spark/integration compute isolation.
public_network_access_enabled	bool	`false`	No	Allow public access to workspace endpoints; false for private-endpoint-only.
data_exfiltration_protection_enabled	bool	`true`	No	Restrict outbound connectivity to allowed AAD tenants.
allowed_aad_tenant_ids	list(string)	`[]`	No	Tenant IDs allowed for outbound linking when exfiltration protection is on.
aad_administrator	object	`null`	No	Entra ID admin `{ login, object_id, tenant_id }`; strongly recommended.
cmk_key_versionless_id	string	`null`	No	Versionless Key Vault key ID for customer-managed double encryption.
allow_azure_services	bool	`false`	No	Create the 0.0.0.0 `AllowAllWindowsAzureIps` firewall rule. Use deliberately.
firewall_rules	list(object)	`[]`	No	Explicit IP firewall rules `{ name, start_ip_address, end_ip_address }`.
dedicated_sql_pool	object	`null`	No	Optional dedicated SQL pool; `sku_name` is a Gen2 DWU tier (`DW100c`–`DW30000c`).
spark_pool	object	`null`	No	Optional Apache Spark pool with autoscale + auto-pause.
tags	map(string)	`{}`	No	Tags merged onto all created resources.

Outputs

Name	Description
workspace_id	Resource ID of the Synapse workspace.
workspace_name	Name of the Synapse workspace.
connectivity_endpoints	Map of endpoints (web, dev, sql, sqlOnDemand) for clients and tooling.
identity_principal_id	Principal ID of the workspace system-assigned managed identity (for RBAC grants).
filesystem_id	Resource ID of the ADLS Gen2 filesystem backing the workspace.
dedicated_sql_pool_id	Resource ID of the dedicated SQL pool, or null if not created.
spark_pool_id	Resource ID of the Spark pool, or null if not created.

Enterprise scenario

A retail data-platform team runs a central lakehouse for finance and merchandising. They deploy this module once per environment from a pipeline: prod gets a DW1000c dedicated SQL pool for the nightly enterprise data warehouse load plus a Medium Spark pool (auto-paused after 10 minutes) for PySpark ETL, while dev runs no dedicated pool and a Small Spark pool capped at five nodes to hold the bill down. Public network access is off in every environment, with reachability limited to the corporate egress CIDR and private endpoints; the module’s identity_principal_id output feeds a Storage Blob Data Contributor grant so analysts’ notebooks and pipelines can read curated zones without anyone hand-managing a service principal.

Best practices

Keep public_network_access_enabled = false and use private endpoints for the Sql, SqlOnDemand, and Dev sub-resources; treat firewall rules as a break-glass path scoped to a known egress CIDR, never an AllowAll 0.0.0.0–255.255.255.255 range. Enabling data_exfiltration_protection_enabled with an explicit allowed_aad_tenant_ids list blocks data egress to foreign tenants.
Pause or right-size compute aggressively — dedicated SQL pools bill by provisioned DWUs whether queried or not, so pause them outside load windows (or run serverless SQL where you pay per TB scanned), and always set a short Spark auto_pause_delay_in_minutes (10–15) so idle clusters spin down.
Make the workspace Entra-only: this module omits a SQL-auth administrator on purpose. Assign aad_administrator to a security group, not an individual, and grant pool/data access through Entra ID role assignments so there are no static SQL passwords to rotate.
The backing storage must be ADLS Gen2 (is_hns_enabled = true) and the workspace managed identity needs Storage Blob Data Contributor on it — wire that with the identity_principal_id output rather than over-broad subscription roles.
Enable customer-managed keys via cmk_key_versionless_id for regulated data, and grant the workspace identity Get/Unwrap Key/Wrap Key on the Key Vault; pin the key as versionless so rotation does not force workspace recreation.
Standardize naming and tagging — workspace names are globally unique and immutable, so bake env + workload into them (synwkloudvinprod), and rely on the module’s merged tags (costCenter, environment) to make per-pool spend attributable in Cost Management.