Quick take — A reusable hashicorp/azurerm ~> 4.0 module for Azure Synapse Analytics: ADLS Gen2-backed workspace, optional dedicated/Spark pools, Entra ID admin, managed VNet, and firewall — wired for production. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "synapse" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-synapse?ref=v1.0.0"
workspace_name = "..." # Globally unique workspace name, 1-50 lowercase alphanum…
resource_group_name = "..." # Resource group that holds the workspace.
location = "..." # Azure region (e.g. `centralindia`).
storage_account_id = "..." # Resource ID of an existing ADLS Gen2 (HNS-enabled) stor…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Azure Synapse Analytics is Microsoft’s unified analytics platform: it stitches together a serverless SQL endpoint, provisioned dedicated SQL pools (the old SQL Data Warehouse, now Gen2 DWUs), Apache Spark pools, and pipelines/integration runtimes — all anchored on an Azure Data Lake Storage (ADLS) Gen2 filesystem that the workspace treats as its primary storage. The azurerm_synapse_workspace resource is the control-plane object that everything else hangs off: pools, firewall rules, the managed virtual network, Entra ID administrators, and the managed identity used to reach storage and Key Vault.
A bare azurerm_synapse_workspace is deceptively simple to declare but easy to ship insecurely. The defaults leave you reaching for footguns: a workspace with no firewall rules is unreachable, but the common “fix” — an AllowAll 0.0.0.0–255.255.255.255 rule — exposes the serverless and dedicated SQL endpoints to the entire internet. The managed VNet is opt-in. Double-encryption with a customer-managed key (CMK) is opt-in. And the storage account behind it must be a hierarchical-namespace (Gen2) account with the workspace’s managed identity granted Storage Blob Data Contributor, or pool creation and pipeline runs fail at runtime with opaque errors.
This module wraps all of that into one var-driven unit: it provisions (or consumes) the ADLS Gen2 backing store, creates the workspace with a managed VNet and SQL-AAD-only authentication on by default, wires the Entra ID admin, optionally stands up a dedicated SQL pool and a Spark pool, and exposes the connectivity endpoints and managed-identity principal ID as outputs so downstream RBAC and private DNS can be wired without copy-pasting GUIDs.
When to use it
- You are building a lakehouse or enterprise data warehouse on Azure and want serverless SQL over your data lake plus optional provisioned dedicated SQL / Spark compute, governed as code.
- You need repeatable, compliant environments (dev/test/prod) where SQL-AAD-only auth, managed VNet isolation, and firewall scoping are enforced by default rather than remembered per-deployment.
- You want pool lifecycle (dedicated SQL
DWxxxc, Spark autoscale + auto-pause) expressed as Terraform variables so cost knobs are reviewable in a pull request. - You are NOT looking for ad-hoc one-off analytics — for a throwaway notebook, Synapse is heavy. Reach for this module when the workspace is a shared, long-lived platform asset.
- Skip it if you only need serverless SQL over a single storage account with no provisioned compute and no governance requirements; a thinner setup may suffice.
Module structure
terraform-module-azure-synapse/
├── versions.tf # provider + Terraform version pins
├── main.tf # ADLS Gen2 FS, workspace, AAD admin, firewall, SQL + Spark pools
├── variables.tf # var-driven inputs with validations
└── outputs.tf # ids, endpoints, managed identity principal
versions.tf
terraform {
required_version = ">= 1.6.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
main.tf
locals {
# Workspace names must be globally unique, 1-50 chars, lowercase letters/numbers.
workspace_name = lower(var.workspace_name)
tags = merge(
{
module = "terraform-module-azure-synapse"
managedBy = "terraform"
},
var.tags
)
}
# ---------------------------------------------------------------------------
# ADLS Gen2 filesystem that backs the workspace.
# The container lives in an existing hierarchical-namespace (Gen2) storage
# account whose resource ID is passed in via var.storage_account_id.
# ---------------------------------------------------------------------------
resource "azurerm_storage_data_lake_gen2_filesystem" "this" {
name = var.filesystem_name
storage_account_id = var.storage_account_id
}
# ---------------------------------------------------------------------------
# Synapse workspace
# ---------------------------------------------------------------------------
resource "azurerm_synapse_workspace" "this" {
name = local.workspace_name
resource_group_name = var.resource_group_name
location = var.location
storage_data_lake_gen2_filesystem_id = azurerm_storage_data_lake_gen2_filesystem.this.id
# Secure-by-default posture.
managed_virtual_network_enabled = var.managed_virtual_network_enabled
public_network_access_enabled = var.public_network_access_enabled
sql_identity_control_enabled = true
data_exfiltration_protection_enabled = var.data_exfiltration_protection_enabled
managed_resource_group_name = var.managed_resource_group_name
linking_allowed_for_aad_tenant_ids = var.data_exfiltration_protection_enabled ? var.allowed_aad_tenant_ids : null
# Entra ID (Azure AD) administrator. SQL-auth admin is intentionally omitted
# so the workspace is Entra-only unless the caller opts in via local SQL login.
dynamic "azuread_administrator" {
for_each = var.aad_administrator != null ? [var.aad_administrator] : []
content {
login = azuread_administrator.value.login
object_id = azuread_administrator.value.object_id
tenant_id = azuread_administrator.value.tenant_id
}
}
# Optional customer-managed key for double encryption at rest.
dynamic "customer_managed_key" {
for_each = var.cmk_key_versionless_id != null ? [1] : []
content {
key_versionless_id = var.cmk_key_versionless_id
key_name = "synapsecmk"
}
}
identity {
type = "SystemAssigned"
}
tags = local.tags
}
# ---------------------------------------------------------------------------
# Firewall rules. Default to NONE; the caller passes explicit CIDR-equivalent
# start/end IPs. The special "AllowAllWindowsAzureIps" 0.0.0.0 rule is gated
# behind its own variable so it is a conscious, reviewable decision.
# ---------------------------------------------------------------------------
resource "azurerm_synapse_firewall_rule" "rules" {
for_each = { for r in var.firewall_rules : r.name => r }
name = each.value.name
synapse_workspace_id = azurerm_synapse_workspace.this.id
start_ip_address = each.value.start_ip_address
end_ip_address = each.value.end_ip_address
}
resource "azurerm_synapse_firewall_rule" "allow_azure_services" {
count = var.allow_azure_services ? 1 : 0
name = "AllowAllWindowsAzureIps"
synapse_workspace_id = azurerm_synapse_workspace.this.id
start_ip_address = "0.0.0.0"
end_ip_address = "0.0.0.0"
}
# ---------------------------------------------------------------------------
# Optional dedicated SQL pool (provisioned DW compute, Gen2 DWUs).
# ---------------------------------------------------------------------------
resource "azurerm_synapse_sql_pool" "this" {
count = var.dedicated_sql_pool != null ? 1 : 0
name = var.dedicated_sql_pool.name
synapse_workspace_id = azurerm_synapse_workspace.this.id
sku_name = var.dedicated_sql_pool.sku_name
create_mode = "Default"
storage_account_type = var.dedicated_sql_pool.storage_account_type
collation = var.dedicated_sql_pool.collation
geo_backup_policy_enabled = var.dedicated_sql_pool.geo_backup_policy_enabled
tags = local.tags
}
# ---------------------------------------------------------------------------
# Optional Apache Spark pool with autoscale + auto-pause.
# ---------------------------------------------------------------------------
resource "azurerm_synapse_spark_pool" "this" {
count = var.spark_pool != null ? 1 : 0
name = var.spark_pool.name
synapse_workspace_id = azurerm_synapse_workspace.this.id
node_size_family = var.spark_pool.node_size_family
node_size = var.spark_pool.node_size
spark_version = var.spark_pool.spark_version
cache_size = var.spark_pool.cache_size
auto_scale {
min_node_count = var.spark_pool.min_node_count
max_node_count = var.spark_pool.max_node_count
}
auto_pause {
delay_in_minutes = var.spark_pool.auto_pause_delay_in_minutes
}
tags = local.tags
}
variables.tf
variable "workspace_name" {
description = "Globally unique Synapse workspace name (1-50 chars, lowercase letters and numbers)."
type = string
validation {
condition = can(regex("^[a-z0-9]{1,50}$", lower(var.workspace_name)))
error_message = "workspace_name must be 1-50 lowercase alphanumeric characters."
}
}
variable "resource_group_name" {
description = "Resource group that holds the workspace."
type = string
}
variable "location" {
description = "Azure region for the workspace (e.g. centralindia, eastus)."
type = string
}
variable "storage_account_id" {
description = "Resource ID of an existing ADLS Gen2 (hierarchical-namespace) storage account that backs the workspace."
type = string
validation {
condition = can(regex("/providers/Microsoft.Storage/storageAccounts/", var.storage_account_id))
error_message = "storage_account_id must be a Microsoft.Storage storageAccounts resource ID."
}
}
variable "filesystem_name" {
description = "Name of the Gen2 filesystem (container) created to back the workspace."
type = string
default = "synapsefs"
validation {
condition = can(regex("^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$", var.filesystem_name))
error_message = "filesystem_name must be 3-63 chars, lowercase alphanumeric or hyphen, not starting/ending with a hyphen."
}
}
variable "managed_resource_group_name" {
description = "Optional name for the managed resource group Synapse creates for workspace-internal resources. Null lets Azure name it."
type = string
default = null
}
variable "managed_virtual_network_enabled" {
description = "Enable the Synapse managed virtual network for network isolation of Spark/integration compute."
type = bool
default = true
}
variable "public_network_access_enabled" {
description = "Allow public network access to the workspace endpoints. Set false when using private endpoints only."
type = bool
default = false
}
variable "data_exfiltration_protection_enabled" {
description = "Enable data exfiltration protection. When true, outbound connectivity is restricted to allowed AAD tenants."
type = bool
default = true
}
variable "allowed_aad_tenant_ids" {
description = "Tenant IDs allowed for outbound linking when data exfiltration protection is enabled. The workspace's own tenant is always included."
type = list(string)
default = []
}
variable "aad_administrator" {
description = "Entra ID (Azure AD) administrator for the workspace. Strongly recommended; null disables AAD admin assignment."
type = object({
login = string
object_id = string
tenant_id = string
})
default = null
}
variable "cmk_key_versionless_id" {
description = "Versionless Key Vault key ID for customer-managed double encryption at rest. Null uses Microsoft-managed keys."
type = string
default = null
}
variable "allow_azure_services" {
description = "Create the 0.0.0.0 'AllowAllWindowsAzureIps' firewall rule so Azure services can reach the workspace. Use deliberately."
type = bool
default = false
}
variable "firewall_rules" {
description = "Explicit IP firewall rules for the SQL endpoints."
type = list(object({
name = string
start_ip_address = string
end_ip_address = string
}))
default = []
}
variable "dedicated_sql_pool" {
description = "Optional dedicated SQL pool (provisioned DW). Null skips it. sku_name is a DWU tier such as DW100c..DW30000c."
type = object({
name = string
sku_name = string
storage_account_type = optional(string, "GRS")
collation = optional(string, "SQL_Latin1_General_CP1_CI_AS")
geo_backup_policy_enabled = optional(bool, true)
})
default = null
validation {
condition = var.dedicated_sql_pool == null ? true : can(regex("^DW[0-9]+c$", var.dedicated_sql_pool.sku_name))
error_message = "dedicated_sql_pool.sku_name must be a Gen2 DWU tier like DW100c, DW500c, or DW1000c."
}
validation {
condition = var.dedicated_sql_pool == null ? true : contains(["GRS", "LRS"], var.dedicated_sql_pool.storage_account_type)
error_message = "dedicated_sql_pool.storage_account_type must be GRS or LRS."
}
}
variable "spark_pool" {
description = "Optional Apache Spark pool with autoscale + auto-pause. Null skips it."
type = object({
name = string
node_size_family = optional(string, "MemoryOptimized")
node_size = optional(string, "Small")
spark_version = optional(string, "3.4")
cache_size = optional(number, 50)
min_node_count = optional(number, 3)
max_node_count = optional(number, 10)
auto_pause_delay_in_minutes = optional(number, 15)
})
default = null
validation {
condition = var.spark_pool == null ? true : contains(["Small", "Medium", "Large", "XLarge", "XXLarge", "XXXLarge"], var.spark_pool.node_size)
error_message = "spark_pool.node_size must be one of Small, Medium, Large, XLarge, XXLarge, XXXLarge."
}
validation {
condition = var.spark_pool == null ? true : (var.spark_pool.min_node_count >= 3 && var.spark_pool.max_node_count >= var.spark_pool.min_node_count)
error_message = "spark_pool.min_node_count must be >= 3 and max_node_count must be >= min_node_count."
}
}
variable "tags" {
description = "Tags merged onto all created resources."
type = map(string)
default = {}
}
outputs.tf
output "workspace_id" {
description = "Resource ID of the Synapse workspace."
value = azurerm_synapse_workspace.this.id
}
output "workspace_name" {
description = "Name of the Synapse workspace."
value = azurerm_synapse_workspace.this.name
}
output "connectivity_endpoints" {
description = "Map of workspace connectivity endpoints (web, dev, sql, sqlOnDemand, etc.)."
value = azurerm_synapse_workspace.this.connectivity_endpoints
}
output "identity_principal_id" {
description = "Principal ID of the workspace system-assigned managed identity, for RBAC grants (e.g. Storage Blob Data Contributor)."
value = azurerm_synapse_workspace.this.identity[0].principal_id
}
output "filesystem_id" {
description = "Resource ID of the ADLS Gen2 filesystem backing the workspace."
value = azurerm_storage_data_lake_gen2_filesystem.this.id
}
output "dedicated_sql_pool_id" {
description = "Resource ID of the dedicated SQL pool, or null if not created."
value = try(azurerm_synapse_sql_pool.this[0].id, null)
}
output "spark_pool_id" {
description = "Resource ID of the Spark pool, or null if not created."
value = try(azurerm_synapse_spark_pool.this[0].id, null)
}
How to use it
data "azurerm_client_config" "current" {}
resource "azurerm_storage_account" "lake" {
name = "stkvanalyticsprod"
resource_group_name = "rg-analytics-prod"
location = "centralindia"
account_tier = "Standard"
account_replication_type = "GRS"
account_kind = "StorageV2"
is_hns_enabled = true # ADLS Gen2 hierarchical namespace — REQUIRED for Synapse
}
module "synapse_analytics" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-synapse?ref=v1.0.0"
workspace_name = "synwkloudvinprod"
resource_group_name = "rg-analytics-prod"
location = "centralindia"
storage_account_id = azurerm_storage_account.lake.id
filesystem_name = "warehouse"
# Entra-only admin (no SQL login).
aad_administrator = {
login = "sg-synapse-admins"
object_id = "00000000-0000-0000-0000-000000000000" # the security group's object ID
tenant_id = data.azurerm_client_config.current.tenant_id
}
# Private-by-default; reach it from the corporate egress range only.
public_network_access_enabled = false
managed_virtual_network_enabled = true
data_exfiltration_protection_enabled = true
allowed_aad_tenant_ids = [data.azurerm_client_config.current.tenant_id]
firewall_rules = [
{
name = "corp-egress"
start_ip_address = "203.0.113.0"
end_ip_address = "203.0.113.255"
}
]
dedicated_sql_pool = {
name = "edw"
sku_name = "DW500c"
}
spark_pool = {
name = "etl"
node_size = "Medium"
min_node_count = 3
max_node_count = 12
auto_pause_delay_in_minutes = 10
}
tags = {
environment = "prod"
costCenter = "data-platform"
}
}
# Downstream: grant the workspace managed identity data-plane access to the lake
# so pipelines and pools can read/write. Uses the module's identity_principal_id output.
resource "azurerm_role_assignment" "synapse_to_lake" {
scope = azurerm_storage_account.lake.id
role_definition_name = "Storage Blob Data Contributor"
principal_id = module.synapse_analytics.identity_principal_id
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/synapse/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-synapse?ref=v1.0.0"
}
inputs = {
workspace_name = "..."
resource_group_name = "..."
location = "..."
storage_account_id = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/synapse && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
| workspace_name | string | — | Yes | Globally unique workspace name, 1-50 lowercase alphanumeric chars. |
| resource_group_name | string | — | Yes | Resource group that holds the workspace. |
| location | string | — | Yes | Azure region (e.g. centralindia). |
| storage_account_id | string | — | Yes | Resource ID of an existing ADLS Gen2 (HNS-enabled) storage account. |
| filesystem_name | string | "synapsefs" |
No | Gen2 filesystem/container created to back the workspace (3-63 chars). |
| managed_resource_group_name | string | null |
No | Name for the Synapse-managed resource group; null lets Azure name it. |
| managed_virtual_network_enabled | bool | true |
No | Enable the Synapse managed VNet for Spark/integration compute isolation. |
| public_network_access_enabled | bool | false |
No | Allow public access to workspace endpoints; false for private-endpoint-only. |
| data_exfiltration_protection_enabled | bool | true |
No | Restrict outbound connectivity to allowed AAD tenants. |
| allowed_aad_tenant_ids | list(string) | [] |
No | Tenant IDs allowed for outbound linking when exfiltration protection is on. |
| aad_administrator | object | null |
No | Entra ID admin { login, object_id, tenant_id }; strongly recommended. |
| cmk_key_versionless_id | string | null |
No | Versionless Key Vault key ID for customer-managed double encryption. |
| allow_azure_services | bool | false |
No | Create the 0.0.0.0 AllowAllWindowsAzureIps firewall rule. Use deliberately. |
| firewall_rules | list(object) | [] |
No | Explicit IP firewall rules { name, start_ip_address, end_ip_address }. |
| dedicated_sql_pool | object | null |
No | Optional dedicated SQL pool; sku_name is a Gen2 DWU tier (DW100c–DW30000c). |
| spark_pool | object | null |
No | Optional Apache Spark pool with autoscale + auto-pause. |
| tags | map(string) | {} |
No | Tags merged onto all created resources. |
Outputs
| Name | Description |
|---|---|
| workspace_id | Resource ID of the Synapse workspace. |
| workspace_name | Name of the Synapse workspace. |
| connectivity_endpoints | Map of endpoints (web, dev, sql, sqlOnDemand) for clients and tooling. |
| identity_principal_id | Principal ID of the workspace system-assigned managed identity (for RBAC grants). |
| filesystem_id | Resource ID of the ADLS Gen2 filesystem backing the workspace. |
| dedicated_sql_pool_id | Resource ID of the dedicated SQL pool, or null if not created. |
| spark_pool_id | Resource ID of the Spark pool, or null if not created. |
Enterprise scenario
A retail data-platform team runs a central lakehouse for finance and merchandising. They deploy this module once per environment from a pipeline: prod gets a DW1000c dedicated SQL pool for the nightly enterprise data warehouse load plus a Medium Spark pool (auto-paused after 10 minutes) for PySpark ETL, while dev runs no dedicated pool and a Small Spark pool capped at five nodes to hold the bill down. Public network access is off in every environment, with reachability limited to the corporate egress CIDR and private endpoints; the module’s identity_principal_id output feeds a Storage Blob Data Contributor grant so analysts’ notebooks and pipelines can read curated zones without anyone hand-managing a service principal.
Best practices
- Keep
public_network_access_enabled = falseand use private endpoints for theSql,SqlOnDemand, andDevsub-resources; treat firewall rules as a break-glass path scoped to a known egress CIDR, never anAllowAll0.0.0.0–255.255.255.255 range. Enablingdata_exfiltration_protection_enabledwith an explicitallowed_aad_tenant_idslist blocks data egress to foreign tenants. - Pause or right-size compute aggressively — dedicated SQL pools bill by provisioned DWUs whether queried or not, so pause them outside load windows (or run serverless SQL where you pay per TB scanned), and always set a short Spark
auto_pause_delay_in_minutes(10–15) so idle clusters spin down. - Make the workspace Entra-only: this module omits a SQL-auth administrator on purpose. Assign
aad_administratorto a security group, not an individual, and grant pool/data access through Entra ID role assignments so there are no static SQL passwords to rotate. - The backing storage must be ADLS Gen2 (
is_hns_enabled = true) and the workspace managed identity needsStorage Blob Data Contributoron it — wire that with theidentity_principal_idoutput rather than over-broad subscription roles. - Enable customer-managed keys via
cmk_key_versionless_idfor regulated data, and grant the workspace identityGet/Unwrap Key/Wrap Keyon the Key Vault; pin the key as versionless so rotation does not force workspace recreation. - Standardize naming and tagging — workspace names are globally unique and immutable, so bake
env+ workload into them (synwkloudvinprod), and rely on the module’s mergedtags(costCenter, environment) to make per-pool spend attributable in Cost Management.