Quick take — A production Terraform module for azurerm_databricks_workspace: VNet injection, no-public-IP, CMK for managed services and DBFS, and Unity Catalog access connector — fully var-driven for azurerm ~> 4.0. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "databricks" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-databricks?ref=v1.0.0"
name = "..." # Workspace name; 3-64 chars, alphanumerics/hyphens/under…
resource_group_name = "..." # Existing resource group to hold the workspace.
location = "..." # Azure region (e.g. `centralindia`).
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
An Azure Databricks Workspace is the managed analytics control plane that Microsoft and Databricks jointly operate on Azure. When you create the azurerm_databricks_workspace resource, Azure provisions a managed resource group into your subscription that holds the workspace’s data-plane plumbing — the worker/driver VMs, a managed VNet (unless you inject your own), storage for DBFS, and the network security groups that wire it all together. You get a Databricks UI, clusters, SQL warehouses, jobs, and Delta Lake, all billed through your Azure subscription via Databricks Units (DBUs).
The catch is that almost every interesting production setting lives in nested blocks and a free-form custom_parameters map, not in tidy top-level arguments: VNet injection, Secure Cluster Connectivity (no public IP), customer-managed keys for both managed services and the DBFS root, NAT-gateway egress, and the storage account name are all easy to get subtly wrong. Worse, several of them are immutable — set no_public_ip or the injected subnets wrong on day one and your only fix is to destroy and recreate the workspace, taking every cluster, job, and notebook reference with it.
Wrapping the workspace in a reusable module fixes that. The module encodes a secure-by-default posture — Premium SKU, VNet injection, Secure Cluster Connectivity, and CMK enabled — pins the immutable inputs behind validations so a bad value fails at plan instead of after a 20-minute apply, and emits the outputs (workspace URL, managed RG, and the Unity Catalog access connector principal) that downstream Terraform and platform teams actually consume.
When to use it
- You run more than one Databricks workspace (dev / staging / prod, or one per data domain in a data-mesh setup) and need them provisioned identically with only the environment-specific knobs differing.
- Security or compliance requires VNet injection plus Secure Cluster Connectivity so clusters have no public IP and all egress flows through your firewall or NAT gateway — the single hardest Databricks setup to get right by hand.
- You need customer-managed keys (CMK) in Key Vault for the managed services (notebooks, secrets, query results) and/or the DBFS root storage to satisfy data-residency or key-rotation policy.
- You are standing up Unity Catalog and need the workspace to ship with an
azurerm_databricks_access_connector(a managed identity) that can be granted RBAC on your ADLS Gen2 metastore storage. - You want the immutable, recreate-on-change parameters validated before apply, not discovered the hard way in production.
If you only need a throwaway sandbox workspace with defaults and a public endpoint, the bare resource is fine — this module is aimed at workspaces that have to survive an audit.
Module structure
terraform-module-azure-databricks/
├── versions.tf # provider + required_version pins
├── main.tf # access connector + workspace (VNet injection, SCC, CMK)
├── variables.tf # var-driven inputs with validation
└── outputs.tf # id/name/url + managed RG + connector identity
versions.tf
terraform {
required_version = ">= 1.6.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
main.tf
locals {
# Unity Catalog access connector is created unless the caller opts out.
create_access_connector = var.create_access_connector
# CMK for managed services requires both a key vault key id and the feature flag.
managed_cmk_enabled = var.managed_services_cmk_key_vault_key_id != null
# CMK for the DBFS root requires the same.
dbfs_cmk_enabled = var.managed_disk_cmk_key_vault_key_id != null || var.dbfs_root_cmk_key_vault_key_id != null
default_tags = {
ManagedBy = "Terraform"
Module = "terraform-module-azure-databricks"
}
tags = merge(local.default_tags, var.tags)
}
# Managed identity used by Unity Catalog to reach ADLS Gen2 metastore/data.
resource "azurerm_databricks_access_connector" "this" {
count = local.create_access_connector ? 1 : 0
name = coalesce(var.access_connector_name, "${var.name}-ac")
resource_group_name = var.resource_group_name
location = var.location
identity {
type = "SystemAssigned"
}
tags = local.tags
}
resource "azurerm_databricks_workspace" "this" {
name = var.name
resource_group_name = var.resource_group_name
location = var.location
sku = var.sku
# Name of the auto-created managed resource group that holds the data plane.
managed_resource_group_name = coalesce(
var.managed_resource_group_name,
"${var.resource_group_name}-${var.name}-managed"
)
# Secure Cluster Connectivity: clusters get no public IP. IMMUTABLE.
public_network_access_enabled = var.public_network_access_enabled
network_security_group_rules_required = var.public_network_access_enabled ? null : "NoAzureDatabricksRules"
# Customer-managed keys for managed services (notebooks, secrets, results).
managed_services_cmk_key_vault_key_id = var.managed_services_cmk_key_vault_key_id
# Customer-managed key for the DBFS root storage account.
customer_managed_key_enabled = local.dbfs_cmk_enabled
managed_disk_cmk_key_vault_key_id = var.managed_disk_cmk_key_vault_key_id
# Force cluster-local disk encryption with the platform-managed key when
# a customer-managed disk key is not supplied but encryption is desired.
infrastructure_encryption_enabled = var.infrastructure_encryption_enabled
custom_parameters {
# No public IP on cluster nodes (Secure Cluster Connectivity). IMMUTABLE.
no_public_ip = var.no_public_ip
# VNet injection: bring your own VNet + the two delegated subnets. IMMUTABLE.
virtual_network_id = var.virtual_network_id
public_subnet_name = var.public_subnet_name
private_subnet_name = var.private_subnet_name
public_subnet_network_security_group_association_id = var.public_subnet_nsg_association_id
private_subnet_network_security_group_association_id = var.private_subnet_nsg_association_id
# Deterministic name for the managed DBFS root storage account.
storage_account_name = var.storage_account_name
storage_account_sku_name = var.storage_account_sku_name
# Route all cluster egress through a NAT gateway in the managed RG.
nat_gateway_name = var.nat_gateway_name
public_ip_name = var.public_ip_name
}
tags = local.tags
lifecycle {
# The managed RG, subnets, and SCC flags are immutable; surface drift loudly
# instead of silently planning a destructive replace on harmless tag edits.
ignore_changes = [
tags["CreatedDate"],
]
}
}
variables.tf
variable "name" {
type = string
description = "Name of the Databricks workspace. 3-64 chars, alphanumerics, hyphens and underscores."
validation {
condition = can(regex("^[A-Za-z0-9_-]{3,64}$", var.name))
error_message = "name must be 3-64 characters: letters, digits, hyphens or underscores only."
}
}
variable "resource_group_name" {
type = string
description = "Name of the existing resource group that will contain the workspace."
}
variable "location" {
type = string
description = "Azure region for the workspace (e.g. centralindia, eastus2)."
}
variable "sku" {
type = string
default = "premium"
description = "Workspace SKU. Use premium for VNet injection, CMK and Unity Catalog."
validation {
condition = contains(["standard", "premium", "trial"], var.sku)
error_message = "sku must be one of: standard, premium, trial."
}
}
variable "managed_resource_group_name" {
type = string
default = null
description = "Override name for the auto-created managed resource group. Defaults to <rg>-<name>-managed."
}
# ---------------------------------------------------------------------------
# Networking — VNet injection + Secure Cluster Connectivity (all IMMUTABLE)
# ---------------------------------------------------------------------------
variable "public_network_access_enabled" {
type = bool
default = false
description = "Allow access to the workspace from the public internet. Disable to require Private Link."
}
variable "no_public_ip" {
type = bool
default = true
description = "Secure Cluster Connectivity: deploy cluster nodes with no public IP. IMMUTABLE after create."
}
variable "virtual_network_id" {
type = string
default = null
description = "Resource ID of the VNet to inject the workspace into. Null uses a Databricks-managed VNet."
}
variable "public_subnet_name" {
type = string
default = null
description = "Name of the delegated 'host' subnet for VNet injection. Required when virtual_network_id is set."
}
variable "private_subnet_name" {
type = string
default = null
description = "Name of the delegated 'container' subnet for VNet injection. Required when virtual_network_id is set."
}
variable "public_subnet_nsg_association_id" {
type = string
default = null
description = "ID of the subnet-NSG association for the public subnet. Required for VNet injection."
}
variable "private_subnet_nsg_association_id" {
type = string
default = null
description = "ID of the subnet-NSG association for the private subnet. Required for VNet injection."
}
variable "nat_gateway_name" {
type = string
default = null
description = "Name of a NAT gateway created in the managed RG for deterministic cluster egress (SCC only)."
}
variable "public_ip_name" {
type = string
default = null
description = "Name of the public IP attached to the managed NAT gateway. Pairs with nat_gateway_name."
}
# ---------------------------------------------------------------------------
# Storage
# ---------------------------------------------------------------------------
variable "storage_account_name" {
type = string
default = null
description = "Name of the managed DBFS root storage account. 3-24 lowercase alphanumerics. IMMUTABLE."
validation {
condition = var.storage_account_name == null || can(regex("^[a-z0-9]{3,24}$", var.storage_account_name))
error_message = "storage_account_name must be 3-24 lowercase letters/digits."
}
}
variable "storage_account_sku_name" {
type = string
default = "Standard_GRS"
description = "SKU of the managed DBFS root storage account."
validation {
condition = contains(["Standard_LRS", "Standard_GRS", "Standard_RAGRS", "Standard_ZRS"], var.storage_account_sku_name)
error_message = "storage_account_sku_name must be a valid Standard storage SKU."
}
}
# ---------------------------------------------------------------------------
# Encryption — customer-managed keys
# ---------------------------------------------------------------------------
variable "managed_services_cmk_key_vault_key_id" {
type = string
default = null
description = "Key Vault key versioned ID for encrypting managed services (notebooks, secrets, results). Premium only."
}
variable "managed_disk_cmk_key_vault_key_id" {
type = string
default = null
description = "Key Vault key versioned ID for encrypting cluster managed disks with a customer key."
}
variable "dbfs_root_cmk_key_vault_key_id" {
type = string
default = null
description = "Key Vault key versioned ID for encrypting the DBFS root storage account. Enables customer-managed key."
}
variable "infrastructure_encryption_enabled" {
type = bool
default = true
description = "Enable double (infrastructure) encryption on the workspace's managed disks and storage."
}
# ---------------------------------------------------------------------------
# Unity Catalog access connector
# ---------------------------------------------------------------------------
variable "create_access_connector" {
type = bool
default = true
description = "Create a system-assigned-identity access connector for Unity Catalog metastore/data access."
}
variable "access_connector_name" {
type = string
default = null
description = "Override name for the access connector. Defaults to <name>-ac."
}
variable "tags" {
type = map(string)
default = {}
description = "Additional tags merged onto the workspace and access connector."
}
outputs.tf
output "id" {
description = "Resource ID of the Databricks workspace."
value = azurerm_databricks_workspace.this.id
}
output "name" {
description = "Name of the Databricks workspace."
value = azurerm_databricks_workspace.this.name
}
output "workspace_url" {
description = "Per-workspace URL (e.g. adb-1234567890123456.7.azuredatabricks.net) used for the API host."
value = azurerm_databricks_workspace.this.workspace_url
}
output "workspace_id" {
description = "Unique numeric Databricks workspace ID (organization id), used by the databricks provider."
value = azurerm_databricks_workspace.this.workspace_id
}
output "managed_resource_group_id" {
description = "Resource ID of the auto-created managed resource group holding the data plane."
value = azurerm_databricks_workspace.this.managed_resource_group_id
}
output "managed_resource_group_name" {
description = "Name of the auto-created managed resource group."
value = azurerm_databricks_workspace.this.managed_resource_group_name
}
output "access_connector_id" {
description = "Resource ID of the Unity Catalog access connector (null if not created)."
value = try(azurerm_databricks_access_connector.this[0].id, null)
}
output "access_connector_principal_id" {
description = "System-assigned identity principal ID of the access connector — grant this RBAC on metastore storage."
value = try(azurerm_databricks_access_connector.this[0].identity[0].principal_id, null)
}
How to use it
This example injects the workspace into an existing hub VNet with two delegated subnets, turns on Secure Cluster Connectivity, supplies a managed-services CMK from Key Vault, and then grants the access connector Storage Blob Data Contributor on the Unity Catalog metastore storage account using one of the module’s outputs.
module "databricks_workspace" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-databricks?ref=v1.0.0"
name = "kv-analytics-prod"
resource_group_name = azurerm_resource_group.data.name
location = "centralindia"
sku = "premium"
# Secure Cluster Connectivity + VNet injection into the hub.
public_network_access_enabled = false
no_public_ip = true
virtual_network_id = azurerm_virtual_network.hub.id
public_subnet_name = azurerm_subnet.dbx_host.name
private_subnet_name = azurerm_subnet.dbx_container.name
public_subnet_nsg_association_id = azurerm_subnet_network_security_group_association.dbx_host.id
private_subnet_nsg_association_id = azurerm_subnet_network_security_group_association.dbx_container.id
# Deterministic managed storage + customer-managed key for managed services.
storage_account_name = "kvanalyticsproddbfs"
storage_account_sku_name = "Standard_ZRS"
managed_services_cmk_key_vault_key_id = azurerm_key_vault_key.dbx_managed.versionless_id
create_access_connector = true
tags = {
Environment = "prod"
CostCenter = "data-platform"
Domain = "analytics"
}
}
# Downstream: grant the workspace's Unity Catalog identity access to the
# metastore storage, using the module's access_connector_principal_id output.
resource "azurerm_role_assignment" "uc_metastore" {
scope = azurerm_storage_account.uc_metastore.id
role_definition_name = "Storage Blob Data Contributor"
principal_id = module.databricks_workspace.access_connector_principal_id
}
# The databricks provider can now authenticate against the new workspace.
provider "databricks" {
host = "https://${module.databricks_workspace.workspace_url}"
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/databricks/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-databricks?ref=v1.0.0"
}
inputs = {
name = "..."
resource_group_name = "..."
location = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/databricks && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Workspace name; 3-64 chars, alphanumerics/hyphens/underscores. |
resource_group_name |
string |
— | Yes | Existing resource group to hold the workspace. |
location |
string |
— | Yes | Azure region (e.g. centralindia). |
sku |
string |
"premium" |
No | standard, premium, or trial. Premium needed for VNet injection/CMK/Unity Catalog. |
managed_resource_group_name |
string |
null |
No | Override managed RG name; defaults to <rg>-<name>-managed. |
public_network_access_enabled |
bool |
false |
No | Allow public internet access to the workspace control plane. |
no_public_ip |
bool |
true |
No | Secure Cluster Connectivity; no public IP on nodes. Immutable. |
virtual_network_id |
string |
null |
No | VNet resource ID for injection; null uses managed VNet. Immutable. |
public_subnet_name |
string |
null |
No | Delegated host subnet name for VNet injection. Immutable. |
private_subnet_name |
string |
null |
No | Delegated container subnet name for VNet injection. Immutable. |
public_subnet_nsg_association_id |
string |
null |
No | Subnet-NSG association ID for the host subnet. |
private_subnet_nsg_association_id |
string |
null |
No | Subnet-NSG association ID for the container subnet. |
nat_gateway_name |
string |
null |
No | NAT gateway name in the managed RG for deterministic egress (SCC). |
public_ip_name |
string |
null |
No | Public IP name attached to the managed NAT gateway. |
storage_account_name |
string |
null |
No | Managed DBFS root storage account name; 3-24 lowercase alnum. Immutable. |
storage_account_sku_name |
string |
"Standard_GRS" |
No | SKU of the managed DBFS root storage account. |
managed_services_cmk_key_vault_key_id |
string |
null |
No | Key Vault key ID for managed-services CMK (Premium only). |
managed_disk_cmk_key_vault_key_id |
string |
null |
No | Key Vault key ID for cluster managed-disk CMK. |
dbfs_root_cmk_key_vault_key_id |
string |
null |
No | Key Vault key ID for DBFS root storage CMK. |
infrastructure_encryption_enabled |
bool |
true |
No | Enable double (infrastructure) encryption on managed disks/storage. |
create_access_connector |
bool |
true |
No | Create a system-assigned-identity access connector for Unity Catalog. |
access_connector_name |
string |
null |
No | Override access connector name; defaults to <name>-ac. |
tags |
map(string) |
{} |
No | Additional tags merged onto the workspace and connector. |
Outputs
| Name | Description |
|---|---|
id |
Resource ID of the Databricks workspace. |
name |
Name of the Databricks workspace. |
workspace_url |
Per-workspace URL used as the API/UI host. |
workspace_id |
Numeric Databricks workspace (organization) ID for the databricks provider. |
managed_resource_group_id |
Resource ID of the auto-created managed resource group. |
managed_resource_group_name |
Name of the auto-created managed resource group. |
access_connector_id |
Resource ID of the Unity Catalog access connector (null if not created). |
access_connector_principal_id |
Access connector system-assigned identity principal ID — grant RBAC on metastore storage. |
Enterprise scenario
A retail bank’s data-platform team runs three identical Databricks workspaces — dev, staging, and prod — each injected into a region-local spoke VNet with Secure Cluster Connectivity so no cluster ever receives a public IP and all egress is forced through the central Azure Firewall for inspection. Production uses a managed_services_cmk_key_vault_key_id from an HSM-backed Key Vault to satisfy the regulator’s requirement that notebook content and query results be encrypted under a customer-controlled, annually rotated key. Each workspace ships with its access connector pre-bound to the shared ADLS Gen2 Unity Catalog metastore via the access_connector_principal_id output, so a new environment is fully governable the moment terraform apply finishes.
Best practices
- Treat the immutable inputs as a one-shot decision.
no_public_ip, the injectedvirtual_network_id/subnets, andstorage_account_namecannot change in place — Terraform plans a destroy-and-recreate, which wipes clusters, jobs, and DBFS. The module’s regex validations catch bad values at plan time; review any plan that shows the workspace being replaced before you apply. - Default to
premiumand Secure Cluster Connectivity. VNet injection, customer-managed keys, and Unity Catalog all require the Premium SKU. Keepno_public_ip = trueandpublic_network_access_enabled = false, then front the workspace with Private Endpoints so the control plane is never reachable from the internet. - Right-size the storage SKU for cost. The managed DBFS root defaults to
Standard_GRS; in non-prod, setstorage_account_sku_name = "Standard_LRS"to cut replication cost, and reserve zone-redundantStandard_ZRSfor production resilience. Remember DBUs, not this storage, are usually the dominant Databricks bill — pair the module with cluster policies and auto-termination. - Rotate CMKs with versionless key IDs. Pass
*.versionless_idfromazurerm_key_vault_key(as shown) so Azure auto-picks up new key versions on rotation without a Terraform apply, and grant the workspace’s Key Vault access before the workspace so the first apply doesn’t fail on a missing key permission. - Name deterministically across environments. Drive
name,managed_resource_group_name,storage_account_name, andaccess_connector_namefrom a shared naming convention (<org>-<workload>-<env>) so the managed RG and DBFS storage are predictable in audits and cost reports rather than random Databricks-generated strings. - Grant the access connector least privilege. Assign its
access_connector_principal_idonlyStorage Blob Data Contributoron the specific metastore/container scopes it needs — never at subscription or resource-group scope — and keep Unity Catalog as the single governance plane rather than per-workspace direct mounts.