IaC Azure

Terraform Module: Azure Databricks Workspace — VNet-Injected, Customer-Managed Keys, Locked Down by Default

Quick take — A production Terraform module for azurerm_databricks_workspace: VNet injection, no-public-IP, CMK for managed services and DBFS, and Unity Catalog access connector — fully var-driven for azurerm ~> 4.0. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "databricks" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-databricks?ref=v1.0.0"

  name                = "..."  # Workspace name; 3-64 chars, alphanumerics/hyphens/under…
  resource_group_name = "..."  # Existing resource group to hold the workspace.
  location            = "..."  # Azure region (e.g. `centralindia`).
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

An Azure Databricks Workspace is the managed analytics control plane that Microsoft and Databricks jointly operate on Azure. When you create the azurerm_databricks_workspace resource, Azure provisions a managed resource group into your subscription that holds the workspace’s data-plane plumbing — the worker/driver VMs, a managed VNet (unless you inject your own), storage for DBFS, and the network security groups that wire it all together. You get a Databricks UI, clusters, SQL warehouses, jobs, and Delta Lake, all billed through your Azure subscription via Databricks Units (DBUs).

The catch is that almost every interesting production setting lives in nested blocks and a free-form custom_parameters map, not in tidy top-level arguments: VNet injection, Secure Cluster Connectivity (no public IP), customer-managed keys for both managed services and the DBFS root, NAT-gateway egress, and the storage account name are all easy to get subtly wrong. Worse, several of them are immutable — set no_public_ip or the injected subnets wrong on day one and your only fix is to destroy and recreate the workspace, taking every cluster, job, and notebook reference with it.

Wrapping the workspace in a reusable module fixes that. The module encodes a secure-by-default posture — Premium SKU, VNet injection, Secure Cluster Connectivity, and CMK enabled — pins the immutable inputs behind validations so a bad value fails at plan instead of after a 20-minute apply, and emits the outputs (workspace URL, managed RG, and the Unity Catalog access connector principal) that downstream Terraform and platform teams actually consume.

When to use it

If you only need a throwaway sandbox workspace with defaults and a public endpoint, the bare resource is fine — this module is aimed at workspaces that have to survive an audit.

Module structure

terraform-module-azure-databricks/
├── versions.tf      # provider + required_version pins
├── main.tf          # access connector + workspace (VNet injection, SCC, CMK)
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # id/name/url + managed RG + connector identity

versions.tf

terraform {
  required_version = ">= 1.6.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

locals {
  # Unity Catalog access connector is created unless the caller opts out.
  create_access_connector = var.create_access_connector

  # CMK for managed services requires both a key vault key id and the feature flag.
  managed_cmk_enabled = var.managed_services_cmk_key_vault_key_id != null

  # CMK for the DBFS root requires the same.
  dbfs_cmk_enabled = var.managed_disk_cmk_key_vault_key_id != null || var.dbfs_root_cmk_key_vault_key_id != null

  default_tags = {
    ManagedBy = "Terraform"
    Module    = "terraform-module-azure-databricks"
  }

  tags = merge(local.default_tags, var.tags)
}

# Managed identity used by Unity Catalog to reach ADLS Gen2 metastore/data.
resource "azurerm_databricks_access_connector" "this" {
  count = local.create_access_connector ? 1 : 0

  name                = coalesce(var.access_connector_name, "${var.name}-ac")
  resource_group_name = var.resource_group_name
  location            = var.location

  identity {
    type = "SystemAssigned"
  }

  tags = local.tags
}

resource "azurerm_databricks_workspace" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  location            = var.location
  sku                 = var.sku

  # Name of the auto-created managed resource group that holds the data plane.
  managed_resource_group_name = coalesce(
    var.managed_resource_group_name,
    "${var.resource_group_name}-${var.name}-managed"
  )

  # Secure Cluster Connectivity: clusters get no public IP. IMMUTABLE.
  public_network_access_enabled         = var.public_network_access_enabled
  network_security_group_rules_required = var.public_network_access_enabled ? null : "NoAzureDatabricksRules"

  # Customer-managed keys for managed services (notebooks, secrets, results).
  managed_services_cmk_key_vault_key_id = var.managed_services_cmk_key_vault_key_id

  # Customer-managed key for the DBFS root storage account.
  customer_managed_key_enabled = local.dbfs_cmk_enabled
  managed_disk_cmk_key_vault_key_id = var.managed_disk_cmk_key_vault_key_id

  # Force cluster-local disk encryption with the platform-managed key when
  # a customer-managed disk key is not supplied but encryption is desired.
  infrastructure_encryption_enabled = var.infrastructure_encryption_enabled

  custom_parameters {
    # No public IP on cluster nodes (Secure Cluster Connectivity). IMMUTABLE.
    no_public_ip = var.no_public_ip

    # VNet injection: bring your own VNet + the two delegated subnets. IMMUTABLE.
    virtual_network_id  = var.virtual_network_id
    public_subnet_name  = var.public_subnet_name
    private_subnet_name = var.private_subnet_name

    public_subnet_network_security_group_association_id  = var.public_subnet_nsg_association_id
    private_subnet_network_security_group_association_id = var.private_subnet_nsg_association_id

    # Deterministic name for the managed DBFS root storage account.
    storage_account_name = var.storage_account_name
    storage_account_sku_name = var.storage_account_sku_name

    # Route all cluster egress through a NAT gateway in the managed RG.
    nat_gateway_name      = var.nat_gateway_name
    public_ip_name        = var.public_ip_name
  }

  tags = local.tags

  lifecycle {
    # The managed RG, subnets, and SCC flags are immutable; surface drift loudly
    # instead of silently planning a destructive replace on harmless tag edits.
    ignore_changes = [
      tags["CreatedDate"],
    ]
  }
}

variables.tf

variable "name" {
  type        = string
  description = "Name of the Databricks workspace. 3-64 chars, alphanumerics, hyphens and underscores."

  validation {
    condition     = can(regex("^[A-Za-z0-9_-]{3,64}$", var.name))
    error_message = "name must be 3-64 characters: letters, digits, hyphens or underscores only."
  }
}

variable "resource_group_name" {
  type        = string
  description = "Name of the existing resource group that will contain the workspace."
}

variable "location" {
  type        = string
  description = "Azure region for the workspace (e.g. centralindia, eastus2)."
}

variable "sku" {
  type        = string
  default     = "premium"
  description = "Workspace SKU. Use premium for VNet injection, CMK and Unity Catalog."

  validation {
    condition     = contains(["standard", "premium", "trial"], var.sku)
    error_message = "sku must be one of: standard, premium, trial."
  }
}

variable "managed_resource_group_name" {
  type        = string
  default     = null
  description = "Override name for the auto-created managed resource group. Defaults to <rg>-<name>-managed."
}

# ---------------------------------------------------------------------------
# Networking — VNet injection + Secure Cluster Connectivity (all IMMUTABLE)
# ---------------------------------------------------------------------------

variable "public_network_access_enabled" {
  type        = bool
  default     = false
  description = "Allow access to the workspace from the public internet. Disable to require Private Link."
}

variable "no_public_ip" {
  type        = bool
  default     = true
  description = "Secure Cluster Connectivity: deploy cluster nodes with no public IP. IMMUTABLE after create."
}

variable "virtual_network_id" {
  type        = string
  default     = null
  description = "Resource ID of the VNet to inject the workspace into. Null uses a Databricks-managed VNet."
}

variable "public_subnet_name" {
  type        = string
  default     = null
  description = "Name of the delegated 'host' subnet for VNet injection. Required when virtual_network_id is set."
}

variable "private_subnet_name" {
  type        = string
  default     = null
  description = "Name of the delegated 'container' subnet for VNet injection. Required when virtual_network_id is set."
}

variable "public_subnet_nsg_association_id" {
  type        = string
  default     = null
  description = "ID of the subnet-NSG association for the public subnet. Required for VNet injection."
}

variable "private_subnet_nsg_association_id" {
  type        = string
  default     = null
  description = "ID of the subnet-NSG association for the private subnet. Required for VNet injection."
}

variable "nat_gateway_name" {
  type        = string
  default     = null
  description = "Name of a NAT gateway created in the managed RG for deterministic cluster egress (SCC only)."
}

variable "public_ip_name" {
  type        = string
  default     = null
  description = "Name of the public IP attached to the managed NAT gateway. Pairs with nat_gateway_name."
}

# ---------------------------------------------------------------------------
# Storage
# ---------------------------------------------------------------------------

variable "storage_account_name" {
  type        = string
  default     = null
  description = "Name of the managed DBFS root storage account. 3-24 lowercase alphanumerics. IMMUTABLE."

  validation {
    condition     = var.storage_account_name == null || can(regex("^[a-z0-9]{3,24}$", var.storage_account_name))
    error_message = "storage_account_name must be 3-24 lowercase letters/digits."
  }
}

variable "storage_account_sku_name" {
  type        = string
  default     = "Standard_GRS"
  description = "SKU of the managed DBFS root storage account."

  validation {
    condition     = contains(["Standard_LRS", "Standard_GRS", "Standard_RAGRS", "Standard_ZRS"], var.storage_account_sku_name)
    error_message = "storage_account_sku_name must be a valid Standard storage SKU."
  }
}

# ---------------------------------------------------------------------------
# Encryption — customer-managed keys
# ---------------------------------------------------------------------------

variable "managed_services_cmk_key_vault_key_id" {
  type        = string
  default     = null
  description = "Key Vault key versioned ID for encrypting managed services (notebooks, secrets, results). Premium only."
}

variable "managed_disk_cmk_key_vault_key_id" {
  type        = string
  default     = null
  description = "Key Vault key versioned ID for encrypting cluster managed disks with a customer key."
}

variable "dbfs_root_cmk_key_vault_key_id" {
  type        = string
  default     = null
  description = "Key Vault key versioned ID for encrypting the DBFS root storage account. Enables customer-managed key."
}

variable "infrastructure_encryption_enabled" {
  type        = bool
  default     = true
  description = "Enable double (infrastructure) encryption on the workspace's managed disks and storage."
}

# ---------------------------------------------------------------------------
# Unity Catalog access connector
# ---------------------------------------------------------------------------

variable "create_access_connector" {
  type        = bool
  default     = true
  description = "Create a system-assigned-identity access connector for Unity Catalog metastore/data access."
}

variable "access_connector_name" {
  type        = string
  default     = null
  description = "Override name for the access connector. Defaults to <name>-ac."
}

variable "tags" {
  type        = map(string)
  default     = {}
  description = "Additional tags merged onto the workspace and access connector."
}

outputs.tf

output "id" {
  description = "Resource ID of the Databricks workspace."
  value       = azurerm_databricks_workspace.this.id
}

output "name" {
  description = "Name of the Databricks workspace."
  value       = azurerm_databricks_workspace.this.name
}

output "workspace_url" {
  description = "Per-workspace URL (e.g. adb-1234567890123456.7.azuredatabricks.net) used for the API host."
  value       = azurerm_databricks_workspace.this.workspace_url
}

output "workspace_id" {
  description = "Unique numeric Databricks workspace ID (organization id), used by the databricks provider."
  value       = azurerm_databricks_workspace.this.workspace_id
}

output "managed_resource_group_id" {
  description = "Resource ID of the auto-created managed resource group holding the data plane."
  value       = azurerm_databricks_workspace.this.managed_resource_group_id
}

output "managed_resource_group_name" {
  description = "Name of the auto-created managed resource group."
  value       = azurerm_databricks_workspace.this.managed_resource_group_name
}

output "access_connector_id" {
  description = "Resource ID of the Unity Catalog access connector (null if not created)."
  value       = try(azurerm_databricks_access_connector.this[0].id, null)
}

output "access_connector_principal_id" {
  description = "System-assigned identity principal ID of the access connector — grant this RBAC on metastore storage."
  value       = try(azurerm_databricks_access_connector.this[0].identity[0].principal_id, null)
}

How to use it

This example injects the workspace into an existing hub VNet with two delegated subnets, turns on Secure Cluster Connectivity, supplies a managed-services CMK from Key Vault, and then grants the access connector Storage Blob Data Contributor on the Unity Catalog metastore storage account using one of the module’s outputs.

module "databricks_workspace" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-databricks?ref=v1.0.0"

  name                = "kv-analytics-prod"
  resource_group_name = azurerm_resource_group.data.name
  location            = "centralindia"
  sku                 = "premium"

  # Secure Cluster Connectivity + VNet injection into the hub.
  public_network_access_enabled    = false
  no_public_ip                     = true
  virtual_network_id               = azurerm_virtual_network.hub.id
  public_subnet_name               = azurerm_subnet.dbx_host.name
  private_subnet_name              = azurerm_subnet.dbx_container.name
  public_subnet_nsg_association_id  = azurerm_subnet_network_security_group_association.dbx_host.id
  private_subnet_nsg_association_id = azurerm_subnet_network_security_group_association.dbx_container.id

  # Deterministic managed storage + customer-managed key for managed services.
  storage_account_name                  = "kvanalyticsproddbfs"
  storage_account_sku_name              = "Standard_ZRS"
  managed_services_cmk_key_vault_key_id = azurerm_key_vault_key.dbx_managed.versionless_id

  create_access_connector = true

  tags = {
    Environment = "prod"
    CostCenter  = "data-platform"
    Domain      = "analytics"
  }
}

# Downstream: grant the workspace's Unity Catalog identity access to the
# metastore storage, using the module's access_connector_principal_id output.
resource "azurerm_role_assignment" "uc_metastore" {
  scope                = azurerm_storage_account.uc_metastore.id
  role_definition_name = "Storage Blob Data Contributor"
  principal_id         = module.databricks_workspace.access_connector_principal_id
}

# The databricks provider can now authenticate against the new workspace.
provider "databricks" {
  host = "https://${module.databricks_workspace.workspace_url}"
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/databricks/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-databricks?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/databricks && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Workspace name; 3-64 chars, alphanumerics/hyphens/underscores.
resource_group_name string Yes Existing resource group to hold the workspace.
location string Yes Azure region (e.g. centralindia).
sku string "premium" No standard, premium, or trial. Premium needed for VNet injection/CMK/Unity Catalog.
managed_resource_group_name string null No Override managed RG name; defaults to <rg>-<name>-managed.
public_network_access_enabled bool false No Allow public internet access to the workspace control plane.
no_public_ip bool true No Secure Cluster Connectivity; no public IP on nodes. Immutable.
virtual_network_id string null No VNet resource ID for injection; null uses managed VNet. Immutable.
public_subnet_name string null No Delegated host subnet name for VNet injection. Immutable.
private_subnet_name string null No Delegated container subnet name for VNet injection. Immutable.
public_subnet_nsg_association_id string null No Subnet-NSG association ID for the host subnet.
private_subnet_nsg_association_id string null No Subnet-NSG association ID for the container subnet.
nat_gateway_name string null No NAT gateway name in the managed RG for deterministic egress (SCC).
public_ip_name string null No Public IP name attached to the managed NAT gateway.
storage_account_name string null No Managed DBFS root storage account name; 3-24 lowercase alnum. Immutable.
storage_account_sku_name string "Standard_GRS" No SKU of the managed DBFS root storage account.
managed_services_cmk_key_vault_key_id string null No Key Vault key ID for managed-services CMK (Premium only).
managed_disk_cmk_key_vault_key_id string null No Key Vault key ID for cluster managed-disk CMK.
dbfs_root_cmk_key_vault_key_id string null No Key Vault key ID for DBFS root storage CMK.
infrastructure_encryption_enabled bool true No Enable double (infrastructure) encryption on managed disks/storage.
create_access_connector bool true No Create a system-assigned-identity access connector for Unity Catalog.
access_connector_name string null No Override access connector name; defaults to <name>-ac.
tags map(string) {} No Additional tags merged onto the workspace and connector.

Outputs

Name Description
id Resource ID of the Databricks workspace.
name Name of the Databricks workspace.
workspace_url Per-workspace URL used as the API/UI host.
workspace_id Numeric Databricks workspace (organization) ID for the databricks provider.
managed_resource_group_id Resource ID of the auto-created managed resource group.
managed_resource_group_name Name of the auto-created managed resource group.
access_connector_id Resource ID of the Unity Catalog access connector (null if not created).
access_connector_principal_id Access connector system-assigned identity principal ID — grant RBAC on metastore storage.

Enterprise scenario

A retail bank’s data-platform team runs three identical Databricks workspaces — dev, staging, and prod — each injected into a region-local spoke VNet with Secure Cluster Connectivity so no cluster ever receives a public IP and all egress is forced through the central Azure Firewall for inspection. Production uses a managed_services_cmk_key_vault_key_id from an HSM-backed Key Vault to satisfy the regulator’s requirement that notebook content and query results be encrypted under a customer-controlled, annually rotated key. Each workspace ships with its access connector pre-bound to the shared ADLS Gen2 Unity Catalog metastore via the access_connector_principal_id output, so a new environment is fully governable the moment terraform apply finishes.

Best practices

TerraformAzureDatabricks WorkspaceModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading