IaC Azure

Terraform Module: Azure Machine Learning Workspace — Private, Governed MLOps Foundations

Quick take — Provision an Azure Machine Learning Workspace with Terraform: customer-managed keys, private connectivity to its Storage/Key Vault/ACR dependencies, system-assigned identity, and clean MLOps-ready outputs. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "machine_learning" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-machine-learning?ref=v1.0.0"

  name                    = "..."  # Workspace name (3-33 chars, alphanumeric/hyphen, valida…
  location                = "..."  # Azure region; must match the dependent resources.
  resource_group_name     = "..."  # Resource group containing the workspace.
  application_insights_id = "..."  # App Insights resource ID for telemetry.
  key_vault_id            = "..."  # Key Vault resource ID for connection secrets.
  storage_account_id      = "..."  # Storage Account resource ID (default datastore).
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

An Azure Machine Learning (Azure ML) Workspace is the top-level resource that anchors everything in Azure ML: compute targets, datastores, registered models, environments, jobs, endpoints, and the experiment/run history. Critically, a workspace is not self-contained — at creation time it binds to four mandatory dependent resources: a Storage Account (for artifacts, snapshots and datasets), a Key Vault (for connection secrets and credentials), an Application Insights instance (for telemetry from training jobs and online endpoints), and optionally a Container Registry (ACR, for environment images used by jobs and deployments). Those bindings are immutable for the life of the workspace, which makes the workspace an awkward thing to click together in the portal — get one dependency wrong and you are recreating the whole thing.

That immutability is exactly why this belongs in a reusable Terraform module. Wrapping azurerm_machine_learning_workspace lets you encode the non-negotiable production posture once — system-assigned managed identity, customer-managed encryption keys (CMK), public network access disabled in favour of a private endpoint, high-business-impact (HBI) data handling, and consistent naming/tagging — and then stamp out identical dev, staging, and prod workspaces that differ only by inputs. The module also resolves the chicken-and-egg ordering between the workspace identity and its CMK key vault access, so consumers never have to think about it.

When to use it

Reach for the raw resource instead only for a quick throwaway sandbox where you will accept the Azure-managed defaults and a public endpoint.

Module structure

terraform-module-azure-machine-learning/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # workspace + identity + CMK + private endpoint
├── variables.tf     # var-driven inputs with validations
└── outputs.tf       # id/name + identity + discovery URLs

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

# The workspace binds to four dependent resources at creation. These IDs are
# passed in so the module stays composable (callers own lifecycle of the deps).
resource "azurerm_machine_learning_workspace" "this" {
  name                = var.name
  location            = var.location
  resource_group_name = var.resource_group_name

  application_insights_id = var.application_insights_id
  key_vault_id            = var.key_vault_id
  storage_account_id      = var.storage_account_id
  # Container registry is optional but required to build custom environments.
  container_registry_id   = var.container_registry_id

  friendly_name  = var.friendly_name
  description     = var.description
  sku_name        = var.sku_name

  # High Business Impact: scrubs diagnostic data that could leak sensitive
  # content out of the workspace boundary. Immutable after creation.
  high_business_impact = var.high_business_impact

  # Lock the data plane down; reach the workspace only via private endpoint.
  public_network_access_enabled = var.public_network_access_enabled

  # v1 legacy mode forces ARM-based RBAC on the workspace Key Vault/Storage.
  # Leave false on new workspaces so you get the current data-plane model.
  v1_legacy_mode_enabled = var.v1_legacy_mode_enabled

  identity {
    type = var.user_assigned_identity_ids == null ? "SystemAssigned" : "SystemAssigned, UserAssigned"
    identity_ids = var.user_assigned_identity_ids
  }

  # Customer-managed key encryption for workspace metadata (Cosmos DB, the
  # internal search and storage that Azure ML provisions on your behalf).
  dynamic "encryption" {
    for_each = var.customer_managed_key == null ? [] : [var.customer_managed_key]
    content {
      key_vault_id     = encryption.value.key_vault_id
      key_id           = encryption.value.key_id
      user_assigned_identity_id = try(encryption.value.user_assigned_identity_id, null)
    }
  }

  tags = var.tags
}

# Grant the workspace's own identity rights to read the CMK so it can complete
# provisioning of its managed datastore. Only created when CMK + RBAC vault.
resource "azurerm_role_assignment" "cmk_crypto_user" {
  count = var.customer_managed_key == null ? 0 : 1

  scope                = var.customer_managed_key.key_vault_id
  role_definition_name = "Key Vault Crypto Service Encryption User"
  principal_id         = azurerm_machine_learning_workspace.this.identity[0].principal_id
}

# Private endpoint for the workspace control/data plane. The amlworkspace
# sub-resource also fronts the notebook and inference scoring traffic.
resource "azurerm_private_endpoint" "this" {
  count = var.private_endpoint == null ? 0 : 1

  name                = "${var.name}-pe"
  location            = var.location
  resource_group_name = var.resource_group_name
  subnet_id           = var.private_endpoint.subnet_id

  private_service_connection {
    name                           = "${var.name}-psc"
    private_connection_resource_id = azurerm_machine_learning_workspace.this.id
    subresource_names              = ["amlworkspace"]
    is_manual_connection           = false
  }

  dynamic "private_dns_zone_group" {
    for_each = length(var.private_endpoint.private_dns_zone_ids) > 0 ? [1] : []
    content {
      name                 = "aml-dns"
      private_dns_zone_ids = var.private_endpoint.private_dns_zone_ids
    }
  }

  tags = var.tags
}

variables.tf

variable "name" {
  type        = string
  description = "Name of the Azure Machine Learning workspace."

  validation {
    # ML workspace names: 3-33 chars, alphanumerics and hyphens.
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9-]{1,31}[a-zA-Z0-9]$", var.name))
    error_message = "name must be 3-33 chars, alphanumeric or hyphens, and start/end alphanumeric."
  }
}

variable "location" {
  type        = string
  description = "Azure region for the workspace (must match its dependent resources)."
}

variable "resource_group_name" {
  type        = string
  description = "Resource group that will contain the workspace."
}

variable "application_insights_id" {
  type        = string
  description = "Resource ID of the Application Insights instance for job/endpoint telemetry."
}

variable "key_vault_id" {
  type        = string
  description = "Resource ID of the Key Vault that stores workspace connection secrets."
}

variable "storage_account_id" {
  type        = string
  description = "Resource ID of the Storage Account used as the default datastore."
}

variable "container_registry_id" {
  type        = string
  description = "Resource ID of the ACR used to build/store environment images. Premium SKU required when private."
  default     = null
}

variable "friendly_name" {
  type        = string
  description = "Display name shown in Azure ML Studio."
  default     = null
}

variable "description" {
  type        = string
  description = "Description of the workspace shown in Azure ML Studio."
  default     = null
}

variable "sku_name" {
  type        = string
  description = "Workspace SKU tier."
  default     = "Basic"

  validation {
    condition     = contains(["Basic", "Standard"], var.sku_name)
    error_message = "sku_name must be 'Basic' or 'Standard'."
  }
}

variable "high_business_impact" {
  type        = bool
  description = "Enable HBI to suppress collection of sensitive diagnostic data. Immutable after creation."
  default     = true
}

variable "public_network_access_enabled" {
  type        = bool
  description = "Allow public network access to the workspace. Set false and use a private endpoint in production."
  default     = false
}

variable "v1_legacy_mode_enabled" {
  type        = bool
  description = "Enable v1 legacy (ARM-only) data plane. Keep false for new workspaces."
  default     = false
}

variable "user_assigned_identity_ids" {
  type        = list(string)
  description = "Optional list of user-assigned identity IDs to attach alongside the system-assigned identity."
  default     = null
}

variable "customer_managed_key" {
  type = object({
    key_vault_id              = string
    key_id                    = string
    user_assigned_identity_id = optional(string)
  })
  description = "Customer-managed key for workspace data encryption. Null uses Microsoft-managed keys."
  default     = null
}

variable "private_endpoint" {
  type = object({
    subnet_id            = string
    private_dns_zone_ids = optional(list(string), [])
  })
  description = "Private endpoint config for the 'amlworkspace' sub-resource. Null skips PE creation."
  default     = null
}

variable "tags" {
  type        = map(string)
  description = "Tags applied to the workspace and private endpoint."
  default     = {}
}

outputs.tf

output "id" {
  description = "Resource ID of the Azure Machine Learning workspace."
  value       = azurerm_machine_learning_workspace.this.id
}

output "name" {
  description = "Name of the workspace."
  value       = azurerm_machine_learning_workspace.this.name
}

output "discovery_url" {
  description = "Discovery URL used by the Azure ML SDK/CLI to locate workspace endpoints."
  value       = azurerm_machine_learning_workspace.this.discovery_url
}

output "workspace_id" {
  description = "Immutable GUID of the workspace (used in diagnostic/metric queries)."
  value       = azurerm_machine_learning_workspace.this.workspace_id
}

output "principal_id" {
  description = "Principal ID of the workspace's system-assigned managed identity."
  value       = azurerm_machine_learning_workspace.this.identity[0].principal_id
}

output "tenant_id" {
  description = "Tenant ID of the workspace's system-assigned managed identity."
  value       = azurerm_machine_learning_workspace.this.identity[0].tenant_id
}

output "private_endpoint_id" {
  description = "Resource ID of the workspace private endpoint, if created."
  value       = try(azurerm_private_endpoint.this[0].id, null)
}

How to use it

module "machine_learning_workspace" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-machine-learning?ref=v1.0.0"

  name                = "mlw-fraud-prod-eus2"
  location            = "eastus2"
  resource_group_name = azurerm_resource_group.ml.name

  # Four mandatory dependencies — created/owned by the caller.
  application_insights_id = azurerm_application_insights.ml.id
  key_vault_id            = azurerm_key_vault.ml.id
  storage_account_id      = azurerm_storage_account.ml.id
  container_registry_id   = azurerm_container_registry.ml.id

  friendly_name = "Fraud Detection (Prod)"
  description   = "Production training + scoring for the fraud risk models."
  sku_name      = "Basic"

  high_business_impact          = true
  public_network_access_enabled = false

  # Encrypt workspace metadata with our own key.
  customer_managed_key = {
    key_vault_id = azurerm_key_vault.cmk.id
    key_id       = azurerm_key_vault_key.aml.id
  }

  # Land the workspace inside the platform VNet.
  private_endpoint = {
    subnet_id            = azurerm_subnet.ml_pe.id
    private_dns_zone_ids = [azurerm_private_dns_zone.aml_api.id, azurerm_private_dns_zone.aml_notebooks.id]
  }

  tags = {
    env       = "prod"
    workload  = "fraud-detection"
    cost_center = "ml-platform"
  }
}

# Downstream: grant a CI/CD service principal the ability to submit jobs and
# manage assets in the workspace, scoped to its resource ID output.
resource "azurerm_role_assignment" "ci_ml_contributor" {
  scope                = module.machine_learning_workspace.id
  role_definition_name = "AzureML Data Scientist"
  principal_id         = azuread_service_principal.ml_pipeline.object_id
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/machine_learning/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-machine-learning?ref=v1.0.0"
}

inputs = {
  name = "..."
  location = "..."
  resource_group_name = "..."
  application_insights_id = "..."
  key_vault_id = "..."
  storage_account_id = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/machine_learning && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Workspace name (3-33 chars, alphanumeric/hyphen, validated).
location string Yes Azure region; must match the dependent resources.
resource_group_name string Yes Resource group containing the workspace.
application_insights_id string Yes App Insights resource ID for telemetry.
key_vault_id string Yes Key Vault resource ID for connection secrets.
storage_account_id string Yes Storage Account resource ID (default datastore).
container_registry_id string null No ACR resource ID for environment images (Premium when private).
friendly_name string null No Display name in Azure ML Studio.
description string null No Workspace description shown in Studio.
sku_name string "Basic" No Workspace SKU: Basic or Standard (validated).
high_business_impact bool true No Suppress sensitive diagnostic data collection (immutable).
public_network_access_enabled bool false No Allow public access; keep false with a private endpoint.
v1_legacy_mode_enabled bool false No Enable v1 ARM-only data plane; keep false for new workspaces.
user_assigned_identity_ids list(string) null No Extra user-assigned identities alongside the system identity.
customer_managed_key object null No CMK config (key_vault_id, key_id, optional user_assigned_identity_id).
private_endpoint object null No PE config (subnet_id, optional private_dns_zone_ids).
tags map(string) {} No Tags applied to the workspace and private endpoint.

Outputs

Name Description
id Resource ID of the Azure Machine Learning workspace.
name Name of the workspace.
discovery_url Discovery URL used by the Azure ML SDK/CLI to locate endpoints.
workspace_id Immutable workspace GUID for diagnostic/metric queries.
principal_id Principal ID of the workspace system-assigned managed identity.
tenant_id Tenant ID of the workspace system-assigned managed identity.
private_endpoint_id Resource ID of the workspace private endpoint, if created.

Enterprise scenario

A retail bank’s fraud-analytics team runs three Azure ML workspaces — dev, staging, and prod — each in a separate spoke VNet under the platform landing zone. Because transaction data is HBI, every workspace is provisioned through this module with high_business_impact = true, public_network_access_enabled = false, a customer-managed key rotated quarterly in a dedicated Key Vault, and a private endpoint into the spoke so data scientists reach Azure ML Studio only over ExpressRoute. The module’s principal_id output feeds RBAC assignments that let the workspace identity pull environment images from a shared Premium ACR, while the id output wires a GitHub Actions service principal as AzureML Data Scientist for automated retraining — giving the team identical, audited workspaces with zero portal clicks.

Best practices

TerraformAzureMachine Learning WorkspaceModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading