Quick take — Provision an Azure Monitor Workspace for managed Prometheus with Terraform (azurerm ~> 4.0): default-DCE/DCR endpoints, query metrics access, public/private network controls, and clean outputs. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "monitor_workspace" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-workspace?ref=v1.0.0"
name = "..." # Name of the Monitor Workspace; 3-44 chars, validated ag…
resource_group_name = "..." # Resource group that contains the workspace.
location = "..." # Azure region. Place it close to the clusters/agents tha…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
An Azure Monitor Workspace (resource type azurerm_monitor_workspace, formerly “Azure Monitor managed service for Prometheus”) is the regional, fully-managed store for Prometheus metrics in Azure. When you enable managed Prometheus on an AKS cluster — or when you scrape any endpoint with the Azure Monitor metrics agent — the time-series land in a Monitor Workspace and become queryable over PromQL via the workspace’s query endpoint. It is the metrics counterpart to a Log Analytics workspace, and it is what Azure Managed Grafana points at as a data source.
A single azurerm_monitor_workspace resource is deceptively small, but in production it never travels alone. The moment you create one, Azure auto-provisions a default Data Collection Endpoint (DCE) and default Data Collection Rule (DCR) behind it; you almost always want to wire Grafana’s managed identity for read access, attach diagnostic settings so the workspace’s own platform metrics flow somewhere, and decide whether the query endpoint is public or locked behind a private endpoint. Wrapping all of that in a module means every team gets the same query-access RBAC, the same diagnostics, and the same naming — instead of fifteen slightly-different hand-built workspaces where half forgot to grant Grafana the Monitoring Data Reader role.
This module exposes azurerm_monitor_workspace plus the two sub-resources you reach for on day one: a role assignment that grants a Grafana (or other) principal Monitoring Data Reader on the workspace, and an optional diagnostic setting that ships the workspace’s metrics/logs to Log Analytics.
When to use it
- You are enabling managed Prometheus for AKS and need a destination workspace before you create the cluster’s
azurerm_monitor_data_collection_rule_association. - You run Azure Managed Grafana and want a governed Prometheus data source with RBAC baked in, not click-ops.
- You are consolidating self-hosted Prometheus/Thanos onto a managed backend to cut the operational toil of running stateful Prometheus pods and long-term storage.
- You need per-environment or per-region metric stores (dev/stage/prod, or one per landing zone) created identically from a pipeline.
- You want the query endpoint reachable only over a private endpoint for a regulated workload, and you want that decision enforced as code.
If you only need logs (KQL) rather than Prometheus time-series, use a Log Analytics workspace module instead — azurerm_monitor_workspace is specifically the Prometheus metrics store.
Module structure
terraform-module-azure-monitor-workspace/
├── versions.tf # provider + Terraform version pins
├── main.tf # azurerm_monitor_workspace + RBAC + diagnostics
├── variables.tf # var-driven inputs with validation
└── outputs.tf # id, name, query endpoint, default DCR/DCE ids
# versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
# main.tf
# Regional managed Prometheus metrics store.
resource "azurerm_monitor_workspace" "this" {
name = var.name
resource_group_name = var.resource_group_name
location = var.location
# When false, the query endpoint is reachable only via a private endpoint.
public_network_access_enabled = var.public_network_access_enabled
tags = var.tags
}
# Grant read access (PromQL query) to a set of principals — typically the
# Azure Managed Grafana system-assigned identity, plus any service principals
# that run dashboards or alert evaluation. "Monitoring Data Reader" is the
# least-privilege role for querying metrics from the workspace.
resource "azurerm_role_assignment" "data_reader" {
for_each = var.query_reader_principal_ids
scope = azurerm_monitor_workspace.this.id
role_definition_name = "Monitoring Data Reader"
principal_id = each.value
}
# Optionally ship the workspace's own platform metrics/logs to Log Analytics
# so you can alert on ingestion health and active time-series counts.
resource "azurerm_monitor_diagnostic_setting" "this" {
count = var.diagnostic_log_analytics_workspace_id == null ? 0 : 1
name = "${var.name}-diag"
target_resource_id = azurerm_monitor_workspace.this.id
log_analytics_workspace_id = var.diagnostic_log_analytics_workspace_id
enabled_metric {
category = "AllMetrics"
}
}
# variables.tf
variable "name" {
description = "Name of the Azure Monitor Workspace (Prometheus metrics store)."
type = string
validation {
condition = can(regex("^[a-zA-Z0-9][a-zA-Z0-9._-]{2,43}$", var.name))
error_message = "name must be 3-44 chars, start alphanumeric, and use only letters, digits, '.', '_' or '-'."
}
}
variable "resource_group_name" {
description = "Resource group that will contain the Monitor Workspace."
type = string
}
variable "location" {
description = "Azure region. Managed Prometheus is region-bound; place it near the clusters/agents that write to it."
type = string
}
variable "public_network_access_enabled" {
description = "If true, the query endpoint is reachable over the public internet. Set false to require a private endpoint."
type = bool
default = true
}
variable "query_reader_principal_ids" {
description = "Map of friendly key => principal object ID granted 'Monitoring Data Reader' (e.g. Grafana identity). Keys must be stable across applies."
type = map(string)
default = {}
validation {
condition = alltrue([
for id in values(var.query_reader_principal_ids) :
can(regex("^[0-9a-fA-F-]{36}$", id))
])
error_message = "Every value in query_reader_principal_ids must be a 36-character GUID (principal object ID)."
}
}
variable "diagnostic_log_analytics_workspace_id" {
description = "Optional Log Analytics workspace resource ID to receive the Monitor Workspace's platform metrics. Null disables diagnostics."
type = string
default = null
}
variable "tags" {
description = "Tags applied to the Monitor Workspace."
type = map(string)
default = {}
}
# outputs.tf
output "id" {
description = "Resource ID of the Azure Monitor Workspace."
value = azurerm_monitor_workspace.this.id
}
output "name" {
description = "Name of the Azure Monitor Workspace."
value = azurerm_monitor_workspace.this.name
}
output "query_endpoint" {
description = "PromQL query endpoint URL — use this as the Prometheus data source URL in Grafana."
value = azurerm_monitor_workspace.this.query_endpoint
}
output "default_data_collection_endpoint_id" {
description = "ID of the auto-created default Data Collection Endpoint (DCE) for this workspace."
value = azurerm_monitor_workspace.this.default_data_collection_endpoint_id
}
output "default_data_collection_rule_id" {
description = "ID of the auto-created default Data Collection Rule (DCR). Associate this with AKS to start scraping."
value = azurerm_monitor_workspace.this.default_data_collection_rule_id
}
How to use it
# Reference the existing Grafana instance so we can grant its identity read access.
data "azurerm_dashboard_grafana" "platform" {
name = "kv-grafana-prod"
resource_group_name = "rg-observability-prod"
}
module "monitor_workspace_prometheus_prod" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-workspace?ref=v1.0.0"
name = "kv-amw-prod-weu"
resource_group_name = "rg-observability-prod"
location = "westeurope"
# Keep the query endpoint public for now; flip to false once the
# private endpoint + DNS zone are in place.
public_network_access_enabled = true
query_reader_principal_ids = {
grafana = data.azurerm_dashboard_grafana.platform.identity[0].principal_id
}
diagnostic_log_analytics_workspace_id = azurerm_log_analytics_workspace.platform.id
tags = {
env = "prod"
workload = "observability"
managedBy = "terraform"
}
}
# Downstream: associate the workspace's default DCR with an AKS cluster so
# the managed Prometheus agent starts shipping cluster metrics into it.
resource "azurerm_monitor_data_collection_rule_association" "aks_prometheus" {
name = "amw-prometheus"
target_resource_id = azurerm_kubernetes_cluster.prod.id
data_collection_rule_id = module.monitor_workspace_prometheus_prod.default_data_collection_rule_id
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/monitor_workspace/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-workspace?ref=v1.0.0"
}
inputs = {
name = "..."
resource_group_name = "..."
location = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/monitor_workspace && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Name of the Monitor Workspace; 3-44 chars, validated against the allowed character set. |
resource_group_name |
string |
— | Yes | Resource group that contains the workspace. |
location |
string |
— | Yes | Azure region. Place it close to the clusters/agents that write metrics. |
public_network_access_enabled |
bool |
true |
No | false requires the query endpoint to be reached via a private endpoint. |
query_reader_principal_ids |
map(string) |
{} |
No | Map of key => principal object ID granted Monitoring Data Reader (validated as GUIDs). |
diagnostic_log_analytics_workspace_id |
string |
null |
No | Log Analytics workspace ID for the workspace’s platform metrics; null disables diagnostics. |
tags |
map(string) |
{} |
No | Tags applied to the workspace. |
Outputs
| Name | Description |
|---|---|
id |
Resource ID of the Azure Monitor Workspace. |
name |
Name of the Monitor Workspace. |
query_endpoint |
PromQL query endpoint URL — use as the Prometheus data source in Grafana. |
default_data_collection_endpoint_id |
ID of the auto-created default Data Collection Endpoint (DCE). |
default_data_collection_rule_id |
ID of the auto-created default Data Collection Rule (DCR); associate with AKS to begin scraping. |
Enterprise scenario
A fintech platform team runs twelve AKS clusters across three landing zones and needs unified Prometheus dashboards without operating self-hosted Prometheus/Thanos. They deploy this module once per region from the platform pipeline — kv-amw-prod-weu, -neu, -eus — each granting only the shared Azure Managed Grafana identity Monitoring Data Reader, and each feeding diagnostics into the central Log Analytics workspace for ingestion-health alerts. Every cluster’s data_collection_rule_association points at the matching workspace’s default_data_collection_rule_id, so onboarding a new cluster is a two-line module reference rather than a bespoke observability project.
Best practices
- Grant
Monitoring Data Reader, never Contributor, for query access. Reading PromQL only needs the data-reader role; reserve write/management roles for the pipeline identity that runs Terraform. Pass principals throughquery_reader_principal_idsso RBAC is reviewable in code. - Use the
default_data_collection_rule_idoutput, not a hand-rolled DCR. Azure creates a default DCE/DCR per workspace specifically for managed Prometheus; associating that one with AKS is the supported path and avoids duplicate ingestion. - Keep the workspace in the same region as the clusters writing to it.
azurerm_monitor_workspaceis regional and there is no cross-region ingestion — co-locating with AKS reduces latency and keeps egress off the bill. - Lock the query endpoint down for regulated data. Set
public_network_access_enabled = falseand front the workspace with a private endpoint + private DNS zone so PromQL traffic never traverses the public internet. - Mind cost: managed Prometheus bills per metric sample ingested. Tune scrape intervals and drop high-cardinality labels at the DCR level rather than over-provisioning workspaces; one workspace can back many clusters.
- Name predictably and tag for ownership. A convention like
kv-amw-<env>-<region-short>plusenv/workload/managedBytags makes multi-region observability estates navigable and cost-attributable.