Terraform Module: Azure Data Explorer (Kusto) — opinionated clusters with hot-cache-tuned databases

Quick take — A reusable hashicorp/azurerm ~> 4.0 module for Azure Data Explorer: SKU and autoscale, system-assigned identity, double encryption, per-database hot cache and retention, RBAC database principals, and an Event Hub ingestion connection wired into clean outputs. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "data_explorer" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-explorer?ref=v1.0.0"

  cluster_name        = "..."  # Globally unique cluster name (4-22 lowercase alphanumer…
  resource_group_name = "..."  # Resource group for the cluster and databases.
  location            = "..."  # Azure region, e.g. `centralindia`.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Data Explorer (ADX, internally Kusto) is a fully managed, columnar, time-series analytics engine — the store-and-query brain behind Log Analytics, Application Insights, and Microsoft Sentinel, exposed for your own data via the Kusto Query Language (KQL). You provision a cluster (the compute + SSD-cache tier, billed per VM-hour) and inside it one or more databases, each with its own hot-cache window and retention (soft-delete) period. Data lands either by streaming from Event Hubs / IoT Hub, by LightIngest batch loads, or by queued ingestion, and you query it interactively over billions of rows in sub-second time.

The raw resource graph rewards getting a handful of decisions right and punishes the rest: the cluster sku couples a VM family to a cache-disk size and a price point, auto_scale versus a fixed capacity is mutually exclusive, double encryption and the public-network toggle are immutable after creation, and a database’s hot_cache_period must never exceed its soft_delete_period or queries silently fall back to cold blob storage. Wrapping azurerm_kusto_cluster + azurerm_kusto_database — plus the RBAC and ingestion sub-resources every team ends up needing — in one reviewed, tagged, version-pinned module bakes those rules in so each workload ships a correctly-sized, least-privilege cluster instead of copy-pasting a block that hot-caches 30 days into a 7-day database.

When to use it

You need interactive, ad-hoc analytics over telemetry, logs, metrics, or IoT time-series at a scale where a SQL database or Log Analytics workspace gets slow or expensive.
You are building an observability or security data lake and want KQL parity with Sentinel/Log Analytics but with your own retention, cost, and schema control.
You want to stream from Event Hubs / IoT Hub straight into queryable tables with a managed data connection rather than a custom consumer.
You need per-database hot-cache and retention tuning — e.g. 31 days hot for the live dashboard, 2 years cold for compliance — under one cluster’s compute.
You want every cluster to carry consistent SKU, autoscale bounds, managed-identity, encryption, and database-scoped RBAC enforced by code review, not portal clicks.

Reach for a different tool when your workload is transactional, needs row-level updates/deletes, or is sub-gigabyte and infrequently queried — that is Azure SQL, Cosmos DB, or a plain Log Analytics workspace, not ADX.

Module structure

terraform-module-azure-data-explorer/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

locals {
  # Autoscale and a fixed node count are mutually exclusive on the cluster.
  autoscale_enabled = var.auto_scale != null

  # Build a flat map of {db_name => {db_name, principal}} so RBAC assignments
  # can be created with a single for_each across every database/principal pair.
  database_principals = merge([
    for db_name, db in var.databases : {
      for p in db.principals :
      "${db_name}/${p.tenant_id}/${p.object_id}/${p.role}" => {
        database_name  = db_name
        principal_id   = p.object_id
        principal_type = p.principal_type
        role           = p.role
        tenant_id      = p.tenant_id
      }
    }
  ]...)
}

resource "azurerm_kusto_cluster" "this" {
  name                = var.cluster_name
  resource_group_name = var.resource_group_name
  location            = var.location

  sku {
    name     = var.sku_name
    capacity = local.autoscale_enabled ? null : var.capacity
  }

  dynamic "optimized_auto_scale" {
    for_each = local.autoscale_enabled ? [var.auto_scale] : []
    content {
      minimum_instances = optimized_auto_scale.value.minimum_instances
      maximum_instances = optimized_auto_scale.value.maximum_instances
    }
  }

  identity {
    type = "SystemAssigned"
  }

  # Immutable after creation — set deliberately at first apply.
  double_encryption_enabled     = var.double_encryption_enabled
  disk_encryption_enabled       = var.disk_encryption_enabled
  public_network_access_enabled = var.public_network_access_enabled
  auto_stop_enabled             = var.auto_stop_enabled
  streaming_ingestion_enabled   = var.streaming_ingestion_enabled
  purge_enabled                 = var.purge_enabled
  zones                         = var.availability_zones

  tags = var.tags
}

resource "azurerm_kusto_database" "this" {
  for_each = var.databases

  name                = each.key
  resource_group_name = var.resource_group_name
  location            = var.location
  cluster_name        = azurerm_kusto_cluster.this.name

  hot_cache_period   = each.value.hot_cache_period
  soft_delete_period = each.value.soft_delete_period
}

# Database-scoped RBAC (Admin / Ingestor / Viewer / etc.) for Entra principals.
resource "azurerm_kusto_database_principal_assignment" "this" {
  for_each = local.database_principals

  name                = replace(each.key, "/", "-")
  resource_group_name = var.resource_group_name
  cluster_name        = azurerm_kusto_cluster.this.name
  database_name       = azurerm_kusto_database.this[each.value.database_name].name

  tenant_id      = each.value.tenant_id
  principal_id   = each.value.principal_id
  principal_type = each.value.principal_type
  role           = each.value.role
}

# Optional managed ingestion from Event Hub straight into a table.
resource "azurerm_kusto_eventhub_data_connection" "this" {
  for_each = var.eventhub_connections

  name                = each.key
  resource_group_name = var.resource_group_name
  location            = var.location
  cluster_name        = azurerm_kusto_cluster.this.name
  database_name       = azurerm_kusto_database.this[each.value.database_name].name

  eventhub_id    = each.value.eventhub_id
  consumer_group = each.value.consumer_group

  table_name        = each.value.table_name
  mapping_rule_name = each.value.mapping_rule_name
  data_format       = each.value.data_format
  compression       = each.value.compression
  identity_id       = azurerm_kusto_cluster.this.id
}

variables.tf

variable "cluster_name" {
  description = "Globally unique ADX cluster name (4-22 chars, lowercase letters and numbers, must start with a letter)."
  type        = string

  validation {
    condition     = can(regex("^[a-z][a-z0-9]{3,21}$", var.cluster_name))
    error_message = "cluster_name must be 4-22 chars, start with a lowercase letter, and contain only lowercase letters and digits."
  }
}

variable "resource_group_name" {
  description = "Resource group that will contain the cluster, databases, and connections."
  type        = string
}

variable "location" {
  description = "Azure region for the cluster, e.g. centralindia."
  type        = string
}

variable "sku_name" {
  description = "Cluster SKU (VM family + cache disk). Dev/Test SKUs have no SLA. e.g. Standard_E2ads_v5, Standard_D13_v2."
  type        = string
  default     = "Standard_E2ads_v5"

  validation {
    condition     = can(regex("^(Dev\\(No SLA\\)_)?Standard_", var.sku_name))
    error_message = "sku_name must be a valid Kusto SKU such as Standard_E2ads_v5 or Dev(No SLA)_Standard_E2a_v4."
  }
}

variable "capacity" {
  description = "Fixed node count when autoscale is disabled. Ignored if auto_scale is set."
  type        = number
  default     = 2

  validation {
    condition     = var.capacity >= 1 && var.capacity <= 1000
    error_message = "capacity must be between 1 and 1000."
  }
}

variable "auto_scale" {
  description = "Optional optimized autoscale. When set, capacity is ignored and the cluster scales between the bounds."
  type = object({
    minimum_instances = number
    maximum_instances = number
  })
  default = null

  validation {
    condition = var.auto_scale == null || (
      var.auto_scale.minimum_instances >= 2 &&
      var.auto_scale.maximum_instances >= var.auto_scale.minimum_instances
    )
    error_message = "auto_scale.minimum_instances must be >= 2 and maximum_instances >= minimum_instances."
  }
}

variable "availability_zones" {
  description = "Availability zones to spread cluster nodes across, e.g. [\"1\", \"2\", \"3\"]. Empty for zone-agnostic."
  type        = list(string)
  default     = []
}

variable "double_encryption_enabled" {
  description = "Enable infrastructure (double) encryption. IMMUTABLE after creation."
  type        = bool
  default     = true
}

variable "disk_encryption_enabled" {
  description = "Encrypt the cluster's data disks."
  type        = bool
  default     = true
}

variable "public_network_access_enabled" {
  description = "Allow access over the public endpoint. IMMUTABLE — set false when fronting with Private Endpoint."
  type        = bool
  default     = false
}

variable "streaming_ingestion_enabled" {
  description = "Enable low-latency streaming ingestion (required for sub-second Event Hub ingest)."
  type        = bool
  default     = true
}

variable "purge_enabled" {
  description = "Allow hard data purges (GDPR/right-to-erasure). Off by default."
  type        = bool
  default     = false
}

variable "auto_stop_enabled" {
  description = "Auto-stop the cluster after a period of inactivity to save cost (dev/test friendly)."
  type        = bool
  default     = false
}

variable "databases" {
  description = "Map of database name => settings. hot_cache_period must be <= soft_delete_period. Use ISO 8601 durations (e.g. P31D)."
  type = map(object({
    hot_cache_period   = optional(string, "P31D")
    soft_delete_period = optional(string, "P365D")
    principals = optional(list(object({
      object_id      = string
      tenant_id      = string
      principal_type = string # User | Group | App
      role           = string # Admin | Ingestor | Monitor | User | UnrestrictedViewer | Viewer
    })), [])
  }))
  default = {}
}

variable "eventhub_connections" {
  description = "Map of connection name => Event Hub ingestion settings landing events into a database table."
  type = map(object({
    database_name     = string
    eventhub_id       = string
    consumer_group    = optional(string, "$Default")
    table_name        = optional(string)
    mapping_rule_name = optional(string)
    data_format       = optional(string, "JSON")
    compression       = optional(string, "None")
  }))
  default = {}
}

variable "tags" {
  description = "Tags applied to the cluster."
  type        = map(string)
  default     = {}
}

outputs.tf

output "cluster_id" {
  description = "Resource ID of the ADX cluster (use for RBAC, diagnostic settings, Private Endpoint)."
  value       = azurerm_kusto_cluster.this.id
}

output "cluster_name" {
  description = "Name of the ADX cluster."
  value       = azurerm_kusto_cluster.this.name
}

output "cluster_uri" {
  description = "Query endpoint URI, e.g. https://<name>.<region>.kusto.windows.net."
  value       = azurerm_kusto_cluster.this.uri
}

output "data_ingestion_uri" {
  description = "Ingestion endpoint URI for queued/batch ingestion clients."
  value       = azurerm_kusto_cluster.this.data_ingestion_uri
}

output "identity_principal_id" {
  description = "Object ID of the cluster's system-assigned identity (grant it Event Hub / Storage data roles)."
  value       = azurerm_kusto_cluster.this.identity[0].principal_id
}

output "database_ids" {
  description = "Map of database name to resource ID."
  value       = { for k, db in azurerm_kusto_database.this : k => db.id }
}

output "database_names" {
  description = "List of database names created in the cluster."
  value       = keys(azurerm_kusto_database.this)
}

output "eventhub_connection_ids" {
  description = "Map of Event Hub data-connection name to resource ID."
  value       = { for k, c in azurerm_kusto_eventhub_data_connection.this : k => c.id }
}

How to use it

module "data_explorer_kusto_observability" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-explorer?ref=v1.0.0"

  cluster_name        = "adxobsprodcin"
  resource_group_name = azurerm_resource_group.analytics.name
  location            = azurerm_resource_group.analytics.location

  sku_name = "Standard_E4ads_v5"

  # Scale between 2 and 6 nodes on demand instead of a flat node count.
  auto_scale = {
    minimum_instances = 2
    maximum_instances = 6
  }

  availability_zones            = ["1", "2", "3"]
  double_encryption_enabled     = true
  public_network_access_enabled = false # fronted by a Private Endpoint
  streaming_ingestion_enabled   = true

  databases = {
    "telemetry" = {
      hot_cache_period   = "P31D"  # 31 days hot for live dashboards
      soft_delete_period = "P365D" # 1 year total retention
      principals = [
        {
          object_id      = azuread_group.observability_admins.object_id
          tenant_id      = data.azurerm_client_config.current.tenant_id
          principal_type = "Group"
          role           = "Admin"
        },
        {
          object_id      = azuread_group.dashboard_readers.object_id
          tenant_id      = data.azurerm_client_config.current.tenant_id
          principal_type = "Group"
          role           = "Viewer"
        }
      ]
    }
    "audit" = {
      hot_cache_period   = "P7D"
      soft_delete_period = "P730D" # 2 years for compliance, mostly cold
    }
  }

  # Stream raw telemetry from Event Hub straight into the RawEvents table.
  eventhub_connections = {
    "telemetry-stream" = {
      database_name     = "telemetry"
      eventhub_id       = module.event_hub.eventhub_id
      consumer_group    = "adx-ingest"
      table_name        = "RawEvents"
      mapping_rule_name = "RawEvents_mapping"
      data_format       = "JSON"
    }
  }

  tags = {
    workload    = "observability"
    environment = "prod"
    owner       = "data-platform"
  }
}

# Downstream: the cluster identity needs to read the Event Hub it ingests from.
resource "azurerm_role_assignment" "adx_eventhub_receiver" {
  scope                = module.event_hub.namespace_id
  role_definition_name = "Azure Event Hubs Data Receiver"
  principal_id         = module.data_explorer_kusto_observability.identity_principal_id
}

# Downstream: ship a Function App's logs into the same cluster via diagnostics,
# referencing the cluster_id output.
resource "azurerm_monitor_diagnostic_setting" "fn_to_adx" {
  name                           = "fn-to-adx"
  target_resource_id             = azurerm_linux_function_app.api.id
  log_analytics_workspace_id     = azurerm_log_analytics_workspace.hub.id
  log_analytics_destination_type = "Dedicated"

  enabled_log {
    category_group = "allLogs"
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module config — live/prod/data_explorer/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-data-explorer?ref=v1.0.0"
}

inputs = {
  cluster_name = "..."
  resource_group_name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/data_explorer && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`cluster_name`	`string`	—	Yes	Globally unique cluster name (4-22 lowercase alphanumeric, validated).
`resource_group_name`	`string`	—	Yes	Resource group for the cluster and databases.
`location`	`string`	—	Yes	Azure region, e.g. `centralindia`.
`sku_name`	`string`	`"Standard_E2ads_v5"`	No	VM family + cache disk SKU. Dev/Test SKUs have no SLA.
`capacity`	`number`	`2`	No	Fixed node count (1-1000) when autoscale is off.
`auto_scale`	`object`	`null`	No	Optimized autoscale bounds (`minimum_instances` >= 2). Overrides `capacity`.
`availability_zones`	`list(string)`	`[]`	No	Zones to spread nodes across, e.g. `["1","2","3"]`.
`double_encryption_enabled`	`bool`	`true`	No	Infrastructure (double) encryption. Immutable after creation.
`disk_encryption_enabled`	`bool`	`true`	No	Encrypt the cluster data disks.
`public_network_access_enabled`	`bool`	`false`	No	Allow public endpoint. Immutable; pair `false` with Private Endpoint.
`streaming_ingestion_enabled`	`bool`	`true`	No	Enable low-latency streaming ingestion.
`purge_enabled`	`bool`	`false`	No	Allow hard data purges (GDPR erasure).
`auto_stop_enabled`	`bool`	`false`	No	Auto-stop on inactivity to save cost (dev/test).
`databases`	`map(object)`	`{}`	No	Per-DB hot cache, soft delete, and RBAC principals.
`eventhub_connections`	`map(object)`	`{}`	No	Event Hub ingestion connections into DB tables.
`tags`	`map(string)`	`{}`	No	Tags applied to the cluster.

Outputs

Name	Description
`cluster_id`	Resource ID of the cluster (RBAC, diagnostics, Private Endpoint).
`cluster_name`	Name of the cluster.
`cluster_uri`	KQL query endpoint URI (`https://<name>.<region>.kusto.windows.net`).
`data_ingestion_uri`	Ingestion endpoint URI for queued/batch clients.
`identity_principal_id`	Object ID of the cluster’s system-assigned identity.
`database_ids`	Map of database name to resource ID.
`database_names`	List of database names in the cluster.
`eventhub_connection_ids`	Map of Event Hub connection name to resource ID.

Enterprise scenario

A fintech runs a fraud-and-observability platform on adxobsprodcin, a 3-zone, autoscaling Standard_E4ads_v5 cluster. The module provisions a telemetry database (31 days hot for the live Grafana/ADX dashboards, 1 year retained) fed by a managed Event Hub connection that lands ~80 GB/day of transaction events into a RawEvents table, plus a separate audit database kept 2 years for the regulator but only 7 days hot to keep cache cost down. Database-scoped RBAC grants the SRE group Admin and analysts Viewer via Entra groups — no standing portal access — while the cluster’s system-assigned identity is the only principal allowed to read the source Event Hub, so the entire hot path runs without a single shared secret.

Best practices

Match hot_cache_period to your real query window, not your retention. Hot cache (SSD) is the expensive part of ADX; soft_delete_period (cold blob) is cheap. Keep 7-31 days hot for dashboards and let the long tail age into cold storage — the module validates nothing here, so set them deliberately and never let hot exceed soft-delete.
Use optimized autoscale instead of a flat node count. Set auto_scale with a floor of 2 (for SLA) and a ceiling sized to peak ingest/query; you pay for nodes by the second, so a tight min/max band beats over-provisioning a fixed capacity you only need at month-end.
Lock the cluster down and prefer the managed identity. Provision with public_network_access_enabled = false plus a Private Endpoint, and grant the cluster’s identity_principal_id the Azure Event Hubs Data Receiver / Storage Blob Data Reader roles on its sources rather than embedding connection strings — the Event Hub data connection uses identity_id for exactly this.
Right-size the SKU family to the workload. The E-series (Standard_E*ads_v5) is memory/compute-balanced for typical telemetry; pick storage-optimised L-series only for cache-heavy, query-bound workloads, and use a Dev(No SLA) SKU plus auto_stop_enabled = true for non-prod to cut idle spend.
Set encryption and zones at creation — they are immutable. double_encryption_enabled, public_network_access_enabled, and availability_zones cannot be flipped in place; getting them right on the first apply avoids a destroy/recreate and a full data reload.
Name and tag for cost attribution. Use a CAF-style adx-prefixed lowercase name (no hyphens — the SKU/DNS rules forbid them) and apply workload/environment/owner tags so per-cluster VM-hour spend lands in the right cost centre.