IaC Azure

Terraform Module: Azure Monitor Action Group — reusable on-call notification fan-out

Quick take — Wrap azurerm_monitor_action_group in a reusable Terraform module: email, SMS, webhook, and ITSM receivers with validated short names so every alert in your subscription routes to the right on-call team. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "monitor_action_group" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-action-group?ref=v1.0.0"

  name                = "..."  # Action group name, unique within the resource group (1–…
  resource_group_name = "..."  # Resource group to create the action group in.
  short_name          = "..."  # Sender ID for SMS/email; validated to ≤ 12 chars (Azure…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

An Azure Monitor Action Group is the “what do I do when this fires” half of every alert in Azure. Metric alerts, log search alerts, Service Health alerts, Activity Log alerts, and budget alerts don’t notify anyone on their own — they reference one or more action groups, and the action group is what actually fans the signal out to humans and systems: email, SMS, voice call, push notification to the Azure mobile app, an Azure Function or Logic App, an event hub, an automation runbook, or an ITSM connector into ServiceNow/PagerDixe.

The resource itself is azurerm_monitor_action_group, and on its own it’s deceptively simple — but in practice every team re-types the same fiddly details: a globally-unique-per-resource-group name, a short_name that Azure hard-caps at 12 characters (it’s what shows up as the SMS/email sender), and repeated email_receiver / sms_receiver / webhook_receiver blocks that must stay in sync across dev, staging, and prod. Getting the short_name wrong fails the apply; forgetting use_common_alert_schema = true on a webhook means your downstream automation receives an inconsistent legacy payload.

Wrapping it in a module gives you one validated, list-driven interface. You pass a list of email addresses and a list of SMS contacts; the module expands them into receiver blocks, enforces the 12-character short-name limit at plan time, and emits the action group id that your alert modules consume. Standardising this once means on-call routing is consistent, reviewable in code, and impossible to typo into a broken apply.

When to use it

If you only ever have one or two ad-hoc alerts, an inline receiver is fine — reach for the module once routing becomes a shared concern.

Module structure

terraform-module-azure-monitor-action-group/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

resource "azurerm_monitor_action_group" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  short_name          = var.short_name
  enabled             = var.enabled
  location            = var.location
  tags                = var.tags

  # Email receivers — one block per address.
  dynamic "email_receiver" {
    for_each = { for r in var.email_receivers : r.name => r }
    content {
      name                    = email_receiver.value.name
      email_address           = email_receiver.value.email_address
      use_common_alert_schema = email_receiver.value.use_common_alert_schema
    }
  }

  # SMS receivers — country_code + phone_number.
  dynamic "sms_receiver" {
    for_each = { for r in var.sms_receivers : r.name => r }
    content {
      name         = sms_receiver.value.name
      country_code = sms_receiver.value.country_code
      phone_number = sms_receiver.value.phone_number
    }
  }

  # Azure mobile-app push receivers.
  dynamic "azure_app_push_receiver" {
    for_each = { for r in var.azure_app_push_receivers : r.name => r }
    content {
      name          = azure_app_push_receiver.value.name
      email_address = azure_app_push_receiver.value.email_address
    }
  }

  # Webhook receivers — Logic Apps, Functions, Teams/Slack relays, etc.
  dynamic "webhook_receiver" {
    for_each = { for r in var.webhook_receivers : r.name => r }
    content {
      name                    = webhook_receiver.value.name
      service_uri             = webhook_receiver.value.service_uri
      use_common_alert_schema = webhook_receiver.value.use_common_alert_schema
    }
  }

  # ITSM receivers — ServiceNow / System Center via an ITSM connection.
  dynamic "itsm_receiver" {
    for_each = { for r in var.itsm_receivers : r.name => r }
    content {
      name                 = itsm_receiver.value.name
      workspace_id         = itsm_receiver.value.workspace_id
      connection_id        = itsm_receiver.value.connection_id
      ticket_configuration = itsm_receiver.value.ticket_configuration
      region               = itsm_receiver.value.region
    }
  }
}

variables.tf

variable "name" {
  description = "Name of the action group. Must be unique within the resource group."
  type        = string

  validation {
    condition     = length(var.name) >= 1 && length(var.name) <= 260
    error_message = "name must be between 1 and 260 characters."
  }
}

variable "resource_group_name" {
  description = "Resource group in which to create the action group."
  type        = string
}

variable "short_name" {
  description = "Short name (max 12 chars) used as the sender ID in SMS and email notifications."
  type        = string

  validation {
    condition     = length(var.short_name) >= 1 && length(var.short_name) <= 12
    error_message = "short_name must be between 1 and 12 characters (Azure hard limit)."
  }
}

variable "enabled" {
  description = "Whether the action group is enabled. Disable to mute all receivers without deleting them."
  type        = bool
  default     = true
}

variable "location" {
  description = "Region for the action group resource. 'global' is recommended for resilience."
  type        = string
  default     = "global"
}

variable "email_receivers" {
  description = "List of email receivers. Set use_common_alert_schema true unless a consumer needs the legacy payload."
  type = list(object({
    name                    = string
    email_address           = string
    use_common_alert_schema = optional(bool, true)
  }))
  default = []

  validation {
    condition = alltrue([
      for r in var.email_receivers : can(regex("^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$", r.email_address))
    ])
    error_message = "Each email_receivers[*].email_address must be a valid email address."
  }
}

variable "sms_receivers" {
  description = "List of SMS receivers. country_code is digits only (e.g. '91' for India, '1' for US)."
  type = list(object({
    name         = string
    country_code = string
    phone_number = string
  }))
  default = []

  validation {
    condition = alltrue([
      for r in var.sms_receivers : can(regex("^[0-9]{1,4}$", r.country_code))
    ])
    error_message = "Each sms_receivers[*].country_code must be 1-4 digits, no '+' prefix."
  }
}

variable "azure_app_push_receivers" {
  description = "List of Azure mobile-app push receivers, keyed by the recipient's Azure account email."
  type = list(object({
    name          = string
    email_address = string
  }))
  default = []
}

variable "webhook_receivers" {
  description = "List of webhook receivers (Logic App, Function, or chat relay URLs)."
  type = list(object({
    name                    = string
    service_uri             = string
    use_common_alert_schema = optional(bool, true)
  }))
  default = []

  validation {
    condition = alltrue([
      for r in var.webhook_receivers : can(regex("^https://", r.service_uri))
    ])
    error_message = "Each webhook_receivers[*].service_uri must be an HTTPS URL."
  }
}

variable "itsm_receivers" {
  description = "List of ITSM receivers (ServiceNow / SCSM) wired to a Log Analytics workspace ITSM connection."
  type = list(object({
    name                 = string
    workspace_id         = string
    connection_id        = string
    ticket_configuration = string
    region               = string
  }))
  default = []
}

variable "tags" {
  description = "Tags to apply to the action group."
  type        = map(string)
  default     = {}
}

outputs.tf

output "id" {
  description = "Resource ID of the action group. Reference this from alert rules."
  value       = azurerm_monitor_action_group.this.id
}

output "name" {
  description = "Name of the action group."
  value       = azurerm_monitor_action_group.this.name
}

output "short_name" {
  description = "Short name (sender ID) of the action group."
  value       = azurerm_monitor_action_group.this.short_name
}

output "email_addresses" {
  description = "Email addresses configured on this action group, for documentation/runbooks."
  value       = [for r in var.email_receivers : r.email_address]
}

How to use it

module "monitor_action_group" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-action-group?ref=v1.0.0"

  name                = "ag-platform-oncall-prod"
  resource_group_name = azurerm_resource_group.monitoring.name
  short_name          = "platprod" # <= 12 chars, shown as SMS sender

  email_receivers = [
    { name = "platform-dl", email_address = "platform-oncall@kloudvin.com" },
    { name = "sre-lead", email_address = "vinod@kloudvin.com" },
  ]

  sms_receivers = [
    { name = "primary-oncall", country_code = "91", phone_number = "9876543210" },
  ]

  azure_app_push_receivers = [
    { name = "vinod-mobile", email_address = "vinod@kloudvin.com" },
  ]

  # Relay critical alerts into a Teams channel via a Logic App.
  webhook_receivers = [
    {
      name        = "teams-critical"
      service_uri = azurerm_logic_app_trigger_http_request.teams.callback_url
    },
  ]

  tags = {
    environment = "prod"
    team        = "platform"
    managed_by  = "terraform"
  }
}

# Downstream: a metric alert consuming the module's `id` output.
resource "azurerm_monitor_metric_alert" "high_cpu" {
  name                = "alert-vmss-cpu-high"
  resource_group_name = azurerm_resource_group.monitoring.name
  scopes              = [azurerm_linux_virtual_machine_scale_set.web.id]
  description         = "VMSS average CPU over 85% for 5 minutes."
  severity            = 2
  frequency           = "PT1M"
  window_size         = "PT5M"

  criteria {
    metric_namespace = "Microsoft.Compute/virtualMachineScaleSets"
    metric_name      = "Percentage CPU"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 85
  }

  action {
    action_group_id = module.monitor_action_group.id
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/monitor_action_group/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-action-group?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  short_name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/monitor_action_group && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Action group name, unique within the resource group (1–260 chars).
resource_group_name string Yes Resource group to create the action group in.
short_name string Yes Sender ID for SMS/email; validated to ≤ 12 chars (Azure hard limit).
enabled bool true No Enable/disable all receivers without deleting the group.
location string "global" No Region for the resource; global recommended for resilience.
email_receivers list(object) [] No Email receivers; each email_address is regex-validated, use_common_alert_schema defaults true.
sms_receivers list(object) [] No SMS receivers; country_code validated as 1–4 digits with no +.
azure_app_push_receivers list(object) [] No Azure mobile-app push receivers keyed by account email.
webhook_receivers list(object) [] No Webhook receivers; service_uri validated as HTTPS, use_common_alert_schema defaults true.
itsm_receivers list(object) [] No ITSM receivers wired to a Log Analytics workspace ITSM connection.
tags map(string) {} No Tags applied to the action group.

Outputs

Name Description
id Resource ID of the action group; reference this from alert rules.
name Name of the action group.
short_name Short name (sender ID) of the action group.
email_addresses List of configured email addresses, for runbooks/documentation.

Enterprise scenario

A retail bank runs a landing-zone subscription per environment and mandates that every alert routes through a tiered on-call model. They instantiate this module three times per subscription — ag-sev1-oncall (SMS + voice + ITSM ticket into ServiceNow), ag-sev2-oncall (email + Teams webhook), and ag-sev3-info (email distribution list only) — so a Sev-1 payments outage pages the primary engineer’s phone and opens an incident automatically, while a Sev-3 cert-expiry warning quietly hits a mailbox. Because the itsm_receivers connection IDs and on-call phone numbers live in code, the bank’s audit team can prove who was reachable for any historical incident straight from the Git history.

Best practices

TerraformAzureMonitor Action GroupModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading