Terraform Module: Azure Monitor Action Group — reusable on-call notification fan-out

Quick take — Wrap azurerm_monitor_action_group in a reusable Terraform module: email, SMS, webhook, and ITSM receivers with validated short names so every alert in your subscription routes to the right on-call team. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "monitor_action_group" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-action-group?ref=v1.0.0"

  name                = "..."  # Action group name, unique within the resource group (1–…
  resource_group_name = "..."  # Resource group to create the action group in.
  short_name          = "..."  # Sender ID for SMS/email; validated to ≤ 12 chars (Azure…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

An Azure Monitor Action Group is the “what do I do when this fires” half of every alert in Azure. Metric alerts, log search alerts, Service Health alerts, Activity Log alerts, and budget alerts don’t notify anyone on their own — they reference one or more action groups, and the action group is what actually fans the signal out to humans and systems: email, SMS, voice call, push notification to the Azure mobile app, an Azure Function or Logic App, an event hub, an automation runbook, or an ITSM connector into ServiceNow/PagerDixe.

The resource itself is azurerm_monitor_action_group, and on its own it’s deceptively simple — but in practice every team re-types the same fiddly details: a globally-unique-per-resource-group name, a short_name that Azure hard-caps at 12 characters (it’s what shows up as the SMS/email sender), and repeated email_receiver / sms_receiver / webhook_receiver blocks that must stay in sync across dev, staging, and prod. Getting the short_name wrong fails the apply; forgetting use_common_alert_schema = true on a webhook means your downstream automation receives an inconsistent legacy payload.

Wrapping it in a module gives you one validated, list-driven interface. You pass a list of email addresses and a list of SMS contacts; the module expands them into receiver blocks, enforces the 12-character short-name limit at plan time, and emits the action group id that your alert modules consume. Standardising this once means on-call routing is consistent, reviewable in code, and impossible to typo into a broken apply.

When to use it

You have more than a handful of alerts and want notification targets defined once and referenced everywhere, instead of inline receivers copy-pasted per alert.
You run multiple environments or teams and need each to have its own routing (platform team vs. app team vs. security) with identical structure.
You want on-call rotation as code — adding or removing a person from the pager is a reviewed pull request, not a portal click nobody can audit.
You integrate alerts with downstream automation (Logic Apps for Teams/Slack, Functions for auto-remediation, or an ITSM tool) and need the common alert schema enforced everywhere.
You are building alert modules (metric/log/Activity Log) and need a stable action_group_id output to wire them to.

If you only ever have one or two ad-hoc alerts, an inline receiver is fine — reach for the module once routing becomes a shared concern.

Module structure

terraform-module-azure-monitor-action-group/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

resource "azurerm_monitor_action_group" "this" {
  name                = var.name
  resource_group_name = var.resource_group_name
  short_name          = var.short_name
  enabled             = var.enabled
  location            = var.location
  tags                = var.tags

  # Email receivers — one block per address.
  dynamic "email_receiver" {
    for_each = { for r in var.email_receivers : r.name => r }
    content {
      name                    = email_receiver.value.name
      email_address           = email_receiver.value.email_address
      use_common_alert_schema = email_receiver.value.use_common_alert_schema
    }
  }

  # SMS receivers — country_code + phone_number.
  dynamic "sms_receiver" {
    for_each = { for r in var.sms_receivers : r.name => r }
    content {
      name         = sms_receiver.value.name
      country_code = sms_receiver.value.country_code
      phone_number = sms_receiver.value.phone_number
    }
  }

  # Azure mobile-app push receivers.
  dynamic "azure_app_push_receiver" {
    for_each = { for r in var.azure_app_push_receivers : r.name => r }
    content {
      name          = azure_app_push_receiver.value.name
      email_address = azure_app_push_receiver.value.email_address
    }
  }

  # Webhook receivers — Logic Apps, Functions, Teams/Slack relays, etc.
  dynamic "webhook_receiver" {
    for_each = { for r in var.webhook_receivers : r.name => r }
    content {
      name                    = webhook_receiver.value.name
      service_uri             = webhook_receiver.value.service_uri
      use_common_alert_schema = webhook_receiver.value.use_common_alert_schema
    }
  }

  # ITSM receivers — ServiceNow / System Center via an ITSM connection.
  dynamic "itsm_receiver" {
    for_each = { for r in var.itsm_receivers : r.name => r }
    content {
      name                 = itsm_receiver.value.name
      workspace_id         = itsm_receiver.value.workspace_id
      connection_id        = itsm_receiver.value.connection_id
      ticket_configuration = itsm_receiver.value.ticket_configuration
      region               = itsm_receiver.value.region
    }
  }
}

variables.tf

variable "name" {
  description = "Name of the action group. Must be unique within the resource group."
  type        = string

  validation {
    condition     = length(var.name) >= 1 && length(var.name) <= 260
    error_message = "name must be between 1 and 260 characters."
  }
}

variable "resource_group_name" {
  description = "Resource group in which to create the action group."
  type        = string
}

variable "short_name" {
  description = "Short name (max 12 chars) used as the sender ID in SMS and email notifications."
  type        = string

  validation {
    condition     = length(var.short_name) >= 1 && length(var.short_name) <= 12
    error_message = "short_name must be between 1 and 12 characters (Azure hard limit)."
  }
}

variable "enabled" {
  description = "Whether the action group is enabled. Disable to mute all receivers without deleting them."
  type        = bool
  default     = true
}

variable "location" {
  description = "Region for the action group resource. 'global' is recommended for resilience."
  type        = string
  default     = "global"
}

variable "email_receivers" {
  description = "List of email receivers. Set use_common_alert_schema true unless a consumer needs the legacy payload."
  type = list(object({
    name                    = string
    email_address           = string
    use_common_alert_schema = optional(bool, true)
  }))
  default = []

  validation {
    condition = alltrue([
      for r in var.email_receivers : can(regex("^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$", r.email_address))
    ])
    error_message = "Each email_receivers[*].email_address must be a valid email address."
  }
}

variable "sms_receivers" {
  description = "List of SMS receivers. country_code is digits only (e.g. '91' for India, '1' for US)."
  type = list(object({
    name         = string
    country_code = string
    phone_number = string
  }))
  default = []

  validation {
    condition = alltrue([
      for r in var.sms_receivers : can(regex("^[0-9]{1,4}$", r.country_code))
    ])
    error_message = "Each sms_receivers[*].country_code must be 1-4 digits, no '+' prefix."
  }
}

variable "azure_app_push_receivers" {
  description = "List of Azure mobile-app push receivers, keyed by the recipient's Azure account email."
  type = list(object({
    name          = string
    email_address = string
  }))
  default = []
}

variable "webhook_receivers" {
  description = "List of webhook receivers (Logic App, Function, or chat relay URLs)."
  type = list(object({
    name                    = string
    service_uri             = string
    use_common_alert_schema = optional(bool, true)
  }))
  default = []

  validation {
    condition = alltrue([
      for r in var.webhook_receivers : can(regex("^https://", r.service_uri))
    ])
    error_message = "Each webhook_receivers[*].service_uri must be an HTTPS URL."
  }
}

variable "itsm_receivers" {
  description = "List of ITSM receivers (ServiceNow / SCSM) wired to a Log Analytics workspace ITSM connection."
  type = list(object({
    name                 = string
    workspace_id         = string
    connection_id        = string
    ticket_configuration = string
    region               = string
  }))
  default = []
}

variable "tags" {
  description = "Tags to apply to the action group."
  type        = map(string)
  default     = {}
}

outputs.tf

output "id" {
  description = "Resource ID of the action group. Reference this from alert rules."
  value       = azurerm_monitor_action_group.this.id
}

output "name" {
  description = "Name of the action group."
  value       = azurerm_monitor_action_group.this.name
}

output "short_name" {
  description = "Short name (sender ID) of the action group."
  value       = azurerm_monitor_action_group.this.short_name
}

output "email_addresses" {
  description = "Email addresses configured on this action group, for documentation/runbooks."
  value       = [for r in var.email_receivers : r.email_address]
}

How to use it

module "monitor_action_group" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-action-group?ref=v1.0.0"

  name                = "ag-platform-oncall-prod"
  resource_group_name = azurerm_resource_group.monitoring.name
  short_name          = "platprod" # <= 12 chars, shown as SMS sender

  email_receivers = [
    { name = "platform-dl", email_address = "platform-oncall@kloudvin.com" },
    { name = "sre-lead", email_address = "vinod@kloudvin.com" },
  ]

  sms_receivers = [
    { name = "primary-oncall", country_code = "91", phone_number = "9876543210" },
  ]

  azure_app_push_receivers = [
    { name = "vinod-mobile", email_address = "vinod@kloudvin.com" },
  ]

  # Relay critical alerts into a Teams channel via a Logic App.
  webhook_receivers = [
    {
      name        = "teams-critical"
      service_uri = azurerm_logic_app_trigger_http_request.teams.callback_url
    },
  ]

  tags = {
    environment = "prod"
    team        = "platform"
    managed_by  = "terraform"
  }
}

# Downstream: a metric alert consuming the module's `id` output.
resource "azurerm_monitor_metric_alert" "high_cpu" {
  name                = "alert-vmss-cpu-high"
  resource_group_name = azurerm_resource_group.monitoring.name
  scopes              = [azurerm_linux_virtual_machine_scale_set.web.id]
  description         = "VMSS average CPU over 85% for 5 minutes."
  severity            = 2
  frequency           = "PT1M"
  window_size         = "PT5M"

  criteria {
    metric_namespace = "Microsoft.Compute/virtualMachineScaleSets"
    metric_name      = "Percentage CPU"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 85
  }

  action {
    action_group_id = module.monitor_action_group.id
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module config — live/prod/monitor_action_group/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-action-group?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  short_name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/monitor_action_group && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`name`	`string`	—	Yes	Action group name, unique within the resource group (1–260 chars).
`resource_group_name`	`string`	—	Yes	Resource group to create the action group in.
`short_name`	`string`	—	Yes	Sender ID for SMS/email; validated to ≤ 12 chars (Azure hard limit).
`enabled`	`bool`	`true`	No	Enable/disable all receivers without deleting the group.
`location`	`string`	`"global"`	No	Region for the resource; `global` recommended for resilience.
`email_receivers`	`list(object)`	`[]`	No	Email receivers; each `email_address` is regex-validated, `use_common_alert_schema` defaults `true`.
`sms_receivers`	`list(object)`	`[]`	No	SMS receivers; `country_code` validated as 1–4 digits with no `+`.
`azure_app_push_receivers`	`list(object)`	`[]`	No	Azure mobile-app push receivers keyed by account email.
`webhook_receivers`	`list(object)`	`[]`	No	Webhook receivers; `service_uri` validated as HTTPS, `use_common_alert_schema` defaults `true`.
`itsm_receivers`	`list(object)`	`[]`	No	ITSM receivers wired to a Log Analytics workspace ITSM connection.
`tags`	`map(string)`	`{}`	No	Tags applied to the action group.

Outputs

Name	Description
`id`	Resource ID of the action group; reference this from alert rules.
`name`	Name of the action group.
`short_name`	Short name (sender ID) of the action group.
`email_addresses`	List of configured email addresses, for runbooks/documentation.

Enterprise scenario

A retail bank runs a landing-zone subscription per environment and mandates that every alert routes through a tiered on-call model. They instantiate this module three times per subscription — ag-sev1-oncall (SMS + voice + ITSM ticket into ServiceNow), ag-sev2-oncall (email + Teams webhook), and ag-sev3-info (email distribution list only) — so a Sev-1 payments outage pages the primary engineer’s phone and opens an incident automatically, while a Sev-3 cert-expiry warning quietly hits a mailbox. Because the itsm_receivers connection IDs and on-call phone numbers live in code, the bank’s audit team can prove who was reachable for any historical incident straight from the Git history.

Best practices

Keep short_name meaningful and unique — it’s the only identifier a half-asleep engineer sees on a 3 a.m. SMS. Encode team + environment (platprod, sec-prod) within the 12-character budget; the module fails the plan if you exceed it.
Set use_common_alert_schema = true everywhere (it’s the module default). The common schema gives webhooks, Functions, and Logic Apps a single stable payload shape, so you don’t rewrite parsers when you add new alert types.
Prefer location = "global" so the action group itself isn’t tied to a single region’s availability — notifications still fire if the region hosting your workload is impaired.
Don’t hard-code phone numbers or ITSM connection IDs in .tf files casually — pass them via tfvars from a secured pipeline variable group or Key Vault data source, and never commit real on-call numbers to a public repo.
Use enabled = false to mute, never delete — during a planned maintenance window, flip the flag rather than destroying the group, so alert rules keep their reference intact and you avoid a noisy recreate.
Layer action groups by severity, not by service — one group per on-call tier (Sev-1/2/3) reusing this module scales far better than a bespoke group per resource, and keeps notification fatigue under control.