Quick take — Wrap azurerm_monitor_action_group in a reusable Terraform module: email, SMS, webhook, and ITSM receivers with validated short names so every alert in your subscription routes to the right on-call team. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "monitor_action_group" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-action-group?ref=v1.0.0"
name = "..." # Action group name, unique within the resource group (1–…
resource_group_name = "..." # Resource group to create the action group in.
short_name = "..." # Sender ID for SMS/email; validated to ≤ 12 chars (Azure…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
An Azure Monitor Action Group is the “what do I do when this fires” half of every alert in Azure. Metric alerts, log search alerts, Service Health alerts, Activity Log alerts, and budget alerts don’t notify anyone on their own — they reference one or more action groups, and the action group is what actually fans the signal out to humans and systems: email, SMS, voice call, push notification to the Azure mobile app, an Azure Function or Logic App, an event hub, an automation runbook, or an ITSM connector into ServiceNow/PagerDixe.
The resource itself is azurerm_monitor_action_group, and on its own it’s deceptively simple — but in practice every team re-types the same fiddly details: a globally-unique-per-resource-group name, a short_name that Azure hard-caps at 12 characters (it’s what shows up as the SMS/email sender), and repeated email_receiver / sms_receiver / webhook_receiver blocks that must stay in sync across dev, staging, and prod. Getting the short_name wrong fails the apply; forgetting use_common_alert_schema = true on a webhook means your downstream automation receives an inconsistent legacy payload.
Wrapping it in a module gives you one validated, list-driven interface. You pass a list of email addresses and a list of SMS contacts; the module expands them into receiver blocks, enforces the 12-character short-name limit at plan time, and emits the action group id that your alert modules consume. Standardising this once means on-call routing is consistent, reviewable in code, and impossible to typo into a broken apply.
When to use it
- You have more than a handful of alerts and want notification targets defined once and referenced everywhere, instead of inline receivers copy-pasted per alert.
- You run multiple environments or teams and need each to have its own routing (platform team vs. app team vs. security) with identical structure.
- You want on-call rotation as code — adding or removing a person from the pager is a reviewed pull request, not a portal click nobody can audit.
- You integrate alerts with downstream automation (Logic Apps for Teams/Slack, Functions for auto-remediation, or an ITSM tool) and need the common alert schema enforced everywhere.
- You are building alert modules (metric/log/Activity Log) and need a stable
action_group_idoutput to wire them to.
If you only ever have one or two ad-hoc alerts, an inline receiver is fine — reach for the module once routing becomes a shared concern.
Module structure
terraform-module-azure-monitor-action-group/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
main.tf
resource "azurerm_monitor_action_group" "this" {
name = var.name
resource_group_name = var.resource_group_name
short_name = var.short_name
enabled = var.enabled
location = var.location
tags = var.tags
# Email receivers — one block per address.
dynamic "email_receiver" {
for_each = { for r in var.email_receivers : r.name => r }
content {
name = email_receiver.value.name
email_address = email_receiver.value.email_address
use_common_alert_schema = email_receiver.value.use_common_alert_schema
}
}
# SMS receivers — country_code + phone_number.
dynamic "sms_receiver" {
for_each = { for r in var.sms_receivers : r.name => r }
content {
name = sms_receiver.value.name
country_code = sms_receiver.value.country_code
phone_number = sms_receiver.value.phone_number
}
}
# Azure mobile-app push receivers.
dynamic "azure_app_push_receiver" {
for_each = { for r in var.azure_app_push_receivers : r.name => r }
content {
name = azure_app_push_receiver.value.name
email_address = azure_app_push_receiver.value.email_address
}
}
# Webhook receivers — Logic Apps, Functions, Teams/Slack relays, etc.
dynamic "webhook_receiver" {
for_each = { for r in var.webhook_receivers : r.name => r }
content {
name = webhook_receiver.value.name
service_uri = webhook_receiver.value.service_uri
use_common_alert_schema = webhook_receiver.value.use_common_alert_schema
}
}
# ITSM receivers — ServiceNow / System Center via an ITSM connection.
dynamic "itsm_receiver" {
for_each = { for r in var.itsm_receivers : r.name => r }
content {
name = itsm_receiver.value.name
workspace_id = itsm_receiver.value.workspace_id
connection_id = itsm_receiver.value.connection_id
ticket_configuration = itsm_receiver.value.ticket_configuration
region = itsm_receiver.value.region
}
}
}
variables.tf
variable "name" {
description = "Name of the action group. Must be unique within the resource group."
type = string
validation {
condition = length(var.name) >= 1 && length(var.name) <= 260
error_message = "name must be between 1 and 260 characters."
}
}
variable "resource_group_name" {
description = "Resource group in which to create the action group."
type = string
}
variable "short_name" {
description = "Short name (max 12 chars) used as the sender ID in SMS and email notifications."
type = string
validation {
condition = length(var.short_name) >= 1 && length(var.short_name) <= 12
error_message = "short_name must be between 1 and 12 characters (Azure hard limit)."
}
}
variable "enabled" {
description = "Whether the action group is enabled. Disable to mute all receivers without deleting them."
type = bool
default = true
}
variable "location" {
description = "Region for the action group resource. 'global' is recommended for resilience."
type = string
default = "global"
}
variable "email_receivers" {
description = "List of email receivers. Set use_common_alert_schema true unless a consumer needs the legacy payload."
type = list(object({
name = string
email_address = string
use_common_alert_schema = optional(bool, true)
}))
default = []
validation {
condition = alltrue([
for r in var.email_receivers : can(regex("^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$", r.email_address))
])
error_message = "Each email_receivers[*].email_address must be a valid email address."
}
}
variable "sms_receivers" {
description = "List of SMS receivers. country_code is digits only (e.g. '91' for India, '1' for US)."
type = list(object({
name = string
country_code = string
phone_number = string
}))
default = []
validation {
condition = alltrue([
for r in var.sms_receivers : can(regex("^[0-9]{1,4}$", r.country_code))
])
error_message = "Each sms_receivers[*].country_code must be 1-4 digits, no '+' prefix."
}
}
variable "azure_app_push_receivers" {
description = "List of Azure mobile-app push receivers, keyed by the recipient's Azure account email."
type = list(object({
name = string
email_address = string
}))
default = []
}
variable "webhook_receivers" {
description = "List of webhook receivers (Logic App, Function, or chat relay URLs)."
type = list(object({
name = string
service_uri = string
use_common_alert_schema = optional(bool, true)
}))
default = []
validation {
condition = alltrue([
for r in var.webhook_receivers : can(regex("^https://", r.service_uri))
])
error_message = "Each webhook_receivers[*].service_uri must be an HTTPS URL."
}
}
variable "itsm_receivers" {
description = "List of ITSM receivers (ServiceNow / SCSM) wired to a Log Analytics workspace ITSM connection."
type = list(object({
name = string
workspace_id = string
connection_id = string
ticket_configuration = string
region = string
}))
default = []
}
variable "tags" {
description = "Tags to apply to the action group."
type = map(string)
default = {}
}
outputs.tf
output "id" {
description = "Resource ID of the action group. Reference this from alert rules."
value = azurerm_monitor_action_group.this.id
}
output "name" {
description = "Name of the action group."
value = azurerm_monitor_action_group.this.name
}
output "short_name" {
description = "Short name (sender ID) of the action group."
value = azurerm_monitor_action_group.this.short_name
}
output "email_addresses" {
description = "Email addresses configured on this action group, for documentation/runbooks."
value = [for r in var.email_receivers : r.email_address]
}
How to use it
module "monitor_action_group" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-action-group?ref=v1.0.0"
name = "ag-platform-oncall-prod"
resource_group_name = azurerm_resource_group.monitoring.name
short_name = "platprod" # <= 12 chars, shown as SMS sender
email_receivers = [
{ name = "platform-dl", email_address = "platform-oncall@kloudvin.com" },
{ name = "sre-lead", email_address = "vinod@kloudvin.com" },
]
sms_receivers = [
{ name = "primary-oncall", country_code = "91", phone_number = "9876543210" },
]
azure_app_push_receivers = [
{ name = "vinod-mobile", email_address = "vinod@kloudvin.com" },
]
# Relay critical alerts into a Teams channel via a Logic App.
webhook_receivers = [
{
name = "teams-critical"
service_uri = azurerm_logic_app_trigger_http_request.teams.callback_url
},
]
tags = {
environment = "prod"
team = "platform"
managed_by = "terraform"
}
}
# Downstream: a metric alert consuming the module's `id` output.
resource "azurerm_monitor_metric_alert" "high_cpu" {
name = "alert-vmss-cpu-high"
resource_group_name = azurerm_resource_group.monitoring.name
scopes = [azurerm_linux_virtual_machine_scale_set.web.id]
description = "VMSS average CPU over 85% for 5 minutes."
severity = 2
frequency = "PT1M"
window_size = "PT5M"
criteria {
metric_namespace = "Microsoft.Compute/virtualMachineScaleSets"
metric_name = "Percentage CPU"
aggregation = "Average"
operator = "GreaterThan"
threshold = 85
}
action {
action_group_id = module.monitor_action_group.id
}
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/monitor_action_group/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-monitor-action-group?ref=v1.0.0"
}
inputs = {
name = "..."
resource_group_name = "..."
short_name = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/monitor_action_group && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Action group name, unique within the resource group (1–260 chars). |
resource_group_name |
string |
— | Yes | Resource group to create the action group in. |
short_name |
string |
— | Yes | Sender ID for SMS/email; validated to ≤ 12 chars (Azure hard limit). |
enabled |
bool |
true |
No | Enable/disable all receivers without deleting the group. |
location |
string |
"global" |
No | Region for the resource; global recommended for resilience. |
email_receivers |
list(object) |
[] |
No | Email receivers; each email_address is regex-validated, use_common_alert_schema defaults true. |
sms_receivers |
list(object) |
[] |
No | SMS receivers; country_code validated as 1–4 digits with no +. |
azure_app_push_receivers |
list(object) |
[] |
No | Azure mobile-app push receivers keyed by account email. |
webhook_receivers |
list(object) |
[] |
No | Webhook receivers; service_uri validated as HTTPS, use_common_alert_schema defaults true. |
itsm_receivers |
list(object) |
[] |
No | ITSM receivers wired to a Log Analytics workspace ITSM connection. |
tags |
map(string) |
{} |
No | Tags applied to the action group. |
Outputs
| Name | Description |
|---|---|
id |
Resource ID of the action group; reference this from alert rules. |
name |
Name of the action group. |
short_name |
Short name (sender ID) of the action group. |
email_addresses |
List of configured email addresses, for runbooks/documentation. |
Enterprise scenario
A retail bank runs a landing-zone subscription per environment and mandates that every alert routes through a tiered on-call model. They instantiate this module three times per subscription — ag-sev1-oncall (SMS + voice + ITSM ticket into ServiceNow), ag-sev2-oncall (email + Teams webhook), and ag-sev3-info (email distribution list only) — so a Sev-1 payments outage pages the primary engineer’s phone and opens an incident automatically, while a Sev-3 cert-expiry warning quietly hits a mailbox. Because the itsm_receivers connection IDs and on-call phone numbers live in code, the bank’s audit team can prove who was reachable for any historical incident straight from the Git history.
Best practices
- Keep
short_namemeaningful and unique — it’s the only identifier a half-asleep engineer sees on a 3 a.m. SMS. Encode team + environment (platprod,sec-prod) within the 12-character budget; the module fails the plan if you exceed it. - Set
use_common_alert_schema = trueeverywhere (it’s the module default). The common schema gives webhooks, Functions, and Logic Apps a single stable payload shape, so you don’t rewrite parsers when you add new alert types. - Prefer
location = "global"so the action group itself isn’t tied to a single region’s availability — notifications still fire if the region hosting your workload is impaired. - Don’t hard-code phone numbers or ITSM connection IDs in
.tffiles casually — pass them viatfvarsfrom a secured pipeline variable group or Key Vault data source, and never commit real on-call numbers to a public repo. - Use
enabled = falseto mute, never delete — during a planned maintenance window, flip the flag rather than destroying the group, so alert rules keep their reference intact and you avoid a noisy recreate. - Layer action groups by severity, not by service — one group per on-call tier (Sev-1/2/3) reusing this module scales far better than a bespoke group per resource, and keeps notification fatigue under control.