Quick take — A reusable hashicorp/azurerm ~> 4.0 module for azurerm_traffic_manager_profile: routing-method-driven DNS load balancing, wired endpoint monitoring, and Azure/external endpoints for global multi-region failover. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "traffic_manager" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-traffic-manager?ref=v1.0.0"
name = "..." # Traffic Manager profile resource name (1–63 chars, alph…
resource_group_name = "..." # Resource group that hosts the (global) profile.
dns_relative_name = "..." # Lowercase DNS label under `trafficmanager.net`; must be…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Azure Traffic Manager is a DNS-based traffic load balancer. It does not sit in the data path — clients never proxy packets through it. Instead, when a resolver asks for app.contoso.com, Traffic Manager answers with the DNS record of the best endpoint according to a routing method (Performance, Priority, Weighted, Geographic, MultiValue, or Subnet), and that answer is governed by continuous endpoint health probing. Because it operates at the DNS layer, it works across regions, across clouds, and even with on-premises endpoints — anything with a reachable hostname or public IP.
The catch is that a production-grade profile is rarely a single resource. You almost always need a azurerm_traffic_manager_profile plus one or more endpoint resources (azurerm_traffic_manager_azure_endpoint, ..._external_endpoint, or ..._nested_endpoint), a carefully tuned monitor_config block (path, protocol, port, probe interval, tolerated failures, expected status code ranges), and a dns_config with a sane TTL. Hand-authoring that for every app means copy-pasting the monitor probe settings, forgetting max_return on MultiValue profiles, or shipping a 300-second TTL that makes failover glacial.
This module wraps all of that into one var-driven unit: you declare the routing method and a map of endpoints, and it produces a fully wired profile with health monitoring and the relative DNS name reserved under *.trafficmanager.net. It validates the inputs that Azure silently rejects at apply time, so misconfigurations fail in plan instead of after a five-minute round trip.
When to use it
- Global active-active or active-passive failover across two or more Azure regions (e.g. App Service in West Europe + North Europe) where you want automatic DNS-level cutover when one region’s health probe fails.
- Latency-based routing (
Performance) so users in APAC resolve to your Southeast Asia endpoint while EU users resolve to West Europe — no client-side geo logic. - Hybrid / multi-cloud front doors where some endpoints are Azure resources and others are external (an on-prem datacentre IP, an AWS ALB hostname) — Traffic Manager treats them uniformly.
- Gradual / canary cutovers using
Weightedrouting to send 10% of resolution to a new stack and 90% to the existing one. - Layering under Front Door / Application Gateway when you need regional redundancy below an L7 proxy, or when your protocol isn’t HTTP (Front Door is HTTP-only; Traffic Manager probes TCP too).
Reach for Azure Front Door instead when you need a true L7 reverse proxy with WAF, TLS termination, caching, and instant (non-DNS-TTL-bound) failover for HTTP workloads. Traffic Manager is the right tool when you want global routing for any protocol, want to keep endpoints on their own public IPs, or want the cheapest global-distribution primitive.
Module structure
terraform-module-azure-traffic-manager/
├── versions.tf # provider + Terraform version pins
├── main.tf # profile + endpoint resources
├── variables.tf # var-driven inputs with validation
└── outputs.tf # id, fqdn, endpoint ids
# versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
# main.tf
locals {
# Split the endpoint map by type so each type maps to its own resource.
azure_endpoints = {
for k, v in var.endpoints : k => v if v.type == "azure"
}
external_endpoints = {
for k, v in var.endpoints : k => v if v.type == "external"
}
}
resource "azurerm_traffic_manager_profile" "this" {
name = var.name
resource_group_name = var.resource_group_name
profile_status = var.profile_status
traffic_routing_method = var.traffic_routing_method
# Only meaningful for the MultiValue routing method; null otherwise.
max_return = var.traffic_routing_method == "MultiValue" ? var.max_return : null
dns_config {
relative_name = var.dns_relative_name
ttl = var.dns_ttl
}
monitor_config {
protocol = var.monitor_config.protocol
port = var.monitor_config.port
path = contains(["HTTP", "HTTPS"], var.monitor_config.protocol) ? var.monitor_config.path : null
interval_in_seconds = var.monitor_config.interval_in_seconds
timeout_in_seconds = var.monitor_config.timeout_in_seconds
tolerated_number_of_failures = var.monitor_config.tolerated_number_of_failures
dynamic "expected_status_code_ranges" {
for_each = contains(["HTTP", "HTTPS"], var.monitor_config.protocol) ? var.monitor_config.expected_status_code_ranges : []
content {
min = expected_status_code_ranges.value.min
max = expected_status_code_ranges.value.max
}
}
dynamic "custom_header" {
for_each = var.monitor_config.custom_headers
content {
name = custom_header.value.name
value = custom_header.value.value
}
}
}
tags = var.tags
}
resource "azurerm_traffic_manager_azure_endpoint" "this" {
for_each = local.azure_endpoints
name = each.key
profile_id = azurerm_traffic_manager_profile.this.id
target_resource_id = each.value.target_resource_id
enabled = each.value.enabled
weight = each.value.weight
priority = each.value.priority
geo_mappings = each.value.geo_mappings
endpoint_location = each.value.endpoint_location
dynamic "custom_header" {
for_each = each.value.custom_headers
content {
name = custom_header.value.name
value = custom_header.value.value
}
}
}
resource "azurerm_traffic_manager_external_endpoint" "this" {
for_each = local.external_endpoints
name = each.key
profile_id = azurerm_traffic_manager_profile.this.id
target = each.value.target
enabled = each.value.enabled
weight = each.value.weight
priority = each.value.priority
geo_mappings = each.value.geo_mappings
# External endpoints need an explicit location for Performance routing,
# since Azure can't infer the region from a resource id.
endpoint_location = each.value.endpoint_location
dynamic "custom_header" {
for_each = each.value.custom_headers
content {
name = custom_header.value.name
value = custom_header.value.value
}
}
}
# variables.tf
variable "name" {
description = "Name of the Traffic Manager profile (resource name, not the DNS label)."
type = string
validation {
condition = can(regex("^[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9]$", var.name))
error_message = "name must be 1-63 chars, alphanumeric or hyphen, and not start/end with a hyphen."
}
}
variable "resource_group_name" {
description = "Resource group that hosts the profile. Traffic Manager is a global resource but still lives in an RG."
type = string
}
variable "dns_relative_name" {
description = "DNS label under trafficmanager.net (e.g. 'contoso-prod' => contoso-prod.trafficmanager.net). Must be globally unique."
type = string
validation {
condition = can(regex("^[a-z0-9][a-z0-9-]{0,61}[a-z0-9]$", var.dns_relative_name))
error_message = "dns_relative_name must be lowercase alphanumeric/hyphen and globally unique under trafficmanager.net."
}
}
variable "dns_ttl" {
description = "TTL (seconds) Traffic Manager sets on the DNS responses it hands back. Lower = faster failover, more queries/cost."
type = number
default = 30
validation {
condition = var.dns_ttl >= 0 && var.dns_ttl <= 2147483647
error_message = "dns_ttl must be between 0 and 2147483647 seconds."
}
}
variable "traffic_routing_method" {
description = "Routing method: Performance, Priority, Weighted, Geographic, MultiValue, or Subnet."
type = string
default = "Performance"
validation {
condition = contains(
["Performance", "Priority", "Weighted", "Geographic", "MultiValue", "Subnet"],
var.traffic_routing_method
)
error_message = "traffic_routing_method must be one of Performance, Priority, Weighted, Geographic, MultiValue, Subnet."
}
}
variable "max_return" {
description = "Max number of endpoints returned for a MultiValue profile (2-8). Ignored for other routing methods."
type = number
default = null
validation {
condition = var.max_return == null || (var.max_return >= 1 && var.max_return <= 8)
error_message = "max_return must be between 1 and 8 when set."
}
}
variable "profile_status" {
description = "Whether the profile is Enabled or Disabled. Disabled profiles return NXDOMAIN."
type = string
default = "Enabled"
validation {
condition = contains(["Enabled", "Disabled"], var.profile_status)
error_message = "profile_status must be Enabled or Disabled."
}
}
variable "monitor_config" {
description = "Endpoint health probe settings. path/expected_status_code_ranges only apply to HTTP/HTTPS probes."
type = object({
protocol = optional(string, "HTTPS")
port = optional(number, 443)
path = optional(string, "/")
interval_in_seconds = optional(number, 30)
timeout_in_seconds = optional(number, 10)
tolerated_number_of_failures = optional(number, 3)
expected_status_code_ranges = optional(list(object({ min = number, max = number })), [{ min = 200, max = 299 }])
custom_headers = optional(list(object({ name = string, value = string })), [])
})
default = {}
validation {
condition = contains(["HTTP", "HTTPS", "TCP"], upper(var.monitor_config.protocol))
error_message = "monitor_config.protocol must be HTTP, HTTPS, or TCP."
}
validation {
# Azure only permits a 10s interval ("fast probing") when failures tolerated <= 9 and timeout <= 9.
condition = (
var.monitor_config.interval_in_seconds != 10 ||
(var.monitor_config.timeout_in_seconds <= 9)
)
error_message = "With a 10s probe interval (fast probing), timeout_in_seconds must be <= 9."
}
validation {
condition = var.monitor_config.tolerated_number_of_failures >= 0 && var.monitor_config.tolerated_number_of_failures <= 9
error_message = "monitor_config.tolerated_number_of_failures must be between 0 and 9."
}
}
variable "endpoints" {
description = <<-EOT
Map of endpoints keyed by endpoint name. 'type' is "azure" or "external".
- azure endpoints set target_resource_id (App Service, Public IP, etc.).
- external endpoints set target (FQDN or IP) and should set endpoint_location for Performance routing.
weight (1-1000) is used by Weighted routing; priority (1-1000, unique) by Priority routing;
geo_mappings by Geographic routing.
EOT
type = map(object({
type = string
target_resource_id = optional(string)
target = optional(string)
enabled = optional(bool, true)
weight = optional(number, 1)
priority = optional(number)
endpoint_location = optional(string)
geo_mappings = optional(list(string))
custom_headers = optional(list(object({ name = string, value = string })), [])
}))
default = {}
validation {
condition = alltrue([for k, v in var.endpoints : contains(["azure", "external"], v.type)])
error_message = "Each endpoint 'type' must be either \"azure\" or \"external\"."
}
validation {
condition = alltrue([for k, v in var.endpoints : v.type != "azure" || v.target_resource_id != null])
error_message = "Azure endpoints must set target_resource_id."
}
validation {
condition = alltrue([for k, v in var.endpoints : v.type != "external" || v.target != null])
error_message = "External endpoints must set target (an FQDN or IP)."
}
validation {
condition = alltrue([for k, v in var.endpoints : v.weight == null || (v.weight >= 1 && v.weight <= 1000)])
error_message = "endpoint weight must be between 1 and 1000."
}
}
variable "tags" {
description = "Tags applied to the profile."
type = map(string)
default = {}
}
# outputs.tf
output "id" {
description = "Resource ID of the Traffic Manager profile."
value = azurerm_traffic_manager_profile.this.id
}
output "name" {
description = "Name of the Traffic Manager profile."
value = azurerm_traffic_manager_profile.this.name
}
output "fqdn" {
description = "Public FQDN of the profile (e.g. contoso-prod.trafficmanager.net) — CNAME your custom domain to this."
value = azurerm_traffic_manager_profile.this.fqdn
}
output "azure_endpoint_ids" {
description = "Map of Azure endpoint name => resource id."
value = { for k, v in azurerm_traffic_manager_azure_endpoint.this : k => v.id }
}
output "external_endpoint_ids" {
description = "Map of external endpoint name => resource id."
value = { for k, v in azurerm_traffic_manager_external_endpoint.this : k => v.id }
}
How to use it
module "traffic_manager" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-traffic-manager?ref=v1.0.0"
name = "tm-shop-prod"
resource_group_name = azurerm_resource_group.global.name
dns_relative_name = "kloudvin-shop-prod" # => kloudvin-shop-prod.trafficmanager.net
traffic_routing_method = "Performance"
dns_ttl = 30
monitor_config = {
protocol = "HTTPS"
port = 443
path = "/healthz"
interval_in_seconds = 30
timeout_in_seconds = 10
tolerated_number_of_failures = 3
expected_status_code_ranges = [{ min = 200, max = 299 }]
custom_headers = [{ name = "Host", value = "shop.kloudvin.com" }]
}
endpoints = {
"weu-app" = {
type = "azure"
target_resource_id = azurerm_linux_web_app.weu.id
endpoint_location = "westeurope"
priority = 1
}
"neu-app" = {
type = "azure"
target_resource_id = azurerm_linux_web_app.neu.id
endpoint_location = "northeurope"
priority = 2
}
# An on-prem datacentre kept as a last-resort external endpoint.
"dc-onprem" = {
type = "external"
target = "shop-dr.contoso.internal.example.com"
endpoint_location = "uksouth"
enabled = true
}
}
tags = {
environment = "prod"
workload = "shop"
managed_by = "terraform"
}
}
# Downstream: CNAME a custom apex/subdomain at the profile FQDN.
resource "azurerm_dns_cname_record" "shop" {
name = "shop"
zone_name = azurerm_dns_zone.kloudvin.name
resource_group_name = azurerm_resource_group.dns.name
ttl = 60
record = module.traffic_manager.fqdn # kloudvin-shop-prod.trafficmanager.net
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/traffic_manager/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-traffic-manager?ref=v1.0.0"
}
inputs = {
name = "..."
resource_group_name = "..."
dns_relative_name = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/traffic_manager && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Traffic Manager profile resource name (1–63 chars, alphanumeric/hyphen). |
resource_group_name |
string |
— | Yes | Resource group that hosts the (global) profile. |
dns_relative_name |
string |
— | Yes | Lowercase DNS label under trafficmanager.net; must be globally unique. |
dns_ttl |
number |
30 |
No | TTL in seconds for the DNS answers Traffic Manager returns (0–2147483647). |
traffic_routing_method |
string |
"Performance" |
No | One of Performance, Priority, Weighted, Geographic, MultiValue, Subnet. |
max_return |
number |
null |
No | Endpoints returned for a MultiValue profile (1–8); ignored otherwise. |
profile_status |
string |
"Enabled" |
No | Enabled or Disabled (disabled profiles return NXDOMAIN). |
monitor_config |
object |
{} |
No | Probe settings: protocol (HTTP/HTTPS/TCP), port, path, interval_in_seconds, timeout_in_seconds, tolerated_number_of_failures, expected_status_code_ranges, custom_headers. |
endpoints |
map(object) |
{} |
No | Endpoints keyed by name. Each has type (azure/external), target_resource_id or target, enabled, weight, priority, endpoint_location, geo_mappings, custom_headers. |
tags |
map(string) |
{} |
No | Tags applied to the profile. |
Outputs
| Name | Description |
|---|---|
id |
Resource ID of the Traffic Manager profile. |
name |
Name of the Traffic Manager profile. |
fqdn |
Public FQDN (<label>.trafficmanager.net) to CNAME your custom domain at. |
azure_endpoint_ids |
Map of Azure endpoint name → resource id. |
external_endpoint_ids |
Map of external endpoint name → resource id. |
Enterprise scenario
A retail platform runs its checkout API as identical App Service stacks in West Europe and North Europe behind a single shop.kloudvin.com. The team consumes this module with Priority routing — West Europe at priority 1, North Europe at priority 2, and an on-prem DR site as a disabled external endpoint they enable only during a regional outage. The monitor_config probes /healthz over HTTPS every 30 seconds with three tolerated failures, so a dead region is pulled from DNS within ~90 seconds, and the 30-second dns_ttl keeps client cutover fast without flooding the resolvers. Because the module is versioned in Azure Repos, the same pattern is reused verbatim for the cart, search, and account services by changing only the name, dns_relative_name, and endpoint targets.
Best practices
- Tune the TTL/probe trade-off deliberately. Failover latency is roughly
(interval × tolerated_failures) + dns_ttl. A 30s TTL with a 30s interval and 3 failures is a sane default; dropping to a 10s “fast probe” interval cuts detection time but increases probe traffic and requirestimeout_in_seconds <= 9— the module validates this for you. - Probe a real health endpoint, not
/. Pointmonitor_config.pathat a/healthzroute that checks downstream dependencies, and constrainexpected_status_code_rangesto200–299so a 503 from a degraded-but-listening instance correctly marks the endpoint Degraded and pulls it from rotation. - Set
endpoint_locationon external endpoints whenever you usePerformancerouting — Azure can infer the region for Azure endpoints from their resource id, but external endpoints need it explicitly or latency routing degrades to round-robin. - Layer Traffic Manager above regional load balancers, not instances. Point endpoints at App Service, Front Door, or Application Gateway in each region rather than at individual VMs, so intra-region failover is handled locally and Traffic Manager only decides between regions.
- Mind the cost model. Traffic Manager bills per million DNS queries and per monitored endpoint (with a surcharge for fast/10s probing and external endpoints). Keep the TTL high enough to cache aggressively, and disable rather than delete standby endpoints so you stop paying for their probes while preserving the config.
- Use a stable, environment-scoped
dns_relative_name. The*.trafficmanager.netlabel is globally unique and is what your custom-domain CNAME and downstream verification target, so bake the environment and workload into it (tm-shop-prod) and never reuse a freed label casually — DNS caches and CNAMEs will follow it.