IaC Azure

Terraform Module: Azure Traffic Manager — DNS-based global load balancing with health-checked endpoints

Quick take — A reusable hashicorp/azurerm ~> 4.0 module for azurerm_traffic_manager_profile: routing-method-driven DNS load balancing, wired endpoint monitoring, and Azure/external endpoints for global multi-region failover. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "traffic_manager" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-traffic-manager?ref=v1.0.0"

  name                = "..."  # Traffic Manager profile resource name (1–63 chars, alph…
  resource_group_name = "..."  # Resource group that hosts the (global) profile.
  dns_relative_name   = "..."  # Lowercase DNS label under `trafficmanager.net`; must be…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Traffic Manager is a DNS-based traffic load balancer. It does not sit in the data path — clients never proxy packets through it. Instead, when a resolver asks for app.contoso.com, Traffic Manager answers with the DNS record of the best endpoint according to a routing method (Performance, Priority, Weighted, Geographic, MultiValue, or Subnet), and that answer is governed by continuous endpoint health probing. Because it operates at the DNS layer, it works across regions, across clouds, and even with on-premises endpoints — anything with a reachable hostname or public IP.

The catch is that a production-grade profile is rarely a single resource. You almost always need a azurerm_traffic_manager_profile plus one or more endpoint resources (azurerm_traffic_manager_azure_endpoint, ..._external_endpoint, or ..._nested_endpoint), a carefully tuned monitor_config block (path, protocol, port, probe interval, tolerated failures, expected status code ranges), and a dns_config with a sane TTL. Hand-authoring that for every app means copy-pasting the monitor probe settings, forgetting max_return on MultiValue profiles, or shipping a 300-second TTL that makes failover glacial.

This module wraps all of that into one var-driven unit: you declare the routing method and a map of endpoints, and it produces a fully wired profile with health monitoring and the relative DNS name reserved under *.trafficmanager.net. It validates the inputs that Azure silently rejects at apply time, so misconfigurations fail in plan instead of after a five-minute round trip.

When to use it

Reach for Azure Front Door instead when you need a true L7 reverse proxy with WAF, TLS termination, caching, and instant (non-DNS-TTL-bound) failover for HTTP workloads. Traffic Manager is the right tool when you want global routing for any protocol, want to keep endpoints on their own public IPs, or want the cheapest global-distribution primitive.

Module structure

terraform-module-azure-traffic-manager/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # profile + endpoint resources
├── variables.tf     # var-driven inputs with validation
└── outputs.tf       # id, fqdn, endpoint ids
# versions.tf
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}
# main.tf

locals {
  # Split the endpoint map by type so each type maps to its own resource.
  azure_endpoints = {
    for k, v in var.endpoints : k => v if v.type == "azure"
  }
  external_endpoints = {
    for k, v in var.endpoints : k => v if v.type == "external"
  }
}

resource "azurerm_traffic_manager_profile" "this" {
  name                   = var.name
  resource_group_name    = var.resource_group_name
  profile_status         = var.profile_status
  traffic_routing_method = var.traffic_routing_method
  # Only meaningful for the MultiValue routing method; null otherwise.
  max_return = var.traffic_routing_method == "MultiValue" ? var.max_return : null

  dns_config {
    relative_name = var.dns_relative_name
    ttl           = var.dns_ttl
  }

  monitor_config {
    protocol                     = var.monitor_config.protocol
    port                         = var.monitor_config.port
    path                         = contains(["HTTP", "HTTPS"], var.monitor_config.protocol) ? var.monitor_config.path : null
    interval_in_seconds          = var.monitor_config.interval_in_seconds
    timeout_in_seconds           = var.monitor_config.timeout_in_seconds
    tolerated_number_of_failures = var.monitor_config.tolerated_number_of_failures

    dynamic "expected_status_code_ranges" {
      for_each = contains(["HTTP", "HTTPS"], var.monitor_config.protocol) ? var.monitor_config.expected_status_code_ranges : []
      content {
        min = expected_status_code_ranges.value.min
        max = expected_status_code_ranges.value.max
      }
    }

    dynamic "custom_header" {
      for_each = var.monitor_config.custom_headers
      content {
        name  = custom_header.value.name
        value = custom_header.value.value
      }
    }
  }

  tags = var.tags
}

resource "azurerm_traffic_manager_azure_endpoint" "this" {
  for_each = local.azure_endpoints

  name                 = each.key
  profile_id           = azurerm_traffic_manager_profile.this.id
  target_resource_id   = each.value.target_resource_id
  enabled              = each.value.enabled
  weight               = each.value.weight
  priority             = each.value.priority
  geo_mappings         = each.value.geo_mappings
  endpoint_location    = each.value.endpoint_location

  dynamic "custom_header" {
    for_each = each.value.custom_headers
    content {
      name  = custom_header.value.name
      value = custom_header.value.value
    }
  }
}

resource "azurerm_traffic_manager_external_endpoint" "this" {
  for_each = local.external_endpoints

  name              = each.key
  profile_id        = azurerm_traffic_manager_profile.this.id
  target            = each.value.target
  enabled           = each.value.enabled
  weight            = each.value.weight
  priority          = each.value.priority
  geo_mappings      = each.value.geo_mappings
  # External endpoints need an explicit location for Performance routing,
  # since Azure can't infer the region from a resource id.
  endpoint_location = each.value.endpoint_location

  dynamic "custom_header" {
    for_each = each.value.custom_headers
    content {
      name  = custom_header.value.name
      value = custom_header.value.value
    }
  }
}
# variables.tf

variable "name" {
  description = "Name of the Traffic Manager profile (resource name, not the DNS label)."
  type        = string

  validation {
    condition     = can(regex("^[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9]$", var.name))
    error_message = "name must be 1-63 chars, alphanumeric or hyphen, and not start/end with a hyphen."
  }
}

variable "resource_group_name" {
  description = "Resource group that hosts the profile. Traffic Manager is a global resource but still lives in an RG."
  type        = string
}

variable "dns_relative_name" {
  description = "DNS label under trafficmanager.net (e.g. 'contoso-prod' => contoso-prod.trafficmanager.net). Must be globally unique."
  type        = string

  validation {
    condition     = can(regex("^[a-z0-9][a-z0-9-]{0,61}[a-z0-9]$", var.dns_relative_name))
    error_message = "dns_relative_name must be lowercase alphanumeric/hyphen and globally unique under trafficmanager.net."
  }
}

variable "dns_ttl" {
  description = "TTL (seconds) Traffic Manager sets on the DNS responses it hands back. Lower = faster failover, more queries/cost."
  type        = number
  default     = 30

  validation {
    condition     = var.dns_ttl >= 0 && var.dns_ttl <= 2147483647
    error_message = "dns_ttl must be between 0 and 2147483647 seconds."
  }
}

variable "traffic_routing_method" {
  description = "Routing method: Performance, Priority, Weighted, Geographic, MultiValue, or Subnet."
  type        = string
  default     = "Performance"

  validation {
    condition = contains(
      ["Performance", "Priority", "Weighted", "Geographic", "MultiValue", "Subnet"],
      var.traffic_routing_method
    )
    error_message = "traffic_routing_method must be one of Performance, Priority, Weighted, Geographic, MultiValue, Subnet."
  }
}

variable "max_return" {
  description = "Max number of endpoints returned for a MultiValue profile (2-8). Ignored for other routing methods."
  type        = number
  default     = null

  validation {
    condition     = var.max_return == null || (var.max_return >= 1 && var.max_return <= 8)
    error_message = "max_return must be between 1 and 8 when set."
  }
}

variable "profile_status" {
  description = "Whether the profile is Enabled or Disabled. Disabled profiles return NXDOMAIN."
  type        = string
  default     = "Enabled"

  validation {
    condition     = contains(["Enabled", "Disabled"], var.profile_status)
    error_message = "profile_status must be Enabled or Disabled."
  }
}

variable "monitor_config" {
  description = "Endpoint health probe settings. path/expected_status_code_ranges only apply to HTTP/HTTPS probes."
  type = object({
    protocol                     = optional(string, "HTTPS")
    port                         = optional(number, 443)
    path                         = optional(string, "/")
    interval_in_seconds          = optional(number, 30)
    timeout_in_seconds           = optional(number, 10)
    tolerated_number_of_failures = optional(number, 3)
    expected_status_code_ranges  = optional(list(object({ min = number, max = number })), [{ min = 200, max = 299 }])
    custom_headers               = optional(list(object({ name = string, value = string })), [])
  })
  default = {}

  validation {
    condition     = contains(["HTTP", "HTTPS", "TCP"], upper(var.monitor_config.protocol))
    error_message = "monitor_config.protocol must be HTTP, HTTPS, or TCP."
  }

  validation {
    # Azure only permits a 10s interval ("fast probing") when failures tolerated <= 9 and timeout <= 9.
    condition = (
      var.monitor_config.interval_in_seconds != 10 ||
      (var.monitor_config.timeout_in_seconds <= 9)
    )
    error_message = "With a 10s probe interval (fast probing), timeout_in_seconds must be <= 9."
  }

  validation {
    condition     = var.monitor_config.tolerated_number_of_failures >= 0 && var.monitor_config.tolerated_number_of_failures <= 9
    error_message = "monitor_config.tolerated_number_of_failures must be between 0 and 9."
  }
}

variable "endpoints" {
  description = <<-EOT
    Map of endpoints keyed by endpoint name. 'type' is "azure" or "external".
    - azure endpoints set target_resource_id (App Service, Public IP, etc.).
    - external endpoints set target (FQDN or IP) and should set endpoint_location for Performance routing.
    weight (1-1000) is used by Weighted routing; priority (1-1000, unique) by Priority routing;
    geo_mappings by Geographic routing.
  EOT
  type = map(object({
    type               = string
    target_resource_id = optional(string)
    target             = optional(string)
    enabled            = optional(bool, true)
    weight             = optional(number, 1)
    priority           = optional(number)
    endpoint_location  = optional(string)
    geo_mappings       = optional(list(string))
    custom_headers     = optional(list(object({ name = string, value = string })), [])
  }))
  default = {}

  validation {
    condition     = alltrue([for k, v in var.endpoints : contains(["azure", "external"], v.type)])
    error_message = "Each endpoint 'type' must be either \"azure\" or \"external\"."
  }

  validation {
    condition     = alltrue([for k, v in var.endpoints : v.type != "azure" || v.target_resource_id != null])
    error_message = "Azure endpoints must set target_resource_id."
  }

  validation {
    condition     = alltrue([for k, v in var.endpoints : v.type != "external" || v.target != null])
    error_message = "External endpoints must set target (an FQDN or IP)."
  }

  validation {
    condition     = alltrue([for k, v in var.endpoints : v.weight == null || (v.weight >= 1 && v.weight <= 1000)])
    error_message = "endpoint weight must be between 1 and 1000."
  }
}

variable "tags" {
  description = "Tags applied to the profile."
  type        = map(string)
  default     = {}
}
# outputs.tf

output "id" {
  description = "Resource ID of the Traffic Manager profile."
  value       = azurerm_traffic_manager_profile.this.id
}

output "name" {
  description = "Name of the Traffic Manager profile."
  value       = azurerm_traffic_manager_profile.this.name
}

output "fqdn" {
  description = "Public FQDN of the profile (e.g. contoso-prod.trafficmanager.net) — CNAME your custom domain to this."
  value       = azurerm_traffic_manager_profile.this.fqdn
}

output "azure_endpoint_ids" {
  description = "Map of Azure endpoint name => resource id."
  value       = { for k, v in azurerm_traffic_manager_azure_endpoint.this : k => v.id }
}

output "external_endpoint_ids" {
  description = "Map of external endpoint name => resource id."
  value       = { for k, v in azurerm_traffic_manager_external_endpoint.this : k => v.id }
}

How to use it

module "traffic_manager" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-traffic-manager?ref=v1.0.0"

  name                   = "tm-shop-prod"
  resource_group_name    = azurerm_resource_group.global.name
  dns_relative_name      = "kloudvin-shop-prod" # => kloudvin-shop-prod.trafficmanager.net
  traffic_routing_method = "Performance"
  dns_ttl                = 30

  monitor_config = {
    protocol                     = "HTTPS"
    port                         = 443
    path                         = "/healthz"
    interval_in_seconds          = 30
    timeout_in_seconds           = 10
    tolerated_number_of_failures = 3
    expected_status_code_ranges  = [{ min = 200, max = 299 }]
    custom_headers               = [{ name = "Host", value = "shop.kloudvin.com" }]
  }

  endpoints = {
    "weu-app" = {
      type               = "azure"
      target_resource_id = azurerm_linux_web_app.weu.id
      endpoint_location  = "westeurope"
      priority           = 1
    }
    "neu-app" = {
      type               = "azure"
      target_resource_id = azurerm_linux_web_app.neu.id
      endpoint_location  = "northeurope"
      priority           = 2
    }
    # An on-prem datacentre kept as a last-resort external endpoint.
    "dc-onprem" = {
      type              = "external"
      target            = "shop-dr.contoso.internal.example.com"
      endpoint_location = "uksouth"
      enabled           = true
    }
  }

  tags = {
    environment = "prod"
    workload    = "shop"
    managed_by  = "terraform"
  }
}

# Downstream: CNAME a custom apex/subdomain at the profile FQDN.
resource "azurerm_dns_cname_record" "shop" {
  name                = "shop"
  zone_name           = azurerm_dns_zone.kloudvin.name
  resource_group_name = azurerm_resource_group.dns.name
  ttl                 = 60
  record              = module.traffic_manager.fqdn # kloudvin-shop-prod.trafficmanager.net
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/traffic_manager/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-traffic-manager?ref=v1.0.0"
}

inputs = {
  name = "..."
  resource_group_name = "..."
  dns_relative_name = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/traffic_manager && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Traffic Manager profile resource name (1–63 chars, alphanumeric/hyphen).
resource_group_name string Yes Resource group that hosts the (global) profile.
dns_relative_name string Yes Lowercase DNS label under trafficmanager.net; must be globally unique.
dns_ttl number 30 No TTL in seconds for the DNS answers Traffic Manager returns (0–2147483647).
traffic_routing_method string "Performance" No One of Performance, Priority, Weighted, Geographic, MultiValue, Subnet.
max_return number null No Endpoints returned for a MultiValue profile (1–8); ignored otherwise.
profile_status string "Enabled" No Enabled or Disabled (disabled profiles return NXDOMAIN).
monitor_config object {} No Probe settings: protocol (HTTP/HTTPS/TCP), port, path, interval_in_seconds, timeout_in_seconds, tolerated_number_of_failures, expected_status_code_ranges, custom_headers.
endpoints map(object) {} No Endpoints keyed by name. Each has type (azure/external), target_resource_id or target, enabled, weight, priority, endpoint_location, geo_mappings, custom_headers.
tags map(string) {} No Tags applied to the profile.

Outputs

Name Description
id Resource ID of the Traffic Manager profile.
name Name of the Traffic Manager profile.
fqdn Public FQDN (<label>.trafficmanager.net) to CNAME your custom domain at.
azure_endpoint_ids Map of Azure endpoint name → resource id.
external_endpoint_ids Map of external endpoint name → resource id.

Enterprise scenario

A retail platform runs its checkout API as identical App Service stacks in West Europe and North Europe behind a single shop.kloudvin.com. The team consumes this module with Priority routing — West Europe at priority 1, North Europe at priority 2, and an on-prem DR site as a disabled external endpoint they enable only during a regional outage. The monitor_config probes /healthz over HTTPS every 30 seconds with three tolerated failures, so a dead region is pulled from DNS within ~90 seconds, and the 30-second dns_ttl keeps client cutover fast without flooding the resolvers. Because the module is versioned in Azure Repos, the same pattern is reused verbatim for the cart, search, and account services by changing only the name, dns_relative_name, and endpoint targets.

Best practices

TerraformAzureTraffic ManagerModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading