IaC Azure

Terraform Module: Azure Network Watcher — One enabled regional instance, no accidental duplicates

Quick take — A reusable hashicorp/azurerm ~> 4.0 module for azurerm_network_watcher that pins one Network Watcher per region, wires in flow logs and connection monitors, and keeps the auto-created NetworkWatcherRG under control. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "network_watcher" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-network-watcher?ref=v1.0.0"

  name     = "..."  # Name of the Network Watcher (e.g. `nw-eastus-prod`). On…
  location = "..."  # Azure region. A region holds only one Network Watcher p…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Network Watcher is the regional service behind packet-level network diagnostics: NSG flow logs, VNet flow logs, connection monitors, IP flow verify, next-hop, packet capture, and the topology view. The catch is that it is a singleton per region per subscription. When you create a virtual network in a region, Azure silently auto-provisions a Network Watcher named NetworkWatcher_<region> inside a resource group called NetworkWatcherRG. If your Terraform also tries to create one, you get either a “already exists” import conflict or a duplicate sitting in the wrong resource group.

This module wraps azurerm_network_watcher so that your IaC owns exactly one explicit, named, tagged instance per region — instead of leaving Azure’s implicit one floating around unmanaged. It optionally provisions the resource group that holds it, attaches a Storage Account-backed NSG flow log with traffic analytics, and stands up a connection monitor so reachability checks are codified rather than clicked together in the portal. Wrapping it in a module means the “one per region” rule, the naming convention, and the flow-log retention policy are enforced identically across every subscription in the landing zone.

When to use it

Module structure

terraform-module-azure-network-watcher/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.6.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

# Optionally own the resource group that holds the Network Watcher.
# Azure auto-creates "NetworkWatcherRG"; set create_resource_group = false
# and point resource_group_name at it to adopt the existing one instead.
resource "azurerm_resource_group" "this" {
  count = var.create_resource_group ? 1 : 0

  name     = var.resource_group_name
  location = var.location
  tags     = var.tags
}

locals {
  resource_group_name = var.create_resource_group ? azurerm_resource_group.this[0].name : var.resource_group_name
}

# The singleton-per-region Network Watcher. Only ONE may exist per region per
# subscription, so this module is intended to be called once per region.
resource "azurerm_network_watcher" "this" {
  name                = var.name
  location            = var.location
  resource_group_name = local.resource_group_name
  tags                = var.tags
}

# NSG flow log (optional). Requires an existing Storage Account in the same
# region, and a Log Analytics workspace if traffic analytics is enabled.
resource "azurerm_network_watcher_flow_log" "this" {
  for_each = var.flow_logs

  name                      = each.value.name
  network_watcher_name      = azurerm_network_watcher.this.name
  resource_group_name       = local.resource_group_name
  network_security_group_id = each.value.network_security_group_id
  storage_account_id        = each.value.storage_account_id
  enabled                   = each.value.enabled
  version                   = each.value.version
  tags                      = var.tags

  retention_policy {
    enabled = each.value.retention_enabled
    days    = each.value.retention_days
  }

  dynamic "traffic_analytics" {
    for_each = each.value.traffic_analytics == null ? [] : [each.value.traffic_analytics]
    content {
      enabled               = traffic_analytics.value.enabled
      workspace_id          = traffic_analytics.value.workspace_id
      workspace_region      = traffic_analytics.value.workspace_region
      workspace_resource_id = traffic_analytics.value.workspace_resource_id
      interval_in_minutes   = traffic_analytics.value.interval_in_minutes
    }
  }
}

# Connection monitor (optional) for codified synthetic reachability tests.
resource "azurerm_network_connection_monitor" "this" {
  for_each = var.connection_monitors

  name               = each.value.name
  network_watcher_id = azurerm_network_watcher.this.id
  location           = var.location
  tags               = var.tags

  dynamic "endpoint" {
    for_each = each.value.endpoints
    content {
      name               = endpoint.value.name
      target_resource_id = endpoint.value.target_resource_id
      address            = endpoint.value.address
    }
  }

  dynamic "test_configuration" {
    for_each = each.value.test_configurations
    content {
      name                      = test_configuration.value.name
      protocol                  = test_configuration.value.protocol
      test_frequency_in_seconds = test_configuration.value.test_frequency_in_seconds

      dynamic "tcp_configuration" {
        for_each = test_configuration.value.tcp_port == null ? [] : [test_configuration.value.tcp_port]
        content {
          port = tcp_configuration.value
        }
      }

      dynamic "http_configuration" {
        for_each = test_configuration.value.http_method == null ? [] : [test_configuration.value.http_method]
        content {
          method = http_configuration.value
        }
      }
    }
  }

  dynamic "test_group" {
    for_each = each.value.test_groups
    content {
      name                     = test_group.value.name
      destination_endpoints    = test_group.value.destination_endpoints
      source_endpoints         = test_group.value.source_endpoints
      test_configuration_names = test_group.value.test_configuration_names
      enabled                  = test_group.value.enabled
    }
  }
}

variables.tf

variable "name" {
  type        = string
  description = "Name of the Network Watcher (e.g. nw-eastus-prod). One per region per subscription."

  validation {
    condition     = length(var.name) >= 1 && length(var.name) <= 80
    error_message = "name must be between 1 and 80 characters."
  }
}

variable "location" {
  type        = string
  description = "Azure region for the Network Watcher. A region may hold only one Network Watcher per subscription."
}

variable "resource_group_name" {
  type        = string
  description = "Resource group that holds the Network Watcher. Azure's implicit watcher uses 'NetworkWatcherRG'."
  default     = "NetworkWatcherRG"
}

variable "create_resource_group" {
  type        = bool
  description = "Create the resource group (true) or adopt an existing one such as NetworkWatcherRG (false)."
  default     = true
}

variable "tags" {
  type        = map(string)
  description = "Tags applied to the resource group, watcher, flow logs, and connection monitors."
  default     = {}
}

variable "flow_logs" {
  description = "Map of NSG flow logs to create. Each requires a Storage Account in the same region as the watcher."
  type = map(object({
    name                      = string
    network_security_group_id = string
    storage_account_id        = string
    enabled                   = optional(bool, true)
    version                   = optional(number, 2)
    retention_enabled         = optional(bool, true)
    retention_days            = optional(number, 90)
    traffic_analytics = optional(object({
      enabled               = optional(bool, true)
      workspace_id          = string
      workspace_region      = string
      workspace_resource_id = string
      interval_in_minutes   = optional(number, 10)
    }))
  }))
  default = {}

  validation {
    condition     = alltrue([for fl in values(var.flow_logs) : contains([1, 2], fl.version)])
    error_message = "flow_logs[*].version must be 1 or 2."
  }

  validation {
    condition     = alltrue([for fl in values(var.flow_logs) : fl.retention_days >= 0 && fl.retention_days <= 365])
    error_message = "flow_logs[*].retention_days must be between 0 and 365."
  }

  validation {
    condition = alltrue([
      for fl in values(var.flow_logs) :
      fl.traffic_analytics == null ? true : contains([10, 60], fl.traffic_analytics.interval_in_minutes)
    ])
    error_message = "traffic_analytics.interval_in_minutes must be 10 or 60."
  }
}

variable "connection_monitors" {
  description = "Map of connection monitors for synthetic reachability tests."
  type = map(object({
    name = string
    endpoints = list(object({
      name               = string
      target_resource_id = optional(string)
      address            = optional(string)
    }))
    test_configurations = list(object({
      name                      = string
      protocol                  = string
      test_frequency_in_seconds = optional(number, 60)
      tcp_port                  = optional(number)
      http_method               = optional(string)
    }))
    test_groups = list(object({
      name                     = string
      destination_endpoints    = list(string)
      source_endpoints         = list(string)
      test_configuration_names = list(string)
      enabled                  = optional(bool, true)
    }))
  }))
  default = {}

  validation {
    condition = alltrue(flatten([
      for cm in values(var.connection_monitors) : [
        for tc in cm.test_configurations : contains(["Tcp", "Http", "Icmp"], tc.protocol)
      ]
    ]))
    error_message = "test_configurations[*].protocol must be one of Tcp, Http, or Icmp."
  }
}

outputs.tf

output "id" {
  description = "Resource ID of the Network Watcher."
  value       = azurerm_network_watcher.this.id
}

output "name" {
  description = "Name of the Network Watcher."
  value       = azurerm_network_watcher.this.name
}

output "location" {
  description = "Region of the Network Watcher."
  value       = azurerm_network_watcher.this.location
}

output "resource_group_name" {
  description = "Resource group that holds the Network Watcher."
  value       = local.resource_group_name
}

output "flow_log_ids" {
  description = "Map of flow-log keys to their resource IDs."
  value       = { for k, fl in azurerm_network_watcher_flow_log.this : k => fl.id }
}

output "connection_monitor_ids" {
  description = "Map of connection-monitor keys to their resource IDs."
  value       = { for k, cm in azurerm_network_connection_monitor.this : k => cm.id }
}

How to use it

module "network_watcher" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-network-watcher?ref=v1.0.0"

  name                  = "nw-eastus-prod"
  location              = "eastus"
  resource_group_name   = "rg-network-watcher-prod"
  create_resource_group = true

  flow_logs = {
    hub_nsg = {
      name                      = "fl-hub-nsg"
      network_security_group_id = azurerm_network_security_group.hub.id
      storage_account_id        = azurerm_storage_account.flowlogs.id
      retention_days            = 90
      traffic_analytics = {
        workspace_id          = azurerm_log_analytics_workspace.security.workspace_id
        workspace_region      = "eastus"
        workspace_resource_id = azurerm_log_analytics_workspace.security.id
        interval_in_minutes   = 10
      }
    }
  }

  connection_monitors = {
    hub_to_onprem = {
      name = "cm-hub-to-onprem"
      endpoints = [
        {
          name               = "hub-vm"
          target_resource_id = azurerm_linux_virtual_machine.hub_probe.id
        },
        {
          name    = "onprem-dns"
          address = "10.50.0.10"
        }
      ]
      test_configurations = [
        {
          name                      = "dns-tcp-53"
          protocol                  = "Tcp"
          tcp_port                  = 53
          test_frequency_in_seconds = 30
        }
      ]
      test_groups = [
        {
          name                     = "tg-onprem-dns"
          source_endpoints         = ["hub-vm"]
          destination_endpoints    = ["onprem-dns"]
          test_configuration_names = ["dns-tcp-53"]
        }
      ]
    }
  }

  tags = {
    environment = "prod"
    owner       = "platform-network"
  }
}

# Downstream: feed the watcher ID into a metric alert that fires when a
# connection monitor reports a reachability drop.
resource "azurerm_monitor_metric_alert" "reachability" {
  name                = "alert-reachability-eastus"
  resource_group_name = "rg-network-watcher-prod"
  scopes              = [module.network_watcher.id]
  description         = "Fires when end-to-end reachability falls below threshold."

  criteria {
    metric_namespace = "Microsoft.Network/networkWatchers/connectionMonitors"
    metric_name      = "ProbesFailedPercent"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 10
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module configlive/prod/network_watcher/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-network-watcher?ref=v1.0.0"
}

inputs = {
  name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/network_watcher && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string Yes Name of the Network Watcher (e.g. nw-eastus-prod). One per region per subscription.
location string Yes Azure region. A region holds only one Network Watcher per subscription.
resource_group_name string "NetworkWatcherRG" No Resource group that holds the watcher.
create_resource_group bool true No Create the resource group, or adopt an existing one such as NetworkWatcherRG.
tags map(string) {} No Tags applied to the RG, watcher, flow logs, and connection monitors.
flow_logs map(object) {} No NSG flow logs to create. Each needs a Storage Account in the watcher’s region; optional traffic analytics.
connection_monitors map(object) {} No Connection monitors with endpoints, test configurations, and test groups.

Outputs

Name Description
id Resource ID of the Network Watcher.
name Name of the Network Watcher.
location Region of the Network Watcher.
resource_group_name Resource group that holds the Network Watcher.
flow_log_ids Map of flow-log keys to their resource IDs.
connection_monitor_ids Map of connection-monitor keys to their resource IDs.

Enterprise scenario

A bank’s platform team runs a hub-and-spoke topology across eastus and westeurope in three subscriptions (connectivity, prod, non-prod). They call this module once per region in the connectivity subscription, each time enabling NSG flow logs on the hub firewall subnet with 90-day retention and Traffic Analytics shipping to the central security Log Analytics workspace. Connection monitors continuously probe TCP/53 and TCP/443 from a hub probe VM to the on-prem ExpressRoute DNS and core banking endpoints, and the id output feeds a metric alert that pages the network on-call when ProbesFailedPercent crosses 10%. Because the watcher name, retention, and RG are fixed in the module, every region’s diagnostics look identical and the auditors get a single, predictable place to verify flow-log coverage.

Best practices

TerraformAzureNetwork WatcherModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading