Terraform Module: Azure Network Watcher — One enabled regional instance, no accidental duplicates

Quick take — A reusable hashicorp/azurerm ~> 4.0 module for azurerm_network_watcher that pins one Network Watcher per region, wires in flow logs and connection monitors, and keeps the auto-created NetworkWatcherRG under control. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "azurerm" {
  features {}
}

module "network_watcher" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-network-watcher?ref=v1.0.0"

  name     = "..."  # Name of the Network Watcher (e.g. `nw-eastus-prod`). On…
  location = "..."  # Azure region. A region holds only one Network Watcher p…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

Azure Network Watcher is the regional service behind packet-level network diagnostics: NSG flow logs, VNet flow logs, connection monitors, IP flow verify, next-hop, packet capture, and the topology view. The catch is that it is a singleton per region per subscription. When you create a virtual network in a region, Azure silently auto-provisions a Network Watcher named NetworkWatcher_<region> inside a resource group called NetworkWatcherRG. If your Terraform also tries to create one, you get either a “already exists” import conflict or a duplicate sitting in the wrong resource group.

This module wraps azurerm_network_watcher so that your IaC owns exactly one explicit, named, tagged instance per region — instead of leaving Azure’s implicit one floating around unmanaged. It optionally provisions the resource group that holds it, attaches a Storage Account-backed NSG flow log with traffic analytics, and stands up a connection monitor so reachability checks are codified rather than clicked together in the portal. Wrapping it in a module means the “one per region” rule, the naming convention, and the flow-log retention policy are enforced identically across every subscription in the landing zone.

When to use it

You are building a landing zone and want explicit, tagged Network Watcher instances instead of the implicit NetworkWatcher_<region> that Azure auto-creates, so it shows up in your CMDB and cost/governance reports.
You need NSG or VNet flow logs with Traffic Analytics enabled consistently across regions for security monitoring, retained for a fixed number of days to satisfy audit requirements.
You run synthetic reachability tests (connection monitors) between hubs, spokes, on-prem gateways, or external endpoints and want them version-controlled.
You manage multiple regions/subscriptions and want one module call per region to guarantee no duplicate watchers and a uniform naming scheme.
Skip this module if you only ever use Azure’s auto-created watcher and have zero flow-log or connection-monitor requirements — in that case there is nothing to manage.

Module structure

terraform-module-azure-network-watcher/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.6.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

main.tf

# Optionally own the resource group that holds the Network Watcher.
# Azure auto-creates "NetworkWatcherRG"; set create_resource_group = false
# and point resource_group_name at it to adopt the existing one instead.
resource "azurerm_resource_group" "this" {
  count = var.create_resource_group ? 1 : 0

  name     = var.resource_group_name
  location = var.location
  tags     = var.tags
}

locals {
  resource_group_name = var.create_resource_group ? azurerm_resource_group.this[0].name : var.resource_group_name
}

# The singleton-per-region Network Watcher. Only ONE may exist per region per
# subscription, so this module is intended to be called once per region.
resource "azurerm_network_watcher" "this" {
  name                = var.name
  location            = var.location
  resource_group_name = local.resource_group_name
  tags                = var.tags
}

# NSG flow log (optional). Requires an existing Storage Account in the same
# region, and a Log Analytics workspace if traffic analytics is enabled.
resource "azurerm_network_watcher_flow_log" "this" {
  for_each = var.flow_logs

  name                      = each.value.name
  network_watcher_name      = azurerm_network_watcher.this.name
  resource_group_name       = local.resource_group_name
  network_security_group_id = each.value.network_security_group_id
  storage_account_id        = each.value.storage_account_id
  enabled                   = each.value.enabled
  version                   = each.value.version
  tags                      = var.tags

  retention_policy {
    enabled = each.value.retention_enabled
    days    = each.value.retention_days
  }

  dynamic "traffic_analytics" {
    for_each = each.value.traffic_analytics == null ? [] : [each.value.traffic_analytics]
    content {
      enabled               = traffic_analytics.value.enabled
      workspace_id          = traffic_analytics.value.workspace_id
      workspace_region      = traffic_analytics.value.workspace_region
      workspace_resource_id = traffic_analytics.value.workspace_resource_id
      interval_in_minutes   = traffic_analytics.value.interval_in_minutes
    }
  }
}

# Connection monitor (optional) for codified synthetic reachability tests.
resource "azurerm_network_connection_monitor" "this" {
  for_each = var.connection_monitors

  name               = each.value.name
  network_watcher_id = azurerm_network_watcher.this.id
  location           = var.location
  tags               = var.tags

  dynamic "endpoint" {
    for_each = each.value.endpoints
    content {
      name               = endpoint.value.name
      target_resource_id = endpoint.value.target_resource_id
      address            = endpoint.value.address
    }
  }

  dynamic "test_configuration" {
    for_each = each.value.test_configurations
    content {
      name                      = test_configuration.value.name
      protocol                  = test_configuration.value.protocol
      test_frequency_in_seconds = test_configuration.value.test_frequency_in_seconds

      dynamic "tcp_configuration" {
        for_each = test_configuration.value.tcp_port == null ? [] : [test_configuration.value.tcp_port]
        content {
          port = tcp_configuration.value
        }
      }

      dynamic "http_configuration" {
        for_each = test_configuration.value.http_method == null ? [] : [test_configuration.value.http_method]
        content {
          method = http_configuration.value
        }
      }
    }
  }

  dynamic "test_group" {
    for_each = each.value.test_groups
    content {
      name                     = test_group.value.name
      destination_endpoints    = test_group.value.destination_endpoints
      source_endpoints         = test_group.value.source_endpoints
      test_configuration_names = test_group.value.test_configuration_names
      enabled                  = test_group.value.enabled
    }
  }
}

variables.tf

variable "name" {
  type        = string
  description = "Name of the Network Watcher (e.g. nw-eastus-prod). One per region per subscription."

  validation {
    condition     = length(var.name) >= 1 && length(var.name) <= 80
    error_message = "name must be between 1 and 80 characters."
  }
}

variable "location" {
  type        = string
  description = "Azure region for the Network Watcher. A region may hold only one Network Watcher per subscription."
}

variable "resource_group_name" {
  type        = string
  description = "Resource group that holds the Network Watcher. Azure's implicit watcher uses 'NetworkWatcherRG'."
  default     = "NetworkWatcherRG"
}

variable "create_resource_group" {
  type        = bool
  description = "Create the resource group (true) or adopt an existing one such as NetworkWatcherRG (false)."
  default     = true
}

variable "tags" {
  type        = map(string)
  description = "Tags applied to the resource group, watcher, flow logs, and connection monitors."
  default     = {}
}

variable "flow_logs" {
  description = "Map of NSG flow logs to create. Each requires a Storage Account in the same region as the watcher."
  type = map(object({
    name                      = string
    network_security_group_id = string
    storage_account_id        = string
    enabled                   = optional(bool, true)
    version                   = optional(number, 2)
    retention_enabled         = optional(bool, true)
    retention_days            = optional(number, 90)
    traffic_analytics = optional(object({
      enabled               = optional(bool, true)
      workspace_id          = string
      workspace_region      = string
      workspace_resource_id = string
      interval_in_minutes   = optional(number, 10)
    }))
  }))
  default = {}

  validation {
    condition     = alltrue([for fl in values(var.flow_logs) : contains([1, 2], fl.version)])
    error_message = "flow_logs[*].version must be 1 or 2."
  }

  validation {
    condition     = alltrue([for fl in values(var.flow_logs) : fl.retention_days >= 0 && fl.retention_days <= 365])
    error_message = "flow_logs[*].retention_days must be between 0 and 365."
  }

  validation {
    condition = alltrue([
      for fl in values(var.flow_logs) :
      fl.traffic_analytics == null ? true : contains([10, 60], fl.traffic_analytics.interval_in_minutes)
    ])
    error_message = "traffic_analytics.interval_in_minutes must be 10 or 60."
  }
}

variable "connection_monitors" {
  description = "Map of connection monitors for synthetic reachability tests."
  type = map(object({
    name = string
    endpoints = list(object({
      name               = string
      target_resource_id = optional(string)
      address            = optional(string)
    }))
    test_configurations = list(object({
      name                      = string
      protocol                  = string
      test_frequency_in_seconds = optional(number, 60)
      tcp_port                  = optional(number)
      http_method               = optional(string)
    }))
    test_groups = list(object({
      name                     = string
      destination_endpoints    = list(string)
      source_endpoints         = list(string)
      test_configuration_names = list(string)
      enabled                  = optional(bool, true)
    }))
  }))
  default = {}

  validation {
    condition = alltrue(flatten([
      for cm in values(var.connection_monitors) : [
        for tc in cm.test_configurations : contains(["Tcp", "Http", "Icmp"], tc.protocol)
      ]
    ]))
    error_message = "test_configurations[*].protocol must be one of Tcp, Http, or Icmp."
  }
}

outputs.tf

output "id" {
  description = "Resource ID of the Network Watcher."
  value       = azurerm_network_watcher.this.id
}

output "name" {
  description = "Name of the Network Watcher."
  value       = azurerm_network_watcher.this.name
}

output "location" {
  description = "Region of the Network Watcher."
  value       = azurerm_network_watcher.this.location
}

output "resource_group_name" {
  description = "Resource group that holds the Network Watcher."
  value       = local.resource_group_name
}

output "flow_log_ids" {
  description = "Map of flow-log keys to their resource IDs."
  value       = { for k, fl in azurerm_network_watcher_flow_log.this : k => fl.id }
}

output "connection_monitor_ids" {
  description = "Map of connection-monitor keys to their resource IDs."
  value       = { for k, cm in azurerm_network_connection_monitor.this : k => cm.id }
}

How to use it

module "network_watcher" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-network-watcher?ref=v1.0.0"

  name                  = "nw-eastus-prod"
  location              = "eastus"
  resource_group_name   = "rg-network-watcher-prod"
  create_resource_group = true

  flow_logs = {
    hub_nsg = {
      name                      = "fl-hub-nsg"
      network_security_group_id = azurerm_network_security_group.hub.id
      storage_account_id        = azurerm_storage_account.flowlogs.id
      retention_days            = 90
      traffic_analytics = {
        workspace_id          = azurerm_log_analytics_workspace.security.workspace_id
        workspace_region      = "eastus"
        workspace_resource_id = azurerm_log_analytics_workspace.security.id
        interval_in_minutes   = 10
      }
    }
  }

  connection_monitors = {
    hub_to_onprem = {
      name = "cm-hub-to-onprem"
      endpoints = [
        {
          name               = "hub-vm"
          target_resource_id = azurerm_linux_virtual_machine.hub_probe.id
        },
        {
          name    = "onprem-dns"
          address = "10.50.0.10"
        }
      ]
      test_configurations = [
        {
          name                      = "dns-tcp-53"
          protocol                  = "Tcp"
          tcp_port                  = 53
          test_frequency_in_seconds = 30
        }
      ]
      test_groups = [
        {
          name                     = "tg-onprem-dns"
          source_endpoints         = ["hub-vm"]
          destination_endpoints    = ["onprem-dns"]
          test_configuration_names = ["dns-tcp-53"]
        }
      ]
    }
  }

  tags = {
    environment = "prod"
    owner       = "platform-network"
  }
}

# Downstream: feed the watcher ID into a metric alert that fires when a
# connection monitor reports a reachability drop.
resource "azurerm_monitor_metric_alert" "reachability" {
  name                = "alert-reachability-eastus"
  resource_group_name = "rg-network-watcher-prod"
  scopes              = [module.network_watcher.id]
  description         = "Fires when end-to-end reachability falls below threshold."

  criteria {
    metric_namespace = "Microsoft.Network/networkWatchers/connectionMonitors"
    metric_name      = "ProbesFailedPercent"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 10
  }
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "azurerm"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...azurerm state bucket/container + key per path...
  }
}

2. Module config — live/prod/network_watcher/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-network-watcher?ref=v1.0.0"
}

inputs = {
  name = "..."
  location = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/network_watcher && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`name`	`string`	—	Yes	Name of the Network Watcher (e.g. `nw-eastus-prod`). One per region per subscription.
`location`	`string`	—	Yes	Azure region. A region holds only one Network Watcher per subscription.
`resource_group_name`	`string`	`"NetworkWatcherRG"`	No	Resource group that holds the watcher.
`create_resource_group`	`bool`	`true`	No	Create the resource group, or adopt an existing one such as `NetworkWatcherRG`.
`tags`	`map(string)`	`{}`	No	Tags applied to the RG, watcher, flow logs, and connection monitors.
`flow_logs`	`map(object)`	`{}`	No	NSG flow logs to create. Each needs a Storage Account in the watcher’s region; optional traffic analytics.
`connection_monitors`	`map(object)`	`{}`	No	Connection monitors with endpoints, test configurations, and test groups.

Outputs

Name	Description
`id`	Resource ID of the Network Watcher.
`name`	Name of the Network Watcher.
`location`	Region of the Network Watcher.
`resource_group_name`	Resource group that holds the Network Watcher.
`flow_log_ids`	Map of flow-log keys to their resource IDs.
`connection_monitor_ids`	Map of connection-monitor keys to their resource IDs.

Enterprise scenario

A bank’s platform team runs a hub-and-spoke topology across eastus and westeurope in three subscriptions (connectivity, prod, non-prod). They call this module once per region in the connectivity subscription, each time enabling NSG flow logs on the hub firewall subnet with 90-day retention and Traffic Analytics shipping to the central security Log Analytics workspace. Connection monitors continuously probe TCP/53 and TCP/443 from a hub probe VM to the on-prem ExpressRoute DNS and core banking endpoints, and the id output feeds a metric alert that pages the network on-call when ProbesFailedPercent crosses 10%. Because the watcher name, retention, and RG are fixed in the module, every region’s diagnostics look identical and the auditors get a single, predictable place to verify flow-log coverage.

Best practices

Treat it as a per-region singleton. Azure auto-creates NetworkWatcher_<region> in NetworkWatcherRG the moment a VNet appears. Call this module exactly once per region and either adopt that RG (create_resource_group = false, resource_group_name = "NetworkWatcherRG", terraform import) or fully own a named RG — never let two watchers coexist in one region.
Keep flow-log Storage Accounts in the same region as the watcher and lock them down. Network Watcher cannot write NSG flow logs to a Storage Account in another region. Use a dedicated account with min_tls_version = "TLS1_2", public access disabled, and a lifecycle rule that expires blobs in line with retention_days so cold flow-log data does not pile up cost.
Tune retention and Traffic Analytics interval for cost. Flow logs and a 10-minute Traffic Analytics interval are the expensive parts, not the watcher itself (which is free). Use 60-minute intervals and shorter retention in non-prod; reserve 10-minute granularity and 90+ day retention for regulated prod subnets.
Name by region and environment, not generically. nw-eastus-prod immediately tells operators which region and subscription tier they are looking at; avoid nw-01-style names that collide across the estate and defeat the point of explicit management.
Prefer VNet flow logs over per-NSG flow logs going forward. Microsoft is retiring NSG flow logs in favor of VNet flow logs; standardize new deployments on VNet-level capture so you get full coverage with fewer log resources and an easier migration path.
Scope diagnostic permissions tightly. Grant the platform pipeline only Network Contributor on the watcher RG plus write access to the flow-log Storage Account and workspace — not subscription-wide rights — so the singleton and its logs cannot be reconfigured by unrelated teams.