Quick take — A reusable hashicorp/azurerm ~> 4.0 module for azurerm_network_watcher that pins one Network Watcher per region, wires in flow logs and connection monitors, and keeps the auto-created NetworkWatcherRG under control. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.
Quickstart (copy-paste)
Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):
provider "azurerm" {
features {}
}
module "network_watcher" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-network-watcher?ref=v1.0.0"
name = "..." # Name of the Network Watcher (e.g. `nw-eastus-prod`). On…
location = "..." # Azure region. A region holds only one Network Watcher p…
}
Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.
What this module is
Azure Network Watcher is the regional service behind packet-level network diagnostics: NSG flow logs, VNet flow logs, connection monitors, IP flow verify, next-hop, packet capture, and the topology view. The catch is that it is a singleton per region per subscription. When you create a virtual network in a region, Azure silently auto-provisions a Network Watcher named NetworkWatcher_<region> inside a resource group called NetworkWatcherRG. If your Terraform also tries to create one, you get either a “already exists” import conflict or a duplicate sitting in the wrong resource group.
This module wraps azurerm_network_watcher so that your IaC owns exactly one explicit, named, tagged instance per region — instead of leaving Azure’s implicit one floating around unmanaged. It optionally provisions the resource group that holds it, attaches a Storage Account-backed NSG flow log with traffic analytics, and stands up a connection monitor so reachability checks are codified rather than clicked together in the portal. Wrapping it in a module means the “one per region” rule, the naming convention, and the flow-log retention policy are enforced identically across every subscription in the landing zone.
When to use it
- You are building a landing zone and want explicit, tagged Network Watcher instances instead of the implicit
NetworkWatcher_<region>that Azure auto-creates, so it shows up in your CMDB and cost/governance reports. - You need NSG or VNet flow logs with Traffic Analytics enabled consistently across regions for security monitoring, retained for a fixed number of days to satisfy audit requirements.
- You run synthetic reachability tests (connection monitors) between hubs, spokes, on-prem gateways, or external endpoints and want them version-controlled.
- You manage multiple regions/subscriptions and want one module call per region to guarantee no duplicate watchers and a uniform naming scheme.
- Skip this module if you only ever use Azure’s auto-created watcher and have zero flow-log or connection-monitor requirements — in that case there is nothing to manage.
Module structure
terraform-module-azure-network-watcher/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf
versions.tf
terraform {
required_version = ">= 1.6.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.0"
}
}
}
main.tf
# Optionally own the resource group that holds the Network Watcher.
# Azure auto-creates "NetworkWatcherRG"; set create_resource_group = false
# and point resource_group_name at it to adopt the existing one instead.
resource "azurerm_resource_group" "this" {
count = var.create_resource_group ? 1 : 0
name = var.resource_group_name
location = var.location
tags = var.tags
}
locals {
resource_group_name = var.create_resource_group ? azurerm_resource_group.this[0].name : var.resource_group_name
}
# The singleton-per-region Network Watcher. Only ONE may exist per region per
# subscription, so this module is intended to be called once per region.
resource "azurerm_network_watcher" "this" {
name = var.name
location = var.location
resource_group_name = local.resource_group_name
tags = var.tags
}
# NSG flow log (optional). Requires an existing Storage Account in the same
# region, and a Log Analytics workspace if traffic analytics is enabled.
resource "azurerm_network_watcher_flow_log" "this" {
for_each = var.flow_logs
name = each.value.name
network_watcher_name = azurerm_network_watcher.this.name
resource_group_name = local.resource_group_name
network_security_group_id = each.value.network_security_group_id
storage_account_id = each.value.storage_account_id
enabled = each.value.enabled
version = each.value.version
tags = var.tags
retention_policy {
enabled = each.value.retention_enabled
days = each.value.retention_days
}
dynamic "traffic_analytics" {
for_each = each.value.traffic_analytics == null ? [] : [each.value.traffic_analytics]
content {
enabled = traffic_analytics.value.enabled
workspace_id = traffic_analytics.value.workspace_id
workspace_region = traffic_analytics.value.workspace_region
workspace_resource_id = traffic_analytics.value.workspace_resource_id
interval_in_minutes = traffic_analytics.value.interval_in_minutes
}
}
}
# Connection monitor (optional) for codified synthetic reachability tests.
resource "azurerm_network_connection_monitor" "this" {
for_each = var.connection_monitors
name = each.value.name
network_watcher_id = azurerm_network_watcher.this.id
location = var.location
tags = var.tags
dynamic "endpoint" {
for_each = each.value.endpoints
content {
name = endpoint.value.name
target_resource_id = endpoint.value.target_resource_id
address = endpoint.value.address
}
}
dynamic "test_configuration" {
for_each = each.value.test_configurations
content {
name = test_configuration.value.name
protocol = test_configuration.value.protocol
test_frequency_in_seconds = test_configuration.value.test_frequency_in_seconds
dynamic "tcp_configuration" {
for_each = test_configuration.value.tcp_port == null ? [] : [test_configuration.value.tcp_port]
content {
port = tcp_configuration.value
}
}
dynamic "http_configuration" {
for_each = test_configuration.value.http_method == null ? [] : [test_configuration.value.http_method]
content {
method = http_configuration.value
}
}
}
}
dynamic "test_group" {
for_each = each.value.test_groups
content {
name = test_group.value.name
destination_endpoints = test_group.value.destination_endpoints
source_endpoints = test_group.value.source_endpoints
test_configuration_names = test_group.value.test_configuration_names
enabled = test_group.value.enabled
}
}
}
variables.tf
variable "name" {
type = string
description = "Name of the Network Watcher (e.g. nw-eastus-prod). One per region per subscription."
validation {
condition = length(var.name) >= 1 && length(var.name) <= 80
error_message = "name must be between 1 and 80 characters."
}
}
variable "location" {
type = string
description = "Azure region for the Network Watcher. A region may hold only one Network Watcher per subscription."
}
variable "resource_group_name" {
type = string
description = "Resource group that holds the Network Watcher. Azure's implicit watcher uses 'NetworkWatcherRG'."
default = "NetworkWatcherRG"
}
variable "create_resource_group" {
type = bool
description = "Create the resource group (true) or adopt an existing one such as NetworkWatcherRG (false)."
default = true
}
variable "tags" {
type = map(string)
description = "Tags applied to the resource group, watcher, flow logs, and connection monitors."
default = {}
}
variable "flow_logs" {
description = "Map of NSG flow logs to create. Each requires a Storage Account in the same region as the watcher."
type = map(object({
name = string
network_security_group_id = string
storage_account_id = string
enabled = optional(bool, true)
version = optional(number, 2)
retention_enabled = optional(bool, true)
retention_days = optional(number, 90)
traffic_analytics = optional(object({
enabled = optional(bool, true)
workspace_id = string
workspace_region = string
workspace_resource_id = string
interval_in_minutes = optional(number, 10)
}))
}))
default = {}
validation {
condition = alltrue([for fl in values(var.flow_logs) : contains([1, 2], fl.version)])
error_message = "flow_logs[*].version must be 1 or 2."
}
validation {
condition = alltrue([for fl in values(var.flow_logs) : fl.retention_days >= 0 && fl.retention_days <= 365])
error_message = "flow_logs[*].retention_days must be between 0 and 365."
}
validation {
condition = alltrue([
for fl in values(var.flow_logs) :
fl.traffic_analytics == null ? true : contains([10, 60], fl.traffic_analytics.interval_in_minutes)
])
error_message = "traffic_analytics.interval_in_minutes must be 10 or 60."
}
}
variable "connection_monitors" {
description = "Map of connection monitors for synthetic reachability tests."
type = map(object({
name = string
endpoints = list(object({
name = string
target_resource_id = optional(string)
address = optional(string)
}))
test_configurations = list(object({
name = string
protocol = string
test_frequency_in_seconds = optional(number, 60)
tcp_port = optional(number)
http_method = optional(string)
}))
test_groups = list(object({
name = string
destination_endpoints = list(string)
source_endpoints = list(string)
test_configuration_names = list(string)
enabled = optional(bool, true)
}))
}))
default = {}
validation {
condition = alltrue(flatten([
for cm in values(var.connection_monitors) : [
for tc in cm.test_configurations : contains(["Tcp", "Http", "Icmp"], tc.protocol)
]
]))
error_message = "test_configurations[*].protocol must be one of Tcp, Http, or Icmp."
}
}
outputs.tf
output "id" {
description = "Resource ID of the Network Watcher."
value = azurerm_network_watcher.this.id
}
output "name" {
description = "Name of the Network Watcher."
value = azurerm_network_watcher.this.name
}
output "location" {
description = "Region of the Network Watcher."
value = azurerm_network_watcher.this.location
}
output "resource_group_name" {
description = "Resource group that holds the Network Watcher."
value = local.resource_group_name
}
output "flow_log_ids" {
description = "Map of flow-log keys to their resource IDs."
value = { for k, fl in azurerm_network_watcher_flow_log.this : k => fl.id }
}
output "connection_monitor_ids" {
description = "Map of connection-monitor keys to their resource IDs."
value = { for k, cm in azurerm_network_connection_monitor.this : k => cm.id }
}
How to use it
module "network_watcher" {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-network-watcher?ref=v1.0.0"
name = "nw-eastus-prod"
location = "eastus"
resource_group_name = "rg-network-watcher-prod"
create_resource_group = true
flow_logs = {
hub_nsg = {
name = "fl-hub-nsg"
network_security_group_id = azurerm_network_security_group.hub.id
storage_account_id = azurerm_storage_account.flowlogs.id
retention_days = 90
traffic_analytics = {
workspace_id = azurerm_log_analytics_workspace.security.workspace_id
workspace_region = "eastus"
workspace_resource_id = azurerm_log_analytics_workspace.security.id
interval_in_minutes = 10
}
}
}
connection_monitors = {
hub_to_onprem = {
name = "cm-hub-to-onprem"
endpoints = [
{
name = "hub-vm"
target_resource_id = azurerm_linux_virtual_machine.hub_probe.id
},
{
name = "onprem-dns"
address = "10.50.0.10"
}
]
test_configurations = [
{
name = "dns-tcp-53"
protocol = "Tcp"
tcp_port = 53
test_frequency_in_seconds = 30
}
]
test_groups = [
{
name = "tg-onprem-dns"
source_endpoints = ["hub-vm"]
destination_endpoints = ["onprem-dns"]
test_configuration_names = ["dns-tcp-53"]
}
]
}
}
tags = {
environment = "prod"
owner = "platform-network"
}
}
# Downstream: feed the watcher ID into a metric alert that fires when a
# connection monitor reports a reachability drop.
resource "azurerm_monitor_metric_alert" "reachability" {
name = "alert-reachability-eastus"
resource_group_name = "rg-network-watcher-prod"
scopes = [module.network_watcher.id]
description = "Fires when end-to-end reachability falls below threshold."
criteria {
metric_namespace = "Microsoft.Network/networkWatchers/connectionMonitors"
metric_name = "ProbesFailedPercent"
aggregation = "Average"
operator = "GreaterThan"
threshold = 10
}
}
With Terragrunt
Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.
1. Root config — live/terragrunt.hcl (inherited by every module):
remote_state {
backend = "azurerm"
generate = { path = "backend.tf", if_exists = "overwrite" }
config = {
# ...azurerm state bucket/container + key per path...
}
}
2. Module config — live/prod/network_watcher/terragrunt.hcl:
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-azure-network-watcher?ref=v1.0.0"
}
inputs = {
name = "..."
location = "..."
}
3. Deploy one environment, or roll out all modules together:
cd live/prod/network_watcher && terragrunt apply # this module
terragrunt run-all apply # every module under live/prod
Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.
Inputs
| Name | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | Yes | Name of the Network Watcher (e.g. nw-eastus-prod). One per region per subscription. |
location |
string |
— | Yes | Azure region. A region holds only one Network Watcher per subscription. |
resource_group_name |
string |
"NetworkWatcherRG" |
No | Resource group that holds the watcher. |
create_resource_group |
bool |
true |
No | Create the resource group, or adopt an existing one such as NetworkWatcherRG. |
tags |
map(string) |
{} |
No | Tags applied to the RG, watcher, flow logs, and connection monitors. |
flow_logs |
map(object) |
{} |
No | NSG flow logs to create. Each needs a Storage Account in the watcher’s region; optional traffic analytics. |
connection_monitors |
map(object) |
{} |
No | Connection monitors with endpoints, test configurations, and test groups. |
Outputs
| Name | Description |
|---|---|
id |
Resource ID of the Network Watcher. |
name |
Name of the Network Watcher. |
location |
Region of the Network Watcher. |
resource_group_name |
Resource group that holds the Network Watcher. |
flow_log_ids |
Map of flow-log keys to their resource IDs. |
connection_monitor_ids |
Map of connection-monitor keys to their resource IDs. |
Enterprise scenario
A bank’s platform team runs a hub-and-spoke topology across eastus and westeurope in three subscriptions (connectivity, prod, non-prod). They call this module once per region in the connectivity subscription, each time enabling NSG flow logs on the hub firewall subnet with 90-day retention and Traffic Analytics shipping to the central security Log Analytics workspace. Connection monitors continuously probe TCP/53 and TCP/443 from a hub probe VM to the on-prem ExpressRoute DNS and core banking endpoints, and the id output feeds a metric alert that pages the network on-call when ProbesFailedPercent crosses 10%. Because the watcher name, retention, and RG are fixed in the module, every region’s diagnostics look identical and the auditors get a single, predictable place to verify flow-log coverage.
Best practices
- Treat it as a per-region singleton. Azure auto-creates
NetworkWatcher_<region>inNetworkWatcherRGthe moment a VNet appears. Call this module exactly once per region and either adopt that RG (create_resource_group = false,resource_group_name = "NetworkWatcherRG",terraform import) or fully own a named RG — never let two watchers coexist in one region. - Keep flow-log Storage Accounts in the same region as the watcher and lock them down. Network Watcher cannot write NSG flow logs to a Storage Account in another region. Use a dedicated account with
min_tls_version = "TLS1_2", public access disabled, and a lifecycle rule that expires blobs in line withretention_daysso cold flow-log data does not pile up cost. - Tune retention and Traffic Analytics interval for cost. Flow logs and a 10-minute Traffic Analytics interval are the expensive parts, not the watcher itself (which is free). Use 60-minute intervals and shorter retention in non-prod; reserve 10-minute granularity and 90+ day retention for regulated prod subnets.
- Name by region and environment, not generically.
nw-eastus-prodimmediately tells operators which region and subscription tier they are looking at; avoidnw-01-style names that collide across the estate and defeat the point of explicit management. - Prefer VNet flow logs over per-NSG flow logs going forward. Microsoft is retiring NSG flow logs in favor of VNet flow logs; standardize new deployments on VNet-level capture so you get full coverage with fewer log resources and an easier migration path.
- Scope diagnostic permissions tightly. Grant the platform pipeline only
Network Contributoron the watcher RG plus write access to the flow-log Storage Account and workspace — not subscription-wide rights — so the singleton and its logs cannot be reconfigured by unrelated teams.