Hybrid DNS at Scale: Azure DNS Private Resolver with Conditional Forwarding

For years the standard answer to hybrid DNS on Azure was a pair of forwarder VMs running BIND or the Windows DNS role, conditional-forwarding queries in both directions. They worked, but they were a maintenance tax: patching, HA, scaling, and a single point of failure sitting in the middle of every name lookup. The Azure DNS Private Resolver replaces that pattern entirely with a managed, zone-redundant service that does both inbound and outbound conditional forwarding. This guide builds bidirectional resolution end to end with Terraform and decommissions the VMs for good.

1. Why the default Azure DNS breaks for hybrid

Every Azure VNet ships with Azure-provided DNS at the virtual IP 168.63.129.16. It resolves public names, Azure internal names, and any Private DNS zones linked to the VNet. It is also a closed box with two hard limits that matter for hybrid:

It is only reachable from inside the VNet. On-premises hosts cannot query 168.63.129.16 over a VPN or ExpressRoute. It is a link-local address, not routable across the gateway. So the moment on-prem needs to resolve a Private Link FQDN like mystorage.privatelink.blob.core.windows.net, it has no path to the answer.
It cannot conditionally forward. You cannot tell 168.63.129.16 “send corp.contoso.com to my on-prem domain controllers.” It only knows what is linked to the VNet.

This is the crux of the hybrid DNS problem. Private endpoints depend on Private DNS zones, and those zones only resolve through Azure-provided DNS. On-prem needs a routable resolver in Azure to reach them, and Azure workloads need a way to forward corporate domains back to on-prem. The DNS Private Resolver provides both directions as a managed service.

The failure mode is subtle. A private endpoint’s public CNAME still resolves from anywhere, but it points at the privatelink zone, and only Azure-provided DNS holds the private A record. On-prem clients get the public CNAME, fail to resolve the private zone, and fall back to the public IP, which the Private Link firewall then refuses. The symptom is “it works from Azure but times out from the office.”

2. Architecture: inbound, outbound, and rulesets

The Private Resolver is a single resource deployed into a VNet (the hub, in a hub-and-spoke topology). It exposes three concepts:

Component	Direction	Purpose
Inbound endpoint	On-prem to Azure	A private IP in a dedicated subnet that on-prem DNS forwards to. Resolves Azure Private DNS zones and Azure internal names.
Outbound endpoint	Azure to on-prem	The egress point the resolver uses to send queries out, governed by forwarding rulesets.
DNS forwarding ruleset	Azure to on-prem	A set of conditional-forwarding rules (domain to target IPs) attached to an outbound endpoint and linked to one or more VNets.

Each endpoint lives in its own dedicated subnet, delegated to Microsoft.Network/dnsResolvers. A subnet that holds a resolver endpoint cannot hold anything else, and the two endpoints cannot share a subnet. Microsoft recommends a /28 per endpoint subnet; that is the practical minimum to plan for.

The flow in each direction:

On-prem to Azure:
  corp DNS server  -->  inbound endpoint IP  -->  Azure Private DNS zones

Azure to on-prem:
  spoke VM  -->  168.63.129.16  -->  outbound endpoint (via ruleset rule)  -->  on-prem DNS

The important and often-missed detail: spoke VMs still point at 168.63.129.16. They do not point at the outbound endpoint directly. The ruleset is linked to the VNet, which injects the conditional-forwarding behavior into Azure-provided DNS itself. You change nothing on the VM NICs.

3. Provision the resolver and delegated subnets with Terraform

Start with the subnets. In a hub VNet, carve two /28s that exist only for the resolver. Note the delegation block: the resolver will refuse to deploy into a subnet that is not delegated to it.

resource "azurerm_subnet" "dns_inbound" {
  name                 = "snet-dns-inbound"
  resource_group_name  = azurerm_resource_group.hub.name
  virtual_network_name = azurerm_virtual_network.hub.name
  address_prefixes     = ["10.0.16.0/28"]

  delegation {
    name = "Microsoft.Network.dnsResolvers"
    service_delegation {
      name    = "Microsoft.Network/dnsResolvers"
      actions = ["Microsoft.Network/virtualNetworks/subnets/join/action"]
    }
  }
}

resource "azurerm_subnet" "dns_outbound" {
  name                 = "snet-dns-outbound"
  resource_group_name  = azurerm_resource_group.hub.name
  virtual_network_name = azurerm_virtual_network.hub.name
  address_prefixes     = ["10.0.16.16/28"]

  delegation {
    name = "Microsoft.Network.dnsResolvers"
    service_delegation {
      name    = "Microsoft.Network/dnsResolvers"
      actions = ["Microsoft.Network/virtualNetworks/subnets/join/action"]
    }
  }
}

Now the resolver itself, bound to the hub VNet, plus both endpoints:

resource "azurerm_private_dns_resolver" "this" {
  name                = "dnspr-hub-prod"
  resource_group_name = azurerm_resource_group.hub.name
  location            = azurerm_resource_group.hub.location
  virtual_network_id  = azurerm_virtual_network.hub.id
}

resource "azurerm_private_dns_resolver_inbound_endpoint" "this" {
  name                    = "inbound"
  private_dns_resolver_id = azurerm_private_dns_resolver.this.id
  location                = azurerm_private_dns_resolver.this.location

  ip_configurations {
    private_ip_allocation_method = "Dynamic"
    subnet_id                    = azurerm_subnet.dns_inbound.id
  }
}

resource "azurerm_private_dns_resolver_outbound_endpoint" "this" {
  name                    = "outbound"
  private_dns_resolver_id = azurerm_private_dns_resolver.this.id
  location                = azurerm_private_dns_resolver.this.location
  subnet_id               = azurerm_subnet.dns_outbound.id
}

After apply, capture the inbound endpoint’s allocated IP. You will hand this to the on-prem team. It is exposed under the inbound endpoint’s ip_configurations:

az dns-resolver inbound-endpoint show \
  --resource-group rg-hub-prod \
  --dns-resolver-name dnspr-hub-prod \
  --name inbound \
  --query "ipConfigurations[0].privateIpAddress" -o tsv

Use a static (dynamic-but-stable) posture for this IP. With Dynamic allocation the resolver picks the first free address in the subnet and holds it for the life of the endpoint, which is why a dedicated subnet matters: nothing else can take that address. If you need a guaranteed value to bake into on-prem config ahead of time, set private_ip_allocation_method = "Static" and supply private_ip_address.

4. On-prem to Azure: point corporate DNS at the inbound endpoint

This direction is pure on-prem configuration. The inbound endpoint IP (say 10.0.16.4) is now a fully routable DNS server reachable over your VPN or ExpressRoute private peering. On the corporate DNS servers, create conditional forwarders for the Azure-side namespaces you want on-prem to resolve.

The namespaces to forward are the Private DNS zone names your private endpoints use. For example, to let on-prem resolve private endpoints for Blob, Key Vault, and your internal app domain:

On Windows Server DNS:

# Forward the Private Link zones to the Azure inbound endpoint
Add-DnsServerConditionalForwarderZone `
  -Name "privatelink.blob.core.windows.net" `
  -MasterServers 10.0.16.4

Add-DnsServerConditionalForwarderZone `
  -Name "privatelink.vaultcore.azure.net" `
  -MasterServers 10.0.16.4

Add-DnsServerConditionalForwarderZone `
  -Name "azure.contoso.internal" `
  -MasterServers 10.0.16.4

On a BIND resolver the equivalent is a forward only zone:

zone "privatelink.blob.core.windows.net" {
    type forward;
    forward only;
    forwarders { 10.0.16.4; };
};

That is the entire on-prem-to-Azure path. The inbound endpoint resolves anything the hub VNet can see, which includes every Private DNS zone linked to the hub. The next step makes sure the right zones are linked.

5. Azure to on-prem: forwarding rulesets linked to spokes

Now the reverse direction. Create a ruleset attached to the outbound endpoint, add a rule per on-prem domain, and link the ruleset to the VNets whose workloads need on-prem resolution.

resource "azurerm_private_dns_resolver_dns_forwarding_ruleset" "this" {
  name                                       = "ruleset-onprem"
  resource_group_name                        = azurerm_resource_group.hub.name
  location                                   = azurerm_resource_group.hub.location
  private_dns_resolver_outbound_endpoint_ids = [
    azurerm_private_dns_resolver_outbound_endpoint.this.id
  ]
}

resource "azurerm_private_dns_resolver_forwarding_rule" "corp" {
  name                      = "corp-contoso-com"
  dns_forwarding_ruleset_id = azurerm_private_dns_resolver_dns_forwarding_ruleset.this.id
  domain_name               = "corp.contoso.com."   # trailing dot is required
  enabled                   = true

  target_dns_servers {
    ip_address = "192.168.10.10"
    port       = 53
  }
  target_dns_servers {
    ip_address = "192.168.10.11"
    port       = 53
  }
}

The domain_name must end in a trailing dot — this is a fully qualified domain name and the API rejects it otherwise. The rule says “any query under corp.contoso.com goes to these on-prem DNS servers.” List two or more targets for redundancy.

Link the ruleset to each VNet that should inherit these rules. The hub VNet plus every spoke that runs workloads needing on-prem names:

resource "azurerm_private_dns_resolver_virtual_network_link" "hub" {
  name                      = "link-hub"
  dns_forwarding_ruleset_id = azurerm_private_dns_resolver_dns_forwarding_ruleset.this.id
  virtual_network_id        = azurerm_virtual_network.hub.id
}

resource "azurerm_private_dns_resolver_virtual_network_link" "spoke_app" {
  name                      = "link-spoke-app"
  dns_forwarding_ruleset_id = azurerm_private_dns_resolver_dns_forwarding_ruleset.this.id
  virtual_network_id        = azurerm_virtual_network.spoke_app.id
}

Once linked, any VM in the spoke that resolves db01.corp.contoso.com while pointing at 168.63.129.16 gets the query conditionally forwarded out the outbound endpoint to on-prem — without touching the VM. A spoke VNet must be directly linked to the ruleset; peering alone does not propagate the rules. This catches people relying on hub-spoke peering to do the work it does not do.

6. Integrate Private DNS zones so private endpoints resolve from on-prem

For step 4’s conditional forwarders to return real answers, the Private DNS zones must be linked to the hub VNet — the VNet hosting the inbound endpoint. The inbound endpoint resolves against the zones linked to its own VNet. If your privatelink.blob.core.windows.net zone is only linked to a spoke, on-prem queries hit the inbound endpoint and get nothing.

resource "azurerm_private_dns_zone" "blob" {
  name                = "privatelink.blob.core.windows.net"
  resource_group_name = azurerm_resource_group.hub.name
}

# Link the zone to the HUB so the inbound endpoint can resolve it for on-prem
resource "azurerm_private_dns_zone_virtual_network_link" "blob_hub" {
  name                  = "link-blob-hub"
  resource_group_name   = azurerm_resource_group.hub.name
  private_dns_zone_name = azurerm_private_dns_zone.blob.name
  virtual_network_id    = azurerm_virtual_network.hub.id
}

The clean pattern at scale is centralize all Private DNS zones in the hub, link every zone to the hub VNet, and use Azure Policy with a DINE (deployInIfNotExists) effect to auto-create private endpoint A records in the correct hub zone. Spokes resolve through hub-linked zones via the ruleset/peering, and on-prem resolves through the inbound endpoint — one authoritative set of zones, two consumers. Avoid scattering duplicate zones per spoke; that path leads to drift and split-brain answers.

Enterprise scenario

A retail platform team migrating from forwarder VMs hit a split-brain failure two days after cutover. AKS pods in the spoke could resolve db01.corp.contoso.com, but pods specifically could not resolve a new on-prem domain payments.corp.contoso.com that other Azure VMs resolved fine. The forwarding rule existed, the VNet was linked, and nslookup from the node host worked. The problem was CoreDNS: the cluster ran a custom Corefile ConfigMap that hard-forwarded corp.contoso.com to the old forwarder VM IPs, which had just been deleted. Pods never reached 168.63.129.16, so the ruleset never fired.

The fix was to stop overriding Azure-provided DNS inside the cluster and let the resolver own forwarding. They replaced the stale block with a default upstream that points back at the VNet resolver:

# coredns-custom ConfigMap (kube-system) — let Azure DNS + the ruleset decide
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  forward-onprem.override: |
    forward corp.contoso.com 168.63.129.16 {
      policy sequential
    }

After kubectl -n kube-system rollout restart deployment coredns, pods forwarded corp.contoso.com to 168.63.129.16, which applied the linked ruleset and reached the new outbound endpoint. The lesson: the Private Resolver only works for workloads that actually query Azure-provided DNS. Any layer that shortcuts it — CoreDNS overrides, hard-coded /etc/resolv.conf, or appliance-based DNS — bypasses the ruleset entirely. Audit those layers before deleting the old forwarders, not after.

Verify

Validate both directions explicitly. Half-working hybrid DNS is the default state, so test each path.

From an Azure spoke VM, confirm Azure-to-on-prem forwarding (an on-prem name should resolve to its private IP):

# Spoke VM still points at Azure-provided DNS; the ruleset does the forwarding
nslookup db01.corp.contoso.com        # expect on-prem private IP, e.g. 192.168.x.x

# A private endpoint resolves to its private IP, not the public one
nslookup mystorage.blob.core.windows.net   # expect 10.x.x.x via the privatelink zone

From an on-prem host, confirm on-prem-to-Azure resolution through the inbound endpoint:

# Resolve a Private Link FQDN; must return the private 10.x address
dig +short mystorage.privatelink.blob.core.windows.net

# Query the inbound endpoint directly to isolate the resolver from the forwarder chain
dig @10.0.16.4 mystorage.privatelink.blob.core.windows.net +short

Confirm the control-plane wiring from the CLI:

# Inbound endpoint IP is what on-prem forwarders target
az dns-resolver inbound-endpoint show -g rg-hub-prod \
  --dns-resolver-name dnspr-hub-prod -n inbound \
  --query "ipConfigurations[0].privateIpAddress" -o tsv

# Forwarding rules are present and enabled
az dns-resolver forwarding-rule list -g rg-hub-prod \
  --ruleset-name ruleset-onprem \
  --query "[].{domain:domainName,state:forwardingRuleState}" -o table

# The spoke VNet is actually linked to the ruleset (peering is not enough)
az dns-resolver vnet-link list -g rg-hub-prod \
  --ruleset-name ruleset-onprem --query "[].name" -o table

A correct deployment resolves on-prem names from Azure to private IPs, resolves Azure private endpoints from on-prem to private IPs, and shows the inbound IP, enabled rules, and the expected VNet links.

Decommission the forwarder VMs

Cut over without an outage by running both in parallel briefly:

Deploy the resolver, rulesets, and zone links while the VMs still serve.
Point on-prem conditional forwarders at the inbound endpoint IP instead of the forwarder VM IPs.
Link the ruleset to spokes so Azure-side queries forward via the outbound endpoint instead of the VMs.
Lower the TTL on relevant records ahead of time, then watch resolver query logs and on-prem DNS logs for a full business cycle.
Once traffic to the VMs drops to zero, stop them for a cooling-off period, then delete the VMs, their NICs, disks, and any custom DNS server settings on VNets that referenced them.

Do not forget the VNet’s custom DNS servers setting. If your VNets were configured to use the forwarder VM IPs as custom DNS, you must clear that back to Default (Azure-provided) so spokes use 168.63.129.16 and pick up the ruleset. Leaving stale custom DNS pointing at deleted VMs is the most common post-migration outage.

Migration checklist

Cost, capacity, and pitfalls

The Private Resolver bills on two axes: an hourly charge per endpoint (you have two) and a per-million-queries charge. For most environments this lands well under the fully loaded cost of two HA VMs once you count compute, patching, and on-call. There is no instance to size.

Capacity is the constraint to design around. Each endpoint has a published throughput ceiling — on the order of ~10,000 queries per second per endpoint — and the resolver itself enforces a per-resolver QPS limit. For very high-volume estates, monitor actual QPS and treat the limit as a real planning number, not a theoretical one. The service is zone-redundant by design, so HA is not your problem anymore, but throughput is.

Enable diagnostic settings on the resolver and stream query logs to Log Analytics so you can see exactly which domains forward where and catch resolution failures early:

az monitor diagnostic-settings create \
  --name dnspr-diag \
  --resource $(az dns-resolver show -g rg-hub-prod -n dnspr-hub-prod --query id -o tsv) \
  --workspace $(az monitor log-analytics workspace show -g rg-hub-prod -n law-hub --query id -o tsv) \
  --logs '[{"categoryGroup":"allLogs","enabled":true}]'

Last, the pitfalls that bite in production:

Trailing dots and rule order. Forwarding rule domain_name values are FQDNs and need the trailing dot. Longest-suffix match wins, so a broad rule for contoso.com. can shadow a more specific intent — keep rules tight.
Peering is not a ruleset link. A spoke must be linked to the ruleset directly. Relying on hub-spoke peering to carry forwarding rules silently fails.
Zones linked to the wrong VNet. The inbound endpoint only resolves zones linked to its VNet. Centralize zones in the hub and link them there.
Stale custom DNS after cutover. Clear VNet custom DNS back to Default, or spokes keep asking dead forwarder VMs.
QPS blind spots. Without query logging you will not see the resolver approaching its throughput limit until lookups start timing out. Turn logging on from day one.

Build both directions, prove each with a real lookup to a private IP, and only then delete the VMs. Done this way, hybrid DNS stops being a fragile pair of boxes you babysit and becomes a managed, zone-redundant part of the platform that scales without your attention.

Hybrid DNS at Scale: Azure DNS Private Resolver with Conditional Forwarding

1. Why the default Azure DNS breaks for hybrid

2. Architecture: inbound, outbound, and rulesets

3. Provision the resolver and delegated subnets with Terraform

4. On-prem to Azure: point corporate DNS at the inbound endpoint

5. Azure to on-prem: forwarding rulesets linked to spokes

6. Integrate Private DNS zones so private endpoints resolve from on-prem

Enterprise scenario

Verify

Decommission the forwarder VMs

Migration checklist

Cost, capacity, and pitfalls

Written by Vinod

Comments

Keep Reading

Application Gateway v2 and WAF: L7 Routing, TLS Termination, and Tuning That Holds

AWS Gateway Load Balancer: Transparent Inline Inspection with Third-Party Appliances

AWS Network Firewall in Production: Suricata Rule Engineering for Egress Inspection