Azure Networking

Azure Private Link and Private DNS: Keeping PaaS Off the Public Internet

Quick take: Private Link is only half the solution. Without Private DNS, clients still resolve PaaS names to public IPs and the connection either fails or — worse — quietly egresses over the internet. You need both, wired together, on every VNet that resolves.

A security team mandated Private Endpoint for every Azure SQL database in the estate. The database team deployed the endpoints in an afternoon, flipped Public network access to Disabled, and went home. By 09:00 the next morning every application was throwing connection timeouts. The endpoints were healthy, the NSGs were open, the credentials were fine — and the apps still could not connect. The cause was not networking at all. It was DNS: the applications kept resolving mydb.database.windows.net to the public IP (which was now firewalled off), because nobody had created the Azure Private DNS zone that maps the public FQDN to the endpoint’s private IP. The fix was three az commands and zero application changes. This is the single most common Private Link incident, and it is entirely avoidable once you understand that Private Endpoint moves the IP, but Private DNS moves the name — and a client connects to a name.

This article is the practitioner’s deep dive into the pair. Azure Private Link is the umbrella feature; a Private Endpoint is the concrete object — a network interface (NIC) with a private IP from your subnet that maps to one specific PaaS resource (one SQL server, one storage account’s blob service, one Key Vault) over the Microsoft backbone, never the public internet. Azure Private DNS is the resolution layer that makes the public service FQDN return that private IP, so existing connection strings keep working untouched. You will learn every moving part: the group ID that selects which sub-resource an endpoint targets, the exact privatelink.* zone names per service, the privateDnsZoneGroup that auto-creates and lifecycle-manages the A record (and why you should almost never create that record by hand), how name resolution actually resolves through the platform’s 168.63.129.16 resolver, how to extend it to on-premises with DNS Private Resolver or a forwarder VM, and the data-exfiltration story that is the real reason security teams care.

Because this is a reference you will return to mid-incident, the playbook, the group IDs, the zone names, the limits and the failure modes are all laid out as scannable tables — read the prose once, then keep the tables open when nslookup returns the wrong IP and production is down. By the end you will stop guessing whether a Private Link problem is “networking” or “DNS” (it is almost always DNS), and you will be able to confirm which in under two minutes with a single resolution check.

What problem this solves

PaaS services — Azure SQL, Storage, Key Vault, Cosmos DB, App Service, Service Bus — are born with public endpoints. mydb.database.windows.net resolves to a public IP and accepts connections from anywhere your firewall rules allow. For a great many workloads that is fine, gated by service firewalls and Service Endpoints. But for regulated, sensitive, or zero-trust workloads it is unacceptable on two counts. First, the data plane traverses the public internet (even if encrypted, the path is public, and many compliance regimes forbid it). Second, and more subtly, a public endpoint is a data-exfiltration vector: a compromised VM or a malicious insider can copy data to their own storage account, because outbound to *.blob.core.windows.net is allowed wholesale — the firewall protects your account, not the service.

Private Link solves both. The Private Endpoint gives the service a private IP inside your VNet, so traffic stays on the Microsoft backbone and the service can have its public endpoint disabled entirely. Private Link policies then let you allow your own storage account’s private endpoint while the platform blocks egress to other tenants’ resources, closing the exfiltration hole. The catch — the thing this entire article exists to drive home — is that none of it works until DNS resolves the public FQDN to the private IP. A Private Endpoint with no DNS plan is a NIC nobody can find.

What breaks without this knowledge, in production terms: applications time out after the public endpoint is disabled (the headline incident above); on-premises clients keep resolving public IPs because the Private DNS zone is invisible to corporate DNS; a hub-and-spoke estate ends up with the zone linked to one VNet but not the twenty spokes that actually need it; somebody creates the A record by hand, the endpoint is later re-created with a new private IP, and the stale record blackholes traffic; or a forced-tunnel 0.0.0.0/0 route sends the endpoint’s return traffic to a firewall that drops it. Every one of these looks like a connectivity problem and is a name-resolution or routing problem.

Who hits this: anyone running sensitive PaaS in production, especially in hub-and-spoke topologies with centralized DNS, hybrid estates with on-premises clients, and landing zones where the platform team owns DNS and app teams own endpoints. The decision of which private-access technology to use at all — Private Endpoint versus the older Service Endpoint — is upstream of this and covered in Azure Private Endpoint vs Service Endpoint: Secure PaaS Access; this article assumes you have chosen Private Endpoint and need to make it actually resolve.

To frame the whole field before the deep dive, here is every failure class this article covers, the question it forces, and the one check to run first:

Failure class What the symptom looks like First question to ask First check to run Most common single cause
Resolves to public IP Timeout after public access disabled Does the client get a private or public IP? nslookup mydb.database.windows.net Private DNS zone not linked to this VNet
No / stale A record NXDOMAIN or wrong private IP Is there a record, and is it the current PE IP? az network private-dns record-set a list No privateDnsZoneGroup; manual record drifted
NSG / route blocks the leg Resolves right, still no connect Is the PE NIC reachable on the port? Network Watcher effective routes + NSG 0.0.0.0/0 UDR blackholes; NSG drops the port
Public path still open Works, but exfil still possible Is publicNetworkAccess actually Disabled? az sql server show … publicNetworkAccess Endpoint added but public never turned off
Hybrid resolves public On-prem clients fail, Azure clients fine Where does the query resolve — Azure or on-prem? nslookup from on-prem vs from a VNet VM No conditional forwarder to a DNS resolver
Wrong group ID PE created against the wrong sub-resource Is the endpoint for blob, or for file/dfs? az network private-endpoint show … groupIds One PE assumed to cover all storage sub-resources

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should already understand Azure networking fundamentals: that a VNet is your private address space carved into subnets, that NSGs filter traffic and UDRs (user-defined routes) steer it, and how name resolution works at a basic level (an FQDN resolves to an IP via DNS, possibly through a CNAME chain). Those fundamentals are covered in Azure Virtual Network, Subnets and NSGs: Networking Fundamentals. You should be comfortable running az in Cloud Shell, reading JSON output, and you should know what a PaaS service’s public FQDN looks like (e.g. *.database.windows.net, *.blob.core.windows.net).

This sits in the Networking & Security track and is the practical follow-on to the Private Endpoint-vs-Service-Endpoint decision. It pairs tightly with VNet routing troubleshooting — when DNS is right but traffic still won’t flow, you are in Diagnosing Azure VNet Connectivity: NSGs, UDRs, Effective Routes & Network Watcher territory — and with the storage-specific access failures in Fixing Azure Storage 403 Errors: Firewalls, Private Endpoints, RBAC & SAS. In a large org the zone-and-link design is part of the platform foundation described in Azure Enterprise-Scale Landing Zone: Foundation for Large Organizations.

A quick map of who owns and confirms what during a Private Link incident, so you call the right person fast:

Layer What lives here Who usually owns it Failure classes it can cause
Application / connection string The FQDN the client dials App / dev team None directly — but a hard-coded private IP is a landmine
Private DNS zone + links The name→private-IP mapping Platform / network team Resolves public, NXDOMAIN, stale record (most failures)
privateDnsZoneGroup Auto-managed A record on the PE Whoever deploys the PE Stale/missing record if omitted
Private Endpoint (NIC) The private IP + sub-resource App team (often) Wrong group ID; NIC in a subnet with bad routes
NSG / UDR on the PE subnet Filtering + routing of the leg Network team 0.0.0.0/0 blackhole; port dropped
PaaS service firewall Public access on/off App + security Public still open (exfil); or over-locked, blocking the PE
On-prem DNS / forwarders Cross-premises resolution Corporate IT / network Hybrid clients resolve public

Core concepts

Six mental models make every later diagnosis obvious. Read them once; they are the spine of the whole article.

Private Endpoint moves the IP; Private DNS moves the name — and a client connects to a name. This is the thesis. A Private Endpoint is a NIC with a private IP (say 10.20.1.5) that maps to one PaaS resource over the backbone. But your app dials mydb.database.windows.net, not 10.20.1.5. Unless DNS returns the private IP for that public name, the app resolves the public IP and either egresses publicly (if public access is on) or times out (if it’s off). The Private Endpoint is necessary but useless without the matching DNS answer. Ninety percent of “Private Link doesn’t work” tickets are this one fact, not understood.

A Private Endpoint targets exactly one sub-resource, named by a group ID. A storage account has multiple services — blob, file, queue, table, dfs, web — each with its own FQDN (*.blob.*, *.file.*, …). A single Private Endpoint connects to one of them, selected by a group ID (also called the sub-resource). blob gets you the blob service; you need a separate endpoint (and a separate DNS zone) for file. Azure SQL uses sqlServer; Key Vault uses vault; App Service uses sites; Cosmos DB uses Sql/MongoDB/etc. Assuming one endpoint covers a whole service family is a classic mistake.

The public FQDN CNAMEs into the privatelink zone, which holds the private A record. When you enable a Private Endpoint, the public name (mydb.database.windows.net) is reconfigured so that, from a network that resolves the private zone, it CNAMEs to mydb.privatelink.database.windows.net, and that name has an A record to the private IP. So you don’t override the public name directly — you create the privatelink.database.windows.net zone, link it to your VNet, and the CNAME chain lands on your private A record. From a network without the zone, the same name resolves to the public IP. The zone name is service-specific and must be exact.

The privateDnsZoneGroup auto-creates and lifecycle-manages the A record — use it. You can create the A record by hand, but you almost never should. A privateDnsZoneGroup is a small object you attach to the Private Endpoint that says “keep this privatelink zone’s A record in sync with this endpoint’s IP.” Create it, and the record appears automatically, updates if the IP ever changes, and is deleted when the endpoint is deleted. Skip it and create the record manually, and you own a brittle mapping that silently drifts the day someone re-creates the endpoint with a new IP. The zone-group is the difference between “set and forget” and “stale-record outage in six months.”

Resolution flows through the platform resolver at 168.63.129.16 — which is VNet-local. Inside a VNet, Azure-provided DNS is the magic IP 168.63.129.16. It knows about Private DNS zones linked to that VNet and returns the private A record. This is why a VNet-linked zone “just works” for VNet clients. The crucial limit: 168.63.129.16 is not reachable from on-premises (it’s link-local to the VNet). So hybrid clients can’t use it directly — they need a forwarder inside Azure (a DNS Private Resolver inbound endpoint, or a DNS VM) that on-prem conditionally forwards to. Misunderstanding this single fact is the root of nearly every hybrid Private Link failure.

Private DNS without Private Endpoint, or vice versa, is a partial solution that fails quietly. The two are independent objects you must wire together. A Private Endpoint with no zone → resolves public. A zone with no endpoint (or pointing at a deleted endpoint) → resolves to a private IP nobody answers on → timeout. Disabling public access without first proving private resolution → instant outage. The pair is the unit of work; deploying one without the other is the bug.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side:

Concept One-line definition Where it lives Why it matters to connectivity
Private Link Umbrella feature for private PaaS access over the backbone Platform feature The “why” — private path, no public internet
Private Endpoint A NIC with a private IP mapping to one PaaS sub-resource Your subnet The private IP the client must reach
Group ID (sub-resource) Which service the endpoint targets (blob, sqlServer, …) On the PE One PE = one sub-resource; wrong ID = wrong service
Private Link service Your service exposed privately to consumer VNets Behind your Standard LB Provider side of Private Link (you publish)
Private DNS zone The privatelink.* zone holding the private A record Resource group, linked to VNets Maps the public FQDN to the private IP
privatelink.* name The exact zone name per service (e.g. privatelink.blob.core.windows.net) The zone’s name Must match the service or resolution fails
privateDnsZoneGroup Auto-manages the A record for a PE On the PE Prevents stale-record drift; the safe default
Virtual network link Connects a Private DNS zone to a VNet On the zone A VNet resolves the zone only if linked
168.63.129.16 Azure platform DNS resolver (VNet-local) Every VNet Returns the private A record; not reachable on-prem
DNS Private Resolver Managed DNS forwarder with inbound/outbound endpoints A subnet in the hub Lets on-prem and spokes resolve private zones
Public network access The service-firewall switch for the public endpoint On the PaaS resource Must be Disabled to truly close the public path
Data exfiltration Copying data to an attacker’s PaaS account Threat model Private Link + policy blocks egress to other tenants

The fastest way to internalise the model is to nail down what each object is not — every one of these confusions is a real ticket:

Belief that causes outages Why it’s wrong The correct mental model
“A Private Endpoint overrides the public DNS name.” The PE only creates a NIC + private IP; it touches no DNS by itself. You must create the privatelink zone and link it; the PE just provides the IP the record points at.
“One Private Endpoint secures the whole storage account.” A PE binds to one sub-resource (group ID), not the account. One PE + one zone per sub-resource (blob, file, queue, …) you actually use.
“Disabling public access makes it private.” It only shuts the public door; private resolution is separate. Private DNS must already return the private IP before you disable public, or you self-inflict an outage.
“The hub VNet resolving means the spokes resolve.” Each VNet resolves only the zones linked to it. Link the zone (or point DNS at a resolver) for every spoke that dials the PaaS name.
168.63.129.16 works everywhere.” It is link-local to each VNet, unreachable across ExpressRoute/VPN. On-prem needs a forwarder to an in-Azure resolver; it cannot hit the platform IP directly.
“A correct-looking A record means it’s fine.” A manually created record drifts when the PE is re-created with a new IP. Use a privateDnsZoneGroup; only it stays in sync with the endpoint’s lifecycle.

And because the four objects only work as a set, here is exactly what must exist for each outcome — read it as a truth table for “why is the answer wrong”:

PE exists? Zone created? Zone linked to client VNet? A record (zone-group)? Public access What the client gets
No Enabled Public IP — no private path at all
Yes No Enabled Public IP — endpoint unused, egress public
Yes Yes No Yes Enabled Public IP — zone invisible to this VNet
Yes Yes Yes No Enabled NXDOMAIN on privatelink, falls back to public
Yes Yes Yes Yes Enabled Private IP — works, but exfil door still open
Yes Yes Yes Yes Disabled Private IP — works and fully locked (the goal)
Yes Yes No Yes Disabled Timeout — resolves public, public is closed (the classic outage)

Group IDs and privatelink zone names — the canonical reference

Two pieces of trivia decide whether a Private Endpoint works at all: the group ID (which sub-resource the endpoint targets) and the exact privatelink zone name (where the A record lives). Get either wrong and the endpoint deploys cleanly but never resolves or never connects. There is no way to “figure these out” at the keyboard — you look them up. This is that lookup. Treat it as the single most-referenced table in the article.

Service Group ID (--group-id) Private DNS zone name Public FQDN pattern
Azure SQL Database / SQL MI sqlServer privatelink.database.windows.net *.database.windows.net
Azure Synapse (SQL) Sql privatelink.sql.azuresynapse.net *.sql.azuresynapse.net
Storage — Blob blob privatelink.blob.core.windows.net *.blob.core.windows.net
Storage — File file privatelink.file.core.windows.net *.file.core.windows.net
Storage — Queue queue privatelink.queue.core.windows.net *.queue.core.windows.net
Storage — Table table privatelink.table.core.windows.net *.table.core.windows.net
Storage — Data Lake Gen2 dfs privatelink.dfs.core.windows.net *.dfs.core.windows.net
Storage — Static Web web privatelink.web.core.windows.net *.web.core.windows.net
Key Vault vault privatelink.vaultcore.azure.net *.vault.azure.net
Cosmos DB (Core/SQL) Sql privatelink.documents.azure.com *.documents.azure.com
Cosmos DB (MongoDB) MongoDB privatelink.mongo.cosmos.azure.com *.mongo.cosmos.azure.com
App Service / Functions sites privatelink.azurewebsites.net *.azurewebsites.net
Service Bus / Event Hubs namespace privatelink.servicebus.windows.net *.servicebus.windows.net
Azure Container Registry registry privatelink.azurecr.io *.azurecr.io (+ regional data)
Azure App Configuration configurationStores privatelink.azconfig.io *.azconfig.io
Azure Monitor (AMPLS) azuremonitor several (privatelink.monitor.azure.com, …) multiple

The same lookup for the next tier of services people wire up — AKS, AI, databases and messaging — because guessing these is the same silent failure:

Service Group ID (--group-id) Private DNS zone name Public FQDN pattern
AKS API server (private cluster) management privatelink.<region>.azmk8s.io *.azmk8s.io
Azure Cache for Redis redisCache privatelink.redis.cache.windows.net *.redis.cache.windows.net
Azure Database for PostgreSQL (Flexible) postgresqlServer privatelink.postgres.database.azure.com *.postgres.database.azure.com
Azure Database for MySQL (Flexible) mysqlServer privatelink.mysql.database.azure.com *.mysql.database.azure.com
Event Grid topic topic privatelink.eventgrid.azure.net *.eventgrid.azure.net
Azure Data Factory dataFactory privatelink.datafactory.azure.net *.datafactory.azure.net
Azure AI Search searchService privatelink.search.windows.net *.search.windows.net
Azure OpenAI / AI Services account privatelink.openai.azure.com *.openai.azure.com
Azure Batch batchAccount privatelink.<region>.batch.azure.com *.batch.azure.com
SignalR Service signalr privatelink.service.signalr.net *.service.signalr.net
Azure Backup (Recovery Vault) AzureBackup privatelink.<geo>.backup.windowsazure.com *.backup.windowsazure.com
Azure Web PubSub webpubsub privatelink.webpubsub.azure.com *.webpubsub.azure.com

When a service isn’t in either table, you discover its group IDs rather than guess — the platform will tell you:

What you need Command Note
List valid group IDs for a resource type az network private-link-resource list --id <resourceId> Returns every sub-resource the service supports
The required zone name(s) for a group ID az network private-link-resource list --id <resourceId> --query "[].properties.requiredZoneNames" The exact privatelink.* names to create
What an existing PE actually targets az network private-endpoint show … --query "privateLinkServiceConnections[].groupIds" The group ID you really deployed
Records the platform wants to manage az network private-endpoint show … --query "customDnsConfigs" FQDNs + IPs the zone-group should hold

Three reading notes that save the most time:

Trap Why it bites How to avoid it
Key Vault’s zone is not privatelink.vault.azure.net The data-plane zone is vaultcore.azure.net — a near-universal typo Copy the exact name from this table; a wrong zone resolves nothing
Storage needs one PE + one zone per sub-resource blob and file are different services with different FQDNs Deploy separate endpoints/zones for each sub-resource you use
Some services have multiple FQDNs / regional records ACR has a regional data endpoint; AMPLS spans several zones Verify all records resolve privately, not just the primary

The group ID is also the thing you confirm when an endpoint “exists but doesn’t work” — it may target the wrong sub-resource entirely:

# What sub-resource(s) does this Private Endpoint actually target?
az network private-endpoint show -n pe-sql-prod -g rg-net-prod \
  --query "privateLinkServiceConnections[].groupIds" -o tsv
# Expect: sqlServer   (if this prints 'blob', you built the wrong endpoint)

Building the private path — option by option

Here is the end-to-end build, each step with its choices, defaults, trade-offs and gotchas. The order matters: endpoint → zone → link → zone-group → disable public, validated at each step.

Step 1 — Create the Private Endpoint (and pick the group ID)

The endpoint needs a target resource ID, a group ID, and a subnet to place the NIC. The subnet must have privateEndpointNetworkPolicies considered (historically NSGs/UDRs didn’t apply to PE NICs unless this was enabled; modern subnets support it — see the routing section).

# Create a Private Endpoint for Azure SQL (group-id sqlServer)
SQLID=$(az sql server show -n sql-shop-prod -g rg-data-prod --query id -o tsv)
az network private-endpoint create \
  --name pe-sql-prod --resource-group rg-net-prod \
  --vnet-name vnet-hub --subnet snet-privatelink \
  --private-connection-resource-id "$SQLID" \
  --group-id sqlServer \
  --connection-name pe-sql-conn -o table
resource pe 'Microsoft.Network/privateEndpoints@2023-11-01' = {
  name: 'pe-sql-prod'
  location: location
  properties: {
    subnet: { id: privateLinkSubnetId }
    privateLinkServiceConnections: [ {
      name: 'pe-sql-conn'
      properties: {
        privateLinkServiceId: sqlServerId
        groupIds: [ 'sqlServer' ]   // exactly one sub-resource per endpoint
      }
    } ]
  }
}

The endpoint placement and approval options, each with its trade-off:

Option Values Default When to change Trade-off / gotcha
Group ID service-specific (table above) none (required) per sub-resource Wrong ID → endpoint targets the wrong service
Subnet any subnet in the VNet required dedicate a PE subnet Mixing PEs with VMs complicates NSG/route design
Connection approval Auto / Manual Auto (same tenant, owner) cross-tenant, or governance gate Manual leaves the PE in Pending until approved
Static vs dynamic PE IP Dynamic / Static Dynamic when firewalls pin the IP Static IP survives re-create; dynamic can change
privateEndpointNetworkPolicies Disabled / Enabled varies by age enable to apply NSG/UDR to the PE NIC Disabled means NSGs/UDRs are ignored on the NIC

The connection state is the first thing to check if a cross-tenant or governed endpoint isn’t working:

Connection state Meaning What to do
Approved Live and serving Nothing — proceed to DNS
Pending Awaiting manual approval on the resource owner’s side Approve via az network private-endpoint-connection approve
Rejected Owner declined Re-request; fix whatever policy rejected it
Disconnected Target resource was deleted/moved Re-create the endpoint against the current resource

Step 2 — Create the Private DNS zone (exact name)

The zone name must match the service exactly (from the canonical table). Create it once per service per DNS scope (usually once in the hub).

# Create the Private DNS zone for Azure SQL
az network private-dns zone create \
  --resource-group rg-net-prod \
  --name privatelink.database.windows.net -o table
resource zone 'Microsoft.Network/privateDnsZones@2020-06-01' = {
  name: 'privatelink.database.windows.net'   // EXACT — see the canonical table
  location: 'global'                          // Private DNS zones are always 'global'
}

Zone-creation choices and the gotchas:

Setting Values Default When to change Gotcha
Zone name privatelink.<service> (exact) required per service A wrong name resolves nothing; no error at create time
Location always global global never Private DNS zones are not regional
Resource group any (usually a central DNS RG) required centralize in the hub Scattering zones makes hub-spoke DNS unmanageable
Registration vs resolution link Resolution (for PaaS) per link almost always resolution-only Auto-registration is for VM records, not PaaS PEs

Step 3 — Link the zone to every VNet that must resolve

A VNet resolves a Private DNS zone only if a virtual-network link exists. In hub-and-spoke, this is the step teams forget for the spokes — the hub resolves, the spokes don’t, and half the estate fails.

# Link the zone to the VNet whose clients must resolve privately
VNETID=$(az network vnet show -n vnet-spoke-app -g rg-net-prod --query id -o tsv)
az network private-dns link vnet create \
  --resource-group rg-net-prod \
  --zone-name privatelink.database.windows.net \
  --name link-spoke-app --virtual-network "$VNETID" \
  --registration-enabled false -o table
resource link 'Microsoft.Network/privateDnsZones/virtualNetworkLinks@2020-06-01' = {
  parent: zone
  name: 'link-spoke-app'
  location: 'global'
  properties: {
    virtualNetwork: { id: spokeVnetId }
    registrationEnabled: false   // resolution only for PaaS PEs
  }
}

The linking model and its limits — the numbers matter in big estates:

Link property Value / limit Why it matters
registrationEnabled false for PaaS PEs true only when you want VM auto-registration (not here)
Links per Private DNS zone up to ~1,000 A single zone can serve a very large hub-and-spoke estate
A VNet → zones many One VNet links to all the privatelink.* zones it needs
Cross-subscription links supported The zone in the hub can link to spokes in other subscriptions
Resolution scope the linked VNet only An unlinked VNet resolves the public IP — the #1 spoke bug

Step 4 — Attach the privateDnsZoneGroup (auto A record) — the safe default

This is the step that makes the whole thing robust. Attaching a privateDnsZoneGroup to the endpoint tells Azure to create and maintain the A record in the named zone, tied to the endpoint’s lifecycle.

# Auto-create + lifecycle-manage the A record for this endpoint
az network private-endpoint dns-zone-group create \
  --resource-group rg-net-prod \
  --endpoint-name pe-sql-prod \
  --name pdzg-sql \
  --private-dns-zone privatelink.database.windows.net \
  --zone-name sql -o table
resource zoneGroup 'Microsoft.Network/privateEndpoints/privateDnsZoneGroups@2023-11-01' = {
  parent: pe
  name: 'pdzg-sql'
  properties: {
    privateDnsZoneConfigs: [ {
      name: 'sql'
      properties: { privateDnsZoneId: zone.id }
    } ]
  }
}

Auto-managed versus manual A record — pick auto every time you can:

Approach Record lifecycle Drift risk When it’s acceptable Verdict
privateDnsZoneGroup (auto) Created/updated/deleted with the PE None Almost always Default — use this
Manual A record (record-set a add-record) You own it forever High — stale on re-create Cross-cloud edge cases, custom zones Avoid unless forced
No record at all Never Resolution fails (NXDOMAIN)

After this step, resolution from a linked VNet should return the private IP. Validate before touching public access:

# From a VM inside a linked VNet (NOT Cloud Shell, which isn't in your VNet):
nslookup sql-shop-prod.database.windows.net
# Expect a CNAME to sql-shop-prod.privatelink.database.windows.net → A 10.20.1.5 (private)

Step 5 — Disable public network access (only after private is proven)

Now, and only now, close the public door. Disabling it before DNS resolves privately is the classic self-inflicted outage.

# Azure SQL: disable the public endpoint entirely
az sql server update -n sql-shop-prod -g rg-data-prod \
  --set publicNetworkAccess=Disabled -o table

The public-access switch differs by service in name and granularity:

Service How to disable public Granularity Note
Azure SQL publicNetworkAccess=Disabled All-or-nothing public Firewall rules ignored once disabled
Storage --public-network-access Disabled (+ default-action Deny) Per-account, plus network rules “Allow trusted services” still applies
Key Vault --public-network-access Disabled Per-vault Combine with --default-action Deny
Cosmos DB --public-network-access Disabled Per-account Also --ip-range-filter for exceptions
App Service --public-network-access Disabled Per-app inbound Use with access restrictions for fine control

A pre-flight checklist before you flip the switch — each row is an outage you avoid:

Pre-flight check Command / portal Must be true
PE connection approved az network private-endpoint show … connectionState Approved
Zone linked to the client’s VNet az network private-dns link vnet list A link exists
A record present + correct IP az network private-dns record-set a list Points at the PE IP
Resolution returns private IP nslookup from a VNet VM Private IP, not public
On-prem clients (if any) resolve private nslookup from on-prem Private IP via forwarder

DNS resolution: how the name actually resolves

Understanding the resolution path turns “it doesn’t work” into “I know exactly which hop is wrong.” Here is the chain a VNet client walks, and the three scopes (VNet-only, hub-and-spoke, hybrid) that each change one link in it.

The CNAME chain and the platform resolver

From a client in a linked VNet, dialing sql-shop-prod.database.windows.net:

  1. The client asks Azure DNS (168.63.129.16, the VNet’s default resolver).
  2. The public name CNAMEs to sql-shop-prod.privatelink.database.windows.net.
  3. The resolver checks Private DNS zones linked to this VNet, finds privatelink.database.windows.net, and returns the A record10.20.1.5 (private).
  4. The client connects to 10.20.1.5 — the Private Endpoint NIC — over the backbone.

From a network without the zone linked, step 3 has no private zone to consult, the privatelink name resolves via public DNS to a public IP, and the client connects publicly (or fails if public is disabled). The entire difference is whether the resolving VNet has the link.

Each hop in that chain has its own failure and its own one-line check — when resolution is wrong you walk this table top to bottom and stop at the first surprise:

Hop What happens Goes wrong when… Confirm at this hop
1. Client → resolver Query goes to 168.63.129.16 (or custom DNS) VNet DNS overridden to an on-prem server with no forwarder Get-DnsClientServerAddress / check VNet DNS settings
2. Public name → CNAME …database.windows.net CNAMEs to …privatelink.… Nothing usually — this CNAME is platform-managed nslookup -type=cname <fqdn> shows the privatelink target
3. privatelink → zone Resolver checks zones linked to this VNet Zone not created, or not linked to this VNet az network private-dns link vnet list
4. Zone → A record The privatelink name returns the private A record No privateDnsZoneGroup, so no record exists (NXDOMAIN) az network private-dns record-set a list
5. A record → correct IP Record holds the current PE NIC IP Manual record drifted after a PE re-create Compare record IP vs customDnsConfigs IP

The resolution outcomes you’ll see, and what each tells you:

nslookup result What it means Verdict
CNAME → *.privatelink.*private A record Zone linked, record present, working Correct
Resolves straight to a public IP Zone not linked to this VNet (or not created) Link the zone here
NXDOMAIN on the privatelink name Zone exists but no A record Add privateDnsZoneGroup
Private A record with the wrong IP Stale manual record after PE re-create Switch to auto zone-group
Resolves private on a VM but public from on-prem No cross-prem forwarder to a resolver Add conditional forwarder

The three resolution scopes differ in exactly one variable — who needs to reach the zone — and that drives every design choice below:

Dimension Scope A — single VNet Scope B — hub-and-spoke Scope C — hybrid (on-prem)
Who resolves One VNet’s clients All spokes’ clients On-prem + all spokes
Where zones live That VNet’s RG Centralised in the hub Centralised in the hub
What links the zone One VNet link A link per spoke (or central resolver) Central resolver + per-VNet links
Resolver needed? No — 168.63.129.16 Optional (links suffice) Yes — DNS Private Resolver inbound
On-prem story None None Conditional forwarders → resolver
Automation lever Manual is fine Azure Policy auto-link Policy + resolver IaC
Typical scale Lab / single app Enterprise landing zone Regulated hybrid estate

Scope A — single VNet (the simple case)

One VNet, the zone linked to it, the zone-group on the endpoint. 168.63.129.16 does everything. Nothing else required. This is the lab and the small-deployment case.

Scope B — hub-and-spoke (the common enterprise case)

Centralize the privatelink.* zones in the hub and link them to every spoke that resolves PaaS. There is no need for a DNS server in the hub for VNet clients — peering plus the per-spoke links is enough, because each spoke’s own 168.63.129.16 consults zones linked to that spoke. (A common refinement is to point all VNets at a central DNS resolver so on-prem and custom DNS share one path — see Scope C.)

# Link the central zone to each spoke (run per spoke, or loop in a pipeline)
for SPOKE in vnet-spoke-app vnet-spoke-data vnet-spoke-web; do
  VID=$(az network vnet show -n $SPOKE -g rg-net-prod --query id -o tsv)
  az network private-dns link vnet create -g rg-net-prod \
    --zone-name privatelink.database.windows.net \
    --name link-$SPOKE --virtual-network "$VID" --registration-enabled false
done

Hub-and-spoke DNS design choices:

Design choice Option A Option B Recommendation
Where the zones live One set in the hub Per-spoke duplicates Hub — single source of truth
How spokes resolve Per-spoke link to hub zones Custom DNS → resolver Either; resolver scales better with on-prem
Who creates the link Manual per spoke Azure Policy auto-link Policy — app teams forget links
New PaaS service added Add zone once, links auto via policy Add zone + N links by hand Policy-driven zone management

Scope C — hybrid (on-premises clients)

On-premises clients cannot reach 168.63.129.16. To resolve privatelink.* privately from on-prem, deploy an Azure DNS Private Resolver in the hub with an inbound endpoint, and configure your corporate DNS to conditionally forward the public PaaS suffixes to that inbound endpoint’s IP. The resolver, being in Azure, can consult the linked Private DNS zones and return the private answer to on-prem.

# DNS Private Resolver inbound endpoint IP becomes the conditional-forward target
az dns-resolver inbound-endpoint create \
  --resolver-name dnspr-hub --resource-group rg-net-prod \
  --name inbound --location centralindia \
  --ip-configurations '[{"privateIpAllocationMethod":"Dynamic","subnet":{"id":"<inbound-subnet-id>"}}]' \
  -o table
# On-prem DNS: conditional-forward database.windows.net (etc.) → this inbound IP

The hybrid resolution options compared:

Option What it is Pros Cons / cost
DNS Private Resolver Managed inbound/outbound DNS endpoints No VM to patch, HA built-in, scales Hourly per endpoint + per-query
DNS forwarder VM(s) IaaS VM running DNS, forwarding to 168.63.129.16 Full control, familiar You patch/HA/scale it yourself
Per-spoke links only No on-prem story Simple for VNet-only On-prem clients still resolve public

The conditional forwarders you configure on-prem (one per service suffix you use), so the picture is concrete:

On-prem conditional-forward zone Forwards to For which service
database.windows.net Resolver inbound IP Azure SQL
blob.core.windows.net Resolver inbound IP Storage (blob)
vaultcore.azure.net Resolver inbound IP Key Vault
azurewebsites.net Resolver inbound IP App Service
(forward the public suffix, not the privatelink one) The CNAME chain handles the rest

Routing and NSGs: when DNS is right but traffic still won’t flow

Once nslookup returns the private IP, DNS is exonerated — any remaining failure is routing or filtering on the Private Endpoint’s leg. This is the second-largest bucket of Private Link incidents and the one most often misattributed to DNS.

Forced tunneling and the 0.0.0.0/0 blackhole

In hub-and-spoke with a central firewall, a UDR sends 0.0.0.0/0 to the firewall. If the Private Endpoint’s subnet inherits that route, the return traffic (or the path to the PE) can be black-holed or asymmetrically routed through the firewall, which may drop it. The PE NIC’s effective routes tell the truth:

# Effective routes on the PE NIC — look for a 0.0.0.0/0 to a firewall that shouldn't apply
NICID=$(az network private-endpoint show -n pe-sql-prod -g rg-net-prod \
  --query "networkInterfaces[0].id" -o tsv)
az network nic show-effective-route-table --ids "$NICID" -o table

Read the next-hop column against this decision table — it tells you instantly whether the PE leg is healthy or hijacked:

If the 0.0.0.0/0 next-hop is… It means… For the PE leg, do this
VnetLocal / Internet (system) No forced tunnel — default egress Nothing; the leg is fine
VirtualAppliance (firewall IP) Forced tunnel applies to this subnet Add a /32 route for the PE IP as VnetLocal, or exclude the prefix
VirtualNetworkGateway Routes pushed from on-prem/VPN Confirm the PE prefix isn’t advertised back on-prem (asymmetry)
None (route present, no hop) Traffic to that prefix is dropped A blackhole route is shadowing the PE — remove/scope it
A more-specific /32 to VnetLocal for the PE IP Your fix is in place Confirmed healthy; PE bypasses the firewall

The routing failure modes on the PE leg:

Symptom Root cause Confirm Fix
Resolves private, connection times out 0.0.0.0/0 UDR blackholes the PE return path show-effective-route-table shows the route Add a /32 (PE IP) route as VnetLocal, or exclude from forced tunnel
Works from hub, fails from spoke Spoke has the UDR but no return path Effective routes on the spoke side Symmetric routing; route the PE prefix locally
Intermittent / asymmetric Firewall sees one direction only Firewall flow logs Ensure both directions traverse the same path or neither

NSGs on the Private Endpoint subnet

Historically NSGs and UDRs did not apply to Private Endpoint NICs at all — a frequent source of “my NSG isn’t blocking it” and “my NSG isn’t protecting it” confusion. Modern subnets support applying them when privateEndpointNetworkPolicies is enabled. Know which mode your subnet is in:

privateEndpointNetworkPolicies NSG on PE NIC UDR on PE NIC Implication
Disabled (legacy default) Ignored Ignored You can’t filter the PE; forced-tunnel doesn’t catch it
NetworkSecurityGroupEnabled Applied Ignored NSG can allow/deny the port to the PE
RouteTableEnabled Ignored Applied UDRs steer PE traffic (forced tunnel applies)
Enabled Applied Applied Full control — modern recommended setting
# Turn on full network policies for the PE subnet so NSG + UDR apply
az network vnet subnet update -g rg-net-prod --vnet-name vnet-hub \
  --name snet-privatelink --private-endpoint-network-policies Enabled -o table

The ports each service’s Private Endpoint needs open (if you do apply an NSG):

Service Port(s) the PE serves Protocol
Azure SQL 1433 (TDS) TCP
Storage (blob/file/…) 443 TCP
Key Vault 443 TCP
Cosmos DB 443 (+ 10250–10256 for direct mode) TCP
App Service 443 TCP
Service Bus / Event Hubs 443 / 5671–5672 (AMQP) TCP

Data exfiltration: the security reason this exists

Disabling the public endpoint and going private is partly about the path (compliance), but the deeper security win is data-exfiltration control. A public storage endpoint lets a compromised VM azcopy your data to the attacker’s storage account, because outbound to *.blob.core.windows.net is allowed wholesale — your firewall guards your account, not the service namespace. Private Link, combined with restricting outbound, changes the calculus.

The exfiltration paths and what closes each:

Exfiltration path Open by default? What closes it
Copy to attacker’s storage over public blob endpoint Yes Restrict outbound to only your PE; egress firewall on *.blob.*
Read your data over your public endpoint Yes (if firewall allows) publicNetworkAccess=Disabled + Private Endpoint
DNS exfiltration / unexpected resolution Possible Central DNS + monitoring of zone queries
SAS-token leak used from anywhere Yes Combine PE with stored-access-policy + IP/PE scoping

The layered controls, from weakest to strongest, so you know where Private Link sits:

Control Protects Strength Gap it leaves
Service firewall (IP allow-list) Your account from unknown IPs Weak Path still public; SAS from allowed IP still works
Service Endpoint Your account from your VNet Medium Service still has a public IP; no exfil-to-other-tenant block
Private Endpoint + private DNS Path + your account Strong Needs DNS done right; per-endpoint cost
PE + public Disabled + egress firewall Path + account + exfil to other tenants Strongest Most setup; central egress inspection

Mapped to the way an attacker actually moves data out — and how you both detect and block each — the picture is concrete:

Attacker technique What they exploit How to detect it Control that blocks it
azcopy to attacker storage Outbound to *.blob.core.windows.net allowed wholesale Firewall flow logs to unknown storage FQDNs Egress firewall: allow only your PE prefixes / FQDNs
Read over your public endpoint publicNetworkAccess still Enabled Storage/SQL diagnostic logs show public source IPs publicNetworkAccess=Disabled + --default-action Deny
Stolen SAS replayed externally SAS valid from any IP Storage analytics: SAS auth from off-net IPs Stored-access policy + IP/PE scoping; short expiry
DNS-tunnel / rogue zone record Edit rights on the Private DNS zone Activity log on zone record changes RBAC zone tightly; alert on record-set writes
Cross-tenant Private Endpoint Approving a PE from another tenant Pending PE connections from unknown subs Private Link service auto-approval allow-list
Hairpin via mis-routed UDR Forced tunnel exfiltrating PE traffic Effective routes show firewall hop on PE /32 local route + egress inspection on the firewall

For the storage-specific firewall, SAS and RBAC interplay — the most common 403 maze on top of Private Link — see Fixing Azure Storage 403 Errors: Firewalls, Private Endpoints, RBAC & SAS. For secret-store specifics, Azure Key Vault: Secrets, Keys and Certificates Done Right covers the vault firewall and trusted-services angle.

Limits, quotas and the numbers that bite

Real numbers you size against and hit in big estates:

Resource / limit Value (approx) Why it matters
Private Endpoints per VNet ~1,000 Large estates with many services can approach this
Private Endpoints per subnet bounded by subnet IP space Each PE consumes one IP; size the subnet generously
Private DNS zones per subscription ~1,000 One privatelink.* per service; estates stay well under
Records per Private DNS zone ~25,000 Effectively unbounded for PE use
VNet links per Private DNS zone ~1,000 Caps how many spokes one zone serves directly
Group IDs per Private Endpoint 1 (effectively) One sub-resource per endpoint — the core constraint
DNS Private Resolver inbound/outbound endpoints small per-resolver cap Plan endpoints per hub region
PE NIC IP allocation Dynamic or Static Static survives re-create; dynamic can shift
customDnsConfigs entries per PE 1+ (service-dependent) ACR/AMPLS emit several FQDNs the zone-group must cover
Conditional forwarders per resolver ruleset ~25 per ruleset Cap on how many PaaS suffixes one ruleset forwards
DNS Private Resolver QPS (inbound) high, per-endpoint Sized for estate-wide resolution, not a bottleneck in practice

The same limits, but framed as the planning question each one forces — this is how you turn a number into a subnet size or an endpoint count:

Planning question Driven by limit Rule of thumb
How big should the PE subnet be? One IP per PE; PEs per subnet Size for 2–3× current PE count; a /26 is comfortable for most
How many zones do I create? One privatelink.* per sub-resource Enumerate sub-resources in use; typically 3–8 zones
Can one zone serve the whole estate? ~1,000 VNet links per zone Yes for nearly everyone; a single hub zone set scales
Do I need a second resolver? Per-resolver endpoint cap; region locality One resolver per hub region; co-locate with the firewall
How many endpoints will I run? One PE per sub-resource per VNet scope Count = (services × sub-resources used), not service families
Will data-processing cost dominate? Per-GB through the PE Yes for large blob/data-lake transfers; model against throughput

The error and status strings you’ll actually see, what they mean, and the fix:

Symptom / string Where it appears Likely cause Fix
A network-related or instance-specific error (SQL) App / sqlcmd Resolves public IP, public disabled Link the zone; confirm nslookup
Connection timeout, no error detail Any client Stale/missing A record or route blackhole Check record + effective routes
NXDOMAIN on *.privatelink.* nslookup Zone exists, no record Attach privateDnsZoneGroup
403 AuthorizationFailure (storage) Storage SDK Firewall denies (PE leg not used) or RBAC Confirm private resolution; check RBAC/firewall
PE stuck Pending Portal / az … show Manual approval not granted Approve the connection
On-prem fails, Azure works Split testing No conditional forwarder to resolver Add forwarder for the public suffix
Wrong service responds / cert mismatch Client TLS error Wrong group ID on the endpoint Re-create PE with the correct sub-resource

Architecture at a glance

The diagram traces the request exactly as it resolves and flows, then marks where the path silently breaks. Read it left to right. On the far left, an on-premises DNS server conditionally forwards privatelink-suffixed queries into Azure (badge 5 — the hybrid forwarder gap, because 168.63.129.16 is not reachable cross-premises). In the consumer VNet, the application dials the same connection string it always used (mydb…database.windows.net) and a DNS Private Resolver inbound endpoint (10.10.9.4) handles resolution for both spokes and on-prem. The query lands on name resolution: the privatelink.database.windows.net Private DNS zone (badge 1 — if it isn’t linked to this VNet, the client gets the public IP and times out) and its auto-managed A record → 10.20.1.5 (badge 2 — missing or stale if you skipped the privateDnsZoneGroup). With the private IP in hand, the client opens a TDS 1433 connection to the Private Endpoint NIC at 10.20.1.5 (group ID sqlServer), guarded by an NSG/UDR (badge 3 — a 0.0.0.0/0 forced-tunnel route or a dropped port black-holes the leg even when DNS is perfect). From the endpoint, traffic crosses the Microsoft backbone to Azure SQL with public access Disabled (badge 4 — if you never disabled it, the data is private but the exfiltration door is still open).

The lesson the diagram teaches is the diagnostic order: resolve first, route second. Every failure is one numbered hop. If nslookup returns a public IP you are at badge 1 or 5 (a missing link or a missing forwarder); if it returns a private IP but the connection still times out you are at badge 3 (routing) or badge 2 (a stale record pointing at the wrong NIC); and badge 4 is the security check you run after connectivity works, never before. The whole method is: run one resolution check, land on a badge, apply its fix.

Azure Private Link and Private DNS resolution path: an on-premises DNS server conditionally forwarding privatelink queries into Azure, a consumer VNet where an application using its original connection string and a DNS Private Resolver inbound endpoint resolve the name, a name-resolution zone showing the privatelink.database.windows.net Private DNS zone and its auto-managed A record to private IP 10.20.1.5, a Private Endpoint NIC (group ID sqlServer) on TDS port 1433 guarded by an NSG and UDR, and Azure SQL with public network access disabled reached over the Microsoft backbone — with five numbered failure badges for resolves-to-public-IP, missing-or-stale-A-record, NSG-or-UDR-blackhole, public-path-still-open, and hybrid-forwarder-gap

Real-world scenario

Meridian Bank runs a customer-statements API on Azure App Service (Central India) backed by Azure SQL and an Azure Storage account holding generated PDF statements. A regulator audit mandated that no customer data traverse the public internet and that the storage account not be reachable publicly. The platform team — five engineers — owned a hub-and-spoke network: one hub VNet, six spoke VNets (app, data, integration, two test, one shared-services), an Azure Firewall in the hub with a 0.0.0.0/0 forced-tunnel UDR on the spokes, and roughly 40 on-premises analyst workstations that queried the SQL database directly for reporting.

The rollout looked done in an afternoon. The data team created a Private Endpoint for the SQL server (group ID sqlServer) and one for the storage blob sub-resource, created the two privatelink zones, linked them to the app spoke, and flipped publicNetworkAccess=Disabled on both. They tested from an app-spoke VM — nslookup returned the private IPs, the API worked — and declared victory at 17:00.

Three failures surfaced over the next eighteen hours. First, at 17:40 the integration spoke’s nightly reconciliation job started timing out against SQL. The zone was linked to the app spoke but not the integration spoke, so its clients resolved the now-disabled public IP. nslookup from an integration VM returned a public IP — badge 1. Fix: link both zones to every spoke (they scripted it). Second, at 02:15 the storage path failed even though SQL worked from the same spoke. They had created the blob endpoint but the statements service also wrote to file shares — a different sub-resource needing its own endpoint and privatelink.file.core.windows.net zone. The blob endpoint resolved; the file FQDN resolved public and was now firewalled off — the “one PE per sub-resource” trap. Fix: a second endpoint and zone for file. Third, and the slowest to find, at 09:00 the 40 on-prem analysts all failed to connect to SQL. Their corporate DNS had no idea about the privatelink zone, so they resolved the public IP. The team’s first instinct was “open the firewall” — exactly wrong. The correct fix was a DNS Private Resolver in the hub with an inbound endpoint, and a conditional forwarder on the corporate DNS for database.windows.net and *.core.windows.net pointing at the resolver’s inbound IP. nslookup from a workstation then returned the private IP, and traffic flowed over ExpressRoute to the resolver to the zone to the endpoint.

A fourth, quieter issue emerged in week two during a routing review: the SQL endpoint’s NIC inherited the spoke’s 0.0.0.0/0 forced-tunnel route, and although connectivity worked, return traffic was hairpinning through the firewall, adding ~8 ms and showing up oddly in flow logs. They enabled privateEndpointNetworkPolicies=Enabled on the PE subnet and added a /32 local route for each endpoint IP so the PE legs bypassed the firewall — latency dropped and the asymmetry cleared.

The end state: every spoke linked to both zones (via Azure Policy so new spokes auto-link), separate endpoints for sqlServer, blob and file, a DNS Private Resolver serving on-prem, public access disabled on both services, and PE subnets with full network policies and local routes. Monthly Private Link cost landed around ₹2,400 (six endpoints + resolver), a rounding error against the audit finding it cleared. The lesson on the wall: “Private Endpoint is a five-minute job; Private DNS, on every VNet and on-prem, is the actual project. Resolve before you disable.”

The incident as a timeline, because the order of failures is the lesson:

Time Symptom Root cause Fix applied
17:00 App spoke works, victory declared (only app spoke linked)
17:40 Integration job times out to SQL Zone not linked to integration spoke Link both zones to every spoke
02:15 Storage file path fails, blob fine file is a separate sub-resource/zone Add PE + zone for file
09:00 All 40 on-prem analysts fail SQL On-prem resolves public; no forwarder DNS Private Resolver + conditional forwarder
+1 wk PE leg hairpins through firewall 0.0.0.0/0 UDR on PE subnet privateEndpointNetworkPolicies + /32 local route

Advantages and disadvantages

The Private Link + Private DNS model both delivers true private PaaS and imposes a real DNS discipline. Weigh it honestly:

Advantages (why this model wins) Disadvantages (why it bites)
True private connectivity — traffic on the Microsoft backbone, public endpoint disabled DNS is the hard part — hybrid and multi-VNet resolution must be designed, not assumed
No code changes — existing connection strings keep working unchanged A skipped VNet link silently resolves public; the failure is non-obvious
Data-exfiltration control — block egress to other tenants’ PaaS, not just your account Per-endpoint cost — each PE has an hourly + per-GB charge that adds up across services/sub-resources
Granular — one endpoint per sub-resource means least-privilege network exposure The same granularity means more objects (a PE + zone per sub-resource)
Lifecycle-safe with privateDnsZoneGroup — the A record can’t drift Created manually, the A record does drift on re-create — a six-month time bomb
Works across subscriptions and tenants (Private Link service) Cross-tenant adds manual approval state to manage
Centralizable in a hub with one zone set for the whole estate DNS caching can hide a fix or a break for the TTL window, confusing diagnosis

The model is right for any sensitive PaaS in production, regulated data, and zero-trust estates. It is overkill for a dev sandbox where a Service Endpoint or even the public firewall suffices — and that lighter choice is exactly the Azure Private Endpoint vs Service Endpoint: Secure PaaS Access decision. The disadvantages are all manageable, but only if you treat DNS as the project and the endpoint as the easy part — the inverse of how most teams scope it.

Hands-on lab

Stand up Azure SQL with a Private Endpoint, wire Private DNS, prove private resolution, disable public access, and tear it all down — free-tier-friendly (we use a Basic SQL DB and a small VM; delete at the end). Run in Cloud Shell (Bash), but do the resolution test from the VM, because Cloud Shell is not inside your VNet.

Step 1 — Variables and resource group.

RG=rg-pl-lab
LOC=centralindia
VNET=vnet-pl-lab
SQL=sqlpl$RANDOM           # globally-unique server name
PWD='P@ssw0rd-'$RANDOM'!'  # lab only — never reuse
az group create -n $RG -l $LOC -o table

Step 2 — VNet with two subnets (one for the VM, one for the PE).

az network vnet create -g $RG -n $VNET --address-prefix 10.50.0.0/16 \
  --subnet-name snet-vm --subnet-prefix 10.50.1.0/24 -o table
az network vnet subnet create -g $RG --vnet-name $VNET \
  --name snet-pe --address-prefix 10.50.2.0/24 \
  --private-endpoint-network-policies Enabled -o table

Step 3 — A SQL server + Basic database, public for now (we’ll lock it).

az sql server create -g $RG -n $SQL -l $LOC \
  --admin-user sqladmin --admin-password "$PWD" -o table
az sql db create -g $RG --server $SQL -n statementsdb \
  --service-objective Basic -o table

Step 4 — Create the Private Endpoint (group ID sqlServer).

SQLID=$(az sql server show -g $RG -n $SQL --query id -o tsv)
az network private-endpoint create -g $RG -n pe-sql \
  --vnet-name $VNET --subnet snet-pe \
  --private-connection-resource-id "$SQLID" \
  --group-id sqlServer --connection-name pe-sql-conn -o table

Step 5 — Private DNS zone, link to the VNet, and the auto A record.

az network private-dns zone create -g $RG -n privatelink.database.windows.net -o table
VID=$(az network vnet show -g $RG -n $VNET --query id -o tsv)
az network private-dns link vnet create -g $RG \
  --zone-name privatelink.database.windows.net \
  --name link-lab --virtual-network "$VID" --registration-enabled false -o table
az network private-endpoint dns-zone-group create -g $RG \
  --endpoint-name pe-sql --name pdzg \
  --private-dns-zone privatelink.database.windows.net --zone-name sql -o table

Confirm the auto-created record points at the PE IP:

az network private-dns record-set a list -g $RG \
  --zone-name privatelink.database.windows.net \
  --query "[].{name:name, ip:aRecords[0].ipv4Address}" -o table
# Expect: a record for the server name → an IP in 10.50.2.0/24

Step 6 — A tiny VM in the VNet to test resolution from inside.

az vm create -g $RG -n vm-test --image Ubuntu2204 \
  --vnet-name $VNET --subnet snet-vm \
  --admin-username azureuser --generate-ssh-keys --size Standard_B1s -o table
az vm run-command invoke -g $RG -n vm-test --command-id RunShellScript \
  --scripts "nslookup $SQL.database.windows.net"

Expected: the output shows a CNAME to $SQL.privatelink.database.windows.net resolving to a private 10.50.2.x address — DNS is working privately.

Step 7 — Now disable public access (safe, because private resolves).

az sql server update -g $RG -n $SQL --set publicNetworkAccess=Disabled -o table

Re-run the nslookup from the VM (still private) — connectivity from the VNet is unaffected; only the public door is shut.

Validation checklist. You created a Private Endpoint, wired a Private DNS zone with an auto-managed record, proved the name resolves to a private IP from inside the VNet, and only then disabled public access. The mapping of step to lesson:

Step What you did What it proves
4 PE with --group-id sqlServer The endpoint targets exactly one sub-resource
5 Zone + link + dns-zone-group Resolution needs all three, and the record auto-manages
6 nslookup from the VM Private resolution is real and VNet-scoped (not Cloud Shell)
7 Disable public after validating The correct order that avoids the classic outage

Cleanup (avoid lingering charges).

az group delete -n $RG --yes --no-wait

Cost note. A Basic SQL DB and a B1s VM for an hour are a few rupees; the Private Endpoint is a fraction of a rupee per hour. Deleting the resource group stops everything. Total lab cost well under ₹50.

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First as a scannable table you can read mid-incident, then the same entries with full confirm-command detail underneath.

# Symptom Root cause Confirm (exact cmd / portal path) Fix
1 App times out right after public access disabled Private DNS zone not linked to the client’s VNet nslookup <fqdn> returns a public IP; az network private-dns link vnet list Create a link vnet to every VNet that resolves
2 NXDOMAIN on *.privatelink.*, or no private IP No A record (skipped privateDnsZoneGroup) az network private-dns record-set a list (empty) Attach a privateDnsZoneGroup to the PE
3 Resolves to a private IP but the wrong one Stale manual A record after PE re-create record-set a list IP ≠ private-endpoint show NIC IP Delete manual record; use auto zone-group
4 Storage blob works, file/queue fails One PE only covers one sub-resource private-endpoint show … groupIds shows only blob Add a PE + zone per sub-resource you use
5 On-prem clients fail, Azure clients fine No conditional forwarder to a resolver nslookup from on-prem returns public; from VM returns private DNS Private Resolver inbound + on-prem conditional forwarder
6 Resolves private, connection still times out 0.0.0.0/0 UDR black-holes the PE leg nic show-effective-route-table on the PE NIC /32 local route for the PE IP; or exclude from forced tunnel
7 Wrong service / TLS cert mismatch on connect Wrong group ID on the endpoint private-endpoint show … groupIds ≠ intended Re-create PE with the correct sub-resource
8 PE stuck, never serves Connection in Pending (manual approval) private-endpoint show … connectionState = Pending Approve via private-endpoint-connection approve
9 Key Vault PE resolves nothing Wrong zone name (vault.azure.net not vaultcore) private-dns zone list shows the wrong name Create privatelink.vaultcore.azure.net
10 Fix applied but still broken for minutes DNS caching (client/forwarder TTL) ipconfig /flushdns; compare fresh nslookup Wait out TTL; flush caches; verify on a fresh client
11 NSG “isn’t blocking/protecting” the PE privateEndpointNetworkPolicies Disabled vnet subnet show … privateEndpointNetworkPolicies Set to Enabled so NSG/UDR apply
12 New spoke can’t resolve PaaS New VNet never linked to the central zones private-dns link vnet list lacks the new VNet Link it; better, enforce links via Azure Policy

The expanded form, with full reasoning for the entries that bite hardest:

1. App times out the moment public access is disabled. Root cause: The Private DNS zone is not linked to the VNet the client resolves from, so it gets the (now firewalled) public IP. Confirm: From a client/VM in that VNet, nslookup <fqdn> returns a public IP; az network private-dns link vnet list -g rg-net-prod --zone-name privatelink.database.windows.net does not list that VNet. Fix: az network private-dns link vnet create … --virtual-network <vnetId> --registration-enabled false for every VNet that must resolve. In hub-and-spoke, that’s all the spokes.

2. NXDOMAIN on the privatelink name, or it never returns a private IP. Root cause: The zone exists and is linked, but there is no A record — you created the endpoint and zone but skipped the privateDnsZoneGroup. Confirm: az network private-dns record-set a list -g rg-net-prod --zone-name privatelink.database.windows.net is empty. Fix: Attach a zone-group: az network private-endpoint dns-zone-group create … --private-dns-zone privatelink.database.windows.net. The record appears and self-manages.

3. Resolves to a private IP, but the wrong one — connection blackholes. Root cause: A manually created A record that drifted after the endpoint was deleted and re-created with a new dynamic IP. Confirm: Compare record-set a list (the IP in DNS) against az network private-endpoint show -n pe-sql-prod -g rg-net-prod --query "customDnsConfigs[0].ipAddresses" (the real NIC IP). They differ. Fix: Delete the manual record and attach a privateDnsZoneGroup so the platform keeps it correct; or pin the PE to a static IP if you truly must manage the record by hand.

4. One storage sub-resource works, another doesn’t. Root cause: A Private Endpoint targets exactly one sub-resource (group ID). A blob endpoint does nothing for file, queue, table, dfs, or web. Confirm: az network private-endpoint show -n pe-stg-blob -g rg-net-prod --query "privateLinkServiceConnections[].groupIds" shows only blob. Fix: Create a separate endpoint and matching privatelink.<sub>.core.windows.net zone for each sub-resource the app uses.

5. On-prem clients resolve public; Azure clients resolve private. Root cause: On-premises DNS cannot reach 168.63.129.16, so without a forwarder it resolves the public name publicly. Confirm: nslookup <fqdn> from an on-prem workstation returns a public IP, while the same command on an Azure VM returns the private IP. Fix: Deploy a DNS Private Resolver (inbound endpoint) in the hub and configure on-prem DNS to conditionally forward the public suffix (e.g. database.windows.net) to the resolver’s inbound IP. Forward the public suffix, not the privatelink one.

6. DNS is right (private IP) but the connection still times out. Root cause: A 0.0.0.0/0 forced-tunnel UDR on the PE subnet black-holes or asymmetrically routes the endpoint’s traffic through a firewall that drops it. Confirm: az network nic show-effective-route-table --ids <pe-nic-id> shows the 0.0.0.0/0 next-hop to a firewall applying to the PE. Fix: Add a /32 route for the PE IP with next-hop VnetLocal (or exclude the PE prefix from the forced-tunnel route), and ensure privateEndpointNetworkPolicies is Enabled so the route table actually applies.

7. Connects to the wrong thing / TLS certificate name mismatch. Root cause: The endpoint was created against the wrong group ID, so it maps to a different sub-resource than the client expects. Confirm: az network private-endpoint show … --query "privateLinkServiceConnections[].groupIds" doesn’t match the intended sub-resource. Fix: You can’t change a PE’s group ID in place — delete and re-create with the correct --group-id, and fix the matching zone.

8. The endpoint exists but never serves traffic. Root cause: The private-link connection is Pending (manual approval), common cross-tenant or under governance. Confirm: az network private-endpoint show … --query "privateLinkServiceConnections[].privateLinkServiceConnectionState.status" returns Pending. Fix: Approve it from the resource owner side: az network private-endpoint-connection approve ….

9. Key Vault Private Endpoint resolves nothing. Root cause: The zone was created as privatelink.vault.azure.net (the public suffix) instead of the data-plane zone privatelink.vaultcore.azure.net. Confirm: az network private-dns zone list -g rg-net-prod -o table shows the wrong name. Fix: Create privatelink.vaultcore.azure.net, link it, and attach the zone-group to the vault’s PE.

10. You fixed it, but it’s still broken for several minutes. Root cause: DNS caching — the client or an intermediate forwarder is serving the old answer for the TTL window. Confirm: A fresh nslookup (or one from a different machine) returns the correct private IP while the affected client still shows the old one. Fix: Flush the client cache (ipconfig /flushdns / restart resolver), wait out the TTL on forwarders, and verify from a clean client before concluding the fix failed.

11. The NSG on the PE subnet seems to do nothing. Root cause: privateEndpointNetworkPolicies is Disabled (the legacy default), so NSGs and UDRs are ignored on the PE NIC. Confirm: az network vnet subnet show -g rg-net-prod --vnet-name vnet-hub -n snet-privatelink --query privateEndpointNetworkPolicies. Fix: Set it to Enabled (or the specific NSG/RouteTable mode you need).

12. A newly added spoke can’t reach any PaaS. Root cause: The new VNet was never linked to the central privatelink.* zones, so it resolves public. Confirm: az network private-dns link vnet list … lacks the new VNet. Fix: Link it to each zone; enforce link creation with Azure Policy so new spokes are auto-linked and humans can’t forget.

Best practices

Security notes

The security controls and what each closes:

Control Mechanism Closes / mitigates
Private Endpoint + private DNS NIC + privatelink zone Public data-plane path
publicNetworkAccess=Disabled Service firewall Inbound over the public endpoint
Egress firewall on PaaS suffixes Azure Firewall application rules Exfil to other tenants’ accounts
Private Link policies Platform policy Connecting to out-of-tenant PaaS
RBAC on the Private DNS zone Private DNS Zone Contributor scope Malicious/accidental record redirection
NSG on the PE subnet privateEndpointNetworkPolicies Enabled Lateral reach to the endpoint port

Cost & sizing

The bill drivers for Private Link are small per object but multiply across services and sub-resources:

A rough monthly picture for a typical sensitive workload: 3–6 Private Endpoints (SQL + storage sub-resources + Key Vault) at a few hundred rupees combined, the matching privatelink zones at tens of rupees, and (if hybrid) a DNS Private Resolver at roughly ₹1,500–2,500/month. Meridian Bank’s six endpoints plus resolver landed near ₹2,400/month — trivial against the compliance requirement it satisfied. The drivers and what each buys:

Cost driver What you pay for Rough INR / month What it fixes / enables Watch-out
Private Endpoint (each) Hourly + per-GB processed ~₹150–300 + data One sub-resource’s private path Multiplies per sub-resource
Private DNS zone (each) Per-zone + per-query ~₹10–30 The name→private-IP mapping Many zones, but each is tiny
VNet link (each) Included with the zone ~₹0 A spoke resolving the zone Free, but easy to forget
DNS Private Resolver Per-endpoint hourly + per-query ~₹1,500–2,500 Hybrid + central resolution Inbound and outbound billed separately
Endpoint data processing Per-GB through the PE scales with traffic (throughput) Dominant for large blob transfers
Forwarder VM (alternative) VM + ops ~₹2,000+ and your time Hybrid (the DIY way) You patch/HA/scale it — prefer the resolver

The right-sizing rule: you don’t size Private Link, you enumerate it — count the sub-resources you actually use, one endpoint and zone each, link to every resolving VNet, one resolver per hub region for hybrid. The cost follows the count, and the count follows your real data dependencies.

Interview & exam questions

1. Why does an application fail to connect to Azure SQL right after you disable public network access, even though the Private Endpoint is healthy? Because the application still resolves the public FQDN to the public IP — which is now firewalled off — since no Private DNS zone is linked to the client’s VNet. The endpoint moved the IP, but DNS still points the name at the public address. Fix by creating the privatelink.database.windows.net zone, linking it to the VNet, and attaching a privateDnsZoneGroup; confirm with nslookup returning the private IP.

2. What is a group ID (sub-resource) and why does it matter? It selects which service a Private Endpoint targets — sqlServer for SQL, blob/file/queue for the respective storage services, vault for Key Vault, sites for App Service. A single endpoint connects to exactly one sub-resource, so a blob endpoint does nothing for file. You must create a separate endpoint (and DNS zone) per sub-resource you use.

3. What does the privateDnsZoneGroup do, and why prefer it over a manual A record? It attaches to the Private Endpoint and tells Azure to create and lifecycle-manage the A record in the named privatelink zone — creating it on deploy, updating it if the IP changes, and deleting it when the endpoint is deleted. A manual A record drifts the day the endpoint is re-created with a new dynamic IP, causing a silent blackhole; the zone-group can’t drift.

4. A VNet client resolves the private IP but an on-premises client resolves the public IP. Why, and how do you fix it? On-premises clients cannot reach 168.63.129.16 (it’s link-local to the VNet), so they resolve the public name via corporate DNS. Fix by deploying an Azure DNS Private Resolver inbound endpoint in the hub and configuring corporate DNS to conditionally forward the public suffix (e.g. database.windows.net) to the resolver’s inbound IP.

5. DNS returns the correct private IP but the connection still times out. Where do you look? This is no longer a DNS problem — look at routing/filtering on the PE leg. A 0.0.0.0/0 forced-tunnel UDR may black-hole the endpoint through a firewall. Confirm with az network nic show-effective-route-table on the PE NIC; fix with a /32 local route for the PE IP (and ensure privateEndpointNetworkPolicies is Enabled so routes apply).

6. In a hub-and-spoke estate, where do the Private DNS zones live and how do spokes resolve? Centralize one set of privatelink.* zones in the hub and create a virtual-network link from each zone to every spoke that resolves PaaS. Each spoke’s own 168.63.129.16 then consults the zones linked to it. Enforce the links with Azure Policy so new spokes are auto-linked.

7. What’s the exact Private DNS zone name for Key Vault, and why is it a common mistake? It’s privatelink.vaultcore.azure.net — the data-plane suffix — not privatelink.vault.azure.net. People copy the public FQDN suffix (vault.azure.net) and create the wrong zone, which resolves nothing with no error at create time. Always copy the exact zone name from a reference.

8. How does Private Link help with data exfiltration, beyond just making the path private? A public PaaS endpoint allows outbound to the entire service namespace (*.blob.core.windows.net), so a compromised VM can copy data to an attacker’s account. Private Link plus egress filtering (restricting outbound to only your endpoints) and public access disabled blocks copying to other tenants’ resources — protecting the service, not just your account.

9. Do NSGs and UDRs apply to a Private Endpoint NIC? Only when privateEndpointNetworkPolicies is enabled on the subnet. The legacy default was Disabled, meaning NSGs and UDRs were ignored on the PE NIC — which surprises people both when their NSG “doesn’t protect” the PE and when a forced-tunnel route “doesn’t catch” it. Set it to Enabled for full control.

10. What is a Private Link service (as opposed to a Private Endpoint)? A Private Link service is the provider side: you put your own service behind a Standard Load Balancer and publish it so that consumers in other VNets (or other tenants) can create Private Endpoints to reach it privately. The Private Endpoint is the consumer side. Together they let you offer a SaaS-style private service across tenant boundaries.

11. You re-created a Private Endpoint and now traffic blackholes despite a correct-looking DNS record. What happened? The endpoint got a new dynamic private IP, but the A record was manually created and still points at the old IP. The fix is to use a privateDnsZoneGroup (which would have updated automatically) or pin the endpoint to a static IP. Confirm by comparing the DNS record IP against the endpoint’s current NIC IP.

12. When would you choose a Service Endpoint over a Private Endpoint? When you need to restrict a PaaS service to your VNet but don’t need a private IP or to disable the public endpoint — Service Endpoints are free and simpler, but the service keeps its public IP and they don’t block exfiltration to other tenants. For sensitive/regulated data or zero-trust, choose Private Endpoint. This is the Azure Private Endpoint vs Service Endpoint: Secure PaaS Access decision.

These map to AZ-700 (Network Engineer)design and implement private access to Azure services (Private Link, Private Endpoint, Private DNS, DNS Private Resolver) — and AZ-500 (Security Engineer)implement platform protection / secure PaaS (public access, exfiltration, network isolation). The hub-and-spoke DNS and landing-zone angles also touch AZ-305 (Solutions Architect). A compact cert-mapping for revision:

Question theme Primary cert Exam objective area
Private Endpoint, group IDs, zones AZ-700 Design & implement private access to services
DNS Private Resolver, hybrid forwarding AZ-700 Design & implement name resolution
Public access disabled, exfiltration AZ-500 Secure PaaS; platform protection
Hub-and-spoke DNS, zone-group automation AZ-305 Design network & governance
NSG/UDR on PE, effective routes AZ-700 Implement & manage VNet routing
Private Link service (provider side) AZ-700 Design & implement service delivery

Quick check

  1. You disable public access on Azure SQL and every app instantly times out, though the Private Endpoint is healthy. What is the one check you run first, and what does a public IP in the result tell you?
  2. A storage account’s blob access works through its Private Endpoint, but file shares fail. Why, and what’s the fix?
  3. True or false: creating the A record by hand in the privatelink zone is the recommended way to wire a Private Endpoint.
  4. On-premises analysts resolve the public IP while Azure VMs resolve the private IP for the same database. Name the root cause and the fix.
  5. DNS returns the correct private IP but connections still time out. Is this a DNS problem? Where do you look?

Answers

  1. Run nslookup <fqdn> from a client inside the VNet. A public IP in the result means the Private DNS zone is not linked to that VNet, so the client resolves the now-firewalled public address. Fix by creating a virtual-network-link from the zone to that VNet (and to every spoke in hub-and-spoke).
  2. A Private Endpoint targets exactly one sub-resource (group ID). The blob endpoint does nothing for file — they are different services with different FQDNs and zones. Create a separate endpoint and privatelink.file.core.windows.net zone for the file sub-resource.
  3. False. Use a privateDnsZoneGroup so the platform creates and lifecycle-manages the record. A manual record drifts the day the endpoint is re-created with a new dynamic IP, causing a silent blackhole.
  4. On-premises clients cannot reach 168.63.129.16, so they resolve publicly. Fix by deploying an Azure DNS Private Resolver inbound endpoint and configuring corporate DNS to conditionally forward the public suffix (e.g. database.windows.net) to the resolver’s inbound IP.
  5. No — DNS is exonerated once it returns the private IP. Look at routing/filtering on the PE leg: a 0.0.0.0/0 forced-tunnel UDR black-holing the endpoint (confirm with nic show-effective-route-table), or an NSG dropping the port. Fix with a /32 local route for the PE IP and privateEndpointNetworkPolicies=Enabled.

Glossary

Next steps

You can now build a Private Endpoint, wire Private DNS on every VNet and on-premises, and diagnose any resolution or routing failure to a single hop. Build outward:

AzurePrivate LinkPrivate DNSPrivate EndpointPaaSDNS Private ResolverNetworkingData Exfiltration
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading