Almost every Private Link tutorial puts you on the consumer side: you create a private endpoint, point it at someone else’s storage account or someone else’s SaaS, and traffic stays off the internet. This article is the other half — you are the SaaS. You have an application behind a load balancer in your own VNet, and you want a hundred different customer subscriptions to reach it over a private IP in their address space, with no peering, no shared routing, and no exposure of your internals. That is exactly what an Azure Private Link Service does, and standing it up correctly means getting four things right that the docs gloss over: connection approval, NAT IP allocation, the source-IP problem, and the DNS story you hand your consumers.
1. Consumer endpoint vs provider service: pick your side
Private Link has two objects and people conflate them constantly.
| Private Endpoint (consumer) | Private Link Service (provider) | |
|---|---|---|
| Who creates it | The consumer, in their VNet | You, the provider, in your VNet |
| What it fronts | Someone else’s service (PaaS or a PLS) | Your Standard Load Balancer frontend |
| Address space | Consumer’s subnet | Your subnet (NAT pool) |
| Identity exchanged | n/a | A globally unique alias you give out |
| Routing between VNets | None — that is the point | None |
The key property: there is no VNet peering and no route exchange between the two sides. The Private Link Service projects your load balancer’s frontend into the consumer’s VNet as a NIC with a private IP they choose, and Microsoft’s backbone shuttles the TCP flow across. Your VNet and theirs can use the same 10.0.0.0/16; it does not matter, because nothing is routed — the connection is mapped, not routed. This is why Private Link is the only sane multi-tenant private-access model: it scales to thousands of consumers without a single route table touching another tenant.
Two hard constraints to internalize before you build:
- The Private Link Service can only front a Standard SKU Load Balancer. Basic LB and Application Gateway are not eligible. If your app sits behind App Gateway today, you put a Standard LB in front of (or alongside) it for the PLS path.
- Traffic flows consumer to provider only on the TCP connection the consumer initiates. You cannot use the PLS to call into the consumer.
2. Architecture: Standard LB, the Private Link Service, and the NAT subnet
The moving parts:
Consumer VNet (10.0.0.0/16) Your provider VNet (10.50.0.0/16)
Private Endpoint NIC 10.0.4.9 ───► Private Link Service
│ NAT IPs drawn from
▼ subnet 10.50.1.0/24 (PLS subnet)
Standard LB frontend 10.50.2.10
│
▼
Backend pool: app VMs / VMSS 10.50.3.x
The single most important design decision is the NAT subnet. The Private Link Service allocates a private IP for each connection from the subnet you nominate (enable-ip-config / nat-ip-configuration). This is how it keeps tenant flows distinct on your side — every consumer connection lands on a NAT IP out of that pool. Size that subnet generously and dedicate it to the PLS; do not co-locate other workloads there.
The PLS NAT subnet must have
privateLinkServiceNetworkPoliciesdisabled, exactly like a private endpoint subnet must haveprivateEndpointNetworkPoliciesdisabled. Forgetting this is the number-one “create failed for no obvious reason” error.
First, the subnet and the Standard LB. I am assuming the LB and backend already exist; the snippet below shows the PLS-relevant subnet flag.
RG="rg-saas-prod"
LOC="eastus2"
VNET="vnet-saas"
PLS_SUBNET="snet-pls-nat"
# Dedicated NAT subnet for the PLS, network policies OFF
az network vnet subnet update \
--resource-group "$RG" \
--vnet-name "$VNET" \
--name "$PLS_SUBNET" \
--private-link-service-network-policies Disabled
Now create the Private Link Service against the Standard LB’s frontend IP configuration, allocating NAT IPs dynamically from that subnet:
LB_NAME="lb-saas-std"
LB_FRONTEND="fe-app"
az network private-link-service create \
--resource-group "$RG" \
--name "pls-saas-app" \
--location "$LOC" \
--lb-name "$LB_NAME" \
--lb-frontend-ip-configs "$LB_FRONTEND" \
--vnet-name "$VNET" \
--subnet "$PLS_SUBNET" \
--private-ip-address-version IPv4 \
--enable-proxy-protocol true
Note --enable-proxy-protocol true up front — we will need it in section 4, and toggling it later forces every consumer connection to reset. After creation, capture the alias. This is the globally unique string consumers use instead of any IP or FQDN of yours:
az network private-link-service show \
--resource-group "$RG" --name "pls-saas-app" \
--query "{alias:alias, visibility:visibility, autoApproval:autoApproval}" -o jsonc
The alias looks like pls-saas-app.7b2c1f9e-...region.azure.privatelinkservice. That string, not your VNet or IP, is your product’s connection coordinate.
3. Connection approval: auto-approval vs manual visibility
When a consumer creates a private endpoint against your alias, the connection enters a Pending state on your PLS until you decide. Two settings govern who can even see your service and who gets in without a human:
- Visibility — which subscriptions are allowed to attempt a connection. Restrict this to a role-based allow-list, or open it to everyone (
*) if you are a public SaaS. - Auto-approval — which subscriptions are approved automatically on connect. A subscription in this list skips the pending queue entirely.
The two lists are independent and you usually want them tiered: visibility broad (so prospects can connect), auto-approval narrow (only paying tenants you have onboarded).
# Visible to two partner subs; auto-approve only the production one
az network private-link-service update \
--resource-group "$RG" --name "pls-saas-app" \
--visibility "sub:1111aaaa-... sub:2222bbbb-..." \
--auto-approval "sub:1111aaaa-..."
For everyone not in the auto-approval list, the connection sits pending and you approve it deliberately. This is the hook for your onboarding workflow — gate it behind a payment/contract check, not a human clicking a portal.
# List pending consumer connections
az network private-link-service connection list \
--resource-group "$RG" --service-name "pls-saas-app" \
--query "[?privateLinkServiceConnectionState.status=='Pending'].{name:name, desc:privateLinkServiceConnectionState.description}" -o table
# Approve one
az network private-link-service connection update \
--resource-group "$RG" --service-name "pls-saas-app" \
--name "<connection-name>" \
--connection-status Approved \
--description "Onboarded: contract ACME-4471"
The consumer can attach a free-text request message when they create their endpoint. Treat it as untrusted, but it is the channel for them to pass you a tenant ID or order number so your approval automation can correlate. Never auto-approve on the request message alone.
4. The source-IP problem: NAT, and recovering the real client IP
Here is the trap that bites every provider on day one. Because the Private Link Service NATs each connection onto an IP from your PLS subnet, your backend sees the source IP as a PLS NAT address in your own 10.50.1.0/24 — not the consumer’s real client IP. Every tenant looks like it is coming from the same handful of NAT IPs. Your access logs, your geo/IP allow-lists, your per-tenant rate limiting — all blind.
You cannot recover the original IP from the L3 header; it has been rewritten by design. The supported escape hatch is the TCP PROXY protocol. With enable-proxy-protocol set on the PLS (we did this in section 2), Azure prepends a PROXY protocol v2 header to the TCP byte stream before your application’s payload. That header carries the consumer’s original source IP and port, plus a Private Link-specific TLV with the consumer’s LinkID.
The catch every provider hits: your backend application must be coded to read and strip the PROXY header, or it will treat those binary bytes as the first bytes of the client request and corrupt the protocol. This is not transparent. You opt your application in, not just the PLS.
Most fleets terminate this at the reverse proxy. NGINX, for example, reads PROXY protocol on a listener and re-exposes the true client IP:
server {
# 'proxy_protocol' tells NGINX every connection on this listener
# begins with a PROXY protocol header (v1 or v2) it must parse.
listen 443 ssl proxy_protocol;
# $proxy_protocol_addr is the REAL consumer client IP recovered
# from the header, not the PLS NAT IP.
set_real_ip_from 10.50.1.0/24; # trust the PLS NAT subnet
real_ip_header proxy_protocol;
location / {
proxy_pass http://app_backend;
proxy_set_header X-Forwarded-For $proxy_protocol_addr;
proxy_set_header X-Real-IP $proxy_protocol_addr;
}
}
The Private Link TLV (type 0xEA, sub-type 0x01) carries the consumer’s LinkID — a stable integer per consumer connection. If you parse PROXY v2 yourself (HAProxy, Envoy, or custom code), you can read that TLV to attribute the flow to a specific approved connection, which is gold for multi-tenant audit. The Azure-specific subtype layout is the PP2_TYPE_AZURE extension:
PROXY protocol v2 header (binary)
signature 0x0D0A0D0A000D0A515549540A
ver/cmd 0x21 (v2, PROXY)
fam/proto 0x11 (AF_INET, STREAM)
len ...
src addr <consumer real client IP> <-- what you actually want
dst addr <PLS NAT IP>
TLV type 0xEA, subtype 0x01, value=LinkID (Azure private link)
If your app cannot speak PROXY protocol and you control nothing in front of it, your fallback is to leave it off and accept that you have no real client IP — in which case all per-tenant identity must come from your application-layer auth (mTLS client cert, bearer token), never from L3.
5. Scaling: NAT IP exhaustion and the connection ceiling
Two limits cap how far one Private Link Service scales, and they interact.
NAT IP / port pressure. Each frontend IP configuration on the PLS provides SNAT ports for the connections mapped through it, and connections draw NAT addresses from your PLS subnet. The published ceiling is 8 NAT IP configurations per Private Link Service, and you should plan to add frontend IP configurations as connection volume grows rather than assume one frontend serves unlimited tenants. If you see new consumer connections failing while existing ones are healthy — the provider-side analogue of SNAT exhaustion — you are out of NAT capacity, and the fix is another frontend IP config (and the LB frontend behind it), not a bigger VM.
# Add a second NAT IP configuration to spread connection load
az network private-link-service update \
--resource-group "$RG" --name "pls-saas-app" \
--add ipConfigurations \
name=natipconfig2 \
properties.subnet.id="/subscriptions/.../subnets/$PLS_SUBNET" \
properties.privateIPAllocationMethod=Dynamic \
properties.primary=false
Subnet sizing. Because every concurrent consumer connection consumes a NAT IP from the subnet, the subnet’s usable address count is a hard upper bound on your concurrency. A /24 gives ~250 usable addresses; if you expect thousands of simultaneous tenant connections, size the NAT subnet accordingly at creation — you cannot grow a subnet’s prefix in place without recreating it.
Watch the per-PLS connection count and the NAT subnet’s free IP count as first-class SLO metrics. Provider-side exhaustion presents identically to a consumer outage (“we can’t connect to your SaaS”) but the root cause is entirely on your side and invisible to them.
6. Consumer experience: their private endpoint and Private DNS zone
Your consumer does almost nothing special — and that is the selling point. They create a private endpoint in their VNet pointed at your alias:
# Run by the CONSUMER, in their subscription/VNet
az network private-endpoint create \
--resource-group "rg-consumer" \
--name "pe-acme-to-saas" \
--vnet-name "vnet-consumer" --subnet "snet-pe" \
--private-connection-resource-id "<YOUR-PLS-ALIAS-or-resource-id>" \
--connection-name "conn-acme-saas" \
--manual-request true \
--request-message "tenant=ACME order=4471"
--manual-request true is what they use when connecting by your alias across tenants (they cannot see your resource, so it must go to your pending queue). If you put their subscription in auto-approval, it still flows through but is approved instantly.
Now DNS. Unlike a PaaS private endpoint, you do not get a Microsoft-managed privatelink.* zone — there is no public CNAME chain pointing at a privatelink alias, because the service is yours. The consumer must map your customer-facing FQDN to the private endpoint’s IP themselves. The clean pattern you should document for them: a Private DNS zone for your product domain, with an A record to their endpoint NIC’s IP.
# Consumer hosts YOUR app FQDN privately, pointing at their PE NIC IP
az network private-dns zone create \
--resource-group "rg-consumer" --name "app.your-saas.com"
az network private-dns link vnet create \
--resource-group "rg-consumer" --zone-name "app.your-saas.com" \
--name "link-consumer" --virtual-network "vnet-consumer" \
--registration-enabled false
PE_IP=$(az network private-endpoint show -g rg-consumer -n pe-acme-to-saas \
--query "customDnsConfigs[0].ipAddresses[0]" -o tsv)
az network private-dns record-set a add-record \
--resource-group "rg-consumer" --zone-name "app.your-saas.com" \
--record-set-name "api" --ipv4-address "$PE_IP"
Now api.app.your-saas.com resolves to the private endpoint inside the consumer VNet and to your public name everywhere else. Critically, your TLS certificate must be valid for the FQDN the consumer dials (api.app.your-saas.com). The consumer terminates TLS against your cert over the private path — there is no Azure-managed cert here. Ship them the exact FQDN, and make sure your cert SAN covers it.
7. Cross-tenant, cross-region, and alias distribution
The Private Link Service is region-bound: the PLS, its LB, and its NAT subnet all live in one region. But the consumer can be in any region. Their traffic reaches your region over the Microsoft backbone; you do not deploy a PLS in the consumer’s region. If you want consumers to land in the nearest region for latency, you deploy an independent PLS per provider region and hand out a different alias per region (often fronted by Traffic Manager or Front Door at the public edge for discovery, but the private connection itself is per-region-alias).
Cross-tenant is the normal SaaS case and works out of the box: the consumer is in a completely different Entra tenant. There is no tenant trust, no guest accounts, nothing to federate — the alias plus your approval is the entire contract. Distribute the alias like an API key: it is not a secret (visibility/approval is your real control), but it is the stable identifier you put in onboarding docs and Terraform modules you ship to customers.
A subtle approval property worth stating plainly:
Auto-approval is keyed on the consumer’s subscription ID, which is stable, not on their tenant. Visibility is likewise per-subscription. So onboarding a customer means collecting their subscription ID(s) and adding them to your auto-approval list — there is no concept of “approve this tenant.”
Verify
Prove the full path from a real consumer VNet — do not trust portal “Approved” status alone.
-
Endpoint provisioned and approved. From the consumer side:
az network private-endpoint show -g rg-consumer -n pe-acme-to-saas \ --query "privateLinkServiceConnections[0].privateLinkServiceConnectionState" -o jsonc # expect: { "status": "Approved", ... } -
DNS resolves to a private IP. From a VM in the consumer VNet:
nslookup api.app.your-saas.com # expect an answer inside the consumer's PE subnet (e.g. 10.0.4.x), NOT a public IP -
Private TCP path is live. From the same VM:
nc -vz api.app.your-saas.com 443 # expect: succeeded / connected curl -sS https://api.app.your-saas.com/healthz -
Source IP is the REAL client, not a NAT IP. This is the one that actually proves PROXY protocol works. Hit the service from the consumer VM and read what your application logged as the client address:
# On a provider backend host, tail the access log while the consumer curls: # NGINX log_format should emit $proxy_protocol_addr tail -f /var/log/nginx/access.logThe logged client IP must equal the consumer VM’s private IP, not a
10.50.1.xPLS NAT address. If you see the NAT IP, PROXY protocol is enabled on the PLS but your app is not parsing it (orset_real_ip_fromdoes not trust the NAT subnet).
Checklist
Enterprise scenario
A platform team running a multi-tenant fraud-scoring API onboarded their first twenty banking customers over a single Private Link Service and shipped to production. Within two weeks, three things broke at once. First, their per-tenant rate limiter — keyed on source IP — started throttling unrelated customers together, because every tenant arrived on the same four PLS NAT IPs in their 10.60.1.0/24 subnet. Second, their geo-fencing (block requests originating outside the customer’s contracted region) blocked everyone, because the NAT IP was always their own region. Third, around 250 concurrent connections, brand-new customer endpoints started failing to connect while existing customers stayed healthy.
The root causes were the same two things this article warns about. The rate limiter and geo-fence were reading the L3 source IP, which Private Link had NATed away. And the /24 NAT subnet capped concurrency at ~250 addresses — they had quietly hit provider-side NAT exhaustion.
The fix had three parts. They enabled PROXY protocol on the PLS and put an Envoy listener in front of the app to recover the true client IP and stamp it into a trusted header, then re-keyed both the rate limiter and the geo-fence on that header instead of the socket address. They added a second frontend IP configuration plus a larger dedicated NAT subnet to lift the connection ceiling. And they added two SLO alerts they had never thought to create: PLS NAT-subnet free-IP count, and per-PLS connection count. The Envoy fragment that fixed identity attribution:
# Envoy listener: trust and parse the PROXY protocol header Azure prepends,
# so downstream filters see the consumer's real client IP, not the NAT IP.
listener_filters:
- name: envoy.filters.listener.proxy_protocol
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.listener.proxy_protocol.v3.ProxyProtocol
# Optionally surface the Azure LinkID TLV (0xEA) as request metadata
rule:
- tlv_type: 0xEA
on_tlv_present:
key: "azure_link_id"
The lesson the team wrote into their platform runbook: on the provider side of Private Link, the source IP your application sees is a lie by design, and the NAT subnet prefix is a hard concurrency limit set at creation time. Both must be solved before the first customer, not after the third incident.