Standing up an Application Gateway is a ten-minute portal exercise. Standing up one that routes three apps off one public IP, terminates and re-encrypts TLS with certificates it never stores, autoscales across availability zones, and runs a WAF in Prevention mode that hasn’t broken a single legitimate request in six months – that takes deliberate work. This walkthrough builds exactly that, then spends real time on the part everyone skips: tuning the WAF from Detection to Prevention by reading logs instead of guessing.
Where regional L7 fits: App Gateway vs Front Door vs a plain Load Balancer
Three Azure services overlap in people’s heads and they shouldn’t. Pick wrong and you either pay for an edge you can’t lock down or hit a protocol wall.
| Service | Scope | OSI layer | WAF | Best for |
|---|---|---|---|---|
| Application Gateway v2 | Regional | L7 (HTTP/HTTPS) | Yes (regional WAF policy) | In-VNet web apps/APIs needing path routing, mTLS to backends, private backends |
| Front Door (Std/Premium) | Global | L7 (HTTP/HTTPS) | Yes (global WAF policy) | Anycast edge, caching, global failover across regions |
| Load Balancer (Standard) | Regional | L4 (TCP/UDP) | No | Non-HTTP TCP/UDP, ultra-low-latency passthrough, any protocol |
The deciding questions: do you need the L7 features (URL path routing, header rewrite, cookie affinity, TLS termination) inside a VNet where backends are private? That’s Application Gateway. Do you need a global anycast front with caching? That’s Front Door. Is it non-HTTP, or do you want pure L4 passthrough with no proxy? Standard Load Balancer.
A common and correct pattern is Front Door in front of Application Gateway: Front Door for global anycast and caching at the edge, App Gateway as the regional L7 router with private backends and a second WAF layer. They are complementary, not competing. This article focuses on the regional App Gateway tier.
A note on tiers before we touch anything: v1 is retired for new deployments – Microsoft announced retirement and you should not build on it. Everything here is v2 (Standard_v2 or WAF_v2). The WAF lives on the WAF_v2 SKU, configured through a separate WAF policy resource that you associate with the gateway, listeners, or path rules.
Step 1 - Lay down the network and a v2 gateway
Application Gateway v2 needs a dedicated subnet – nothing else can live in it – with at least a /24 recommended for headroom (each scale-unit consumes addresses). Give it a Standard public IP that is static (v2 requires static, and the zone-redundant SKU pins the IP across zones).
RG=rg-appgw-prod
LOC=eastus2
VNET=vnet-appgw
SUBNET=snet-appgw
az group create -n $RG -l $LOC
az network vnet create -g $RG -n $VNET \
--address-prefix 10.40.0.0/16 \
--subnet-name $SUBNET \
--subnet-prefix 10.40.0.0/24
# Backend subnet for the apps (private)
az network vnet subnet create -g $RG --vnet-name $VNET \
-n snet-backends --address-prefix 10.40.1.0/24
# Static, zone-redundant public IP (zones 1,2,3)
az network public-ip create -g $RG -n pip-appgw \
--sku Standard --allocation-method Static \
--zone 1 2 3
Now the gateway itself, in WAF_v2, autoscaling, zone-redundant. I’ll create it minimal here and layer routing on afterward, because the CLI’s single-shot create gets unwieldy fast.
az network application-gateway create -g $RG -n agw-prod \
--sku WAF_v2 \
--location $LOC \
--zones 1 2 3 \
--min-capacity 2 \
--max-capacity 10 \
--vnet-name $VNET --subnet $SUBNET \
--public-ip-address pip-appgw \
--priority 100 \
--frontend-port 80 \
--http-settings-port 80 \
--http-settings-protocol Http \
--servers 10.40.1.10 10.40.1.11
--min-capacity 2 is the floor that absorbs the next traffic spike while autoscale reacts; never set it to 1 in production – you want at least two instances spread across zones so a single zone loss doesn’t drop you to zero. --max-capacity 10 caps cost; each capacity unit is roughly 10 Mbps of compute + connections, so size it from your peak RPS, not a guess. The --priority flag is mandatory for v2 request routing rules; lower numbers win when rules overlap.
Step 2 - Listeners, rules, and backend pools: multi-site and URL path maps
Real gateways serve more than one site off one IP. Two routing dimensions stack:
- Multi-site listeners match on the
Hostheader (shop.contoso.comvsapi.contoso.com) – different apps, same frontend IP/port. - URL path maps match on the path within a site (
/images/*to a static pool,/api/*to the API pool).
Let’s add a second backend pool, a multi-site HTTPS listener (we wire the cert in Step 3), and a path map.
# Second backend pool for the API tier
az network application-gateway address-pool create -g $RG \
--gateway-name agw-prod -n pool-api \
--servers 10.40.1.20 10.40.1.21
# Static-content pool
az network application-gateway address-pool create -g $RG \
--gateway-name agw-prod -n pool-static \
--servers 10.40.1.30 10.40.1.31
# HTTP settings the backends actually expect (HTTPS, probe-bound)
az network application-gateway http-settings create -g $RG \
--gateway-name agw-prod -n hs-api \
--port 443 --protocol Https \
--host-name-from-backend-pool true \
--probe probe-api --timeout 30 \
--connection-draining-timeout 60
--host-name-from-backend-pool true forwards the backend pool member’s hostname as SNI/Host on re-encryption, which is what most app servers and their certificates expect; if your backends share one hostname, set --host-name api.internal.contoso.com explicitly instead. Now the path map – this is the URL routing brain:
# Path-based rule map: default goes to the web pool,
# /api/* to the API pool, /static/* to static.
az network application-gateway url-path-map create -g $RG \
--gateway-name agw-prod -n pathmap-shop \
--paths "/api/*" \
--address-pool pool-api \
--http-settings hs-api \
--default-address-pool appGatewayBackendPool \
--default-http-settings appGatewayBackendHttpSettings \
--rule-name rule-api
az network application-gateway url-path-map rule create -g $RG \
--gateway-name agw-prod --path-map-name pathmap-shop \
-n rule-static --paths "/static/*" \
--address-pool pool-static --http-settings hs-api
The mental model that prevents 90% of routing bugs: a request routing rule binds one listener to either a single backend (basic rule) or a path map (path-based rule). The listener decides which site; the path map decides which pool within that site. A path that matches nothing in the map falls to the default-address-pool. Order matters in path maps – first match wins – so put specific patterns before broad ones.
Step 3 - TLS termination, end-to-end re-encryption, and certs from Key Vault
You have three TLS postures, and you should know which you’re choosing:
- TLS termination only – gateway decrypts, talks HTTP to backends. Simplest; backend traffic is plaintext inside the VNet.
- End-to-end (re-encryption) – gateway decrypts (to run WAF + routing on cleartext), then re-encrypts to the backend over HTTPS. This is what regulated workloads need: the WAF must see plaintext to inspect it, but the wire to the backend is still encrypted.
- TLS passthrough – not supported on App Gateway. If you need the backend to terminate TLS itself with no decryption at the edge, use a Standard Load Balancer (L4). App Gateway is a proxy; it always terminates.
We’ll do end-to-end, and – critically – source the listener certificate from Key Vault so the gateway never holds the private key in its config and rotation happens in one place.
First, the identity and access. v2 reads Key Vault secrets through a user-assigned managed identity, and you must grant it secret get (the cert is exposed as a secret), not just certificate permissions.
# User-assigned identity the gateway will use
az identity create -g $RG -n id-appgw
IDENTITY_ID=$(az identity show -g $RG -n id-appgw --query id -o tsv)
IDENTITY_PRINCIPAL=$(az identity show -g $RG -n id-appgw --query principalId -o tsv)
KV=kv-appgw-prod
# RBAC model: the identity needs to READ secrets (the cert is a secret)
az role assignment create \
--assignee-object-id $IDENTITY_PRINCIPAL \
--assignee-principal-type ServicePrincipal \
--role "Key Vault Secrets User" \
--scope $(az keyvault show -n $KV --query id -o tsv)
# Attach the identity to the gateway
az network application-gateway identity assign -g $RG \
--gateway-name agw-prod --identity $IDENTITY_ID
If your Key Vault uses the legacy access-policy model instead of RBAC, grant
--secret-permissions get(andlist) to the identity’s principal viaaz keyvault set-policy. Usingcertificatepermissions alone is the single most common reason the gateway shows “Unknown” health and refuses to bind the cert – App Gateway pulls the cert via the secret endpoint.
Now reference the Key Vault cert by its secret ID (use the versionless URI so rotation is picked up automatically – if you pin a version, the gateway keeps serving the old cert after rotation):
KV_SECRET_ID=$(az keyvault certificate show \
--vault-name $KV -n shop-contoso-com \
--query sid -o tsv | sed 's#/[^/]*$##') # strip version -> versionless
az network application-gateway ssl-cert create -g $RG \
--gateway-name agw-prod -n cert-shop \
--key-vault-secret-id "$KV_SECRET_ID"
# HTTPS multi-site listener bound to that cert and host
az network application-gateway http-listener create -g $RG \
--gateway-name agw-prod -n lsnr-shop-https \
--frontend-port 443 --frontend-ip appGatewayFrontendIP \
--ssl-cert cert-shop --host-name shop.contoso.com
# Bind listener -> path map via a path-based routing rule
az network application-gateway rule create -g $RG \
--gateway-name agw-prod -n rule-shop \
--rule-type PathBasedRouting \
--http-listener lsnr-shop-https \
--url-path-map pathmap-shop \
--priority 110
For re-encryption to a backend whose certificate is signed by a private/internal CA, upload the root CA as a trusted root cert on the HTTP settings so the gateway validates the backend’s chain. With a public CA-signed backend cert you can skip this; App Gateway v2 trusts well-known public roots.
az network application-gateway root-cert create -g $RG \
--gateway-name agw-prod -n root-internal-ca \
--cert-file ./internal-root-ca.cer
az network application-gateway http-settings update -g $RG \
--gateway-name agw-prod -n hs-api \
--root-certs root-internal-ca
Finally, enforce a modern TLS floor with an SSL policy – do not serve TLS 1.0/1.1 from a 2026 gateway:
az network application-gateway ssl-policy set -g $RG \
--gateway-name agw-prod \
--policy-type Predefined \
--policy-name AppGwSslPolicy20220101 # TLS 1.2+ baseline
Step 4 - Health probes, connection draining, and autoscaling
A backend pool with no custom probe uses a default probe that hits the backend’s root path and expects 200-399. That’s almost never what you want. Define an explicit probe with a real health endpoint and an accepted-status range:
az network application-gateway probe create -g $RG \
--gateway-name agw-prod -n probe-api \
--protocol Https --host-name-from-http-settings true \
--path /healthz \
--interval 15 --timeout 10 --threshold 3 \
--match-status-codes 200-399
--host-name-from-http-settings true makes the probe send the same SNI/Host the real traffic uses – essential when the backend serves multiple vhosts, otherwise the probe hits the wrong site and marks a healthy backend down. --threshold 3 means three consecutive failures before eviction, which rides out a single GC pause without flapping.
Connection draining (set on the HTTP settings in Step 2 via --connection-draining-timeout 60) is what makes deployments graceful: when you pull a member from the pool, in-flight requests get up to 60s to finish instead of being cut. Without it, every backend deploy throws 502s at users mid-request.
Autoscaling is already on from Step 1 (--min-capacity / --max-capacity). The two facts that matter operationally:
- The floor pre-warms – those min instances are always running, so a sudden spike is absorbed by min capacity while autoscale spins up more. Scale-out is not instantaneous; min capacity is your shock absorber.
- Zone redundancy (
--zones 1 2 3) spreads instances across physical zones. Combined with min-capacity >= 2, a full zone failure leaves you serving from the survivors with zero config change.
Step 5 - WAF policy: CRS rule sets, anomaly scoring, exclusions
The WAF is a separate policy resource. Create it in Detection first – never start in Prevention on a real app, you will block legitimate traffic on day one.
az network application-gateway waf-policy create -g $RG \
-n wafpol-prod --location $LOC
# Managed ruleset: OWASP CRS 3.2 + the Microsoft bot manager ruleset
az network application-gateway waf-policy managed-rule \
rule-set add -g $RG --policy-name wafpol-prod \
--type OWASP --version 3.2
# Start in DETECTION, request-body inspection on, sane size caps
az network application-gateway waf-policy policy-setting update \
-g $RG --policy-name wafpol-prod \
--state Enabled --mode Detection \
--request-body-check true \
--max-request-body-size-in-kb 128 \
--file-upload-limit-in-mb 100
# Associate the policy with the gateway
POLICY_ID=$(az network application-gateway waf-policy show \
-g $RG -n wafpol-prod --query id -o tsv)
az network application-gateway update -g $RG -n agw-prod \
--set firewallPolicy.id=$POLICY_ID
Understand anomaly scoring, because it’s how CRS 3.x decides to block. Each matched rule adds to a per-request score by severity (Critical = 5, Error = 4, Warning = 3, Notice = 2). When the cumulative score crosses the anomaly threshold (default 5) the request is actioned. So a single Critical rule, or a couple of lesser ones together, trips it. This is why you tune by score, not by hunting one rule – a false positive is usually one over-eager Critical match, and you exclude precisely that match for precisely that field.
Exclusions are scalpel, not hammer. Exclude a specific rule against a specific request attribute (a header, a cookie, a form arg) rather than disabling the rule globally:
# Real example: a legacy app posts HTML in a field named "description",
# tripping the XSS rule 941330. Exclude THAT rule for THAT arg only.
az network application-gateway waf-policy managed-rule \
exclusion rule-set add -g $RG --policy-name wafpol-prod \
--type OWASP --version 3.2 \
--group-name REQUEST-941-APPLICATION-ATTACK-XSS \
--rule-ids 941330 \
--match-variable RequestArgNames \
--selector-match-operator Equals \
--selector description
That keeps 941330 protecting every other argument and every other endpoint. The lazy alternative – disabling rule group 941 entirely – would strip XSS protection from the whole app to fix one field.
Step 6 - Detection-then-Prevention: read the logs, kill false positives, then enforce
This is the step that separates a WAF that protects from a WAF that gets disabled after the first outage. The discipline: run Detection in production for one to two full business cycles (a week, including a deploy and a peak), mine the firewall_log for what would have been blocked, fix each false positive with a targeted exclusion, and only then flip to Prevention.
Diagnostic logs must be on – send ApplicationGatewayFirewallLog to a Log Analytics workspace:
az monitor diagnostic-settings create \
--name diag-agw \
--resource $(az network application-gateway show -g $RG -n agw-prod --query id -o tsv) \
--workspace $(az monitor log-analytics workspace show -g $RG -n law-appgw --query id -o tsv) \
--logs '[{"category":"ApplicationGatewayFirewallLog","enabled":true},
{"category":"ApplicationGatewayAccessLog","enabled":true}]'
Now the query that does the real work. In Detection mode every rule that matched is logged with action == "Matched" (it didn’t block, it noted). Group by rule, host, and target field to see your top false-positive candidates:
AzureDiagnostics
| where Category == "ApplicationGatewayFirewallLog"
| where action_s in ("Matched", "Blocked")
| summarize hits = count(),
sampleUri = any(requestUri_s),
sampleMsg = any(Message)
by ruleId_s, ruleGroup_s = details_data_s, hostname_s
| order by hits desc
Read it like a triage nurse. High-hit rules against a known-good endpoint and a known field are almost always false positives – write an exclusion (Step 5) for that rule + field. Low-hit rules with attack-shaped URIs against random paths are real probing – leave them. Anything you’re unsure about, leave the rule on; the cost of a real block in Prevention is a page, the cost of a missed exclusion is a 403 for one customer who’ll email you. Bias toward keeping protection.
Once the firewall log is quiet of false positives across a full cycle, flip to Prevention:
az network application-gateway waf-policy policy-setting update \
-g $RG --policy-name wafpol-prod --mode Prevention
Keep diagnostics on after the switch and watch action_s == "Blocked". The first 24-48 hours in Prevention is when a missed false positive surfaces as a real 403 – have the exclusion-add command ready and a rollback to Detection one CLI call away.
Step 7 - Custom rules: rate limiting and geo/IP match
Managed rules handle OWASP; custom rules handle your policy – rate limits, geo-blocks, allowlists. Custom rules evaluate by priority (lower wins) and run before managed rules, so a custom Allow can short-circuit an allowlisted partner past the managed set, and a custom Block stops abuse before it ever costs you CRS evaluation.
Rate limiting (v2 supports it natively) – throttle by client IP over a sliding window:
az network application-gateway waf-policy custom-rule create \
-g $RG --policy-name wafpol-prod -n rateLimitPerIp \
--priority 10 --rule-type RateLimitRule \
--action Block \
--rate-limit-threshold 100 \
--rate-limit-duration OneMin \
--group-by-user-session ClientAddr
This blocks any single client IP exceeding 100 requests/minute – a blunt but effective brake on credential-stuffing and scraping. For login endpoints specifically, add a MatchRule condition scoping it to /login with a tighter threshold.
Geo and IP match conditions on a standard MatchRule:
# Block a list of countries outright (priority before rate limit if stricter)
az network application-gateway waf-policy custom-rule create \
-g $RG --policy-name wafpol-prod -n geoBlock \
--priority 5 --rule-type MatchRule --action Block
az network application-gateway waf-policy custom-rule match-condition add \
-g $RG --policy-name wafpol-prod --rule-name geoBlock \
--match-variables RemoteAddr \
--operator GeoMatch \
--values "KP" "Some-Other-CC"
# Allowlist a partner CIDR ABOVE everything (lowest priority number)
az network application-gateway waf-policy custom-rule create \
-g $RG --policy-name wafpol-prod -n partnerAllow \
--priority 1 --rule-type MatchRule --action Allow
az network application-gateway waf-policy custom-rule match-condition add \
-g $RG --policy-name wafpol-prod --rule-name partnerAllow \
--match-variables RemoteAddr --operator IPMatch \
--values "203.0.113.0/24"
GeoMatch uses Microsoft’s IP-to-country mapping on RemoteAddr. One trap: if Application Gateway sits behind Front Door or another proxy, RemoteAddr is the proxy’s IP, not the client’s – match on the X-Forwarded-For variable (RequestHeaders / X-Forwarded-For) instead, or your geo rule judges the wrong address.
Verify
Confirm the gateway is healthy, routing splits correctly, TLS is what you intend, and the WAF blocks attacks while passing benign traffic.
Backend health – every member should be Healthy. Anything Unknown usually means probe SNI/host mismatch or a Key Vault permission gap:
az network application-gateway show-backend-health \
-g $RG -n agw-prod \
--query "backendAddressPools[].backendHttpSettingsCollection[].servers[].{addr:address,health:health}" \
-o table
Routing matrix with curl – prove host and path routing land on the right pool. --resolve pins the hostname to the gateway IP so DNS isn’t in the way:
GWIP=$(az network public-ip show -g $RG -n pip-appgw --query ipAddress -o tsv)
# Each line should hit the expected pool; check the served content/headers.
curl -sk --resolve shop.contoso.com:443:$GWIP \
https://shop.contoso.com/ -o /dev/null -w "default -> %{http_code}\n"
curl -sk --resolve shop.contoso.com:443:$GWIP \
https://shop.contoso.com/api/health -o /dev/null -w "api -> %{http_code}\n"
curl -sk --resolve shop.contoso.com:443:$GWIP \
https://shop.contoso.com/static/logo.png -o /dev/null -w "static -> %{http_code}\n"
TLS posture – confirm the served cert and that TLS 1.1 is refused:
# Served leaf cert subject/issuer
echo | openssl s_client -connect $GWIP:443 \
-servername shop.contoso.com 2>/dev/null \
| openssl x509 -noout -subject -issuer
# This MUST fail to handshake on the 2022 SSL policy:
openssl s_client -connect $GWIP:443 -servername shop.contoso.com -tls1_1
Benign-attack payload test – in Prevention, a textbook (harmless) SQLi/XSS string in a query arg should return 403, and a normal request should return 200. This is a non-destructive probe of your own gateway, not anyone else’s:
# Should be 403 (Blocked) once in Prevention
curl -sk --resolve shop.contoso.com:443:$GWIP \
"https://shop.contoso.com/?id=1%27%20OR%20%271%27%3D%271" \
-o /dev/null -w "sqli-probe -> %{http_code}\n"
# Should be 200/expected (clean request)
curl -sk --resolve shop.contoso.com:443:$GWIP \
"https://shop.contoso.com/?id=42" \
-o /dev/null -w "clean -> %{http_code}\n"
Confirm the block in logs – the SQLi probe should appear as Blocked with the matching CRS rule:
AzureDiagnostics
| where Category == "ApplicationGatewayFirewallLog"
| where action_s == "Blocked"
| project TimeGenerated, ruleId_s, requestUri_s, clientIp_s, Message
| order by TimeGenerated desc
| take 20
Enterprise scenario
A payments platform team ran a single Application Gateway v2 fronting a checkout app and a partner API off one public IP, end-to-end TLS, WAF in Prevention. After a routine release of the checkout service, the partner integration team opened a Sev2: a subset of partner POSTs to /api/settlement started returning 403 with no app-side log entry. The app was healthy; the gateway’s access log showed the 403, and the firewall log showed CRS rule 942100 (SQL injection, libinjection) matching the request body.
The cause was correct WAF behavior meeting a real payload: the settlement API legitimately accepts a metadata field containing free-form merchant notes, and one merchant had started sending notes that contained a SQL-looking substring. The body matched 942100, the anomaly score crossed the threshold, the request was blocked. Disabling the SQLi group wholesale was floated and rejected – this was a payments path, stripping SQLi inspection was a non-starter.
The fix was a targeted body exclusion: rule 942100 excluded only for the metadata argument, leaving SQLi inspection intact on every other field and endpoint. They validated it in Detection on a parallel policy for 48 hours against replayed partner traffic before applying it to the live Prevention policy.
az network application-gateway waf-policy managed-rule \
exclusion rule-set add -g $RG --policy-name wafpol-prod \
--type OWASP --version 3.2 \
--group-name REQUEST-942-APPLICATION-ATTACK-SQLI \
--rule-ids 942100 \
--match-variable RequestArgNames \
--selector-match-operator Equals \
--selector metadata
The lasting lesson the team wrote into their runbook: every new field on a WAF-protected API is a potential false positive, so request-body schema changes now go through a Detection-mode soak before Prevention, and exclusions are always scoped to a single rule + single argument – never a rule group, never a global disable.