Azure Networking

Azure Virtual Network, Subnets and NSGs: Networking Fundamentals

A team moved a three-tier app to Azure and put every VM in one subnet with the default rules untouched. When a web server was compromised, the attacker had direct SMB access to the database — because inside a Virtual Network (VNet), Azure lets every machine talk to every other machine by default. There was no firewall between the web tier and the database; there was nothing between anything and anything else. Proper subnetting and network security groups (NSGs) would have contained the breach to the web tier and turned a database exfiltration into a logged, blocked packet. This is the single most common networking mistake on Azure, and it traces to one misunderstanding: people assume a VNet is secure by default. It is private by default — not reachable from the public internet without an explicit public IP — but it is wide open internally.

This article is the practitioner’s foundation for the three primitives every Azure network is built from. A VNet is your private IP address space in a region — 10.0.0.0/16, isolated, yours. Subnets carve that space into smaller ranges (10.0.1.0/24 for web, 10.0.2.0/24 for app, 10.0.3.0/24 for data) so you can apply different controls to different tiers. NSGs are stateful, five-tuple (source, destination, port, protocol, direction) firewalls you attach to a subnet or a NIC to decide which packets land and which are dropped. Get these three right and the rest of Azure networking — peering, private endpoints, gateways, firewalls — slots in cleanly on top. Get them wrong and you either ship a flat, breach-amplifying network or a maze of rules nobody can reason about at 2 a.m.

By the end you will design an address space that peers without collisions, size subnets knowing Azure silently reserves five addresses in each, write NSG rules whose priorities and default rules you fully understand, use Application Security Groups (ASGs) so a rule says “from the web tier” instead of a brittle IP list, decide between service endpoints and private endpoints for PaaS, and read effective security rules and effective routes to confirm — not guess — why a packet was dropped. Because this is a reference you will return to mid-incident, every setting, default, limit and failure mode is laid out as a scannable table alongside the prose and the az/Bicep that configures it.

What problem this solves

Cloud resources need the same network controls you had on-premises — isolation, segmentation, traffic filtering, controlled egress — but expressed in software, applied at scale, and auditable. Without VNets you have no private addressing: every VM would need a public IP and live on the internet. Without subnets you have no segmentation: one flat broadcast-free L3 network where a foothold anywhere is a foothold everywhere. Without NSGs you have no filtering: Azure’s defaults allow all intra-VNet traffic and all outbound internet, so a compromised web server reaches your database, your domain controller, and the internet for exfiltration, unimpeded.

What breaks without this knowledge is rarely a hard failure — it is a silent one. The network “works”: the app responds, traffic flows, nobody notices that the data tier accepts connections from the entire VNet, that PaaS traffic to Storage and Key Vault is exiting over the public internet, or that a forgotten “allow Any-Any” rule at priority 100 sits above every careful deny you wrote. The failure surfaces later — in a breach post-mortem, a compliance audit, a peering that collides because two teams both chose 10.0.0.0/16, or a 2 a.m. incident where a VM cannot reach SQL and you have no idea whether it is an NSG, a route, DNS, or the app itself.

Who hits this: everyone who runs anything beyond a single public web app. It bites hardest on teams lifting-and-shifting on-prem three-tier apps (they replicate the servers but not the firewalls between them), teams that will later peer VNets or connect to on-prem over VPN/ExpressRoute (address-space collisions are painful to fix after the fact), and anyone subject to PCI/HIPAA/ISO segmentation requirements. The fix is almost never “add a firewall appliance” — it is “segment into tier subnets, write deny-by-default NSGs scoped by ASG, and route PaaS over the backbone.” To frame the whole field before the deep dive, here is every problem class this article addresses, what goes wrong without it, and the first place to look:

Problem class What breaks without it Who hits it hardest First place to look
No private addressing Every VM needs a public IP, lives on the internet Anyone past a single web app VNet address space; public IP assignments
Flat network (no subnets) One foothold = total lateral movement Lift-and-shift three-tier apps Subnet list; which tier shares a subnet
Open internal traffic (no NSG) Web tier reaches DB/DC/internet freely Everyone using defaults NSG list; AllowVnetInBound default rule
Address-space collision Peering / VPN fails or routes ambiguously Multi-VNet, hybrid, M&A CIDR plan across all VNets + on-prem
Brittle IP-based rules Rules break when tiers scale or re-IP Autoscaling tiers NSG rule sources (CIDR vs ASG)
PaaS egress over internet Storage/SQL/KV traffic leaves the backbone Compliance-bound workloads Service/private endpoints; effective routes
Asymmetric / blackholed routes SYN arrives, reply vanishes NVA / forced-tunnel designs Effective routes; UDR next-hops

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should be comfortable with IP fundamentals — what a CIDR like /24 means (256 addresses, 254 usable on-prem, fewer in Azure), the RFC 1918 private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), and the basics of TCP/UDP ports and the difference between L3/L4 (IP/port) and L7 (HTTP/application). You should be able to run az in Cloud Shell, read JSON output, and know that an Azure resource group holds resources and a region is where they physically live. No prior Azure networking is assumed; this article is the foundation.

This sits at the base of the Networking track and everything else builds on it. It is upstream of Diagnosing Azure VNet Connectivity: NSGs, UDRs, Effective Routes & Network Watcher, which goes deep on the diagnostic tooling this article introduces. It pairs with Azure Private Endpoint vs Service Endpoint: Secure PaaS Access and Azure Private Link and Private DNS: Keeping PaaS Off the Public Internet for the PaaS-egress decisions touched on here. When NSGs aren’t enough, the L7 layer is Application Gateway v2 WAF: End-to-End TLS, mTLS, and Custom Rule Tuning; the L4 vs L7 choice is in Azure Load Balancer vs Application Gateway: Picking the Right Traffic Manager. At organisation scale, address-space planning and policy-driven NSGs live in the Azure Enterprise-Scale Landing Zone: Foundation for Large Organizations, which itself assumes the Azure Resource Hierarchy Explained: Subscriptions, Resource Groups and Resources.

A quick map of who owns each layer, so you escalate to the right person when a packet goes missing:

Layer What lives here Who usually owns it What it can cause
Address space / IPAM CIDR allocation across VNets + on-prem Network / platform team Peering collisions, no room to grow
Subnets Tier/role segmentation, delegated subnets Network + app team Flat network, lateral movement
NSG rules L4 allow/deny by 5-tuple Network + security Drops, over-exposure, priority races
ASGs Logical grouping of NICs by role App + network Rules that break on scale (if unused)
Routes (system + UDR) Next-hop decisions, forced tunnelling Network team Asymmetric drops, blackholes
Endpoints (service/private) PaaS reachability over the backbone Network + data team PaaS egress over internet; DNS issues
Diagnostics Network Watcher, flow logs SRE / network (the truth source for all the above)

Core concepts

Six mental models make every later decision obvious.

A VNet is a private L3 boundary, scoped to one region and one subscription. A VNet is an isolated slice of Azure’s network with an address space you choose (one or more CIDR blocks). Resources inside it get private IPs from that space and can reach each other; nothing outside reaches in without an explicit public IP, peering, or a gateway. A VNet lives in exactly one region and one subscription — to span regions or subscriptions you create separate VNets and peer them. The address space is the most consequential early decision because changing it after you peer or connect on-prem is painful.

Subnets segment the space; segmentation is the whole security story. A subnet is a contiguous range carved from the VNet (10.0.1.0/24). Subnets are how you apply different controls to different tiers — a web subnet open to 443, an app subnet open only from web, a data subnet open only from app. Without subnet segmentation, every NSG you write is fleet-wide and lateral movement is unconstrained. Azure reserves five addresses in every subnet (network, gateway ×1, DNS ×2, broadcast), so a /24 gives you 251 usable, not 254 — a number that bites when you size tight.

An NSG is a stateful, priority-ordered packet filter. A network security group holds security rules, each a 5-tuple match (source, source-port, destination, destination-port, protocol) plus a direction (inbound/outbound), an action (allow/deny), and a priority (100–4096; lower number wins). It is stateful: allow an inbound flow and the return traffic is automatically permitted (you do not write a matching outbound rule). Rules are evaluated by priority, first match wins, and there are immutable default rules beneath your custom ones that allow all intra-VNet traffic and deny everything else inbound. You attach an NSG to a subnet (protects every NIC in it) or a NIC (protects one VM) — or both, in which case both are evaluated.

Sources and destinations can be addresses, service tags, or ASGs. A rule’s source/destination is not limited to a CIDR. It can be a service tag — a Microsoft-maintained label for a cloud service’s IP ranges (Internet, VirtualNetwork, AzureLoadBalancer, Storage, Sql, AzureCloud) that auto-updates so you never chase IP changes — or an Application Security Group (ASG), a logical handle you attach to NICs so a rule reads “from asg-web to asg-app on 8080” and membership follows the machine, not its address. ASGs and service tags are what keep rules readable and resilient as tiers scale.

Routing is separate from filtering, and UDRs override it. NSGs decide whether a packet is allowed; routes decide where it goes next. Azure injects system routes (intra-VNet, to the internet, to peered VNets, to gateways) automatically. A user-defined route (UDR) in a route table attached to a subnet overrides them — typically to force traffic through a network virtual appliance (NVA) or Azure Firewall for inspection, or to forced-tunnel internet egress on-prem. UDRs are powerful and dangerous: a 0.0.0.0/0 route to an NVA that doesn’t route the reply back creates asymmetric routing — the connection’s SYN arrives but the response vanishes, a drop that no NSG explains.

PaaS reachability has its own model: service vs private endpoints. Azure PaaS (Storage, SQL, Key Vault) lives on public IPs by default; reaching it from a VNet normally exits to the internet. A service endpoint extends your subnet’s identity to the PaaS service over the Azure backbone (the PaaS firewall then trusts the subnet) — the traffic stays on Azure’s network but the resource keeps a public IP. A private endpoint gives the PaaS resource a private IP inside your subnet via Private Link, so it is reachable only privately and you can turn off public access entirely. Which you choose is a recurring decision with security, DNS and cost consequences.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side:

Concept One-line definition Where it lives Why it matters
VNet Private IPv4/IPv6 address space in a region Subscription / resource group The isolation boundary; one region each
Address space The CIDR block(s) the VNet owns VNet property Collisions break peering/VPN
Subnet A contiguous range carved from the VNet Inside a VNet The unit of segmentation
NSG Stateful 5-tuple allow/deny firewall Subnet and/or NIC Decides which packets land
Security rule One allow/deny entry with a priority Inside an NSG First match by priority wins
Default rule Immutable base rules under yours Every NSG Allows intra-VNet, denies the rest
Service tag Microsoft-maintained IP-range label Rule source/destination Auto-updating, no IP chasing
ASG Logical group of NICs by role Attached to NICs Rules by role, survive scaling
UDR / route table A route that overrides system routing Subnet Forces traffic via NVA/firewall
Service endpoint Subnet identity extended to PaaS Subnet + PaaS firewall PaaS over backbone, public IP kept
Private endpoint PaaS gets a private IP in your subnet Subnet (NIC) Fully private PaaS, public off
Peering Connects two VNets privately Between VNets Cross-VNet/region/subscription reach
Effective rules/routes The computed result on a NIC Network Watcher / NIC The truth when a packet drops

Address space and CIDR planning

The address space is the decision you most regret getting wrong, because fixing it after peering or hybrid connectivity is established means re-IPing live workloads. Plan it once, centrally, with room to grow.

Use RFC 1918 private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16). Pick non-overlapping blocks across every VNet you will ever peer and every on-prem range you will connect — overlap makes peering refuse to create and makes routes ambiguous. The convention that scales: allocate a large supernet to the organisation (say 10.0.0.0/8), carve a /16 per VNet (65,536 addresses, far more than you need but cheap and collision-proof), and /24 subnets within. Reserve, do not assign, the ranges you will grow into.

A VNet can hold multiple address-space blocks, which lets you extend a VNet that ran out of room without re-IPing — add 10.1.0.0/16 alongside 10.0.0.0/16. Azure does not bill for the size of the space; a /16 costs the same as a /27. So size generously.

# Create a VNet with a /16 space and a first /24 subnet
az network vnet create \
  --name vnet-shop-prod --resource-group rg-shop-prod --location eastus \
  --address-prefixes 10.0.0.0/16 \
  --subnet-name snet-web --subnet-prefixes 10.0.1.0/24
resource vnet 'Microsoft.Network/virtualNetworks@2023-11-01' = {
  name: 'vnet-shop-prod'
  location: location
  properties: {
    addressSpace: { addressPrefixes: [ '10.0.0.0/16' ] }  // add '10.1.0.0/16' later to grow, no re-IP
    subnets: [
      { name: 'snet-web', properties: { addressPrefix: '10.0.1.0/24' } }
    ]
  }
}

The CIDR sizes you actually use, and how many hosts each yields in Azure (after the five reserved addresses):

CIDR Total addresses Azure-usable (−5) Typical use Note
/29 8 3 Tiny/test subnet Smallest usable general subnet
/28 16 11 Small service subnet Often the floor for a real tier
/27 32 27 Small app tier; AzureBastionSubnet min Bastion requires /26 or larger now
/26 64 59 Medium tier; AzureBastionSubnet Bastion recommended minimum
/24 256 251 Standard tier subnet The comfortable default
/22 1,024 1,019 Large/AKS node subnet AKS pods can need this or larger
/16 65,536 65,531 A whole VNet The recommended per-VNet block
/8 16,777,216 Org supernet (reserve, don’t assign) Carve /16s from it

The three private ranges and how to think about allocating them at scale:

Range Size Best use Watch-out
10.0.0.0/8 ~16.7M Org supernet; /16 per VNet Easy to overlap if uncoordinated — use IPAM
172.16.0.0/12 ~1M Secondary / acquisitions Often collides with on-prem defaults
192.168.0.0/16 ~65K Small/home/lab; avoid in cloud Clashes with home routers over VPN
Public/owned ranges varies Rare; only if you own them Never use public IPs you don’t own

The smallest VNet Azure accepts is /29 and the largest is /8; the practical limits and gotchas:

Limit / rule Value Why it matters
Smallest VNet address space /29 Cannot create anything smaller
Largest VNet address space /8 One VNet can be enormous
Address blocks per VNet Multiple supported Grow without re-IP
Subnets per VNet up to 3,000 (soft) Plenty for any real design
Overlap with peered VNet Not allowed Peering refuses to create
Overlap with on-prem (VPN/ER) Routes become ambiguous Plan around on-prem ranges first
Resizing a VNet/subnet with resources Constrained Easier to add a block than shrink

Subnets: sizing, reservations and special subnets

A subnet is a range inside the VNet, and the first thing to internalise is that Azure reserves five addresses in every subnet: the network address (.0), the default gateway (.1), two reserved for Azure DNS (.2, .3), and the network broadcast (.255 on a /24). So a /24 yields 251 usable host addresses, not 254. Size a subnet for a tier that might scale, and remember this overhead — a /29 (8 total) leaves only 3 usable, which is why /29 is the realistic floor for anything real.

# Add app and data subnets to the existing VNet
az network vnet subnet create -g rg-shop-prod --vnet-name vnet-shop-prod \
  --name snet-app  --address-prefixes 10.0.2.0/24
az network vnet subnet create -g rg-shop-prod --vnet-name vnet-shop-prod \
  --name snet-data --address-prefixes 10.0.3.0/24

Several subnets are special-purpose with mandatory names and minimum sizes — get the name wrong and the gateway/bastion/firewall will not deploy into it. Subnet delegation is the related concept: some PaaS services (Azure SQL Managed Instance, App Service VNet integration, Container Apps) require a subnet delegated to them, dedicated to that service. The reserved addresses per subnet, enumerated so you never miscount:

Reserved address (on 10.0.1.0/24) Purpose Usable by you?
10.0.1.0 Network identifier No
10.0.1.1 Default gateway No
10.0.1.2 Azure DNS mapping No
10.0.1.3 Azure DNS mapping (reserved) No
10.0.1.255 Network broadcast No
10.0.1.410.0.1.254 Your resources Yes (251)

The special-purpose subnets, their exact required names, and minimum sizes:

Purpose Required subnet name Minimum size NSG allowed? UDR allowed?
VPN / ExpressRoute gateway GatewaySubnet /29 (recommend /27) No (ignored historically) Limited
Azure Bastion AzureBastionSubnet /26 Yes (specific rules) Caution
Azure Firewall AzureFirewallSubnet /26 No Managed by service
Azure Firewall mgmt (forced tunnel) AzureFirewallManagementSubnet /26 No Managed
App Gateway v2 (any name) dedicated /24 recommended Yes (required ports) Caution
App Service VNet integration (any) delegated /28+ (size for scale) Yes Yes

Subnet design patterns and when each fits — the choice drives your whole NSG strategy:

Pattern Layout Pros Cons / when not
Per-tier (web/app/data) One subnet per tier Clean NSG-per-tier; classic segmentation More subnets to manage
Per-environment dev/test/prod VNets or subnets Strong blast-radius isolation More VNets/peering
Per-workload A subnet per app/service Fine-grained control Subnet sprawl; IPAM overhead
Hub-and-spoke Shared services hub + spokes Central firewall/DNS/gateway Requires peering + UDR discipline
Flat (single subnet) Everything in one subnet Simplest No segmentation — avoid in prod
Delegated Subnet dedicated to one PaaS Required for some services Cannot mix other resources in it

NSGs in depth: rules, priority and the default rules that surprise you

An NSG is the workhorse. Internalise three facts and most NSG bugs disappear.

First, priority and first-match. Each rule has a priority 100–4096; the platform evaluates inbound (or outbound) rules in ascending priority order and applies the first rule that matches, then stops. So a broad allow at priority 100 wins over a specific deny at priority 200 — putting allow before deny at overlapping scope is a silent over-exposure. Reserve low numbers for your most specific rules; leave gaps (100, 200, 300…) so you can insert later.

Second, statefulness. NSGs are stateful. If you allow an inbound flow, the return packets are automatically permitted — you do not write a matching outbound allow. This trips people who add redundant outbound rules and then break egress. Connection state is tracked per flow.

Third, the default rules. Beneath your custom rules sit immutable default rules you cannot delete (only override with higher-priority custom rules). They are the reason a fresh VNet is wide open internally:

# Create an NSG and inspect the default rules that already govern it
az network nsg create -g rg-shop-prod -n nsg-web
az network nsg rule list -g rg-shop-prod --nsg-name nsg-web \
  --include-default --query "[].{name:name, prio:priority, dir:direction, access:access, src:sourceAddressPrefix, dst:destinationAddressPrefix}" -o table

The full set of NSG default rules — memorise these, they explain most “why did this work / why is this open” questions:

Direction Priority Name Source Destination Action Effect
Inbound 65000 AllowVnetInBound VirtualNetwork VirtualNetwork Allow All intra-VNet traffic flows by default
Inbound 65001 AllowAzureLoadBalancerInBound AzureLoadBalancer Any Allow LB health probes reach the VM
Inbound 65500 DenyAllInBound Any Any Deny Everything else inbound is dropped
Outbound 65000 AllowVnetOutBound VirtualNetwork VirtualNetwork Allow Intra-VNet egress allowed
Outbound 65001 AllowInternetOutBound Any Internet Allow All outbound internet allowed by default
Outbound 65500 DenyAllOutBound Any Any Deny Everything else outbound dropped

The two that cause incidents: AllowVnetInBound (65000) is why the flat-network breach happened — every VM trusts every VM until you override it; and AllowInternetOutBound (65001) is why a compromised box can exfiltrate until you restrict egress. The anatomy of a custom rule, field by field:

Rule field Values Default / note Gotcha
Priority 100–4096 Lower wins Leave gaps (100/200/300) to insert later
Direction Inbound / Outbound Statefulness means you rarely need both
Access Allow / Deny Allow above Deny at same scope = over-exposed
Protocol Tcp / Udp / Icmp / Esp / Ah / * * = any Be specific; * is broad
Source CIDR / service tag / ASG / * Prefer service tag or ASG over raw CIDR
Source port port / range / * usually * Source port is ephemeral; almost always *
Destination CIDR / service tag / ASG / * Use ASG for “this tier”
Dest port port / range / list 80,443 / * List or range allowed

Write the three-tier rule set — explicit allows then an explicit deny above the default 65000:

# Web subnet: allow HTTPS from the internet, deny the rest inbound (above AllowVnetInBound)
az network nsg rule create -g rg-shop-prod --nsg-name nsg-web -n Allow-HTTPS-In \
  --priority 100 --direction Inbound --access Allow --protocol Tcp \
  --source-address-prefixes Internet --destination-port-ranges 443
az network nsg rule create -g rg-shop-prod --nsg-name nsg-web -n Deny-VNet-In \
  --priority 4000 --direction Inbound --access Deny --protocol '*' \
  --source-address-prefixes VirtualNetwork --destination-port-ranges '*'

# App subnet: allow 8080 only from the web ASG (see next section)
az network nsg rule create -g rg-shop-prod --nsg-name nsg-app -n Allow-Web-8080 \
  --priority 100 --direction Inbound --access Allow --protocol Tcp \
  --source-asgs asg-web --destination-port-ranges 8080
resource nsgWeb 'Microsoft.Network/networkSecurityGroups@2023-11-01' = {
  name: 'nsg-web'
  location: location
  properties: {
    securityRules: [
      {
        name: 'Allow-HTTPS-In'
        properties: {
          priority: 100, direction: 'Inbound', access: 'Allow', protocol: 'Tcp'
          sourceAddressPrefix: 'Internet', sourcePortRange: '*'
          destinationAddressPrefix: '*', destinationPortRange: '443'
        }
      }
      {
        name: 'Deny-VNet-In'
        properties: {
          priority: 4000, direction: 'Inbound', access: 'Deny', protocol: '*'
          sourceAddressPrefix: 'VirtualNetwork', sourcePortRange: '*'
          destinationAddressPrefix: '*', destinationPortRange: '*'
        }
      }
    ]
  }
}

Subnet-NSG versus NIC-NSG — both can apply, and the evaluation order matters:

Aspect Subnet-level NSG NIC-level NSG
Scope Every NIC in the subnet One VM’s NIC
Inbound evaluation order Subnet NSG first, then NIC NSG NIC NSG after subnet
Outbound evaluation order NIC NSG first, then subnet NSG (mirror of inbound)
Both must allow? Yes — traffic must pass both Yes
Best for Tier-wide baseline Per-VM exceptions
Risk One NSG governs many VMs NSG sprawl if overused

The key NSG limits you can actually hit:

NSG limit Value (soft, raisable) When it bites
NSGs per region per subscription 5,000 Large estates
Rules per NSG 1,000 Sprawling IP lists (use ASGs/service tags)
Sources/destinations per rule 4,000 (combined) Huge address lists
ASGs per rule depends Many roles in one rule
NSGs per NIC/subnet 1 each You attach one NSG per scope

Application Security Groups: rules by role, not by IP

The fastest way to make an NSG rule set rot is to write sources as IP lists. The web tier scales out, a VM gets a new IP, someone re-IPs a subnet — and now your “allow from web” rule on the app tier is wrong, silently. Application Security Groups (ASGs) fix this: an ASG is a named handle you attach to NICs (a VM’s NIC can belong to several ASGs), and an NSG rule uses the ASG as source or destination. The rule reads “allow from asg-web to asg-app on 8080” and the membership — which actual IPs are “web” — follows the NIC automatically. Scale the web tier from 2 to 20 VMs and the rule needs no change.

# Create ASGs, attach to NICs, and reference them in a rule
az network asg create -g rg-shop-prod -n asg-web -l eastus
az network asg create -g rg-shop-prod -n asg-app -l eastus

# Attach a NIC's IP config to an ASG (repeat per VM, or set in VMSS model)
az network nic ip-config update -g rg-shop-prod --nic-name nic-web-01 \
  --name ipconfig1 --application-security-groups asg-web

# Rule: app tier accepts 8080 only from the web ASG
az network nsg rule create -g rg-shop-prod --nsg-name nsg-app -n Allow-Web-to-App \
  --priority 100 --direction Inbound --access Allow --protocol Tcp \
  --source-asgs asg-web --destination-asgs asg-app --destination-port-ranges 8080
resource asgWeb 'Microsoft.Network/applicationSecurityGroups@2023-11-01' = {
  name: 'asg-web'
  location: location
}
// In the NSG rule, reference by resource id:
//   sourceApplicationSecurityGroups: [ { id: asgWeb.id } ]
//   destinationApplicationSecurityGroups: [ { id: asgApp.id } ]

ASG versus raw-IP versus service tag — when to reach for which source/destination type:

Source/dest type Example Auto-updates? Best for Limitation
Raw CIDR 10.0.1.0/24 No Fixed on-prem ranges, partner IPs Breaks on scale/re-IP; brittle
Service tag Internet, Sql, Storage Yes (Microsoft) Cloud-service ranges you can’t track Coarse; whole-service granularity
ASG asg-web Yes (by membership) “This tier/role” within your VNet Same region; your NICs only
VirtualNetwork tag the whole VNet + peers Yes “Anything internal” Too broad for tier isolation
AzureLoadBalancer the LB probe source Yes Allowing health probes Probe-only, not data

ASG rules and limits worth knowing before you design around them:

ASG rule Behaviour Implication
A NIC can be in multiple ASGs e.g. asg-web + asg-monitored Compose roles freely
ASG and rule must be same region Region-scoped Plan per-region ASGs
All NICs in a rule’s ASG must be in the same VNet VNet-scoped membership Cross-VNet needs another approach
ASGs replace IP lists, not service tags Use both together “from asg-web” + “to Sql
Empty ASG = matches nothing No members → rule is inert New tier silently unreachable until joined

Commonly-used service tags you will reach for constantly:

Service tag Represents Typical use
Internet All public IP space outside Azure Allow/deny public ingress/egress
VirtualNetwork This VNet + peered + on-prem (connected) “Internal” allow/deny
AzureLoadBalancer Azure’s health-probe source Allow LB probes
Storage / Storage.EastUS Azure Storage ranges (global/regional) Egress to Storage / service endpoints
Sql / Sql.EastUS Azure SQL ranges Egress to SQL
AzureKeyVault Key Vault ranges Egress to Key Vault
AzureCloud / AzureCloud.EastUS All Azure public IPs (global/regional) Broad Azure egress
AzureActiveDirectory Entra ID endpoints Auth egress

Routing: system routes, UDRs and forced tunnelling

Filtering (NSGs) and routing (route tables) are independent. Azure gives every subnet a set of system routes automatically — to the local VNet, to peered VNets, to the internet (0.0.0.0/0 → Internet), and to gateways — and you usually never touch them. You override them with a user-defined route (UDR) in a route table attached to a subnet, almost always to force traffic through inspection (an NVA or Azure Firewall) or to forced-tunnel internet-bound traffic back on-prem.

The canonical UDR is 0.0.0.0/0 with next-hop VirtualAppliance pointing at the firewall’s IP, applied to the workload subnets so all egress is inspected. The danger is asymmetry: if the return path doesn’t traverse the same appliance (or the appliance doesn’t SNAT), the reply is dropped and you get a connection that opens but hangs — a failure no NSG rule explains, which is why effective-routes is the first thing to check after NSGs.

# Route table forcing all egress through an Azure Firewall, applied to the app subnet
az network route-table create -g rg-shop-prod -n rt-app
az network route-table route create -g rg-shop-prod --route-table-name rt-app \
  -n default-to-fw --address-prefix 0.0.0.0/0 \
  --next-hop-type VirtualAppliance --next-hop-ip-address 10.0.0.4
az network vnet subnet update -g rg-shop-prod --vnet-name vnet-shop-prod \
  -n snet-app --route-table rt-app
resource rtApp 'Microsoft.Network/routeTables@2023-11-01' = {
  name: 'rt-app'
  location: location
  properties: {
    routes: [
      {
        name: 'default-to-fw'
        properties: {
          addressPrefix: '0.0.0.0/0'
          nextHopType: 'VirtualAppliance'
          nextHopIpAddress: '10.0.0.4'   // the firewall's private IP
        }
      }
    ]
  }
}

The next-hop types a UDR can use, and what each is for:

Next-hop type Meaning Typical use Gotcha
VirtualAppliance Send to an IP (NVA/firewall) Inspect/forced-tunnel egress Asymmetry if return path differs
VirtualNetworkGateway Send to VPN/ER gateway Forced-tunnel to on-prem Needs gateway + propagation
Internet Send to the internet Override a forced tunnel for specific prefixes Re-exposes that prefix
VnetLocal Stay within the VNet Keep intra-VNet local Default for VNet prefix
VirtualNetworkPeering To a peered VNet Hub-spoke routing Peering must exist
None Blackhole — drop Deliberately drop a prefix Accidental None = silent drop

System routes Azure injects so you understand what a UDR is overriding:

System route prefix Next hop When present
VNet address space VnetLocal Always
0.0.0.0/0 Internet Always (unless overridden)
Peered VNet space VNetPeering When peering exists
Gateway-advertised prefixes VirtualNetworkGateway When a gateway + BGP/propagation
10.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12 None (historically) Reserved private space handling

How NSGs and routes interact on a single packet — the order that explains most “it’s allowed but doesn’t work” cases:

Step What is checked Decides If it fails
1. Route lookup Most-specific route for the destination Where the packet goes next None next-hop → silent drop
2. Outbound NSG Source NIC/subnet NSG rules Whether egress is allowed Deny → dropped at source
3. Inbound NSG Destination NIC/subnet NSG rules Whether ingress is allowed Deny → dropped at destination
4. Return traffic Stateful — auto-allowed by NSG Reply permitted Asymmetric route → reply lost

Service endpoints vs private endpoints for PaaS

When a VM in your subnet talks to Azure Storage, SQL or Key Vault, the default path exits to the internet (those services have public IPs). Two mechanisms keep that traffic private, and choosing between them is a recurring design decision.

A service endpoint turns on a flag on the subnet that extends the subnet’s identity to the PaaS service over the Azure backbone. The PaaS firewall is then configured to allow that subnet (Microsoft.Storage, Microsoft.Sql, …). Traffic stays on Azure’s network and the PaaS resource sees the private source, but the resource keeps its public IP and is still publicly resolvable — you have narrowed who can connect, not removed the public surface.

A private endpoint (built on Private Link) gives the PaaS resource a private IP inside your subnet. You then resolve the resource’s hostname to that private IP (via Private DNS) and can disable public access entirely. It is the stronger isolation and the direction most security-conscious designs go, at the cost of a per-endpoint charge and DNS plumbing.

# Service endpoint: flag the data subnet for Storage + SQL, then allow it on the resource
az network vnet subnet update -g rg-shop-prod --vnet-name vnet-shop-prod -n snet-data \
  --service-endpoints Microsoft.Storage Microsoft.Sql

# Private endpoint: give a storage account a private IP inside snet-data
az network private-endpoint create -g rg-shop-prod -n pe-stshop \
  --vnet-name vnet-shop-prod --subnet snet-data \
  --private-connection-resource-id $(az storage account show -n stshopprod -g rg-shop-prod --query id -o tsv) \
  --group-id blob --connection-name pe-stshop-conn
// Service endpoint on a subnet
resource dataSubnet 'Microsoft.Network/virtualNetworks/subnets@2023-11-01' = {
  name: '${vnet.name}/snet-data'
  properties: {
    addressPrefix: '10.0.3.0/24'
    serviceEndpoints: [ { service: 'Microsoft.Storage' }, { service: 'Microsoft.Sql' } ]
  }
}

The decision, head to head:

Dimension Service endpoint Private endpoint
What it does Extends subnet identity to PaaS Gives PaaS a private IP in your subnet
Resource keeps public IP? Yes (still publicly resolvable) No (can disable public access)
Isolation strength Moderate (narrows who, not surface) Strong (fully private)
DNS changes needed? None Yes — Private DNS zone
Cost Free Per-endpoint hourly + per-GB
Granularity Per service type, per subnet Per resource (per account/db)
Cross-region / on-prem reach Same-region subnet only (mostly) Reachable from peers + on-prem
Best when Quick, free narrowing; same region “No public surface” mandate

Common PaaS targets and the typical choice:

PaaS target Service endpoint tag Private endpoint group-id Usual pick
Storage (Blob) Microsoft.Storage blob Private endpoint if compliance-bound
Azure SQL DB Microsoft.Sql sqlServer Private endpoint for prod data
Key Vault Microsoft.KeyVault vault Private endpoint (secrets are sensitive)
Cosmos DB Microsoft.AzureCosmosDB Sql/MongoDB/… Private endpoint
Service Bus Microsoft.ServiceBus namespace Either; private for isolation
App Service (inbound) n/a sites Private endpoint to take it off internet

Peering and connecting VNets

A VNet is one region, one subscription — to connect two you peer them, which gives private, low-latency, backbone connectivity (no gateway needed for the peering itself). Peering is non-transitive: if A peers B and B peers C, A does not reach C — you design a hub-and-spoke with a hub that does the transit (via a firewall/NVA and UDRs), or you mesh-peer. Address spaces must not overlap, which is the entire reason the CIDR planning above matters.

# Peer two VNets (run the reciprocal command on the other side too)
az network vnet peering create -g rg-shop-prod -n hub-to-spoke \
  --vnet-name vnet-hub --remote-vnet vnet-shop-prod --allow-vnet-access \
  --allow-forwarded-traffic --allow-gateway-transit
az network vnet peering create -g rg-shop-prod -n spoke-to-hub \
  --vnet-name vnet-shop-prod --remote-vnet vnet-hub --allow-vnet-access \
  --use-remote-gateways

The peering flags and what each one actually enables:

Peering flag What it allows Set it when
--allow-vnet-access Basic reachability between the VNets Almost always
--allow-forwarded-traffic Accept traffic forwarded (not originated) by the peer Hub NVA forwards spoke traffic
--allow-gateway-transit Let the peer use this VNet’s gateway On the hub with the gateway
--use-remote-gateways Use the peer’s gateway for on-prem/VPN On the spoke (only one side)

Ways to connect networks, compared:

Method Connects Transit? Throughput Cost driver Best for
VNet peering VNet ↔ VNet (any region/sub) Non-transitive Very high (backbone) Per-GB both directions Hub-spoke, cross-region
VPN gateway (S2S) VNet ↔ on-prem Via gateway Up to ~10 Gbps (SKU) Gateway hour + egress Hybrid over internet
ExpressRoute VNet ↔ on-prem (private) Via gateway Up to 100 Gbps Circuit + gateway Private, high-bandwidth hybrid
VPN gateway (P2S) Client ↔ VNet Via gateway Per-client Gateway hour Remote admins/devs
Virtual WAN Many VNets + branches Managed transit High Hub units + data Large global mesh

Architecture at a glance

The diagram below traces a request through a properly segmented VNet, left to right, and maps each security control to the exact hop where it is enforced and the exact way it most commonly fails. On the left, clients (and a known office range for admin) arrive over HTTPS. They land first at the edge / gateway zone — an Application Gateway v2 with WAF in its own dedicated /24 subnet for public web traffic, and Azure Bastion in the mandatory AzureBastionSubnet (/26) for RDP/SSH, so administrators never need a public IP on a VM. From there the path crosses three tier subnets, each fronted by its own NSG: the web subnet (10.0.1.0/24, nsg-web allowing 443 in, VMs tagged asg-web), the app subnet (10.0.2.0/24, nsg-app allowing 8080 only from asg-web, VMs tagged asg-app), and the data subnet (10.0.3.0/24, nsg-data allowing 1433 only from asg-app, holding SQL Managed Instance reachable privately and a Key Vault reached over a service endpoint).

Follow the flows: client → gateway on 443, gateway → web on allowed 443, web → app on asg-web→8080, app → data on 1433. The dashed red flow is the UDR that forces internet-bound egress through the firewall — the line that, mis-built, blackholes return traffic. The five numbered badges sit on the controls that most often bite, and the legend narrates each as symptom · how to confirm · fix: an over-open web NSG (1), an app rule written by IP instead of ASG that breaks on scale (2), a data tier exposed VNet-wide or blocking the app (3), PaaS egress leaking to the public internet for want of an endpoint (4), and a UDR that blackholes the return path (5). Read the architecture and the diagnostic map together — the picture is the playbook.

Segmented Azure VNet showing Internet clients and an office admin range entering through an Application Gateway v2 with WAF and Azure Bastion, then crossing web (10.0.1.0/24), app (10.0.2.0/24) and data (10.0.3.0/24) subnets each guarded by its own NSG with ASG-scoped rules, SQL Managed Instance and Key Vault in the private data subnet, a dashed UDR forcing egress through a firewall, and five numbered badges marking the most common rule and routing failures

Real-world scenario

Lumio Retail runs an e-commerce platform — a React storefront, a .NET API tier, and Azure SQL Managed Instance — and migrated it to Azure under deadline. The first cut was the textbook mistake: a single VNet, one subnet (10.0.0.0/16 with everything in 10.0.0.0/24), default NSG rules untouched. It worked, passed UAT, and went live. Three weeks later a dependency vulnerability in the storefront’s image-resize library gave an attacker remote code execution on a web VM. Because AllowVnetInBound (priority 65000) lets every VM reach every VM, the attacker port-scanned the subnet, found SQL MI on 1433, and began pulling customer rows. The only reason it was caught was an unusual egress spike — there was no segmentation to slow them down at all.

The remediation, done over a weekend, is the design this article teaches. They re-architected into three tier subnets in a fresh, properly-planned address space: snet-web 10.0.1.0/24, snet-app 10.0.2.0/24, snet-data 10.0.3.0/24, with a dedicated App Gateway subnet and an AzureBastionSubnet. Each tier got its own NSG. nsg-web allowed only 443 from Internet (priority 100) and denied the rest. nsg-app allowed 8080 only from asg-web (an ASG, not an IP list, so the rule survived the API tier’s autoscaling) and denied VirtualNetwork otherwise. nsg-data allowed 1433 only from asg-app and denied all other inbound — so even a fully-owned web VM now had no route to the database port. They added a private endpoint for Key Vault and a service endpoint for Storage so secrets and blobs left the public internet. Bastion replaced the public-IP’d jump box.

The instructive part was the confirmation. After applying the rules, the API tier couldn’t reach SQL — a 2 a.m. “the fix broke prod” moment. Instead of guessing, the on-call ran IP Flow Verify from an app NIC to the data IP on 1433: it returned Allowed by Allow-App-1433, so NSGs were fine. Then Effective Routes on the app NIC showed a 0.0.0.0/0 UDR (added that night to force egress through a new firewall) with no return route configured on the firewall side — the SYN reached SQL, the reply went to the firewall and died. Asymmetric routing, not an NSG. They exempted the VNet prefix from the forced-tunnel route and connectivity returned. Total hardening cost: the App Gateway, Bastion, a private endpoint, and a NAT/firewall already in the budget — roughly ₹18,000–22,000/month over the flat design, for turning a breach-amplifier into a contained, logged, defensible network. The lesson Lumio took away: the network “working” and the network being safe are different states, and only effective rules/routes tell you which one you’re in.

Advantages and disadvantages

Advantages Disadvantages
Segmentation — subnets isolate tiers; a breach is contained, not amplified Complexity — large rule sets across many subnets get hard to reason about
Stateful L4 filtering — precise allow/deny by 5-tuple, return traffic automatic Defaults surprise you — intra-VNet allow + internet-out allow are on until you override
Free — NSGs, subnets and ASGs cost nothing themselves L4 only — no application/HTTP awareness; not a WAF
Scales by attachment — one subnet NSG protects every VM in it Priority races — an allow above a deny silently over-exposes
ASGs / service tags — rules by role, resilient to scaling and IP change Region-scoped — ASGs and NSGs don’t span VNets/regions
Auditable as code — Bicep/Terraform makes rules reviewable in PRs Diagnosis takes tooling — a drop needs Network Watcher, not a guess
Endpoints keep PaaS off the public internet (backbone / private IP) Private endpoints add DNS + cost — plumbing and per-endpoint charges

NSGs are the right control when you need L4 segmentation between tiers and from the internet at zero cost and at scale — which is most of the time. They are the wrong sole control when the threat is application-layer (SQL injection, malicious HTTP, bot traffic): there you add Application Gateway WAF or Azure Firewall for L7 inspection, with NSGs still doing the coarse L4 job beneath. The complexity disadvantage is real but manageable: keep rules in code, scope by ASG, leave priority gaps, and the rule set stays legible. The defaults disadvantage is the dangerous one — every flat-network breach traces to AllowVnetInBound being left in force, which is exactly why “deny-by-default with explicit allows” is the first best practice below.

Hands-on lab

This builds a three-tier VNet with deny-by-default NSGs, ASGs, and a service endpoint — all free-tier-friendly except a single tiny VM you delete at the end. Run it in Cloud Shell.

1. Create the resource group and VNet with three subnets.

az group create -n rg-vnet-lab -l eastus

az network vnet create -g rg-vnet-lab -n vnet-lab \
  --address-prefixes 10.10.0.0/16 \
  --subnet-name snet-web --subnet-prefixes 10.10.1.0/24
az network vnet subnet create -g rg-vnet-lab --vnet-name vnet-lab \
  -n snet-app --address-prefixes 10.10.2.0/24
az network vnet subnet create -g rg-vnet-lab --vnet-name vnet-lab \
  -n snet-data --address-prefixes 10.10.3.0/24

2. Create NSGs and the ASGs, and attach the NSGs to subnets.

az network nsg create -g rg-vnet-lab -n nsg-web
az network nsg create -g rg-vnet-lab -n nsg-app
az network nsg create -g rg-vnet-lab -n nsg-data
az network asg create -g rg-vnet-lab -n asg-web -l eastus
az network asg create -g rg-vnet-lab -n asg-app -l eastus

az network vnet subnet update -g rg-vnet-lab --vnet-name vnet-lab -n snet-web  --network-security-group nsg-web
az network vnet subnet update -g rg-vnet-lab --vnet-name vnet-lab -n snet-app  --network-security-group nsg-app
az network vnet subnet update -g rg-vnet-lab --vnet-name vnet-lab -n snet-data --network-security-group nsg-data

3. Write the tiered rules — explicit allows, explicit denies above the defaults.

# Web: 443 from Internet, deny the rest
az network nsg rule create -g rg-vnet-lab --nsg-name nsg-web -n Allow-HTTPS \
  --priority 100 --direction Inbound --access Allow --protocol Tcp \
  --source-address-prefixes Internet --destination-port-ranges 443
az network nsg rule create -g rg-vnet-lab --nsg-name nsg-web -n Deny-VNet \
  --priority 4000 --direction Inbound --access Deny --protocol '*' \
  --source-address-prefixes VirtualNetwork --destination-port-ranges '*'

# App: 8080 only from asg-web
az network nsg rule create -g rg-vnet-lab --nsg-name nsg-app -n Allow-Web-8080 \
  --priority 100 --direction Inbound --access Allow --protocol Tcp \
  --source-asgs asg-web --destination-port-ranges 8080

# Data: 1433 only from asg-app
az network nsg rule create -g rg-vnet-lab --nsg-name nsg-data -n Allow-App-1433 \
  --priority 100 --direction Inbound --access Allow --protocol Tcp \
  --source-asgs asg-app --destination-port-ranges 1433

4. Add a service endpoint for Storage on the data subnet.

az network vnet subnet update -g rg-vnet-lab --vnet-name vnet-lab -n snet-data \
  --service-endpoints Microsoft.Storage
# Expected: the subnet now lists "serviceEndpoints": [{ "service": "Microsoft.Storage", ... }]

5. Verify with Network Watcher — confirm a packet, don’t assume. Create one tiny VM in the app subnet (joined to asg-app), then ask whether it can reach the data IP on 1433:

az vm create -g rg-vnet-lab -n vm-app-01 --image Ubuntu2204 --size Standard_B1s \
  --vnet-name vnet-lab --subnet snet-app --nsg "" --public-ip-address "" \
  --admin-username azureuser --generate-ssh-keys
az network nic ip-config update -g rg-vnet-lab --nic-name vm-app-01VMNic \
  --name ipconfig1 --application-security-groups asg-app

# IP Flow Verify: app NIC → a data-subnet IP on 1433 → expect "Access: Allow"
az network watcher test-ip-flow -g rg-vnet-lab --vm vm-app-01 \
  --direction Outbound --protocol TCP --local 10.10.2.4:50000 --remote 10.10.3.4:1433

# Effective rules + routes on the NIC — the source of truth
az network nic list-effective-nsg -g rg-vnet-lab -n vm-app-01VMNic -o table
az network nic show-effective-route-table -g rg-vnet-lab -n vm-app-01VMNic -o table

6. Tear down.

az group delete -n rg-vnet-lab --yes --no-wait

What each verification step should tell you:

Lab check Command Expected result If it differs
App→Data:1433 allowed test-ip-flow Access: Allow (rule Allow-App-1433) NIC not in asg-app, or rule wrong
Web→Data:1433 blocked test-ip-flow from a web NIC Access: Deny (DenyAllInBound) A stray allow is over-exposing data
Effective NSG list-effective-nsg Your rules above the 65000 defaults Subnet NSG not attached
Effective routes show-effective-route-table System routes only (no surprise UDR) An unexpected UDR blackholes traffic
Service endpoint subnet serviceEndpoints Microsoft.Storage present Flag not applied to the right subnet

Common mistakes & troubleshooting

The differentiator. Below is the symptom → root cause → confirm (exact command/portal path) → fix playbook. Scan the table, then read the detail for the row that matches your incident.

# Symptom Root cause Confirm (exact path) Fix
1 Web tier reachable on ports beyond 443 Over-open rule, or no explicit deny above AllowVnetInBound az network nic list-effective-nsg on a web NIC Scope to 443; add explicit Deny at low-enough priority
2 App tier breaks when web tier scales/re-IPs Rule source is a CIDR, not an ASG Inspect nsg-app rule sourceAddressPrefix Change source to asg-web; join web NICs to it
3 Compromised web VM reaches the database AllowVnetInBound left in force; no tier deny IP Flow Verify web→data:1433 → Allow Deny VirtualNetwork inbound on data; allow only asg-app
4 App can’t reach SQL; rules look correct A 0.0.0.0/0 UDR blackholes the return path show-effective-route-table on app NIC Fix the route/NVA return path; exempt VNet prefix
5 Storage/Key Vault traffic exits to internet No service/private endpoint on the subnet Effective routes: PaaS prefix → Internet Add service endpoint or private endpoint
6 Allow rule exists but traffic still dropped A higher-priority deny wins, or NIC+subnet NSG conflict Effective NSG shows which rule matched Re-prioritise; remember both NSGs must allow
7 Peering won’t create Overlapping address spaces az network vnet show --query addressSpace both sides Re-plan CIDRs; no overlap allowed
8 A→C unreachable through a hub Peering is non-transitive Topology: A–B and B–C peered, no A–C Hub NVA + UDRs, or mesh-peer A–C
9 Gateway/Bastion won’t deploy into a subnet Wrong subnet name or too small Subnet name ≠ GatewaySubnet/AzureBastionSubnet Recreate with the exact required name + size
10 Subnet “full” far below CIDR count Forgot the 5 reserved addresses /29 → only 3 usable, not 8 Size up; account for reservations
11 LB-backed app health-probes fail Custom deny blocked AzureLoadBalancer tag Effective NSG; missing AllowAzureLoadBalancerInBound override Allow AzureLoadBalancer source on the probe port
12 Outbound to a PaaS service intermittently fails Hard-coded PaaS IPs instead of a service tag Rule uses raw CIDR for Storage/Sql Use the Storage/Sql service tag (auto-updates)
13 Connection opens then hangs (no reset) Asymmetric routing via NVA without SNAT Connection Troubleshoot shows one-way path SNAT at the NVA or route symmetrically
14 Flow logs empty / no forensic data NSG flow logs never enabled Network Watcher → NSG flow logs status Enable flow logs to a storage account

Mistake 1 — Web tier open beyond 443 (the missing explicit deny)

People add Allow 443 from Internet and stop, assuming everything else is blocked. Inbound from the internet is — DenyAllInBound (65500) handles that — but inbound from inside the VNet is allowed by AllowVnetInBound (65000). If a peered VNet or another subnet is hostile, the web tier is reachable on every port internally.

Confirm. Compute the effective rules on a web NIC and look for what actually governs internal traffic:

az network nic list-effective-nsg -g rg-shop-prod -n nic-web-01 \
  --query "value[].{name:name, rules:effectiveSecurityRules[].name}" -o json

Fix. Add an explicit deny for VirtualNetwork inbound at a priority below 65000 (so it wins over the default), allowing only what each tier legitimately needs first.

Mistake 2 — App rules written by IP instead of ASG

The app tier’s NSG says allow 8080 from 10.0.1.0/24. It works — until the web subnet is re-CIDR’d, or you decide some web VMs live elsewhere, or a VMSS instance lands on an IP the rule didn’t anticipate. The rule is now subtly wrong and the failure is silent.

Confirm. Inspect the rule’s source:

az network nsg rule show -g rg-shop-prod --nsg-name nsg-app -n Allow-Web-to-App \
  --query "{src:sourceAddressPrefix, srcAsg:sourceApplicationSecurityGroups[].id, port:destinationPortRange}" -o json

If src is a CIDR and srcAsg is empty, that’s the bug. Fix: switch the source to asg-web and ensure web NICs are members.

Mistake 4 — The UDR that blackholes the return path

This is the cruel one because NSGs are innocent. Someone adds a 0.0.0.0/0 route to an NVA/firewall to inspect egress. Now the SYN from the app to SQL (or to the internet) goes out fine, but the reply is routed to the NVA, which either drops it (no return route) or sends it from a different source (no SNAT) — so the client sees a connection that opens and then hangs.

Confirm. The effective route table on the NIC reveals the override:

az network nic show-effective-route-table -g rg-shop-prod -n nic-app-01 \
  --query "value[?addressPrefix[0]=='0.0.0.0/0'].{prefix:addressPrefix, hop:nextHopType, ip:nextHopIpAddresses}" -o json
# Or end-to-end:
az network watcher test-connectivity -g rg-shop-prod \
  --source-resource vm-app-01 --dest-address 10.0.3.4 --dest-port 1433

Fix. Make routing symmetric (return traffic traverses the same appliance) or have the NVA SNAT; exempt the VNet prefix from the forced-tunnel route if intra-VNet traffic shouldn’t be inspected.

Mistake 6 — “Allow exists but it’s still dropped”

Two traps. Priority: a higher-priority (lower-number) deny matched first and stopped evaluation, so your allow at priority 300 never ran. Two NSGs: with both a subnet NSG and a NIC NSG, both must allow — one denying overrides the other allowing.

Confirm. Effective rules show exactly which rule won; if you see your deny matching before the allow, that’s a priority bug; if the subnet NSG allows but the NIC NSG denies, that’s the dual-NSG trap. Fix: re-prioritise (specific allows below broad denies, or restructure), and reconcile the two NSGs.

Network Watcher tools — what to reach for and when

The whole reason to know NSGs and routes deeply is to confirm a drop instead of guessing. The toolset:

Tool Answers the question Command / path
IP Flow Verify “Is this exact 5-tuple allowed, and by which rule?” az network watcher test-ip-flow
Effective Security Rules “What’s the computed NSG on this NIC?” az network nic list-effective-nsg
Effective Routes “Where does this NIC actually send a packet?” az network nic show-effective-route-table
Connection Troubleshoot “Can A reach B, and where does it die?” az network watcher test-connectivity
NSG Flow Logs “Show me what was allowed/denied over time” Network Watcher → NSG flow logs
Connection Monitor “Continuously watch this path’s health” Network Watcher → Connection Monitor
Next Hop “What’s the next hop for this destination?” az network watcher show-next-hop
Topology “Draw what’s actually wired together” Network Watcher → Topology

The order to run them in a connectivity incident — it localises the drop fastest:

Step Run If it says… Then
1 IP Flow Verify (source→dest:port) Deny It’s an NSG rule — fix the rule (you even get its name)
2 (if Allow) Effective Routes A surprise 0.0.0.0/0 UDR Routing/asymmetry — fix the route/NVA
3 (if routes fine) Connection Troubleshoot Fails at a hop Inspect that hop (NVA, firewall, DNS)
4 (if still unclear) NSG Flow Logs Shows the drop Confirm direction + rule; widen investigation
5 (app-layer) check the app/DNS Network is clean It’s not the network — hand to app team

Best practices

The leading indicators to alert on, so a misconfiguration surfaces before a breach does:

Alert on Signal Why it’s leading
New broad allow rule NSG rule with * source/port added Catches over-exposure at change time
Deny-rule hit spike Flow logs: rising deny count Probe/scan in progress, or a broken legit flow
Public IP attached to a tier NIC Resource graph query A VM that should be private went public
PaaS firewall set to “all networks” Storage/SQL network setting Endpoint isolation regressed
UDR 0.0.0.0/0 added/changed Route table change event Potential blackhole/asymmetry incoming
Flow logs stopped No data in the log storage Lost forensic visibility

Security notes

The security control matrix — which Azure control answers which threat, and where it sits relative to NSGs:

Threat Primary control Layer NSG’s role
Lateral movement between tiers Tier NSGs + ASGs (deny-by-default) L4 The control
Public exposure of a VM No public IP + Bastion + NSG L3/L4 Enforces “no inbound from Internet”
Application attacks (SQLi, XSS) App Gateway WAF L7 Coarse L4 beneath the WAF
Malicious egress / exfiltration Azure Firewall (FQDN filtering) + UDR L4–L7 Restrict egress by service tag
PaaS data exposed publicly Private endpoint (+ disable public) L3 Allow only the subnet to the PE
Credential/secret theft Key Vault + private endpoint + RBAC L3+identity Reach KV only privately
Undetected intrusion NSG flow logs + traffic analytics observability Source of the deny/allow record

Cost & sizing

The good news: the core primitives are free. VNets, subnets, NSGs, ASGs and route tables cost nothing — you are billed for what flows and for the gateways/endpoints you add, not for the network structure. So segment generously; there is no cost reason to run a flat network.

What actually drives the bill: VNet peering charges per GB in both directions (ingress and egress on the peering), which adds up on chatty cross-region hub-spoke traffic. Private endpoints cost a small hourly charge per endpoint plus per-GB processed — meaningful at scale (one per sensitive resource). VPN/ExpressRoute gateways are the big-ticket items (a continuous hourly charge by SKU, plus egress), justified by hybrid connectivity, not networking hygiene. Azure Firewall (if you forced-tunnel egress) carries a substantial hourly + per-GB cost. NAT Gateway for outbound SNAT scaling is a modest hourly + per-GB. Data egress to the internet is billed per GB across all of these.

For Lumio’s hardening, the structure (subnets, NSGs, ASGs) was free; the added cost was the App Gateway, Bastion, one private endpoint, and the firewall/NAT already budgeted — roughly ₹18,000–22,000/month over the flat design, which is trivial against the cost of the breach it prevents. The cost drivers and what each buys:

Cost driver What you pay for Rough INR / month What it buys Watch-out
VNet / subnet / NSG / ASG Nothing ₹0 All segmentation + filtering No reason not to segment
VNet peering Per-GB both directions ~₹1/GB each way Cross-VNet/region private reach Chatty hub-spoke adds up
Private endpoint Hourly + per-GB ~₹600–900 + data, each Fully-private PaaS One per resource; multiplies
Service endpoint Nothing ₹0 PaaS over backbone (same region) Resource keeps public IP
Azure Bastion Hourly (SKU) + outbound ~₹10,000–14,000 No public IPs for admin Per-VNet (or peered) cost
Azure Firewall Hourly + per-GB ~₹40,000+ Egress inspection/FQDN filtering Expensive — justify it
NAT Gateway Hourly + per-GB ~₹1,500–3,000 Outbound SNAT scaling Needs the right subnet
VPN gateway (S2S) Hourly (SKU) + egress ~₹8,000–30,000 Hybrid over internet SKU drives throughput + price
Internet egress Per-GB out ~₹7–9/GB (tiered) (the data leaving) The silent recurring line

Sizing rules of thumb: a /16 per VNet and /24 per tier subnet is the comfortable default that never bites; size AKS node/pod subnets generously (/22+); always honour the five-reserved overhead; and place a private endpoint per sensitive resource (not per every PaaS object) to control the per-endpoint cost.

Interview & exam questions

1. Why is a freshly-created Azure VNet not secure by default, even though it has no public IP? Because the default NSG rule AllowVnetInBound (priority 65000) permits all traffic between resources inside the VNet (and peered VNets). A VNet is private from the internet but flat internally — any foothold can reach any other VM on any port until you add explicit deny rules. Security requires overriding that default with tier-scoped denies.

2. Explain NSG rule priority and statefulness. Rules have a priority 100–4096; the platform evaluates in ascending order and applies the first match, then stops — so a broad allow at 100 beats a specific deny at 200. NSGs are stateful: allowing an inbound flow automatically permits its return traffic, so you don’t write a matching outbound rule. Most NSG bugs are a priority race (allow above deny) or a redundant outbound rule breaking statefulness.

3. What are the default NSG rules and which two cause the most incidents? Inbound: AllowVnetInBound (65000), AllowAzureLoadBalancerInBound (65001), DenyAllInBound (65500). Outbound: AllowVnetOutBound, AllowInternetOutBound, DenyAllOutBound. The two that cause incidents are AllowVnetInBound (flat-network lateral movement) and AllowInternetOutBound (unrestricted egress/exfiltration) — both must be overridden for a secure posture.

4. ASG vs raw IP vs service tag as a rule source — when each? Use an ASG for “this tier/role” inside your VNet (membership follows the NIC, surviving scale/re-IP); a service tag (Internet, Sql, Storage) for cloud-service ranges Microsoft maintains and auto-updates; a raw CIDR only for fixed external ranges (on-prem, partner IPs). Hard-coding IPs for an autoscaling tier or a Microsoft service is the classic brittle mistake.

5. How many usable addresses in a /24 subnet in Azure, and why? 251, not 254 — Azure reserves five addresses per subnet: network (.0), default gateway (.1), two for Azure DNS (.2, .3), and broadcast (.255). This overhead is why a /29 (8 total) leaves only 3 usable and is the realistic floor for a real subnet.

6. An app VM can’t reach SQL; IP Flow Verify says the traffic is Allowed. What next, and what are you looking for? The NSGs are fine, so it’s routing. Run Effective Routes on the app NIC and look for a 0.0.0.0/0 UDR to an NVA/firewall — if present and the return path isn’t symmetric (or the NVA doesn’t SNAT), you have asymmetric routing: the SYN arrives but the reply is dropped. Fix the route/return path or exempt the VNet prefix.

7. Difference between a service endpoint and a private endpoint? A service endpoint extends the subnet’s identity to a PaaS service over the Azure backbone (the PaaS firewall then trusts the subnet) — free, same-region, but the resource keeps its public IP. A private endpoint gives the PaaS resource a private IP inside your subnet (Private Link), so you can disable public access entirely — stronger isolation, but it costs per-endpoint and needs Private DNS. Compliance-bound data stores usually want private endpoints.

8. Why won’t two VNets peer, and why is peering “non-transitive”? They won’t peer if their address spaces overlap — Azure refuses the peering. Peering is non-transitive: if A↔B and B↔C are peered, A still can’t reach C; you route A↔C through a hub (NVA/firewall + UDRs) or peer A↔C directly. This is why central, non-overlapping CIDR planning is foundational.

9. You add Allow 443 from Internet to the web NSG. Is the web tier now locked down? Why or why not? No. That rule only governs inbound from the internet; inbound from inside the VNet is still allowed by AllowVnetInBound. A peered VNet or another subnet can reach the web VMs on any port. You must add an explicit deny for VirtualNetwork inbound (and allow only legitimate internal flows) to actually lock it down.

10. How do you give RDP/SSH to admins without putting a public IP on a VM? Deploy Azure Bastion into the mandatory AzureBastionSubnet (/26); it provides browser-based RDP/SSH over TLS with no public IP on the workload VMs, and you restrict who can reach Bastion to a known admin range. A public-IP’d jump box is the anti-pattern Bastion replaces.

11. A health-probe-based load-balanced app starts failing health checks after you tighten the NSG. Likely cause? Your custom deny rule blocked the AzureLoadBalancer service tag, so the platform’s health probes can’t reach the VMs and they’re marked unhealthy. Re-allow AzureLoadBalancer as a source on the probe port (the default AllowAzureLoadBalancerInBound does this until a custom rule overrides it).

12. Both a subnet NSG and a NIC NSG are attached. How is a packet evaluated? Both are evaluated and both must allow for traffic to pass. For inbound, the subnet NSG is evaluated first, then the NIC NSG; for outbound, NIC first then subnet. A common “my allow doesn’t work” bug is one NSG allowing while the other denies.

These map to AZ-104 (Administrator)configure and manage virtual networking (VNets, subnets, NSGs, ASGs, peering, service endpoints) — and AZ-700 (Network Engineer)design and implement core networking, routing, and private access. The security framing (segmentation, egress control, private endpoints) overlaps AZ-500. A compact cert-mapping for revision:

Question theme Primary cert Exam objective area
VNets, subnets, address space AZ-104 / AZ-700 Configure virtual networks; design IP addressing
NSG rules, priority, defaults AZ-104 Configure network security groups
ASGs, service tags AZ-104 / AZ-500 Secure connectivity; NSG/ASG
UDRs, routing, NVA, forced tunnel AZ-700 Design and implement routing
Service vs private endpoints AZ-700 / AZ-500 Private access to PaaS
Peering, hub-spoke, transitivity AZ-700 Design hybrid/inter-VNet connectivity
Network Watcher diagnostics AZ-104 / AZ-700 Monitor and troubleshoot networking

Quick check

  1. A new VNet has no public IPs and “isn’t reachable from the internet.” Why is it still not secure, and which default rule is responsible?
  2. An NSG has Allow Any-Any at priority 100 and Deny 3389 at priority 200. Is RDP blocked? Why?
  3. How many usable IP addresses does a /27 subnet give you in Azure, and why isn’t it 32 (or 30)?
  4. An app NIC’s IP Flow Verify to the DB on 1433 returns Allow, but the app still can’t connect. What is the single most likely culprit and the command to confirm it?
  5. You need the app tier to accept 8080 only from the web tier, which autoscales. What do you use as the rule source, and why not a CIDR?

Answers

  1. Because the default rule AllowVnetInBound (priority 65000) allows all intra-VNet (and peered-VNet) traffic — the VNet is private from the internet but flat internally, so any foothold reaches any VM on any port. You must override it with explicit tier-scoped deny rules.
  2. No, RDP is not blocked. Evaluation is by ascending priority with first-match-wins: Allow Any-Any at priority 100 matches the RDP packet first and stops evaluation, so the Deny 3389 at 200 never runs. Put specific denies above (lower number than) broad allows.
  3. 27 usable. A /27 has 32 total addresses, but Azure reserves five per subnet (network, gateway, two DNS, broadcast), leaving 27 — not 30 (the on-prem figure) and not 32.
  4. A UDR causing asymmetric routing (a 0.0.0.0/0 route to an NVA/firewall with no symmetric return path or SNAT) — the SYN arrives, the reply is dropped. Confirm with az network nic show-effective-route-table on the app NIC and look for the 0.0.0.0/0 next-hop to a VirtualAppliance.
  5. An Application Security Group (asg-web) as the source. ASG membership follows the NIC, so as the web tier autoscales or re-IPs the rule stays correct; a CIDR like 10.0.1.0/24 breaks silently when instances land on unexpected IPs or the subnet is re-planned.

Glossary

Next steps

You can now design and secure a segmented VNet and confirm a drop instead of guessing. Build outward:

AzureVirtual NetworkSubnetsNSGNetwork SecurityASGUDRNetworking
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading