Azure Bastion Deep Dive: Native Client Tunneling, Shareable Links, and Just-in-Time Secure Access

Every public IP on a workload VM is an open invitation to the internet’s background noise of credential-stuffing bots and CVE scanners. The traditional answer was a jump box: one hardened VM with a public IP and RDP/SSH behind an NSG and a VPN — still an internet-facing host you patch, monitor, and explain to your auditor. Azure Bastion removes it. It is a managed, agentless PaaS service that brokers RDP and SSH over TLS 443, so your VMs need no public IP, no inbound 3389/22 from the internet, and no agent inside the guest. The Bastion host lives in a dedicated subnet inside your VNet, reaches your VMs over their private IPs, and presents the session either as an HTML5 canvas in the portal or — far more usefully at scale — as a native mstsc/OpenSSH session tunnelled through the broker.

This guide goes past the portal “Connect” button into the part that matters in production: native client tunneling for scp/Ansible/full RDP, shareable links for third parties with no Azure account, session recording for PCI/HIPAA/SOC 2 evidence, IP-based connections for on-prem and peered targets, hub-and-spoke reuse so one host serves the whole estate, and the methodical decommissioning of your legacy jump boxes. Because Bastion is a security control you operate, not a one-time deploy, the SKUs, subnet rules, NSG flows, RBAC roles, error codes, limits and cost levers are all laid out as scannable tables — read the prose once, then keep the tables open when you are sizing a host, debugging a hung session, or answering a QSA.

By the end you will know exactly which SKU to deploy and why you cannot downgrade it, how to size AzureBastionSubnet and the scale units so 200 operators connect during a release window, how to wire native tunneling and prove it with scp, how to gate every session behind Conditional Access and PIM, and how to read BastionAuditLogs to close the loop with your auditor. Knowing which knob fixes a hung connection in ninety seconds is what separates a controlled cutover from a week of “the vendor still can’t get in.”

What problem this solves

Remote administrative access is the single richest target in any estate. A workload VM with a public IP and an NSG rule allowing 3389 from Any (or even from a “corporate range” that turns out to be a /8) is permanently exposed to password-spray and to whatever RDP/SSH CVE is current that quarter. The classic mitigation — a jump box behind a VPN — does not remove the exposure, it relocates it onto one box you now own end to end: you patch its OS, rotate its credentials, monitor its logins, size its public IP, and answer for it in every audit. And a jump box is still an interactive host an attacker can pivot from once they are on it.

What breaks without Bastion: an on-call engineer cuts a “temporary” NSG hole and public IP for a vendor and forgets to close it; a contractor’s laptop with a cached SSH key walks out the door and the key is still trusted; an auditor asks “show me who RDP’d into the cardholder-data box last Tuesday and what they did,” and there is no recording, only a Windows Security log on the box itself (which the same admin could clear). Meanwhile the management ports stay open 24/7 because closing them breaks access, so the attack surface is permanent.

Who hits this: every team running IaaS VMs that need interactive administration — which is most of them. It bites hardest on regulated estates (PCI-DSS, HIPAA, SOC 2, ISO 27001) that mandate no public IPs on in-scope hosts and recorded admin sessions; on hub-and-spoke platforms where per-VNet jump boxes multiply cost and operational surface; and on hybrid shops that need the same broker to reach on-prem servers over ExpressRoute. Bastion’s promise is concrete: zero public IPs on workloads, zero inbound management ports from the internet, identity-governed access, and an immutable session ledger — provided you pick the right SKU and turn the right knobs, which is exactly what defaults will not do for you.

To frame the field before the deep dive, here is every access pattern Bastion replaces or enables, the pain it removes, and the SKU floor it needs:

Access pattern	Pain it removes	Bastion feature	Minimum SKU
RDP/SSH from the portal (HTML5)	Public IP + open `3389`/`22` on the VM	Browser connect over TLS 443	Basic
Terminal-native SSH / `scp` / Ansible	HTML5 canvas can’t run real tooling	Native client tunneling	Standard
Full `mstsc` RDP (multi-monitor, drives)	Browser RDP is a limited canvas	`az network bastion rdp` / tunnel	Standard
Third-party access (no Azure account)	Temporary public IP + NSG hole for a vendor	Shareable links	Standard
Reach on-prem / peered private-IP hosts	Separate jump box per network	IP-based connection	Standard
Recorded admin sessions for audit	No tamper-evident session evidence	Session recording	Premium
No public IP on the Bastion host itself	The broker is itself internet-facing	Private-only deployment	Premium

Learning objectives

By the end of this article you can:

Choose between the Developer, Basic, Standard, and Premium SKUs deliberately, and explain why you can only upgrade — never downgrade — and what each tier actually unlocks.
Lay down a correct AzureBastionSubnet (/26 minimum, exact name, nothing else in it) with a Standard Static public IP and zone-redundant scale units sized to peak concurrency.
Wire native client tunneling for SSH, RDP, and scp/file transfer, and explain the difference between az network bastion ssh, tunnel, and rdp.
Issue and revoke shareable links as time-boxed grants, and enable IP-based connections to reach on-prem and peered targets the same broker.
Stand up session recording to an immutable, CMK-encrypted storage account and stream BastionAuditLogs to Log Analytics, then query the connection ledger in KQL.
Centralise one Bastion in a hub across peered spokes with the correct peering flags, and explain why the second (transitive) peering hop does not work.
Harden the host with the mandatory NSG flow set, Conditional Access, least-privilege RBAC, and PIM, and pair it with Defender for Cloud JIT so ports open only to the broker subnet.
Right-size the bill and decommission legacy jump boxes methodically, stripping their public IPs only after Bastion access is proven.

Prerequisites & where this fits

You should already understand Azure networking fundamentals: a virtual network is an address space carved into subnets, traffic between subnets and peered VNets is governed by NSGs and route tables, and a VM reaches the world through a public IP (or, increasingly, should not). If those words are fuzzy, read Azure Virtual Network Deep Dive: Every Setting and Azure Virtual Network basics: subnets, NSGs, peering first. You should be comfortable running az in Cloud Shell, reading JSON output, and you should know what a managed identity and an RBAC role assignment are at a basic level.

This sits in the Security / secure-access track of the Azure Zero-to-Hero program. It is downstream of VNet design and upstream of the broader Azure Zero Trust multilayer security model, of which “no public IPs, identity-governed access” is a pillar. It pairs tightly with Microsoft Entra Conditional Access at scale (which gates native sessions), PIM for Azure resources (which makes even the connect right just-in-time), and Azure Monitor & Application Insights for observability (where the audit ledger lands). If you also need outbound egress control off those now-public-IP-less VMs, Azure NAT Gateway for deterministic egress is the complement.

A quick map of which layer owns what, so you call the right person when a session won’t land:

Layer	What lives here	Who usually owns it	Failure it can cause
Identity (Entra)	RBAC, Conditional Access, PIM	Identity team	Connect denied; CA blocks the session
Bastion host	SKU, scale units, tunneling flag	Platform / network	Native subcommands fail; concurrency capped
`AzureBastionSubnet` + NSG	Subnet size, the mandatory flow set	Network team	Silent break of `443`/`4443` or egress
VNet peering	`allowForwardedTraffic`, transitivity	Network team	Spoke VM unreachable from hub host
Target VM	Private IP, guest firewall, local creds	App / VM team	RDP/SSH refused inside the guest
Storage + Key Vault	Recording container, CMK, immutability	Platform / security	Recordings not written or tamperable

Core concepts

Six mental models make every later decision obvious.

Bastion is a broker inside your VNet, not a gateway at its edge. The Bastion host is a set of managed VMs (Microsoft calls each a scale unit or instance) that live in a dedicated subnet named exactly AzureBastionSubnet. A client reaches Bastion over TLS 443 — from the portal or, with tunneling, from the local CLI — and Bastion reaches the target VM over its private IP using ordinary 3389/22 from inside the VNet. Because the broker is in the VNet, the target needs no public IP and no internet-facing port; the only public surface is Bastion’s own 443 (and even that disappears with the private-only Premium deployment).

The SKU is a one-way ratchet. Bastion has four tiers — Developer, Basic, Standard, Premium — and each adds features the one below lacks. You can upgrade in place (Basic→Standard→Premium) but you cannot downgrade; to go down a tier you delete and redeploy. Pick deliberately, because the gaps are large: native tunneling, shareable links, custom ports, file transfer and IP-based connections all start at Standard, and session recording and private-only deployment are Premium-only.

The subnet name and size are load-bearing, not cosmetic. The platform keys off the literal name AzureBastionSubnet — call it anything else and Bastion will not deploy. The minimum size for any Bastion created on or after 2 November 2021 is /26; a /27 is rejected, and a grandfathered /27 cannot scale host instances. The subnet holds nothing else: no NICs, no NAT gateway, no other resource. NSGs and route tables are supported on it, but the address space is Bastion’s alone.

Host scaling is how Bastion serves concurrency. Each scale unit handles roughly 20 concurrent RDP or 40 concurrent SSH sessions. Basic is fixed at 2 instances; Standard and Premium let you set 2–50. You size to peak concurrency, not VM count — a release window with 200 simultaneous operators wants ~10 instances, which is the real reason the subnet must be /26. Scale units and zone redundancy (pinning instances across zones 1/2/3) are chosen at deploy time; zones are immutable afterward, scale units you can adjust on Standard/Premium.

Native client tunneling is what makes Bastion usable for engineers. The browser HTML5 session is fine for a one-off click. For anyone who lives in a terminal, runs scp, drives Ansible, or wants a real mstsc session, the native client path (az network bastion ssh | tunnel | rdp) opens a local connection that tunnels through the broker. It requires Standard+ with the tunneling flag explicitly enabled — --enable-tunneling true — or the subcommands fail even on the right SKU.

Bastion shrinks the attack surface but is not a free pass. It removes public IPs and open ports, but the right to connect is still an RBAC outcome you must grant least-privilege, native sessions still authenticate through Entra (so Conditional Access applies), the subnet still needs a precise NSG flow set or it breaks silently, and a shareable link left standing is a standing exposure. Bastion replaces the jump box’s risks with a smaller, governable set — but only if you turn the knobs.

The vocabulary in one table

Before the deep sections, pin every moving part. The glossary repeats these for lookup; this is the mental model side by side:

Concept	One-line definition	Where it lives	Why it matters
Bastion host	Managed broker for RDP/SSH over TLS	`AzureBastionSubnet` in your VNet	The thing you deploy; SKU-gated
Scale unit (instance)	One managed VM behind the host	In the subnet	Concurrency: ~20 RDP / ~40 SSH each
`AzureBastionSubnet`	The mandatory dedicated subnet	In the VNet	Exact name + `/26`; nothing else in it
SKU tier	Developer / Basic / Standard / Premium	Host property	One-way upgrade; gates every feature
Native client tunneling	Local CLI session through the broker	Client + host	Real `scp`/`ssh`/`mstsc`; Standard+
Shareable link	URL to one VM, no Azure account	Host feature	Vendor access; auth = VM’s own creds
IP-based connection	Reach a target by private IP	Host feature	On-prem / peered hosts; Standard+
Session recording	Video of the RDP/SSH session	Premium → your storage	Tamper-evident audit evidence
`BastionAuditLogs`	Diagnostic log of each connection	Log Analytics	The who/when/what ledger
JIT VM access	Defender opens ports time-boxed	NSG via Defender	Ports open only to the broker subnet
Private-only	Bastion host with no public IP	Premium property	Removes the last public surface
Forwarded traffic	Peering flag letting brokered traffic transit	Peering config	Required for hub-and-spoke reach

1. Pick the SKU before you touch a subnet

The SKU decides which features exist and you cannot downgrade later — only upgrade. Choose deliberately; the gaps between the four tiers are large, and discovering a missing feature mid-engagement (no native SSH, no file copy, no shareable link) means a delete-and-redeploy under pressure.

The full feature matrix, every row:

Feature	Developer	Basic	Standard	Premium
Cost model	Free (shared)	Hourly + data	Hourly + data	Hourly + data
Dedicated deployment	No (shared fabric)	Yes	Yes	Yes
`AzureBastionSubnet` required	No	Yes	Yes	Yes
VNet peering reach (hub-spoke)	No	Yes	Yes	Yes
Concurrent connections	1	Fixed	Scales	Scales
Host scaling (instances)	No	Fixed (2)	2–50	2–50
Zone redundancy	No	Yes	Yes	Yes
Native client (`tunnel`/`ssh`/`rdp`)	No	No	Yes	Yes
Custom inbound ports	No	No	Yes	Yes
File transfer (upload/download)	No	No	Yes	Yes
Shareable links	No	No	Yes	Yes
IP-based connection	No	No	Yes	Yes
Kerberos authentication	No	No	Yes	Yes
Private-only deployment (no public IP)	No	No	No	Yes
Session recording	No	No	No	Yes
Upgrade path	redeploy	→Standard/Premium	→Premium	terminal

The practical read on each tier — what it is for and the trap it sets:

Tier	Use it for	The trap
Developer	Personal dev/test convenience; free	One concurrent connection; no peering → cannot serve hub-and-spoke; shared fabric
Basic	Browser-only RDP/SSH on a single VNet	No native client, no file copy, no shareable links — you hit the wall the first real day
Standard	The platform baseline	Lacks session recording and private-only — fine until an auditor or a private-only mandate appears
Premium	Regulated estates needing recording / private-only	Highest hourly rate; overkill if you owe no session audit trail

The decision in one table — match your requirement to the floor SKU:

If you need…	Smallest SKU	Why
A free sandbox, one VNet, one session	Developer	Shared fabric, no peering, single connection
Browser RDP/SSH, dedicated host, one VNet	Basic	Dedicated but feature-bare
Native `ssh`/`scp`/`mstsc`, custom ports, file copy	Standard	All the day-to-day engineering features live here
Shareable links for vendors	Standard	First tier with the feature
Reach on-prem / peered private IPs	Standard	IP-based connection
Recorded sessions for PCI/HIPAA/SOC 2	Premium	Session recording is Premium-only
No public IP on the Bastion host itself	Premium	Private-only deployment is Premium-only
Hub-and-spoke serving many spokes	Standard (Premium if recording)	Peering reach starts at Basic; features at Standard

The practical rule: for a centralized, shared Bastion in a hub, deploy Standard at minimum and Premium if you owe anyone a session audit trail or a private-only host. Developer is a personal convenience, not platform infrastructure; Basic I rarely deploy because the first request for a native SSH session or a file copy strands you.

2. Subnet design, host scaling, and zones

Bastion (every SKU except Developer) requires a dedicated subnet named exactly AzureBastionSubnet. This is not a convention you may vary — the platform keys off the literal string. The rules people get wrong, each with its consequence:

Rule	Requirement	What breaks if you ignore it
Subnet name	Exactly `AzureBastionSubnet`	Deployment fails / Bastion not offered for the subnet
Subnet size	`/26` or larger (post 2 Nov 2021)	`/27` rejected; grandfathered `/27` can’t scale instances
Subnet contents	Bastion only — no NICs, NAT GW, other resources	Conflicts; Bastion deploy refused
NSG	Supported, but must allow the required flow set	Over-zealous NSG silently breaks `443`/`4443`
Route table	Tolerated; don’t force-tunnel Bastion’s own egress	A 0.0.0.0/0 UDR to an NVA can blackhole the control plane
Public IP	Standard SKU, Static allocation	Dynamic / Basic-SKU IP rejected
Delegation	None required	—

Lay the network down with the /26 subnet and a Standard, Static public IP — Bastion will not accept a Dynamic or Basic-SKU IP:

RG=rg-hub-network
LOC=eastus
VNET=vnet-hub
BASTION=bastion-hub

# Dedicated /26 subnet — the name is mandatory and case-sensitive
az network vnet subnet create \
  --resource-group "$RG" \
  --vnet-name "$VNET" \
  --name AzureBastionSubnet \
  --address-prefixes 10.0.255.0/26

# Standard SKU, Static allocation, zone-redundant — all required/recommended
az network public-ip create \
  --resource-group "$RG" \
  --name pip-bastion-hub \
  --sku Standard \
  --allocation-method Static \
  --zone 1 2 3

resource bastionSubnet 'Microsoft.Network/virtualNetworks/subnets@2023-11-01' = {
  parent: vnet
  name: 'AzureBastionSubnet'   // exact name — platform requirement
  properties: {
    addressPrefix: '10.0.255.0/26'   // /26 minimum
  }
}

resource bastionPip 'Microsoft.Network/publicIPAddresses@2023-11-01' = {
  name: 'pip-bastion-hub'
  location: location
  sku: { name: 'Standard' }                 // Standard required
  zones: [ '1', '2', '3' ]                   // zone-redundant
  properties: { publicIPAllocationMethod: 'Static' }   // Static required
}

Host scaling is how Bastion handles concurrency. Each scale unit is a managed VM behind the service. Basic is fixed at two; Standard and Premium let you set 2 to 50. Size to peak concurrency, not VM count. The sizing reference — pick the smallest instance count that covers your peak:

Scale units	~Concurrent RDP	~Concurrent SSH	Subnet draw	Typical use
2 (Basic / Standard floor)	~40	~80	small	Small team, single VNet
4	~80	~160	small	Mid platform, a few spokes
8	~160	~320	moderate	Regional hub, release windows
10	~200	~400	moderate	200-operator release; the `/26` payoff
20	~400	~800	larger	Large estate, many spokes
50 (max)	~1,000	~2,000	largest	Very large multi-spoke estate

az network bastion create \
  --resource-group "$RG" \
  --name "$BASTION" \
  --vnet-name "$VNET" \
  --public-ip-address pip-bastion-hub \
  --sku Standard \
  --scale-units 4 \
  --location "$LOC" \
  --zone 1 2 3 \
  --enable-tunneling true

--enable-tunneling true is the switch that turns on native client support. Without it, the tunnel/ssh/rdp subcommands in the next step fail even on a Standard SKU. The create-time flags that are immutable versus mutable — get the immutable ones right the first time:

Property	Set at	Mutable later?	Notes
SKU tier	Create	Upgrade only	Basic→Standard→Premium; no downgrade
Availability zones	Create	No	Cannot re-zone a live Bastion
Scale units	Create	Yes (Std/Prem)	2–50; raise/lower as concurrency changes
`--enable-tunneling`	Create or update	Yes	Required for native client
`--enable-ip-connect`	Create or update	Yes	Required for IP-based connection
`--enable-kerberos`	Create or update	Yes	For AD-joined target auth
Public IP	Create	Replaceable	Standard + Static only

Zone redundancy is set at deployment and immutable afterward — you cannot re-zone a live Bastion. In supported regions, pin instances across zones 1, 2, and 3 so a single zone failure does not sever all remote access during an incident, which is precisely when you need it most. If you skip zones and the region has a zonal event, your break-glass path is gone at the worst possible moment.

3. Native client tunneling for SSH, RDP, and file transfer

The browser experience is fine for a one-off. For engineers who live in a terminal, want scp, run Ansible, or need an RDP session richer than an HTML5 canvas, native client tunneling is what makes Bastion usable day to day. It requires Standard SKU or higher with tunneling enabled. There are three relevant subcommands, and the distinction matters — pick the right one:

Subcommand	What it does	Auth options	Best for	SKU
`az network bastion ssh`	Interactive SSH straight to a Linux VM	`AAD`, `ssh-key`, `password`	Quick terminal session, no local port	Standard+
`az network bastion tunnel`	Raw local TCP tunnel to any target port	n/a (transport only)	`scp`, DB clients, full RDP, anything	Standard+
`az network bastion rdp`	Launches native `mstsc` to a Windows VM	Windows creds	Native Windows RDP experience	Standard+

The --auth-type values for ssh, with their trade-offs:

`--auth-type`	Extra flags	What governs access	Trade-off
`AAD`	none	Entra RBAC + Conditional Access	No keys to manage; needs the AAD login extension on the VM and the VM login role
`ssh-key`	`--username`, `--ssh-key`	The key file	Familiar; key can walk out on a laptop
`password`	`--username`	Local credential	Simplest; weakest; avoid in prod

az network bastion ssh opens an interactive SSH session straight to a Linux VM by its resource ID — no public IP, no local port wrangling:

az network bastion ssh \
  --name "$BASTION" \
  --resource-group "$RG" \
  --target-resource-id "/subscriptions/<sub>/resourceGroups/rg-app/providers/Microsoft.Compute/virtualMachines/vm-linux-01" \
  --auth-type AAD

--auth-type AAD (Microsoft Entra login) is my default — access is governed by RBAC and Conditional Access instead of a key file that walks out the door on a laptop. It requires the AADSSHLoginForLinux VM extension and the Virtual Machine User Login role on the target.

az network bastion tunnel is the workhorse. It opens a raw local TCP tunnel to an arbitrary port on the target that you point any client at — real scp, a database client over the same broker, or a full RDP client:

# Open a local tunnel: localhost:50022 -> VM:22 through Bastion
az network bastion tunnel \
  --name "$BASTION" \
  --resource-group "$RG" \
  --target-resource-id "/subscriptions/<sub>/resourceGroups/rg-app/providers/Microsoft.Compute/virtualMachines/vm-linux-01" \
  --resource-port 22 \
  --port 50022

With that tunnel up, every ordinary tool just works against localhost:50022:

# In a second terminal — standard OpenSSH, standard scp, no Bastion awareness
ssh -p 50022 azureuser@127.0.0.1
scp -P 50022 ./deploy.tar.gz azureuser@127.0.0.1:/tmp/

# RDP example: tunnel 3389, then point mstsc at the local port
az network bastion tunnel -n "$BASTION" -g "$RG" \
  --target-resource-id "<vm-windows-id>" --resource-port 3389 --port 53389
# then: mstsc /v:localhost:53389

Common tunnel targets and the local-port convention people use — the tunnel is protocol-agnostic, so anything TCP works:

Target service	`--resource-port`	Typical local `--port`	Client you point at it
SSH	22	50022	`ssh -p 50022 user@127.0.0.1`
RDP	3389	53389	`mstsc /v:localhost:53389`
SQL Server	1433	51433	`sqlcmd -S 127.0.0.1,51433`
PostgreSQL	5432	55432	`psql -h 127.0.0.1 -p 55432`
WinRM (HTTPS)	5986	55986	PowerShell remoting
Custom app/admin	any	any	any TCP client

For Windows users who want the native RDP experience without managing a tunnel, az network bastion rdp launches mstsc directly:

az network bastion rdp `
  --name $Bastion `
  --resource-group $RG `
  --target-resource-id "<vm-windows-id>"

The tunnel runs only as long as the CLI process lives. For automation, background it and capture the PID so a pipeline step can tear it down deterministically rather than leaking an open broker session.

The native-client prerequisites people miss — verify all of these before debugging a “tunnel won’t open”:

Prerequisite	Check / fix	Symptom if missing
SKU is Standard+	`az network bastion show --query sku.name`	Subcommand errors “not supported on this SKU”
Tunneling enabled	`--query enableTunneling` is `true`	Subcommand errors even on Standard
Azure CLI ≥ 2.32 + SSH extension	`az extension add --name ssh`	`az network bastion ssh` not found
RBAC: Reader on Bastion + VM, NIC action	Role assignments	“Authorization failed” before connect
NSG allows the flow set	Section 7 table	Connect hangs / times out
For `--auth-type AAD`	AADSSHLogin extension + VM User Login role	Falls back / auth fails

4. Shareable links and IP-based connections

Two Standard-and-up features cover the awkward access scenarios that NSG rules cannot. They look similar but solve different problems — one is about who (a person with no Azure account), the other about what (a target that isn’t an Azure VM resource):

Feature	Solves	Auth against	Target identified by	Lifecycle risk
Shareable link	Third party with no Azure account	The target VM’s own creds	VM resource ID	A standing link = standing exposure
IP-based connection	Non-Azure / peered private-IP host	Whatever the host uses	Private IP address	Reaches anything routable from the subnet

Shareable links generate a URL that lets a user connect to a specific VM via RDP/SSH without an Azure account or portal access. They authenticate against the target VM’s own credentials (local username/password or key), not against Entra. This is the sane answer to “the vendor needs to RDP into the staging box for two days” — far better than cutting a temporary public IP and an NSG hole. Create the link scoped to one VM:

az network bastion create-shareable-link \
  --name "$BASTION" \
  --resource-group "$RG" \
  --vm-id "/subscriptions/<sub>/resourceGroups/rg-app/providers/Microsoft.Compute/virtualMachines/vm-staging-01"

When the engagement ends, revoke it — do not let it rot:

az network bastion delete-shareable-link \
  --name "$BASTION" --resource-group "$RG" \
  --vm-id "<vm-staging-01-id>"

# List standing links so you can audit and prune them on a schedule
az network bastion list-shareable-link \
  --name "$BASTION" --resource-group "$RG" -o table

The shareable-link governance rules — treat every link as a time-boxed grant:

Concern	Practice
Scope	One link per VM; never a blanket grant
Duration	Time-box with a calendar reminder or automation that deletes on schedule
Auth	The target VM’s own credentials — keep those strong and rotated
Audit	`BastionAuditLogs` records link sessions; review them
Revocation	`delete-shareable-link` the moment the engagement ends
Inventory	`list-shareable-link` periodically; prune anything stale

IP-based connections let Bastion reach a target by private IP rather than Azure resource ID. That unlocks non-Azure targets reachable over the same network fabric — on-premises servers across ExpressRoute/VPN, or VMs in a peered VNet — so the same broker serves your hybrid estate. Enable the feature on the host first:

az network bastion update \
  --name "$BASTION" --resource-group "$RG" \
  --enable-ip-connect true

# Then connect to a private IP (e.g. an on-prem host over ExpressRoute)
az network bastion ssh \
  --name "$BASTION" --resource-group "$RG" \
  --target-ip-address 10.50.4.20 \
  --auth-type ssh-key --username opsadmin --ssh-key ~/.ssh/onprem_ed25519

What IP-based connection can and cannot reach — the routability rule:

Target	Reachable by IP-connect?	Condition
VM in the same VNet	Yes	Routable from `AzureBastionSubnet`
VM in a directly peered spoke	Yes	Peering with forwarded traffic allowed
On-prem host over ExpressRoute/VPN	Yes	Route exists hub→on-prem; no NSG drop
Host in a VNet peered only to a spoke	No	Peering is non-transitive (Section 6)
Public internet host	No	Bastion brokers private targets only

5. Session recording, audit logging, and Just-in-Time

Session recording (Premium only) captures the graphical RDP/SSH session as video. On disconnect, recordings land in a blob container in your storage account via a SAS URL, and you replay them from the Bastion Session Recording blade. This is the artifact auditors ask for in PCI/HIPAA/SOC 2 estates: who connected to which host, when, and what they did on screen. Point it at an immutable, customer-managed-key storage account so the evidence cannot be tampered with after the fact.

What session recording captures and how to harden the destination:

Aspect	Detail	Hardening
What’s captured	Graphical RDP/SSH session as video	—
Where it lands	Blob container in your storage account	Lock down with private endpoint + RBAC
Delivery	SAS URL on disconnect	Short SAS lifetime; least-privilege
Tamper evidence	Blob immutability (WORM) policy	Time-based retention lock
Encryption	Customer-managed keys (CMK) in Key Vault	Rotate the key; restrict KV access
Replay	Bastion Session Recording blade	RBAC-gate who can replay
Gap	SSH text sessions captured as screen video, not keystroke log	Pair with guest-side `auditd`/transcript if you need text

For audit logging, every Bastion session emits a diagnostic event. Stream BastionAuditLogs to Log Analytics and you have the connection ledger:

az monitor diagnostic-settings create \
  --name diag-bastion \
  --resource "/subscriptions/<sub>/resourceGroups/$RG/providers/Microsoft.Network/bastionHosts/$BASTION" \
  --logs '[{"category":"BastionAuditLogs","enabled":true}]' \
  --workspace "/subscriptions/<sub>/resourceGroups/rg-monitor/providers/Microsoft.OperationalInsights/workspaces/law-platform"

resource bastionDiag 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
  name: 'diag-bastion'
  scope: bastion
  properties: {
    workspaceId: lawId
    logs: [ { category: 'BastionAuditLogs', enabled: true } ]
  }
}

Now you can ask real questions in KQL — for example, every session in the last day with source IP, target, and protocol:

BastionAuditLogs
| where TimeGenerated > ago(1d)
| extend p = parse_json(Properties)
| project TimeGenerated,
          UserName       = tostring(p.userName),
          ClientIp       = tostring(p.clientIpAddress),
          TargetVm       = tostring(p.targetVMIPAddress),
          Protocol       = tostring(p.protocol),
          Message        = tostring(p.message)
| order by TimeGenerated desc

The audit questions you’ll actually ask, and the one query shape for each:

Question	Filter / aggregation
Who connected in the last 24h?	`summarize by UserName, TargetVm`
Which targets are hit most?	`summarize count() by TargetVm
Any connections from an unexpected source IP?	`where ClientIp !in (<known ranges>)`
RDP vs SSH split	`summarize count() by Protocol`
Off-hours access (e.g. 00:00–05:00 UTC)	`where hourofday(TimeGenerated) between (0 .. 5)`
Failed / disconnected sessions	`where Message has_any ("failed","disconnect")`

Just-in-Time (JIT) VM access is complementary, and the pairing is the point. JIT (a Microsoft Defender for Cloud feature) keeps the VM’s management ports closed in the NSG and opens them only for an approved, time-boxed request from a specific source. Because Bastion connects from inside the VNet (its scale units sit in AzureBastionSubnet), your JIT rule grants that subnet rather than an engineer’s roaming public IP — so the port opens just-in-time and only to the broker, never to the internet.

# Request JIT access; the allowed source is the Bastion subnet range, not a public IP
az security jit-policy initiate \
  --resource-group rg-app \
  --location "$LOC" \
  --name default \
  --vm-id "/subscriptions/<sub>/resourceGroups/rg-app/providers/Microsoft.Compute/virtualMachines/vm-linux-01" \
  --ports '[{"number":22,"duration":"PT2H","allowedSourceAddressPrefix":"10.0.255.0/26"}]'

The Bastion + JIT pairing, knob by knob:

JIT field	Value with Bastion	Why
`allowedSourceAddressPrefix`	The `AzureBastionSubnet` CIDR (e.g. `10.0.255.0/26`)	Opens the port to the broker, never a roaming IP
`number`	22 (SSH) / 3389 (RDP)	The management port the NSG keeps shut by default
`duration`	`PT1H`–`PT2H` typical	Time-box; auto-closes after
Approval	Defender request (optionally with approver)	Adds a human gate to access
NSG default	Deny `22`/`3389` inbound	Ports closed until JIT opens them

6. Hub-and-spoke reuse with peering and centralized Bastion

You do not deploy a Bastion per VNet — that multiplies cost and operational surface for no benefit. Deploy one Bastion in the hub and let peered spokes ride it. Because Standard/Premium honour VNet peering, a centralized host reaches VMs in every directly connected spoke.

Per-spoke versus centralized, side by side:

Dimension	Bastion per spoke	One Bastion in the hub
Hosts to operate	N	1
Hourly + scale-unit cost	N ×	1 ×
`AzureBastionSubnet`s to manage	N	1
NSG flow sets to maintain	N	1
Audit/log streams	N	1 (central)
Reach	Each VNet only	All directly peered spokes
Recommended	No	Yes

Two requirements make the centralized model work:

Requirement	Setting	If missing
Peering both directions	`--allow-vnet-access true` on both sides	Spoke unreachable
Forwarded traffic allowed	`--allow-forwarded-traffic true` on both sides	Brokered traffic dropped transiting the hub
No NSG drop hub→spoke	Allow brokered `22`/`3389` from the subnet	Connect times out
Direct (single-hop) peering	Spoke peered to the hub, not via another spoke	Non-transitive: Bastion can’t reach it

# Hub <-> spoke peering, both directions, forwarded traffic allowed
az network vnet peering create \
  --name hub-to-spoke-app \
  --resource-group rg-hub-network \
  --vnet-name vnet-hub \
  --remote-vnet "/subscriptions/<sub>/resourceGroups/rg-spoke-app/providers/Microsoft.Network/virtualNetworks/vnet-spoke-app" \
  --allow-vnet-access true \
  --allow-forwarded-traffic true

az network vnet peering create \
  --name spoke-app-to-hub \
  --resource-group rg-spoke-app \
  --vnet-name vnet-spoke-app \
  --remote-vnet "/subscriptions/<sub>/resourceGroups/rg-hub-network/providers/Microsoft.Network/virtualNetworks/vnet-hub" \
  --allow-vnet-access true \
  --allow-forwarded-traffic true

One caveat worth flagging loudly: Bastion does not traverse a second hop. Peering is non-transitive — if a spoke is peered to the hub but the actual VM lives in a VNet peered only to that spoke, Bastion will not reach it. The reachability matrix:

Topology	Bastion in hub reaches it?	Fix if no
VM in the hub VNet	Yes	—
VM in a spoke directly peered to the hub	Yes	—
VM in a spoke peered only to another spoke	No	Peer that spoke directly to the hub
VM behind a VNet peered to a spoke (2nd hop)	No	Direct hub peering, or use Virtual WAN routing
On-prem host over ExpressRoute from hub	Yes (IP-connect)	Ensure route + no NSG drop

Connect spokes to the hub directly, or for a genuinely meshed estate, move to Hub-spoke vs Virtual WAN enterprise topology where the managed hub handles the routing.

7. Hardening: NSGs, Conditional Access, and RBAC

Bastion shrinks the attack surface, but it is not a free pass. Three layers, each with its own table.

NSG on AzureBastionSubnet

Bastion requires a specific set of flows, and an over-zealous NSG will silently break it — the session simply hangs with no obvious error. The mandatory flow set, exhaustively:

Direction	Priority (suggested)	Source	Source port	Destination	Dest port	Why
Inbound	120	`Internet`	*	`AzureBastionSubnet`	443	HTTPS from clients + control plane
Inbound	130	`GatewayManager`	*	`AzureBastionSubnet`	443, 4443	Control-plane management
Inbound	140	`AzureLoadBalancer`	*	`AzureBastionSubnet`	443	Health probes
Inbound	150	`VirtualNetwork`	*	`AzureBastionSubnet`	8080, 5701	Data-plane between instances
Outbound	100	`AzureBastionSubnet`	*	`VirtualNetwork`	22, 3389	Reach target VMs
Outbound	110	`AzureBastionSubnet`	*	`AzureCloud`	443	Dependencies (diagnostics, etc.)
Outbound	120	`AzureBastionSubnet`	*	`VirtualNetwork`	8080, 5701	Data-plane between instances
Outbound	130	`AzureBastionSubnet`	*	`Internet`	80	Session/cert validation

az network nsg rule create -g "$RG" --nsg-name nsg-bastion \
  --name Allow-HTTPS-Inbound --priority 120 --direction Inbound --access Allow \
  --protocol Tcp --source-address-prefixes Internet \
  --destination-port-ranges 443 --destination-address-prefixes '*'

az network nsg rule create -g "$RG" --nsg-name nsg-bastion \
  --name Allow-GatewayManager-Inbound --priority 130 --direction Inbound --access Allow \
  --protocol Tcp --source-address-prefixes GatewayManager \
  --destination-port-ranges 443 4443 --destination-address-prefixes '*'

az network nsg rule create -g "$RG" --nsg-name nsg-bastion \
  --name Allow-SshRdp-Outbound --priority 100 --direction Outbound --access Allow \
  --protocol Tcp --source-address-prefixes '*' \
  --destination-port-ranges 22 3389 --destination-address-prefixes VirtualNetwork

The NSG failure modes — match the symptom to the missing rule:

Symptom	Missing / wrong rule	Confirm	Fix
Portal “Connect” hangs, never loads	Inbound 443 from `Internet` blocked	NSG effective rules	Allow 443 inbound from `Internet`
Deploy succeeds but no sessions work	Inbound 443/4443 from `GatewayManager` blocked	Effective rules	Allow `GatewayManager` 443,4443
Sessions drop intermittently	Inter-instance 8080/5701 blocked	Effective rules	Allow `VirtualNetwork` 8080,5701 both ways
Connects to portal, can’t reach VM	Outbound 22/3389 to `VirtualNetwork` blocked	Effective rules	Allow outbound 22,3389 to `VirtualNetwork`
Flaky after a UDR change	0.0.0.0/0 route to an NVA blackholes egress	`az network nic show-effective-route-table`	Exempt Bastion egress from force-tunnel

Conditional Access

Native client and Entra-based SSH authenticate through Microsoft Entra ID, which means Conditional Access applies. Require MFA and a compliant device on the Azure management surface and you have gated every native Bastion session behind your phishing-resistant posture — without touching a single VM. What CA can enforce on the session path:

CA control	Effect on Bastion session	Notes
Require MFA	Native/AAD session needs MFA	Gates the management plane
Require compliant / hybrid-joined device	Block sessions from unmanaged laptops	Strong control for admin access
Block legacy auth	Removes weak auth paths	Baseline hygiene
Named locations / IP ranges	Restrict where sessions originate	Combine with phishing-resistant MFA
Sign-in risk (Identity Protection)	Step-up or block risky sessions	Needs Entra ID P2
Session controls (sign-in frequency)	Re-auth for long sessions	Limits stale-session risk

RBAC

The ability to connect is an RBAC outcome. A user needs Reader on the Bastion, Reader on the VM, and the relevant data-plane action on the NIC; for Entra SSH/RDP they also need the VM login role. The least-privilege role set:

Action the user needs	Role / permission	Scope	Don’t over-grant
See and use Bastion	`Reader`	Bastion resource	—
See the target VM	`Reader`	VM	—
Connect through the NIC	`…/virtualNetworks/subnets/...` + NIC read action (custom connect role)	RG / VM	Not Contributor
SSH/RDP as a user (Entra)	`Virtual Machine User Login`	VM	Prefer over Admin Login
SSH/RDP as an admin (Entra)	`Virtual Machine Administrator Login`	VM	Only where truly needed
Manage shareable links	Bastion write actions	Bastion	Restrict to platform team

Scope Reader plus a custom connect role at the resource-group level; do not hand out Virtual Machine Administrator Login where Virtual Machine User Login will do. Grant the login role through PIM so even the connect right is itself just-in-time — see PIM for Azure resources. For the broader access model this sits inside, Microsoft Entra RBAC governance deep dive is the parent.

8. Cost optimization and decommissioning the jump boxes

Bastion bills an hourly host rate plus scale-unit and data charges (the first 5 GB/month of outbound is free). The cost levers — pull these before you pay for headroom you don’t use:

Lever	Action	Effect
One host in a hub	Replace N per-spoke hosts with 1 centralized	N× → 1× hourly + scale-unit
Right-size scale units	Set `--scale-units` to observed peak concurrency	Stop paying for idle instances
Developer SKU for sandboxes	Use the free tier where peering isn’t needed	Genuinely free
Kill workload public IPs	Strip standing public IPs once Bastion proven	Removes billed IPs and attack surface
Delete jump-box VMs	Decommission the compute you no longer run	Removes 24/7 VM + IP cost
Standard over Premium	Drop to Standard if you owe no recording/private-only	Lower hourly rate

The wins, in prose: one Premium host in a hub is cheaper and safer than N Basic hosts across spokes — and the N jump-box VMs you delete were each costing compute 24/7 plus their public IPs. Right-size scale units — do not run 50 instances for a team of five; set --scale-units to observed peak concurrency. Developer SKU for sandboxes that do not need peering reach is genuinely free. And kill the public IPs — every standing public IP on a workload is a billed resource and an attack surface.

Decommission a legacy jump box methodically — the order matters so you can roll back if something was missed:

Step	Action	Why this order
1	Stand up Bastion (right SKU, subnet, NSG, scale units)	The replacement must exist first
2	Grant RBAC + (optionally) Entra VM login to users	They can’t cut over without access
3	Cut users over to Bastion for daily access	Prove the new path under real use
4	Confirm `BastionAuditLogs` shows them connecting	Evidence the path works before you remove the old one
5	Remove the jump box’s NSG inbound rules	Close the internet path
6	Dissociate the public IP from the VM NIC	Reversible if you missed a workflow
7	Delete the public IP, then the jump-box VM	Final cleanup once proven

# Strip the public IP off a VM NIC once Bastion access is proven
az network nic ip-config update \
  --resource-group rg-app --nic-name nic-jumpbox-01 \
  --name ipconfig1 --remove publicIpAddress

az network public-ip delete -g rg-app --name pip-jumpbox-01

Architecture at a glance

The diagram traces a native-client connection as it actually flows, left to right, and marks the four hops where a session most commonly breaks. Read it this way: an engineer’s CLI (or the portal) opens a session over TLS 443 — the only inbound surface — which is gated first by Microsoft Entra (RBAC, Conditional Access, PIM) before anything reaches the network. The request lands on the Bastion host in AzureBastionSubnet (/26, Standard SKU, tunneling on, scale units sized to concurrency), whose NSG must permit the precise 443/4443 inbound and 22/3389 outbound flow set or the session hangs silently. From the subnet, Bastion brokers over the VM’s private IP — across VNet peering (forwarded-traffic on, single hop only) when the target is a spoke — to the workload VM, which now carries no public IP and keeps 22/3389 shut except when Defender JIT opens them to the broker subnet alone. The control path also fans to storage (session recordings, immutable + CMK on Premium) and Log Analytics (BastionAuditLogs), which is how the loop closes with your auditor.

The numbered badges sit on the failure-prone hops: an Entra/RBAC denial before connect, the NSG flow set that breaks the broker, the non-transitive peering hop that strands a spoke VM, and the JIT/private-IP contract on the target. The legend narrates each as symptom · how to confirm · fix — the same method as the rest of this guide: localise the break to a hop, confirm with the named command, apply the fix. The first question on any hung session is always “did identity allow it, and does the NSG let the broker through?”

Real-world scenario

A payments platform team I worked with — call them Meridian Pay — ran a hub-and-spoke estate across three regions with roughly 400 VMs, and PCI-DSS forced two hard constraints: no workload VM may carry a public IP, and every interactive admin session must be recorded and retained. Their interim state was four per-region jump boxes — internet-facing, RDP open behind NSGs, each on a Standard_D2s_v5 costing ~₹9,000/month plus its public IP — and the QSA flagged them as in-scope cardholder-data-environment (CDE) ingress with no session evidence. Twelve jump boxes across three regions, twelve public IPs, twelve hosts to patch, and a finding that would not close.

We collapsed all four jump boxes per region into a single Premium Bastion in each regional hub (Premium for the session-recording requirement) with --scale-units 8 to cover ~120 concurrent operators per region during a release window. The spokes were already peered to the hub, so the only networking change was confirming --allow-forwarded-traffic true on both sides of each peering — no new subnets beyond the three AzureBastionSubnet /26s in the hubs. Session recordings were written to a storage account with a time-based immutability (WORM) policy and customer-managed keys, satisfying the tamper-evidence requirement, and BastionAuditLogs streamed to a central Log Analytics workspace gave the QSA the connection ledger they wanted, queryable by user, target and time.

The sharp edge was that the QSA also required admin ports stay closed except during approved access — recording alone was not enough. We wired Defender for Cloud JIT so the NSG kept 22/3389 shut, and the JIT grant opened them only to the hub’s AzureBastionSubnet range, never to a public source. Because Bastion brokers from inside the VNet, the source prefix on the JIT rule was the subnet, not an engineer’s roaming IP:

az security jit-policy initiate \
  --resource-group rg-spoke-payments --location eastus --name default \
  --vm-id "/subscriptions/<sub>/resourceGroups/rg-spoke-payments/providers/Microsoft.Compute/virtualMachines/vm-pay-07" \
  --ports '[{"number":3389,"duration":"PT1H","allowedSourceAddressPrefix":"10.0.255.0/26"}]'

One non-transitive-peering trap nearly bit them: a late-discovered “analytics” VNet was peered only to the payments spoke, not the hub, so the hub Bastion could not reach its two VMs — sessions just timed out with no error. The fix was a direct hub↔analytics peering with forwarded traffic, after which the host reached them immediately. The lesson went on the runbook: “If a VM is two peering hops from the hub, Bastion can’t see it — peer it directly.”

The net result: zero public IPs on workloads, ports closed by default and opened just-in-time to the broker subnet alone, full session video retained immutably, and twelve internet-facing jump boxes deleted across three regions. The CDE-ingress finding closed at the next assessment, the standing public-IP cost went with it, and the monthly spend dropped — three Premium Bastions cost meaningfully less than twelve always-on jump-box VMs plus their IPs and the patching toil around them. The before/after:

Dimension	Before (jump boxes)	After (Bastion)
Internet-facing hosts	12 (4 × 3 regions)	0
Public IPs on the access path	12	0 (Premium private-only)
Workload public IPs	several	0
Admin port exposure	`3389` open behind NSG 24/7	Closed; JIT opens to broker subnet only
Session evidence	None (host-local logs)	Immutable CMK video + central audit log
Hosts to patch	12	3 (managed PaaS)
QSA finding	Open (CDE ingress)	Closed

Advantages and disadvantages

Bastion’s managed-broker-inside-the-VNet model both removes a class of risk and introduces a few sharp edges. Weigh it honestly:

Advantages (why this model helps you)	Disadvantages (why it bites)
No public IP or open `3389`/`22` on workloads — the whole exposure class disappears	The Bastion host itself is a public surface on `443` unless you pay for Premium private-only
Agentless and managed — Microsoft patches the broker, not you	You give up the simplicity (and the cost) of a single VM you fully control
Native client tunneling runs real `scp`/Ansible/`mstsc` — not just an HTML5 canvas	Tunneling needs Standard+ and an explicit flag and the SSH CLI extension — easy to miss
Shareable links grant vendor access with no Azure account or NSG hole	A standing shareable link is a standing exposure; you must time-box and revoke
Sessions authenticate through Entra → Conditional Access + PIM apply estate-wide	The SKU is a one-way ratchet; a wrong choice means delete-and-redeploy
Premium records sessions to immutable CMK storage for audit	Recording is graphical video, not a keystroke/text log — pair with guest auditing for text
One host in a hub serves all directly-peered spokes	Peering is non-transitive — a second hop strands the target with a silent timeout
`BastionAuditLogs` gives a central, queryable connection ledger	Defaults are unsafe: tunneling off, no NSG tuning, broad RBAC — you must turn the knobs

The model is right for any estate that wants no public IPs on workloads and identity-governed, auditable admin access — which is most regulated and most security-mature shops. It is overkill for a single throwaway dev VNet where the free Developer SKU or even a short-lived jump box suffices. The disadvantages are all manageable, but only if you know they exist: the SKU ratchet, the explicit tunneling flag, the NSG flow set, the non-transitive hop, and the standing-link risk are exactly the things defaults will not handle for you.

Hands-on lab

Stand up a Standard Bastion, connect to a Linux VM with native tunneling, prove scp works, then tear it all down. Free-tier-friendly where possible (the VM is a small B1s; Bastion Standard bills hourly, so delete at the end). Run in Cloud Shell (Bash).

Step 1 — Variables and resource group.

RG=rg-bastion-lab
LOC=eastus
VNET=vnet-lab
BASTION=bastion-lab
VM=vm-linux-lab
az group create -n $RG -l $LOC -o table

Step 2 — VNet with a workload subnet and the mandatory AzureBastionSubnet (/26).

az network vnet create -g $RG -n $VNET --address-prefixes 10.0.0.0/16 \
  --subnet-name snet-workload --subnet-prefixes 10.0.1.0/24 -o table

az network vnet subnet create -g $RG --vnet-name $VNET \
  --name AzureBastionSubnet --address-prefixes 10.0.255.0/26 -o table

Expected: the VNet plus two subnets; the Bastion subnet named exactly AzureBastionSubnet.

Step 3 — A Linux VM with NO public IP (the whole point).

az vm create -g $RG -n $VM --image Ubuntu2204 --size Standard_B1s \
  --vnet-name $VNET --subnet snet-workload \
  --public-ip-address "" \
  --admin-username azureuser --generate-ssh-keys -o table

--public-ip-address "" ensures the VM is private-only. Expected: a VM with a private IP and "publicIpAddress": "".

Step 4 — Standard public IP for Bastion (Static), then the Bastion host with tunneling on.

az network public-ip create -g $RG -n pip-bastion-lab \
  --sku Standard --allocation-method Static -o table

az network bastion create -g $RG -n $BASTION \
  --vnet-name $VNET --public-ip-address pip-bastion-lab \
  --sku Standard --scale-units 2 --enable-tunneling true -o table

Bastion takes ~5–10 minutes to provision. Expected when done: "sku": {"name": "Standard"}, "enableTunneling": true.

Step 5 — Open a native tunnel and prove scp through the broker.

# In one terminal: localhost:50022 -> VM:22 through Bastion (leave it running)
VMID=$(az vm show -g $RG -n $VM --query id -o tsv)
az network bastion tunnel -n $BASTION -g $RG \
  --target-resource-id "$VMID" --resource-port 22 --port 50022

# In a second Cloud Shell tab: standard ssh + scp, no public IP anywhere
ssh -p 50022 azureuser@127.0.0.1 'hostname && echo connected-via-bastion'
echo "hello from bastion" > /tmp/proof.txt
scp -P 50022 /tmp/proof.txt azureuser@127.0.0.1:/tmp/proof.txt
ssh -p 50022 azureuser@127.0.0.1 'cat /tmp/proof.txt'

Expected: the VM hostname prints, connected-via-bastion, and hello from bastion round-trips back — a file copied to a VM that has no public IP and no inbound 22 from the internet.

Step 6 — Turn on the audit ledger (optional but instructive).

LAW=$(az monitor log-analytics workspace create -g $RG -n law-bastion-lab --query id -o tsv)
az monitor diagnostic-settings create --name diag-bastion \
  --resource $(az network bastion show -g $RG -n $BASTION --query id -o tsv) \
  --logs '[{"category":"BastionAuditLogs","enabled":true}]' \
  --workspace "$LAW"
# Reconnect once, wait a few minutes, then query BastionAuditLogs in the workspace.

Step 7 — Teardown (do this — Bastion bills hourly).

az group delete -n $RG --yes --no-wait

The lab teardown checklist, so nothing is left billing:

Resource	Bills while it exists?	Removed by `group delete`?
Bastion host (Standard)	Yes (hourly + scale units)	Yes
Public IP (Standard)	Yes (hourly)	Yes
Linux VM (`B1s`)	Yes (compute)	Yes
VNet + subnets	No	Yes
Log Analytics workspace	Yes (ingestion/retention)	Yes

Common mistakes & troubleshooting

The failure modes that actually page you, as a symptom→root-cause→confirm→fix playbook. Scan the matrix, then read the detail for whichever row matches.

#	Symptom	Root cause	Confirm (exact command / portal path)	Fix
1	Bastion won’t deploy	Subnet not named `AzureBastionSubnet` or smaller than `/26`	`az network vnet subnet show -n AzureBastionSubnet --query addressPrefix`	Recreate subnet: exact name, `/26`+
2	“Connect” hangs in the portal	NSG blocks inbound `443` from `Internet`/`GatewayManager`	NSG effective rules on the subnet	Add the mandatory inbound flow set (Section 7)
3	`az network bastion ssh/tunnel` errors	Tunneling not enabled, or SKU is Basic	`az network bastion show --query "{sku:sku.name,tun:enableTunneling}"`	`--enable-tunneling true` on Standard+
4	Portal works, but can’t reach the VM	NSG blocks outbound `22`/`3389` to `VirtualNetwork`	Effective rules; `az network nic show-effective-route-table`	Allow outbound `22`/`3389` to `VirtualNetwork`
5	Spoke VM unreachable, times out	Non-transitive peering (2nd hop) or no forwarded traffic	`az network vnet peering list` — check both flags	Direct hub peering + `--allow-forwarded-traffic true`
6	“Authorization failed” before connect	Missing RBAC (Reader on Bastion/VM or NIC action)	`az role assignment list --assignee <user>`	Grant Reader + custom connect role
7	Entra SSH fails, key prompt instead	Missing AADSSHLogin extension or VM login role	`az vm extension list`; check role assignments	Install extension + grant VM User Login
8	Shareable link 404s / won’t connect	Link revoked, or VM’s local creds wrong	`az network bastion list-shareable-link`	Recreate link; verify the VM’s local credentials
9	No `BastionAuditLogs` rows	Diagnostic setting missing or wrong category	`az monitor diagnostic-settings list --resource <bastion-id>`	Create setting with `BastionAuditLogs` enabled
10	Session recording empty (Premium)	Storage target misconfigured / SAS/permission issue	Session Recording blade; storage container	Fix storage target, identity, container access
11	Sessions drop randomly mid-work	Inter-instance `8080`/`5701` blocked, or zone event	Effective rules; Resource Health	Allow `VirtualNetwork` 8080/5701; deploy zonal
12	Flaky right after a routing change	`0.0.0.0/0` UDR to an NVA blackholes Bastion egress	`az network nic show-effective-route-table`	Exempt Bastion egress from force-tunnel

Mistake 1 — The subnet is wrong

The single most common deploy blocker. The subnet must be named exactly AzureBastionSubnet and be /26 or larger. A typo’d name, a /27, or other resources in the subnet all stop the deploy.

Confirm. az network vnet subnet show -g $RG --vnet-name $VNET -n AzureBastionSubnet --query addressPrefix -o tsv — if this errors, the name is wrong; if it returns a /27 or smaller, the size is wrong. Fix: recreate the subnet with the exact name and /26. You cannot grow a /27 in place into a usable Bastion subnet; delete and recreate.

Mistake 2 — The NSG silently breaks the broker

Bastion needs the precise inbound flow set (443 from Internet, 443/4443 from GatewayManager, probes from AzureLoadBalancer, inter-instance 8080/5701 from VirtualNetwork). An NSG that allows less leaves sessions hanging with no clear error.

Confirm. On the subnet’s NSG, check effective rules (portal: subnet → NSG → Effective rules), or az network nsg rule list. Fix: add the mandatory inbound and outbound rules from Section 7. The give-away is that deploy succeeded but no session ever lands — control-plane flows (GatewayManager 443/4443) are blocked.

Mistake 3 — Native subcommands fail on the right-looking host

az network bastion ssh/tunnel/rdp need Standard+ and the tunneling flag and the Azure CLI SSH extension. People deploy Standard, forget --enable-tunneling true, and the subcommands error.

Confirm. az network bastion show -g $RG -n $BASTION --query "{sku:sku.name, tunneling:enableTunneling}" -o json — you want Standard/Premium and true. Fix: az network bastion update -g $RG -n $BASTION --enable-tunneling true and az extension add --name ssh.

Mistake 5 — The non-transitive peering hop

A spoke peered to the hub is reachable; a VM two hops away (in a VNet peered only to a spoke) is not — and it fails as a silent timeout, which sends people hunting the wrong layer for an hour.

Confirm. az network vnet peering list -g <rg> --vnet-name <vnet> -o table — verify the target’s VNet is peered directly to the hub and both sides have allowForwardedTraffic: true. Fix: create a direct hub↔target peering with forwarded traffic, or move meshed routing to Virtual WAN.

Mistake 9 — The audit ledger is empty

BastionAuditLogs only flow if a diagnostic setting routes them. No setting, no rows — and you discover this when the auditor asks for evidence you never captured.

Confirm. az monitor diagnostic-settings list --resource $(az network bastion show -g $RG -n $BASTION --query id -o tsv) -o json. Fix: create the setting with the BastionAuditLogs category enabled (Section 5), then reconnect once and wait a few minutes for ingestion.

Best practices

Pick the SKU deliberately and once. Standard is the platform baseline; go Premium only for session recording or private-only. Remember it is a one-way upgrade — get it right before you build around it.
Subnet /26, exact name, nothing else. Lay AzureBastionSubnet at /26 from day one so scale units have room; never co-locate other resources.
Deploy zone-redundant. Pin instances across zones 1/2/3 where supported; zones are immutable, and a zonal event must not sever your break-glass access.
Enable tunneling at create time. Set --enable-tunneling true so native ssh/tunnel/rdp work — the single most-forgotten flag.
Size scale units to peak concurrency, not VM count. ~20 RDP / ~40 SSH per unit; right-size and stop paying for idle instances.
One host in a hub, not one per VNet. Centralize in the hub, peer spokes directly with forwarded traffic, and never rely on a transitive second hop.
Tune the NSG to the exact flow set. Allow precisely the required 443/4443/8080/5701 inbound and 22/3389/443/80 outbound — no more, no less.
Authenticate with Entra where you can. Prefer --auth-type AAD so Conditional Access and PIM govern the session and there are no SSH keys to leak.
Time-box every shareable link. Scope per-VM, set a revocation reminder or automation, and audit standing links regularly.
Pair Bastion with Defender JIT. Keep 22/3389 closed in the NSG and open them just-in-time to the AzureBastionSubnet range only.
Stream BastionAuditLogs and (Premium) record sessions. Send the ledger to Log Analytics and write recordings to immutable, CMK-encrypted storage.
Decommission jump boxes in order. Prove Bastion access in the audit log before you dissociate and delete any public IP, so a rollback is always possible.

Security notes

Bastion is itself a security control, so harden it as one. Network isolation: the broker lives in the VNet and reaches targets over private IPs — strip every workload public IP and keep 22/3389 shut, opening them only via JIT to the broker subnet; on Premium, run the host private-only so even 443 is not internet-facing. Identity is the real perimeter: the right to connect is RBAC, so grant least privilege (Reader + a custom connect role, Virtual Machine User Login over Admin Login), put the login role behind PIM, and gate native sessions with Conditional Access requiring MFA and a compliant device. Encryption and evidence: Bastion brokers RDP/SSH over TLS, and on Premium session recordings should land in storage with a time-based immutability (WORM) policy and customer-managed keys so the audit trail cannot be altered after the fact. Least exposure for third parties: prefer time-boxed shareable links over any public-IP/NSG hole, scope them to one VM, and revoke on completion. Audit everything: BastionAuditLogs to a central workspace gives the who/when/what ledger; alert on off-hours or unexpected-source connections. This fits the broader Azure Zero Trust multilayer security model — “no public IPs, identity-governed, audited access” is precisely the network-and-access pillar of Zero Trust.

The security-control checklist, each with its lever:

Control objective	Bastion lever	Verify with
No public IP on workloads	Private-IP brokering; strip VM IPs	`az network nic ip-config show`
No public IP on the broker	Premium private-only deployment	Bastion config
Management ports closed by default	Defender JIT to subnet range	NSG rules; JIT policy
Least-privilege connect	Reader + custom role; VM User Login	`az role assignment list`
Just-in-time elevation	PIM on the VM login role	PIM blade
Phishing-resistant session access	Conditional Access (MFA + compliant device)	CA policy
Tamper-evident session evidence	Premium recording + WORM + CMK	Storage immutability policy
Central audit ledger	`BastionAuditLogs` to Log Analytics	Diagnostic settings

Cost & sizing

Bastion bills three things: an hourly host rate (per SKU), a per-scale-unit hourly rate above the included instances, and outbound data (first 5 GB/month free). The host runs 24/7 once deployed — it does not auto-pause — so the dominant lever is do you need a host at all in this VNet, answered by centralizing in the hub. What drives the bill:

Cost driver	Scales with	Lever to control it
Hourly host rate	SKU tier (Basic < Standard < Premium)	Use the lowest SKU that meets requirements
Scale-unit hours	`--scale-units` above the included count	Right-size to peak concurrency
Outbound data	GB transferred (after 5 GB free)	Usually negligible for admin sessions
Number of hosts	One per VNet vs one per hub	Centralize: one host serves all spokes
Workload public IPs (saved)	IPs you delete	Stripping them reduces cost

Rough figures (illustrative; check the Azure pricing calculator for your region). The right-sizing rule: pick the SKU by feature need and the scale units by peak concurrent sessions ÷ ~20 (RDP) or ÷ ~40 (SSH), then round up by one for headroom.

Scenario	SKU	Scale units	Rough order of monthly cost	Note
Personal dev sandbox	Developer	n/a	Free	No peering; single session
Small team, one VNet, browser-only	Basic	2 (fixed)	Low (≈ a small VM)	No native client
Platform baseline, a few spokes	Standard	2–4	Moderate	Native client, links, IP-connect
Regional hub, release windows	Standard/Premium	8	Higher (Premium adds recording)	~160 concurrent RDP
Regulated estate, recorded sessions	Premium	8–10	Highest host rate + storage	WORM + CMK storage adds a little

The savings side is easy to forget and often net-positive: deleting N jump-box VMs (each a 24/7 D2s-class VM plus its public IP) and stripping workload public IPs frequently outweighs the Bastion host cost, especially when one centralized host replaces several jump boxes. Free-tier note: the Developer SKU is genuinely free but cannot traverse peering — it is a sandbox tool, not platform infrastructure. For the broader picture of right-sizing shared platform services, see Azure FinOps & cost management at scale.

Interview & exam questions

Q1. Why does an Azure Bastion deployment require a subnet named exactly AzureBastionSubnet, and what’s the minimum size? The platform identifies the subnet by that literal name — it will not deploy into a differently named subnet. The minimum size for any Bastion created on or after 2 November 2021 is /26; smaller (e.g. /27) is rejected, and a grandfathered /27 cannot scale host instances. Maps to AZ-700 / AZ-500 networking objectives.

Q2. You deployed Standard Bastion but az network bastion ssh fails. What’s the most likely cause? Tunneling is not enabled. Native client subcommands need Standard+ and --enable-tunneling true (plus the Azure CLI SSH extension). Set the flag with az network bastion update --enable-tunneling true and confirm enableTunneling: true.

Q3. Can you downgrade a Bastion from Premium to Standard? No. SKU changes are upgrade-only (Basic→Standard→Premium). To move to a lower tier you must delete and redeploy. This is why the SKU choice must be deliberate up front.

Q4. A vendor with no Azure account needs RDP to one staging VM for two days. What’s the right Bastion feature, and what does it authenticate against? A shareable link (Standard+), scoped to that one VM. It authenticates against the target VM’s own credentials (local username/password or key), not Entra. Time-box it and revoke with delete-shareable-link when the engagement ends.

Q5. Your hub Bastion can’t reach a VM in a spoke. The spoke is peered, but only to another spoke. Why does it fail, and how do you fix it? VNet peering is non-transitive — Bastion does not traverse a second hop. The VM’s VNet must be peered directly to the hub (with allowForwardedTraffic on both sides), or you move meshed routing to Virtual WAN. The failure presents as a silent timeout.

Q6. How do you keep a VM’s management ports closed yet still let Bastion connect? Pair Bastion with Defender for Cloud JIT: the NSG denies 22/3389 by default, and the JIT grant opens them time-boxed to the AzureBastionSubnet CIDR only — never a roaming public IP — because Bastion brokers from inside the VNet.

Q7. Which SKU is required for session recording, and where do recordings land? Premium. On disconnect, recordings are written as video to a blob container in your storage account via a SAS URL; you replay them from the Session Recording blade. Harden the storage with a WORM immutability policy and customer-managed keys.

Q8. What’s the minimum RBAC for a user to connect to a VM through Bastion? Reader on the Bastion, Reader on the VM, and the NIC data-plane action (typically via a custom connect role). For Entra-based SSH/RDP, add Virtual Machine User Login (or Administrator Login only where truly needed). Prefer granting the login role through PIM.

Q9. How does native client tunneling differ from the browser session, and which command gives a raw TCP tunnel? The browser is an HTML5 canvas; native tunneling opens a local connection through the broker for real scp/Ansible/mstsc. az network bastion tunnel gives a raw local TCP tunnel to any target port that any client can use; ssh and rdp are higher-level conveniences.

Q10. Why is “one Bastion per VNet” an anti-pattern, and what’s the alternative? It multiplies hourly cost, subnets, NSGs, and audit streams for no benefit. Deploy one host in the hub; because Standard/Premium honour peering, it reaches VMs in every directly-peered spoke. Confirm --allow-forwarded-traffic true on both sides of each peering.

Q11. Which NSG flows are mandatory on AzureBastionSubnet? Inbound: 443 from Internet, 443/4443 from GatewayManager, 443 from AzureLoadBalancer, and 8080/5701 from VirtualNetwork. Outbound: 22/3389 to VirtualNetwork, 443 to AzureCloud, 8080/5701 to VirtualNetwork, and 80 to Internet. Missing the GatewayManager flow is the classic silent break.

Q12. How do native Bastion sessions inherit your organization’s MFA posture without touching the VMs? Native client and Entra-based SSH/RDP authenticate through Microsoft Entra ID, so Conditional Access applies to the management plane — require MFA and a compliant device once, and every native Bastion session is gated, with no per-VM changes. Maps to SC-300 / AZ-500.

Quick check

What is the exact required name and minimum size of the Bastion subnet?
Which SKU is the floor for native client tunneling, shareable links, and IP-based connections?
Why does a VM two peering hops from the hub fail to connect through a hub Bastion?
When you pair Bastion with Defender JIT, what source prefix does the JIT rule grant?
Which SKU is required for session recording, and how should the destination storage be hardened?

Answers

AzureBastionSubnet, /26 or larger. The platform keys off the literal name; /27 is rejected (and a grandfathered /27 can’t scale instances).
Standard. Native tunneling, custom ports, file transfer, shareable links, and IP-based connection all start at Standard; Basic has none of them.
VNet peering is non-transitive — Bastion doesn’t traverse a second hop. Peer that VNet directly to the hub (with forwarded traffic), or use Virtual WAN routing.
The AzureBastionSubnet CIDR (e.g. 10.0.255.0/26), because Bastion brokers from inside the VNet — never an engineer’s roaming public IP.
Premium. Write recordings to a storage account with a time-based immutability (WORM) policy and customer-managed keys (CMK) so the evidence is tamper-evident.

Glossary

Azure Bastion — A managed, agentless PaaS service that brokers RDP/SSH to your VMs over TLS 443, so the VMs need no public IP or inbound management ports.
AzureBastionSubnet — The mandatory, exactly-named, /26-minimum dedicated subnet the Bastion host lives in; holds nothing else.
Scale unit (instance) — One managed VM behind the Bastion service; each handles ~20 concurrent RDP or ~40 concurrent SSH sessions.
SKU tier — Developer / Basic / Standard / Premium; an upgrade-only ratchet that gates which features exist.
Native client tunneling — Connecting from the local CLI (az network bastion ssh/tunnel/rdp) through the broker so real scp/Ansible/mstsc work; Standard+, requires --enable-tunneling.
Shareable link — A URL granting RDP/SSH to one specific VM without an Azure account; authenticates against the target VM’s own credentials.
IP-based connection — Reaching a target by its private IP (on-prem over ExpressRoute, or peered VMs) rather than by Azure resource ID; Standard+.
Session recording — Premium-only capture of the graphical RDP/SSH session as video, written to your storage account for audit.
BastionAuditLogs — The diagnostic log category recording each connection (user, source IP, target, protocol); stream it to Log Analytics.
JIT (Just-in-Time) VM access — A Defender for Cloud feature that keeps management ports closed and opens them time-boxed to a specified source — with Bastion, the broker subnet.
Private-only deployment — A Premium configuration where the Bastion host itself has no public IP, removing the last internet-facing surface.
Forwarded traffic — A VNet-peering flag (allowForwardedTraffic) that lets brokered RDP/SSH transit the hub to reach a spoke.
Non-transitive peering — The property that VNet peering does not chain: A↔B and B↔C does not give A↔C; Bastion cannot reach a target two hops away.
Customer-managed key (CMK) — Encryption with a key you own in Key Vault, used here on the recording storage so evidence stays under your control.

Next steps

Lock down the network this lives in with Azure Virtual Network Deep Dive: Every Setting and choose your topology with Hub-spoke vs Virtual WAN enterprise topology.
Gate every session behind identity: Microsoft Entra Conditional Access at scale and PIM for Azure resources.
Put Bastion in context with the Azure Zero Trust multilayer security model and harden posture with Defender for Cloud CSPM & secure score.
Wire the audit ledger and alerting through Azure Monitor & Application Insights for observability.
Control egress off your now-public-IP-less VMs with Azure NAT Gateway for deterministic egress.