Every public IP on a workload VM is an open invitation to the internet’s background noise of credential-stuffing bots and CVE scanners. The traditional answer was a jump box: one hardened VM with a public IP and RDP/SSH behind an NSG and a VPN — still an internet-facing host you patch, monitor, and explain to your auditor. Azure Bastion removes it. It is a managed, agentless PaaS service that brokers RDP and SSH over TLS 443, so your VMs need no public IP, no inbound 3389/22 from the internet, and no agent inside the guest. The Bastion host lives in a dedicated subnet inside your VNet, reaches your VMs over their private IPs, and presents the session either as an HTML5 canvas in the portal or — far more usefully at scale — as a native mstsc/OpenSSH session tunnelled through the broker.
This guide goes past the portal “Connect” button into the part that matters in production: native client tunneling for scp/Ansible/full RDP, shareable links for third parties with no Azure account, session recording for PCI/HIPAA/SOC 2 evidence, IP-based connections for on-prem and peered targets, hub-and-spoke reuse so one host serves the whole estate, and the methodical decommissioning of your legacy jump boxes. Because Bastion is a security control you operate, not a one-time deploy, the SKUs, subnet rules, NSG flows, RBAC roles, error codes, limits and cost levers are all laid out as scannable tables — read the prose once, then keep the tables open when you are sizing a host, debugging a hung session, or answering a QSA.
By the end you will know exactly which SKU to deploy and why you cannot downgrade it, how to size AzureBastionSubnet and the scale units so 200 operators connect during a release window, how to wire native tunneling and prove it with scp, how to gate every session behind Conditional Access and PIM, and how to read BastionAuditLogs to close the loop with your auditor. Knowing which knob fixes a hung connection in ninety seconds is what separates a controlled cutover from a week of “the vendor still can’t get in.”
What problem this solves
Remote administrative access is the single richest target in any estate. A workload VM with a public IP and an NSG rule allowing 3389 from Any (or even from a “corporate range” that turns out to be a /8) is permanently exposed to password-spray and to whatever RDP/SSH CVE is current that quarter. The classic mitigation — a jump box behind a VPN — does not remove the exposure, it relocates it onto one box you now own end to end: you patch its OS, rotate its credentials, monitor its logins, size its public IP, and answer for it in every audit. And a jump box is still an interactive host an attacker can pivot from once they are on it.
What breaks without Bastion: an on-call engineer cuts a “temporary” NSG hole and public IP for a vendor and forgets to close it; a contractor’s laptop with a cached SSH key walks out the door and the key is still trusted; an auditor asks “show me who RDP’d into the cardholder-data box last Tuesday and what they did,” and there is no recording, only a Windows Security log on the box itself (which the same admin could clear). Meanwhile the management ports stay open 24/7 because closing them breaks access, so the attack surface is permanent.
Who hits this: every team running IaaS VMs that need interactive administration — which is most of them. It bites hardest on regulated estates (PCI-DSS, HIPAA, SOC 2, ISO 27001) that mandate no public IPs on in-scope hosts and recorded admin sessions; on hub-and-spoke platforms where per-VNet jump boxes multiply cost and operational surface; and on hybrid shops that need the same broker to reach on-prem servers over ExpressRoute. Bastion’s promise is concrete: zero public IPs on workloads, zero inbound management ports from the internet, identity-governed access, and an immutable session ledger — provided you pick the right SKU and turn the right knobs, which is exactly what defaults will not do for you.
To frame the field before the deep dive, here is every access pattern Bastion replaces or enables, the pain it removes, and the SKU floor it needs:
| Access pattern | Pain it removes | Bastion feature | Minimum SKU |
|---|---|---|---|
| RDP/SSH from the portal (HTML5) | Public IP + open 3389/22 on the VM |
Browser connect over TLS 443 | Basic |
Terminal-native SSH / scp / Ansible |
HTML5 canvas can’t run real tooling | Native client tunneling | Standard |
Full mstsc RDP (multi-monitor, drives) |
Browser RDP is a limited canvas | az network bastion rdp / tunnel |
Standard |
| Third-party access (no Azure account) | Temporary public IP + NSG hole for a vendor | Shareable links | Standard |
| Reach on-prem / peered private-IP hosts | Separate jump box per network | IP-based connection | Standard |
| Recorded admin sessions for audit | No tamper-evident session evidence | Session recording | Premium |
| No public IP on the Bastion host itself | The broker is itself internet-facing | Private-only deployment | Premium |
Learning objectives
By the end of this article you can:
- Choose between the Developer, Basic, Standard, and Premium SKUs deliberately, and explain why you can only upgrade — never downgrade — and what each tier actually unlocks.
- Lay down a correct
AzureBastionSubnet(/26minimum, exact name, nothing else in it) with a Standard Static public IP and zone-redundant scale units sized to peak concurrency. - Wire native client tunneling for SSH, RDP, and
scp/file transfer, and explain the difference betweenaz network bastion ssh,tunnel, andrdp. - Issue and revoke shareable links as time-boxed grants, and enable IP-based connections to reach on-prem and peered targets the same broker.
- Stand up session recording to an immutable, CMK-encrypted storage account and stream
BastionAuditLogsto Log Analytics, then query the connection ledger in KQL. - Centralise one Bastion in a hub across peered spokes with the correct peering flags, and explain why the second (transitive) peering hop does not work.
- Harden the host with the mandatory NSG flow set, Conditional Access, least-privilege RBAC, and PIM, and pair it with Defender for Cloud JIT so ports open only to the broker subnet.
- Right-size the bill and decommission legacy jump boxes methodically, stripping their public IPs only after Bastion access is proven.
Prerequisites & where this fits
You should already understand Azure networking fundamentals: a virtual network is an address space carved into subnets, traffic between subnets and peered VNets is governed by NSGs and route tables, and a VM reaches the world through a public IP (or, increasingly, should not). If those words are fuzzy, read Azure Virtual Network Deep Dive: Every Setting and Azure Virtual Network basics: subnets, NSGs, peering first. You should be comfortable running az in Cloud Shell, reading JSON output, and you should know what a managed identity and an RBAC role assignment are at a basic level.
This sits in the Security / secure-access track of the Azure Zero-to-Hero program. It is downstream of VNet design and upstream of the broader Azure Zero Trust multilayer security model, of which “no public IPs, identity-governed access” is a pillar. It pairs tightly with Microsoft Entra Conditional Access at scale (which gates native sessions), PIM for Azure resources (which makes even the connect right just-in-time), and Azure Monitor & Application Insights for observability (where the audit ledger lands). If you also need outbound egress control off those now-public-IP-less VMs, Azure NAT Gateway for deterministic egress is the complement.
A quick map of which layer owns what, so you call the right person when a session won’t land:
| Layer | What lives here | Who usually owns it | Failure it can cause |
|---|---|---|---|
| Identity (Entra) | RBAC, Conditional Access, PIM | Identity team | Connect denied; CA blocks the session |
| Bastion host | SKU, scale units, tunneling flag | Platform / network | Native subcommands fail; concurrency capped |
AzureBastionSubnet + NSG |
Subnet size, the mandatory flow set | Network team | Silent break of 443/4443 or egress |
| VNet peering | allowForwardedTraffic, transitivity |
Network team | Spoke VM unreachable from hub host |
| Target VM | Private IP, guest firewall, local creds | App / VM team | RDP/SSH refused inside the guest |
| Storage + Key Vault | Recording container, CMK, immutability | Platform / security | Recordings not written or tamperable |
Core concepts
Six mental models make every later decision obvious.
Bastion is a broker inside your VNet, not a gateway at its edge. The Bastion host is a set of managed VMs (Microsoft calls each a scale unit or instance) that live in a dedicated subnet named exactly AzureBastionSubnet. A client reaches Bastion over TLS 443 — from the portal or, with tunneling, from the local CLI — and Bastion reaches the target VM over its private IP using ordinary 3389/22 from inside the VNet. Because the broker is in the VNet, the target needs no public IP and no internet-facing port; the only public surface is Bastion’s own 443 (and even that disappears with the private-only Premium deployment).
The SKU is a one-way ratchet. Bastion has four tiers — Developer, Basic, Standard, Premium — and each adds features the one below lacks. You can upgrade in place (Basic→Standard→Premium) but you cannot downgrade; to go down a tier you delete and redeploy. Pick deliberately, because the gaps are large: native tunneling, shareable links, custom ports, file transfer and IP-based connections all start at Standard, and session recording and private-only deployment are Premium-only.
The subnet name and size are load-bearing, not cosmetic. The platform keys off the literal name AzureBastionSubnet — call it anything else and Bastion will not deploy. The minimum size for any Bastion created on or after 2 November 2021 is /26; a /27 is rejected, and a grandfathered /27 cannot scale host instances. The subnet holds nothing else: no NICs, no NAT gateway, no other resource. NSGs and route tables are supported on it, but the address space is Bastion’s alone.
Host scaling is how Bastion serves concurrency. Each scale unit handles roughly 20 concurrent RDP or 40 concurrent SSH sessions. Basic is fixed at 2 instances; Standard and Premium let you set 2–50. You size to peak concurrency, not VM count — a release window with 200 simultaneous operators wants ~10 instances, which is the real reason the subnet must be /26. Scale units and zone redundancy (pinning instances across zones 1/2/3) are chosen at deploy time; zones are immutable afterward, scale units you can adjust on Standard/Premium.
Native client tunneling is what makes Bastion usable for engineers. The browser HTML5 session is fine for a one-off click. For anyone who lives in a terminal, runs scp, drives Ansible, or wants a real mstsc session, the native client path (az network bastion ssh | tunnel | rdp) opens a local connection that tunnels through the broker. It requires Standard+ with the tunneling flag explicitly enabled — --enable-tunneling true — or the subcommands fail even on the right SKU.
Bastion shrinks the attack surface but is not a free pass. It removes public IPs and open ports, but the right to connect is still an RBAC outcome you must grant least-privilege, native sessions still authenticate through Entra (so Conditional Access applies), the subnet still needs a precise NSG flow set or it breaks silently, and a shareable link left standing is a standing exposure. Bastion replaces the jump box’s risks with a smaller, governable set — but only if you turn the knobs.
The vocabulary in one table
Before the deep sections, pin every moving part. The glossary repeats these for lookup; this is the mental model side by side:
| Concept | One-line definition | Where it lives | Why it matters |
|---|---|---|---|
| Bastion host | Managed broker for RDP/SSH over TLS | AzureBastionSubnet in your VNet |
The thing you deploy; SKU-gated |
| Scale unit (instance) | One managed VM behind the host | In the subnet | Concurrency: ~20 RDP / ~40 SSH each |
AzureBastionSubnet |
The mandatory dedicated subnet | In the VNet | Exact name + /26; nothing else in it |
| SKU tier | Developer / Basic / Standard / Premium | Host property | One-way upgrade; gates every feature |
| Native client tunneling | Local CLI session through the broker | Client + host | Real scp/ssh/mstsc; Standard+ |
| Shareable link | URL to one VM, no Azure account | Host feature | Vendor access; auth = VM’s own creds |
| IP-based connection | Reach a target by private IP | Host feature | On-prem / peered hosts; Standard+ |
| Session recording | Video of the RDP/SSH session | Premium → your storage | Tamper-evident audit evidence |
BastionAuditLogs |
Diagnostic log of each connection | Log Analytics | The who/when/what ledger |
| JIT VM access | Defender opens ports time-boxed | NSG via Defender | Ports open only to the broker subnet |
| Private-only | Bastion host with no public IP | Premium property | Removes the last public surface |
| Forwarded traffic | Peering flag letting brokered traffic transit | Peering config | Required for hub-and-spoke reach |
1. Pick the SKU before you touch a subnet
The SKU decides which features exist and you cannot downgrade later — only upgrade. Choose deliberately; the gaps between the four tiers are large, and discovering a missing feature mid-engagement (no native SSH, no file copy, no shareable link) means a delete-and-redeploy under pressure.
The full feature matrix, every row:
| Feature | Developer | Basic | Standard | Premium |
|---|---|---|---|---|
| Cost model | Free (shared) | Hourly + data | Hourly + data | Hourly + data |
| Dedicated deployment | No (shared fabric) | Yes | Yes | Yes |
AzureBastionSubnet required |
No | Yes | Yes | Yes |
| VNet peering reach (hub-spoke) | No | Yes | Yes | Yes |
| Concurrent connections | 1 | Fixed | Scales | Scales |
| Host scaling (instances) | No | Fixed (2) | 2–50 | 2–50 |
| Zone redundancy | No | Yes | Yes | Yes |
Native client (tunnel/ssh/rdp) |
No | No | Yes | Yes |
| Custom inbound ports | No | No | Yes | Yes |
| File transfer (upload/download) | No | No | Yes | Yes |
| Shareable links | No | No | Yes | Yes |
| IP-based connection | No | No | Yes | Yes |
| Kerberos authentication | No | No | Yes | Yes |
| Private-only deployment (no public IP) | No | No | No | Yes |
| Session recording | No | No | No | Yes |
| Upgrade path | redeploy | →Standard/Premium | →Premium | terminal |
The practical read on each tier — what it is for and the trap it sets:
| Tier | Use it for | The trap |
|---|---|---|
| Developer | Personal dev/test convenience; free | One concurrent connection; no peering → cannot serve hub-and-spoke; shared fabric |
| Basic | Browser-only RDP/SSH on a single VNet | No native client, no file copy, no shareable links — you hit the wall the first real day |
| Standard | The platform baseline | Lacks session recording and private-only — fine until an auditor or a private-only mandate appears |
| Premium | Regulated estates needing recording / private-only | Highest hourly rate; overkill if you owe no session audit trail |
The decision in one table — match your requirement to the floor SKU:
| If you need… | Smallest SKU | Why |
|---|---|---|
| A free sandbox, one VNet, one session | Developer | Shared fabric, no peering, single connection |
| Browser RDP/SSH, dedicated host, one VNet | Basic | Dedicated but feature-bare |
Native ssh/scp/mstsc, custom ports, file copy |
Standard | All the day-to-day engineering features live here |
| Shareable links for vendors | Standard | First tier with the feature |
| Reach on-prem / peered private IPs | Standard | IP-based connection |
| Recorded sessions for PCI/HIPAA/SOC 2 | Premium | Session recording is Premium-only |
| No public IP on the Bastion host itself | Premium | Private-only deployment is Premium-only |
| Hub-and-spoke serving many spokes | Standard (Premium if recording) | Peering reach starts at Basic; features at Standard |
The practical rule: for a centralized, shared Bastion in a hub, deploy Standard at minimum and Premium if you owe anyone a session audit trail or a private-only host. Developer is a personal convenience, not platform infrastructure; Basic I rarely deploy because the first request for a native SSH session or a file copy strands you.
2. Subnet design, host scaling, and zones
Bastion (every SKU except Developer) requires a dedicated subnet named exactly AzureBastionSubnet. This is not a convention you may vary — the platform keys off the literal string. The rules people get wrong, each with its consequence:
| Rule | Requirement | What breaks if you ignore it |
|---|---|---|
| Subnet name | Exactly AzureBastionSubnet |
Deployment fails / Bastion not offered for the subnet |
| Subnet size | /26 or larger (post 2 Nov 2021) |
/27 rejected; grandfathered /27 can’t scale instances |
| Subnet contents | Bastion only — no NICs, NAT GW, other resources | Conflicts; Bastion deploy refused |
| NSG | Supported, but must allow the required flow set | Over-zealous NSG silently breaks 443/4443 |
| Route table | Tolerated; don’t force-tunnel Bastion’s own egress | A 0.0.0.0/0 UDR to an NVA can blackhole the control plane |
| Public IP | Standard SKU, Static allocation | Dynamic / Basic-SKU IP rejected |
| Delegation | None required | — |
Lay the network down with the /26 subnet and a Standard, Static public IP — Bastion will not accept a Dynamic or Basic-SKU IP:
RG=rg-hub-network
LOC=eastus
VNET=vnet-hub
BASTION=bastion-hub
# Dedicated /26 subnet — the name is mandatory and case-sensitive
az network vnet subnet create \
--resource-group "$RG" \
--vnet-name "$VNET" \
--name AzureBastionSubnet \
--address-prefixes 10.0.255.0/26
# Standard SKU, Static allocation, zone-redundant — all required/recommended
az network public-ip create \
--resource-group "$RG" \
--name pip-bastion-hub \
--sku Standard \
--allocation-method Static \
--zone 1 2 3
resource bastionSubnet 'Microsoft.Network/virtualNetworks/subnets@2023-11-01' = {
parent: vnet
name: 'AzureBastionSubnet' // exact name — platform requirement
properties: {
addressPrefix: '10.0.255.0/26' // /26 minimum
}
}
resource bastionPip 'Microsoft.Network/publicIPAddresses@2023-11-01' = {
name: 'pip-bastion-hub'
location: location
sku: { name: 'Standard' } // Standard required
zones: [ '1', '2', '3' ] // zone-redundant
properties: { publicIPAllocationMethod: 'Static' } // Static required
}
Host scaling is how Bastion handles concurrency. Each scale unit is a managed VM behind the service. Basic is fixed at two; Standard and Premium let you set 2 to 50. Size to peak concurrency, not VM count. The sizing reference — pick the smallest instance count that covers your peak:
| Scale units | ~Concurrent RDP | ~Concurrent SSH | Subnet draw | Typical use |
|---|---|---|---|---|
| 2 (Basic / Standard floor) | ~40 | ~80 | small | Small team, single VNet |
| 4 | ~80 | ~160 | small | Mid platform, a few spokes |
| 8 | ~160 | ~320 | moderate | Regional hub, release windows |
| 10 | ~200 | ~400 | moderate | 200-operator release; the /26 payoff |
| 20 | ~400 | ~800 | larger | Large estate, many spokes |
| 50 (max) | ~1,000 | ~2,000 | largest | Very large multi-spoke estate |
az network bastion create \
--resource-group "$RG" \
--name "$BASTION" \
--vnet-name "$VNET" \
--public-ip-address pip-bastion-hub \
--sku Standard \
--scale-units 4 \
--location "$LOC" \
--zone 1 2 3 \
--enable-tunneling true
--enable-tunneling true is the switch that turns on native client support. Without it, the tunnel/ssh/rdp subcommands in the next step fail even on a Standard SKU. The create-time flags that are immutable versus mutable — get the immutable ones right the first time:
| Property | Set at | Mutable later? | Notes |
|---|---|---|---|
| SKU tier | Create | Upgrade only | Basic→Standard→Premium; no downgrade |
| Availability zones | Create | No | Cannot re-zone a live Bastion |
| Scale units | Create | Yes (Std/Prem) | 2–50; raise/lower as concurrency changes |
--enable-tunneling |
Create or update | Yes | Required for native client |
--enable-ip-connect |
Create or update | Yes | Required for IP-based connection |
--enable-kerberos |
Create or update | Yes | For AD-joined target auth |
| Public IP | Create | Replaceable | Standard + Static only |
Zone redundancy is set at deployment and immutable afterward — you cannot re-zone a live Bastion. In supported regions, pin instances across zones 1, 2, and 3 so a single zone failure does not sever all remote access during an incident, which is precisely when you need it most. If you skip zones and the region has a zonal event, your break-glass path is gone at the worst possible moment.
3. Native client tunneling for SSH, RDP, and file transfer
The browser experience is fine for a one-off. For engineers who live in a terminal, want scp, run Ansible, or need an RDP session richer than an HTML5 canvas, native client tunneling is what makes Bastion usable day to day. It requires Standard SKU or higher with tunneling enabled. There are three relevant subcommands, and the distinction matters — pick the right one:
| Subcommand | What it does | Auth options | Best for | SKU |
|---|---|---|---|---|
az network bastion ssh |
Interactive SSH straight to a Linux VM | AAD, ssh-key, password |
Quick terminal session, no local port | Standard+ |
az network bastion tunnel |
Raw local TCP tunnel to any target port | n/a (transport only) | scp, DB clients, full RDP, anything |
Standard+ |
az network bastion rdp |
Launches native mstsc to a Windows VM |
Windows creds | Native Windows RDP experience | Standard+ |
The --auth-type values for ssh, with their trade-offs:
--auth-type |
Extra flags | What governs access | Trade-off |
|---|---|---|---|
AAD |
none | Entra RBAC + Conditional Access | No keys to manage; needs the AAD login extension on the VM and the VM login role |
ssh-key |
--username, --ssh-key |
The key file | Familiar; key can walk out on a laptop |
password |
--username |
Local credential | Simplest; weakest; avoid in prod |
az network bastion ssh opens an interactive SSH session straight to a Linux VM by its resource ID — no public IP, no local port wrangling:
az network bastion ssh \
--name "$BASTION" \
--resource-group "$RG" \
--target-resource-id "/subscriptions/<sub>/resourceGroups/rg-app/providers/Microsoft.Compute/virtualMachines/vm-linux-01" \
--auth-type AAD
--auth-type AAD (Microsoft Entra login) is my default — access is governed by RBAC and Conditional Access instead of a key file that walks out the door on a laptop. It requires the AADSSHLoginForLinux VM extension and the Virtual Machine User Login role on the target.
az network bastion tunnel is the workhorse. It opens a raw local TCP tunnel to an arbitrary port on the target that you point any client at — real scp, a database client over the same broker, or a full RDP client:
# Open a local tunnel: localhost:50022 -> VM:22 through Bastion
az network bastion tunnel \
--name "$BASTION" \
--resource-group "$RG" \
--target-resource-id "/subscriptions/<sub>/resourceGroups/rg-app/providers/Microsoft.Compute/virtualMachines/vm-linux-01" \
--resource-port 22 \
--port 50022
With that tunnel up, every ordinary tool just works against localhost:50022:
# In a second terminal — standard OpenSSH, standard scp, no Bastion awareness
ssh -p 50022 azureuser@127.0.0.1
scp -P 50022 ./deploy.tar.gz azureuser@127.0.0.1:/tmp/
# RDP example: tunnel 3389, then point mstsc at the local port
az network bastion tunnel -n "$BASTION" -g "$RG" \
--target-resource-id "<vm-windows-id>" --resource-port 3389 --port 53389
# then: mstsc /v:localhost:53389
Common tunnel targets and the local-port convention people use — the tunnel is protocol-agnostic, so anything TCP works:
| Target service | --resource-port |
Typical local --port |
Client you point at it |
|---|---|---|---|
| SSH | 22 | 50022 | ssh -p 50022 user@127.0.0.1 |
| RDP | 3389 | 53389 | mstsc /v:localhost:53389 |
| SQL Server | 1433 | 51433 | sqlcmd -S 127.0.0.1,51433 |
| PostgreSQL | 5432 | 55432 | psql -h 127.0.0.1 -p 55432 |
| WinRM (HTTPS) | 5986 | 55986 | PowerShell remoting |
| Custom app/admin | any | any | any TCP client |
For Windows users who want the native RDP experience without managing a tunnel, az network bastion rdp launches mstsc directly:
az network bastion rdp `
--name $Bastion `
--resource-group $RG `
--target-resource-id "<vm-windows-id>"
The tunnel runs only as long as the CLI process lives. For automation, background it and capture the PID so a pipeline step can tear it down deterministically rather than leaking an open broker session.
The native-client prerequisites people miss — verify all of these before debugging a “tunnel won’t open”:
| Prerequisite | Check / fix | Symptom if missing |
|---|---|---|
| SKU is Standard+ | az network bastion show --query sku.name |
Subcommand errors “not supported on this SKU” |
| Tunneling enabled | --query enableTunneling is true |
Subcommand errors even on Standard |
| Azure CLI ≥ 2.32 + SSH extension | az extension add --name ssh |
az network bastion ssh not found |
| RBAC: Reader on Bastion + VM, NIC action | Role assignments | “Authorization failed” before connect |
| NSG allows the flow set | Section 7 table | Connect hangs / times out |
For --auth-type AAD |
AADSSHLogin extension + VM User Login role | Falls back / auth fails |
4. Shareable links and IP-based connections
Two Standard-and-up features cover the awkward access scenarios that NSG rules cannot. They look similar but solve different problems — one is about who (a person with no Azure account), the other about what (a target that isn’t an Azure VM resource):
| Feature | Solves | Auth against | Target identified by | Lifecycle risk |
|---|---|---|---|---|
| Shareable link | Third party with no Azure account | The target VM’s own creds | VM resource ID | A standing link = standing exposure |
| IP-based connection | Non-Azure / peered private-IP host | Whatever the host uses | Private IP address | Reaches anything routable from the subnet |
Shareable links generate a URL that lets a user connect to a specific VM via RDP/SSH without an Azure account or portal access. They authenticate against the target VM’s own credentials (local username/password or key), not against Entra. This is the sane answer to “the vendor needs to RDP into the staging box for two days” — far better than cutting a temporary public IP and an NSG hole. Create the link scoped to one VM:
az network bastion create-shareable-link \
--name "$BASTION" \
--resource-group "$RG" \
--vm-id "/subscriptions/<sub>/resourceGroups/rg-app/providers/Microsoft.Compute/virtualMachines/vm-staging-01"
When the engagement ends, revoke it — do not let it rot:
az network bastion delete-shareable-link \
--name "$BASTION" --resource-group "$RG" \
--vm-id "<vm-staging-01-id>"
# List standing links so you can audit and prune them on a schedule
az network bastion list-shareable-link \
--name "$BASTION" --resource-group "$RG" -o table
The shareable-link governance rules — treat every link as a time-boxed grant:
| Concern | Practice |
|---|---|
| Scope | One link per VM; never a blanket grant |
| Duration | Time-box with a calendar reminder or automation that deletes on schedule |
| Auth | The target VM’s own credentials — keep those strong and rotated |
| Audit | BastionAuditLogs records link sessions; review them |
| Revocation | delete-shareable-link the moment the engagement ends |
| Inventory | list-shareable-link periodically; prune anything stale |
IP-based connections let Bastion reach a target by private IP rather than Azure resource ID. That unlocks non-Azure targets reachable over the same network fabric — on-premises servers across ExpressRoute/VPN, or VMs in a peered VNet — so the same broker serves your hybrid estate. Enable the feature on the host first:
az network bastion update \
--name "$BASTION" --resource-group "$RG" \
--enable-ip-connect true
# Then connect to a private IP (e.g. an on-prem host over ExpressRoute)
az network bastion ssh \
--name "$BASTION" --resource-group "$RG" \
--target-ip-address 10.50.4.20 \
--auth-type ssh-key --username opsadmin --ssh-key ~/.ssh/onprem_ed25519
What IP-based connection can and cannot reach — the routability rule:
| Target | Reachable by IP-connect? | Condition |
|---|---|---|
| VM in the same VNet | Yes | Routable from AzureBastionSubnet |
| VM in a directly peered spoke | Yes | Peering with forwarded traffic allowed |
| On-prem host over ExpressRoute/VPN | Yes | Route exists hub→on-prem; no NSG drop |
| Host in a VNet peered only to a spoke | No | Peering is non-transitive (Section 6) |
| Public internet host | No | Bastion brokers private targets only |
5. Session recording, audit logging, and Just-in-Time
Session recording (Premium only) captures the graphical RDP/SSH session as video. On disconnect, recordings land in a blob container in your storage account via a SAS URL, and you replay them from the Bastion Session Recording blade. This is the artifact auditors ask for in PCI/HIPAA/SOC 2 estates: who connected to which host, when, and what they did on screen. Point it at an immutable, customer-managed-key storage account so the evidence cannot be tampered with after the fact.
What session recording captures and how to harden the destination:
| Aspect | Detail | Hardening |
|---|---|---|
| What’s captured | Graphical RDP/SSH session as video | — |
| Where it lands | Blob container in your storage account | Lock down with private endpoint + RBAC |
| Delivery | SAS URL on disconnect | Short SAS lifetime; least-privilege |
| Tamper evidence | Blob immutability (WORM) policy | Time-based retention lock |
| Encryption | Customer-managed keys (CMK) in Key Vault | Rotate the key; restrict KV access |
| Replay | Bastion Session Recording blade | RBAC-gate who can replay |
| Gap | SSH text sessions captured as screen video, not keystroke log | Pair with guest-side auditd/transcript if you need text |
For audit logging, every Bastion session emits a diagnostic event. Stream BastionAuditLogs to Log Analytics and you have the connection ledger:
az monitor diagnostic-settings create \
--name diag-bastion \
--resource "/subscriptions/<sub>/resourceGroups/$RG/providers/Microsoft.Network/bastionHosts/$BASTION" \
--logs '[{"category":"BastionAuditLogs","enabled":true}]' \
--workspace "/subscriptions/<sub>/resourceGroups/rg-monitor/providers/Microsoft.OperationalInsights/workspaces/law-platform"
resource bastionDiag 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
name: 'diag-bastion'
scope: bastion
properties: {
workspaceId: lawId
logs: [ { category: 'BastionAuditLogs', enabled: true } ]
}
}
Now you can ask real questions in KQL — for example, every session in the last day with source IP, target, and protocol:
BastionAuditLogs
| where TimeGenerated > ago(1d)
| extend p = parse_json(Properties)
| project TimeGenerated,
UserName = tostring(p.userName),
ClientIp = tostring(p.clientIpAddress),
TargetVm = tostring(p.targetVMIPAddress),
Protocol = tostring(p.protocol),
Message = tostring(p.message)
| order by TimeGenerated desc
The audit questions you’ll actually ask, and the one query shape for each:
| Question | Filter / aggregation |
|---|---|
| Who connected in the last 24h? | summarize by UserName, TargetVm |
| Which targets are hit most? | `summarize count() by TargetVm |
| Any connections from an unexpected source IP? | where ClientIp !in (<known ranges>) |
| RDP vs SSH split | summarize count() by Protocol |
| Off-hours access (e.g. 00:00–05:00 UTC) | where hourofday(TimeGenerated) between (0 .. 5) |
| Failed / disconnected sessions | where Message has_any ("failed","disconnect") |
Just-in-Time (JIT) VM access is complementary, and the pairing is the point. JIT (a Microsoft Defender for Cloud feature) keeps the VM’s management ports closed in the NSG and opens them only for an approved, time-boxed request from a specific source. Because Bastion connects from inside the VNet (its scale units sit in AzureBastionSubnet), your JIT rule grants that subnet rather than an engineer’s roaming public IP — so the port opens just-in-time and only to the broker, never to the internet.
# Request JIT access; the allowed source is the Bastion subnet range, not a public IP
az security jit-policy initiate \
--resource-group rg-app \
--location "$LOC" \
--name default \
--vm-id "/subscriptions/<sub>/resourceGroups/rg-app/providers/Microsoft.Compute/virtualMachines/vm-linux-01" \
--ports '[{"number":22,"duration":"PT2H","allowedSourceAddressPrefix":"10.0.255.0/26"}]'
The Bastion + JIT pairing, knob by knob:
| JIT field | Value with Bastion | Why |
|---|---|---|
allowedSourceAddressPrefix |
The AzureBastionSubnet CIDR (e.g. 10.0.255.0/26) |
Opens the port to the broker, never a roaming IP |
number |
22 (SSH) / 3389 (RDP) | The management port the NSG keeps shut by default |
duration |
PT1H–PT2H typical |
Time-box; auto-closes after |
| Approval | Defender request (optionally with approver) | Adds a human gate to access |
| NSG default | Deny 22/3389 inbound |
Ports closed until JIT opens them |
6. Hub-and-spoke reuse with peering and centralized Bastion
You do not deploy a Bastion per VNet — that multiplies cost and operational surface for no benefit. Deploy one Bastion in the hub and let peered spokes ride it. Because Standard/Premium honour VNet peering, a centralized host reaches VMs in every directly connected spoke.
Per-spoke versus centralized, side by side:
| Dimension | Bastion per spoke | One Bastion in the hub |
|---|---|---|
| Hosts to operate | N | 1 |
| Hourly + scale-unit cost | N × | 1 × |
AzureBastionSubnets to manage |
N | 1 |
| NSG flow sets to maintain | N | 1 |
| Audit/log streams | N | 1 (central) |
| Reach | Each VNet only | All directly peered spokes |
| Recommended | No | Yes |
Two requirements make the centralized model work:
| Requirement | Setting | If missing |
|---|---|---|
| Peering both directions | --allow-vnet-access true on both sides |
Spoke unreachable |
| Forwarded traffic allowed | --allow-forwarded-traffic true on both sides |
Brokered traffic dropped transiting the hub |
| No NSG drop hub→spoke | Allow brokered 22/3389 from the subnet |
Connect times out |
| Direct (single-hop) peering | Spoke peered to the hub, not via another spoke | Non-transitive: Bastion can’t reach it |
# Hub <-> spoke peering, both directions, forwarded traffic allowed
az network vnet peering create \
--name hub-to-spoke-app \
--resource-group rg-hub-network \
--vnet-name vnet-hub \
--remote-vnet "/subscriptions/<sub>/resourceGroups/rg-spoke-app/providers/Microsoft.Network/virtualNetworks/vnet-spoke-app" \
--allow-vnet-access true \
--allow-forwarded-traffic true
az network vnet peering create \
--name spoke-app-to-hub \
--resource-group rg-spoke-app \
--vnet-name vnet-spoke-app \
--remote-vnet "/subscriptions/<sub>/resourceGroups/rg-hub-network/providers/Microsoft.Network/virtualNetworks/vnet-hub" \
--allow-vnet-access true \
--allow-forwarded-traffic true
One caveat worth flagging loudly: Bastion does not traverse a second hop. Peering is non-transitive — if a spoke is peered to the hub but the actual VM lives in a VNet peered only to that spoke, Bastion will not reach it. The reachability matrix:
| Topology | Bastion in hub reaches it? | Fix if no |
|---|---|---|
| VM in the hub VNet | Yes | — |
| VM in a spoke directly peered to the hub | Yes | — |
| VM in a spoke peered only to another spoke | No | Peer that spoke directly to the hub |
| VM behind a VNet peered to a spoke (2nd hop) | No | Direct hub peering, or use Virtual WAN routing |
| On-prem host over ExpressRoute from hub | Yes (IP-connect) | Ensure route + no NSG drop |
Connect spokes to the hub directly, or for a genuinely meshed estate, move to Hub-spoke vs Virtual WAN enterprise topology where the managed hub handles the routing.
7. Hardening: NSGs, Conditional Access, and RBAC
Bastion shrinks the attack surface, but it is not a free pass. Three layers, each with its own table.
NSG on AzureBastionSubnet
Bastion requires a specific set of flows, and an over-zealous NSG will silently break it — the session simply hangs with no obvious error. The mandatory flow set, exhaustively:
| Direction | Priority (suggested) | Source | Source port | Destination | Dest port | Why |
|---|---|---|---|---|---|---|
| Inbound | 120 | Internet |
* | AzureBastionSubnet |
443 | HTTPS from clients + control plane |
| Inbound | 130 | GatewayManager |
* | AzureBastionSubnet |
443, 4443 | Control-plane management |
| Inbound | 140 | AzureLoadBalancer |
* | AzureBastionSubnet |
443 | Health probes |
| Inbound | 150 | VirtualNetwork |
* | AzureBastionSubnet |
8080, 5701 | Data-plane between instances |
| Outbound | 100 | AzureBastionSubnet |
* | VirtualNetwork |
22, 3389 | Reach target VMs |
| Outbound | 110 | AzureBastionSubnet |
* | AzureCloud |
443 | Dependencies (diagnostics, etc.) |
| Outbound | 120 | AzureBastionSubnet |
* | VirtualNetwork |
8080, 5701 | Data-plane between instances |
| Outbound | 130 | AzureBastionSubnet |
* | Internet |
80 | Session/cert validation |
az network nsg rule create -g "$RG" --nsg-name nsg-bastion \
--name Allow-HTTPS-Inbound --priority 120 --direction Inbound --access Allow \
--protocol Tcp --source-address-prefixes Internet \
--destination-port-ranges 443 --destination-address-prefixes '*'
az network nsg rule create -g "$RG" --nsg-name nsg-bastion \
--name Allow-GatewayManager-Inbound --priority 130 --direction Inbound --access Allow \
--protocol Tcp --source-address-prefixes GatewayManager \
--destination-port-ranges 443 4443 --destination-address-prefixes '*'
az network nsg rule create -g "$RG" --nsg-name nsg-bastion \
--name Allow-SshRdp-Outbound --priority 100 --direction Outbound --access Allow \
--protocol Tcp --source-address-prefixes '*' \
--destination-port-ranges 22 3389 --destination-address-prefixes VirtualNetwork
The NSG failure modes — match the symptom to the missing rule:
| Symptom | Missing / wrong rule | Confirm | Fix |
|---|---|---|---|
| Portal “Connect” hangs, never loads | Inbound 443 from Internet blocked |
NSG effective rules | Allow 443 inbound from Internet |
| Deploy succeeds but no sessions work | Inbound 443/4443 from GatewayManager blocked |
Effective rules | Allow GatewayManager 443,4443 |
| Sessions drop intermittently | Inter-instance 8080/5701 blocked | Effective rules | Allow VirtualNetwork 8080,5701 both ways |
| Connects to portal, can’t reach VM | Outbound 22/3389 to VirtualNetwork blocked |
Effective rules | Allow outbound 22,3389 to VirtualNetwork |
| Flaky after a UDR change | 0.0.0.0/0 route to an NVA blackholes egress | az network nic show-effective-route-table |
Exempt Bastion egress from force-tunnel |
Conditional Access
Native client and Entra-based SSH authenticate through Microsoft Entra ID, which means Conditional Access applies. Require MFA and a compliant device on the Azure management surface and you have gated every native Bastion session behind your phishing-resistant posture — without touching a single VM. What CA can enforce on the session path:
| CA control | Effect on Bastion session | Notes |
|---|---|---|
| Require MFA | Native/AAD session needs MFA | Gates the management plane |
| Require compliant / hybrid-joined device | Block sessions from unmanaged laptops | Strong control for admin access |
| Block legacy auth | Removes weak auth paths | Baseline hygiene |
| Named locations / IP ranges | Restrict where sessions originate | Combine with phishing-resistant MFA |
| Sign-in risk (Identity Protection) | Step-up or block risky sessions | Needs Entra ID P2 |
| Session controls (sign-in frequency) | Re-auth for long sessions | Limits stale-session risk |
RBAC
The ability to connect is an RBAC outcome. A user needs Reader on the Bastion, Reader on the VM, and the relevant data-plane action on the NIC; for Entra SSH/RDP they also need the VM login role. The least-privilege role set:
| Action the user needs | Role / permission | Scope | Don’t over-grant |
|---|---|---|---|
| See and use Bastion | Reader |
Bastion resource | — |
| See the target VM | Reader |
VM | — |
| Connect through the NIC | …/virtualNetworks/subnets/... + NIC read action (custom connect role) |
RG / VM | Not Contributor |
| SSH/RDP as a user (Entra) | Virtual Machine User Login |
VM | Prefer over Admin Login |
| SSH/RDP as an admin (Entra) | Virtual Machine Administrator Login |
VM | Only where truly needed |
| Manage shareable links | Bastion write actions | Bastion | Restrict to platform team |
Scope Reader plus a custom connect role at the resource-group level; do not hand out Virtual Machine Administrator Login where Virtual Machine User Login will do. Grant the login role through PIM so even the connect right is itself just-in-time — see PIM for Azure resources. For the broader access model this sits inside, Microsoft Entra RBAC governance deep dive is the parent.
8. Cost optimization and decommissioning the jump boxes
Bastion bills an hourly host rate plus scale-unit and data charges (the first 5 GB/month of outbound is free). The cost levers — pull these before you pay for headroom you don’t use:
| Lever | Action | Effect |
|---|---|---|
| One host in a hub | Replace N per-spoke hosts with 1 centralized | N× → 1× hourly + scale-unit |
| Right-size scale units | Set --scale-units to observed peak concurrency |
Stop paying for idle instances |
| Developer SKU for sandboxes | Use the free tier where peering isn’t needed | Genuinely free |
| Kill workload public IPs | Strip standing public IPs once Bastion proven | Removes billed IPs and attack surface |
| Delete jump-box VMs | Decommission the compute you no longer run | Removes 24/7 VM + IP cost |
| Standard over Premium | Drop to Standard if you owe no recording/private-only | Lower hourly rate |
The wins, in prose: one Premium host in a hub is cheaper and safer than N Basic hosts across spokes — and the N jump-box VMs you delete were each costing compute 24/7 plus their public IPs. Right-size scale units — do not run 50 instances for a team of five; set --scale-units to observed peak concurrency. Developer SKU for sandboxes that do not need peering reach is genuinely free. And kill the public IPs — every standing public IP on a workload is a billed resource and an attack surface.
Decommission a legacy jump box methodically — the order matters so you can roll back if something was missed:
| Step | Action | Why this order |
|---|---|---|
| 1 | Stand up Bastion (right SKU, subnet, NSG, scale units) | The replacement must exist first |
| 2 | Grant RBAC + (optionally) Entra VM login to users | They can’t cut over without access |
| 3 | Cut users over to Bastion for daily access | Prove the new path under real use |
| 4 | Confirm BastionAuditLogs shows them connecting |
Evidence the path works before you remove the old one |
| 5 | Remove the jump box’s NSG inbound rules | Close the internet path |
| 6 | Dissociate the public IP from the VM NIC | Reversible if you missed a workflow |
| 7 | Delete the public IP, then the jump-box VM | Final cleanup once proven |
# Strip the public IP off a VM NIC once Bastion access is proven
az network nic ip-config update \
--resource-group rg-app --nic-name nic-jumpbox-01 \
--name ipconfig1 --remove publicIpAddress
az network public-ip delete -g rg-app --name pip-jumpbox-01
Architecture at a glance
The diagram traces a native-client connection as it actually flows, left to right, and marks the four hops where a session most commonly breaks. Read it this way: an engineer’s CLI (or the portal) opens a session over TLS 443 — the only inbound surface — which is gated first by Microsoft Entra (RBAC, Conditional Access, PIM) before anything reaches the network. The request lands on the Bastion host in AzureBastionSubnet (/26, Standard SKU, tunneling on, scale units sized to concurrency), whose NSG must permit the precise 443/4443 inbound and 22/3389 outbound flow set or the session hangs silently. From the subnet, Bastion brokers over the VM’s private IP — across VNet peering (forwarded-traffic on, single hop only) when the target is a spoke — to the workload VM, which now carries no public IP and keeps 22/3389 shut except when Defender JIT opens them to the broker subnet alone. The control path also fans to storage (session recordings, immutable + CMK on Premium) and Log Analytics (BastionAuditLogs), which is how the loop closes with your auditor.
The numbered badges sit on the failure-prone hops: an Entra/RBAC denial before connect, the NSG flow set that breaks the broker, the non-transitive peering hop that strands a spoke VM, and the JIT/private-IP contract on the target. The legend narrates each as symptom · how to confirm · fix — the same method as the rest of this guide: localise the break to a hop, confirm with the named command, apply the fix. The first question on any hung session is always “did identity allow it, and does the NSG let the broker through?”
Real-world scenario
A payments platform team I worked with — call them Meridian Pay — ran a hub-and-spoke estate across three regions with roughly 400 VMs, and PCI-DSS forced two hard constraints: no workload VM may carry a public IP, and every interactive admin session must be recorded and retained. Their interim state was four per-region jump boxes — internet-facing, RDP open behind NSGs, each on a Standard_D2s_v5 costing ~₹9,000/month plus its public IP — and the QSA flagged them as in-scope cardholder-data-environment (CDE) ingress with no session evidence. Twelve jump boxes across three regions, twelve public IPs, twelve hosts to patch, and a finding that would not close.
We collapsed all four jump boxes per region into a single Premium Bastion in each regional hub (Premium for the session-recording requirement) with --scale-units 8 to cover ~120 concurrent operators per region during a release window. The spokes were already peered to the hub, so the only networking change was confirming --allow-forwarded-traffic true on both sides of each peering — no new subnets beyond the three AzureBastionSubnet /26s in the hubs. Session recordings were written to a storage account with a time-based immutability (WORM) policy and customer-managed keys, satisfying the tamper-evidence requirement, and BastionAuditLogs streamed to a central Log Analytics workspace gave the QSA the connection ledger they wanted, queryable by user, target and time.
The sharp edge was that the QSA also required admin ports stay closed except during approved access — recording alone was not enough. We wired Defender for Cloud JIT so the NSG kept 22/3389 shut, and the JIT grant opened them only to the hub’s AzureBastionSubnet range, never to a public source. Because Bastion brokers from inside the VNet, the source prefix on the JIT rule was the subnet, not an engineer’s roaming IP:
az security jit-policy initiate \
--resource-group rg-spoke-payments --location eastus --name default \
--vm-id "/subscriptions/<sub>/resourceGroups/rg-spoke-payments/providers/Microsoft.Compute/virtualMachines/vm-pay-07" \
--ports '[{"number":3389,"duration":"PT1H","allowedSourceAddressPrefix":"10.0.255.0/26"}]'
One non-transitive-peering trap nearly bit them: a late-discovered “analytics” VNet was peered only to the payments spoke, not the hub, so the hub Bastion could not reach its two VMs — sessions just timed out with no error. The fix was a direct hub↔analytics peering with forwarded traffic, after which the host reached them immediately. The lesson went on the runbook: “If a VM is two peering hops from the hub, Bastion can’t see it — peer it directly.”
The net result: zero public IPs on workloads, ports closed by default and opened just-in-time to the broker subnet alone, full session video retained immutably, and twelve internet-facing jump boxes deleted across three regions. The CDE-ingress finding closed at the next assessment, the standing public-IP cost went with it, and the monthly spend dropped — three Premium Bastions cost meaningfully less than twelve always-on jump-box VMs plus their IPs and the patching toil around them. The before/after:
| Dimension | Before (jump boxes) | After (Bastion) |
|---|---|---|
| Internet-facing hosts | 12 (4 × 3 regions) | 0 |
| Public IPs on the access path | 12 | 0 (Premium private-only) |
| Workload public IPs | several | 0 |
| Admin port exposure | 3389 open behind NSG 24/7 |
Closed; JIT opens to broker subnet only |
| Session evidence | None (host-local logs) | Immutable CMK video + central audit log |
| Hosts to patch | 12 | 3 (managed PaaS) |
| QSA finding | Open (CDE ingress) | Closed |
Advantages and disadvantages
Bastion’s managed-broker-inside-the-VNet model both removes a class of risk and introduces a few sharp edges. Weigh it honestly:
| Advantages (why this model helps you) | Disadvantages (why it bites) |
|---|---|
No public IP or open 3389/22 on workloads — the whole exposure class disappears |
The Bastion host itself is a public surface on 443 unless you pay for Premium private-only |
| Agentless and managed — Microsoft patches the broker, not you | You give up the simplicity (and the cost) of a single VM you fully control |
Native client tunneling runs real scp/Ansible/mstsc — not just an HTML5 canvas |
Tunneling needs Standard+ and an explicit flag and the SSH CLI extension — easy to miss |
| Shareable links grant vendor access with no Azure account or NSG hole | A standing shareable link is a standing exposure; you must time-box and revoke |
| Sessions authenticate through Entra → Conditional Access + PIM apply estate-wide | The SKU is a one-way ratchet; a wrong choice means delete-and-redeploy |
| Premium records sessions to immutable CMK storage for audit | Recording is graphical video, not a keystroke/text log — pair with guest auditing for text |
| One host in a hub serves all directly-peered spokes | Peering is non-transitive — a second hop strands the target with a silent timeout |
BastionAuditLogs gives a central, queryable connection ledger |
Defaults are unsafe: tunneling off, no NSG tuning, broad RBAC — you must turn the knobs |
The model is right for any estate that wants no public IPs on workloads and identity-governed, auditable admin access — which is most regulated and most security-mature shops. It is overkill for a single throwaway dev VNet where the free Developer SKU or even a short-lived jump box suffices. The disadvantages are all manageable, but only if you know they exist: the SKU ratchet, the explicit tunneling flag, the NSG flow set, the non-transitive hop, and the standing-link risk are exactly the things defaults will not handle for you.
Hands-on lab
Stand up a Standard Bastion, connect to a Linux VM with native tunneling, prove scp works, then tear it all down. Free-tier-friendly where possible (the VM is a small B1s; Bastion Standard bills hourly, so delete at the end). Run in Cloud Shell (Bash).
Step 1 — Variables and resource group.
RG=rg-bastion-lab
LOC=eastus
VNET=vnet-lab
BASTION=bastion-lab
VM=vm-linux-lab
az group create -n $RG -l $LOC -o table
Step 2 — VNet with a workload subnet and the mandatory AzureBastionSubnet (/26).
az network vnet create -g $RG -n $VNET --address-prefixes 10.0.0.0/16 \
--subnet-name snet-workload --subnet-prefixes 10.0.1.0/24 -o table
az network vnet subnet create -g $RG --vnet-name $VNET \
--name AzureBastionSubnet --address-prefixes 10.0.255.0/26 -o table
Expected: the VNet plus two subnets; the Bastion subnet named exactly AzureBastionSubnet.
Step 3 — A Linux VM with NO public IP (the whole point).
az vm create -g $RG -n $VM --image Ubuntu2204 --size Standard_B1s \
--vnet-name $VNET --subnet snet-workload \
--public-ip-address "" \
--admin-username azureuser --generate-ssh-keys -o table
--public-ip-address "" ensures the VM is private-only. Expected: a VM with a private IP and "publicIpAddress": "".
Step 4 — Standard public IP for Bastion (Static), then the Bastion host with tunneling on.
az network public-ip create -g $RG -n pip-bastion-lab \
--sku Standard --allocation-method Static -o table
az network bastion create -g $RG -n $BASTION \
--vnet-name $VNET --public-ip-address pip-bastion-lab \
--sku Standard --scale-units 2 --enable-tunneling true -o table
Bastion takes ~5–10 minutes to provision. Expected when done: "sku": {"name": "Standard"}, "enableTunneling": true.
Step 5 — Open a native tunnel and prove scp through the broker.
# In one terminal: localhost:50022 -> VM:22 through Bastion (leave it running)
VMID=$(az vm show -g $RG -n $VM --query id -o tsv)
az network bastion tunnel -n $BASTION -g $RG \
--target-resource-id "$VMID" --resource-port 22 --port 50022
# In a second Cloud Shell tab: standard ssh + scp, no public IP anywhere
ssh -p 50022 azureuser@127.0.0.1 'hostname && echo connected-via-bastion'
echo "hello from bastion" > /tmp/proof.txt
scp -P 50022 /tmp/proof.txt azureuser@127.0.0.1:/tmp/proof.txt
ssh -p 50022 azureuser@127.0.0.1 'cat /tmp/proof.txt'
Expected: the VM hostname prints, connected-via-bastion, and hello from bastion round-trips back — a file copied to a VM that has no public IP and no inbound 22 from the internet.
Step 6 — Turn on the audit ledger (optional but instructive).
LAW=$(az monitor log-analytics workspace create -g $RG -n law-bastion-lab --query id -o tsv)
az monitor diagnostic-settings create --name diag-bastion \
--resource $(az network bastion show -g $RG -n $BASTION --query id -o tsv) \
--logs '[{"category":"BastionAuditLogs","enabled":true}]' \
--workspace "$LAW"
# Reconnect once, wait a few minutes, then query BastionAuditLogs in the workspace.
Step 7 — Teardown (do this — Bastion bills hourly).
az group delete -n $RG --yes --no-wait
The lab teardown checklist, so nothing is left billing:
| Resource | Bills while it exists? | Removed by group delete? |
|---|---|---|
| Bastion host (Standard) | Yes (hourly + scale units) | Yes |
| Public IP (Standard) | Yes (hourly) | Yes |
Linux VM (B1s) |
Yes (compute) | Yes |
| VNet + subnets | No | Yes |
| Log Analytics workspace | Yes (ingestion/retention) | Yes |
Common mistakes & troubleshooting
The failure modes that actually page you, as a symptom→root-cause→confirm→fix playbook. Scan the matrix, then read the detail for whichever row matches.
| # | Symptom | Root cause | Confirm (exact command / portal path) | Fix |
|---|---|---|---|---|
| 1 | Bastion won’t deploy | Subnet not named AzureBastionSubnet or smaller than /26 |
az network vnet subnet show -n AzureBastionSubnet --query addressPrefix |
Recreate subnet: exact name, /26+ |
| 2 | “Connect” hangs in the portal | NSG blocks inbound 443 from Internet/GatewayManager |
NSG effective rules on the subnet | Add the mandatory inbound flow set (Section 7) |
| 3 | az network bastion ssh/tunnel errors |
Tunneling not enabled, or SKU is Basic | az network bastion show --query "{sku:sku.name,tun:enableTunneling}" |
--enable-tunneling true on Standard+ |
| 4 | Portal works, but can’t reach the VM | NSG blocks outbound 22/3389 to VirtualNetwork |
Effective rules; az network nic show-effective-route-table |
Allow outbound 22/3389 to VirtualNetwork |
| 5 | Spoke VM unreachable, times out | Non-transitive peering (2nd hop) or no forwarded traffic | az network vnet peering list — check both flags |
Direct hub peering + --allow-forwarded-traffic true |
| 6 | “Authorization failed” before connect | Missing RBAC (Reader on Bastion/VM or NIC action) | az role assignment list --assignee <user> |
Grant Reader + custom connect role |
| 7 | Entra SSH fails, key prompt instead | Missing AADSSHLogin extension or VM login role | az vm extension list; check role assignments |
Install extension + grant VM User Login |
| 8 | Shareable link 404s / won’t connect | Link revoked, or VM’s local creds wrong | az network bastion list-shareable-link |
Recreate link; verify the VM’s local credentials |
| 9 | No BastionAuditLogs rows |
Diagnostic setting missing or wrong category | az monitor diagnostic-settings list --resource <bastion-id> |
Create setting with BastionAuditLogs enabled |
| 10 | Session recording empty (Premium) | Storage target misconfigured / SAS/permission issue | Session Recording blade; storage container | Fix storage target, identity, container access |
| 11 | Sessions drop randomly mid-work | Inter-instance 8080/5701 blocked, or zone event |
Effective rules; Resource Health | Allow VirtualNetwork 8080/5701; deploy zonal |
| 12 | Flaky right after a routing change | 0.0.0.0/0 UDR to an NVA blackholes Bastion egress |
az network nic show-effective-route-table |
Exempt Bastion egress from force-tunnel |
Mistake 1 — The subnet is wrong
The single most common deploy blocker. The subnet must be named exactly AzureBastionSubnet and be /26 or larger. A typo’d name, a /27, or other resources in the subnet all stop the deploy.
Confirm. az network vnet subnet show -g $RG --vnet-name $VNET -n AzureBastionSubnet --query addressPrefix -o tsv — if this errors, the name is wrong; if it returns a /27 or smaller, the size is wrong. Fix: recreate the subnet with the exact name and /26. You cannot grow a /27 in place into a usable Bastion subnet; delete and recreate.
Mistake 2 — The NSG silently breaks the broker
Bastion needs the precise inbound flow set (443 from Internet, 443/4443 from GatewayManager, probes from AzureLoadBalancer, inter-instance 8080/5701 from VirtualNetwork). An NSG that allows less leaves sessions hanging with no clear error.
Confirm. On the subnet’s NSG, check effective rules (portal: subnet → NSG → Effective rules), or az network nsg rule list. Fix: add the mandatory inbound and outbound rules from Section 7. The give-away is that deploy succeeded but no session ever lands — control-plane flows (GatewayManager 443/4443) are blocked.
Mistake 3 — Native subcommands fail on the right-looking host
az network bastion ssh/tunnel/rdp need Standard+ and the tunneling flag and the Azure CLI SSH extension. People deploy Standard, forget --enable-tunneling true, and the subcommands error.
Confirm. az network bastion show -g $RG -n $BASTION --query "{sku:sku.name, tunneling:enableTunneling}" -o json — you want Standard/Premium and true. Fix: az network bastion update -g $RG -n $BASTION --enable-tunneling true and az extension add --name ssh.
Mistake 5 — The non-transitive peering hop
A spoke peered to the hub is reachable; a VM two hops away (in a VNet peered only to a spoke) is not — and it fails as a silent timeout, which sends people hunting the wrong layer for an hour.
Confirm. az network vnet peering list -g <rg> --vnet-name <vnet> -o table — verify the target’s VNet is peered directly to the hub and both sides have allowForwardedTraffic: true. Fix: create a direct hub↔target peering with forwarded traffic, or move meshed routing to Virtual WAN.
Mistake 9 — The audit ledger is empty
BastionAuditLogs only flow if a diagnostic setting routes them. No setting, no rows — and you discover this when the auditor asks for evidence you never captured.
Confirm. az monitor diagnostic-settings list --resource $(az network bastion show -g $RG -n $BASTION --query id -o tsv) -o json. Fix: create the setting with the BastionAuditLogs category enabled (Section 5), then reconnect once and wait a few minutes for ingestion.
Best practices
- Pick the SKU deliberately and once. Standard is the platform baseline; go Premium only for session recording or private-only. Remember it is a one-way upgrade — get it right before you build around it.
- Subnet
/26, exact name, nothing else. LayAzureBastionSubnetat/26from day one so scale units have room; never co-locate other resources. - Deploy zone-redundant. Pin instances across zones 1/2/3 where supported; zones are immutable, and a zonal event must not sever your break-glass access.
- Enable tunneling at create time. Set
--enable-tunneling trueso nativessh/tunnel/rdpwork — the single most-forgotten flag. - Size scale units to peak concurrency, not VM count. ~20 RDP / ~40 SSH per unit; right-size and stop paying for idle instances.
- One host in a hub, not one per VNet. Centralize in the hub, peer spokes directly with forwarded traffic, and never rely on a transitive second hop.
- Tune the NSG to the exact flow set. Allow precisely the required
443/4443/8080/5701inbound and22/3389/443/80outbound — no more, no less. - Authenticate with Entra where you can. Prefer
--auth-type AADso Conditional Access and PIM govern the session and there are no SSH keys to leak. - Time-box every shareable link. Scope per-VM, set a revocation reminder or automation, and audit standing links regularly.
- Pair Bastion with Defender JIT. Keep
22/3389closed in the NSG and open them just-in-time to theAzureBastionSubnetrange only. - Stream
BastionAuditLogsand (Premium) record sessions. Send the ledger to Log Analytics and write recordings to immutable, CMK-encrypted storage. - Decommission jump boxes in order. Prove Bastion access in the audit log before you dissociate and delete any public IP, so a rollback is always possible.
Security notes
Bastion is itself a security control, so harden it as one. Network isolation: the broker lives in the VNet and reaches targets over private IPs — strip every workload public IP and keep 22/3389 shut, opening them only via JIT to the broker subnet; on Premium, run the host private-only so even 443 is not internet-facing. Identity is the real perimeter: the right to connect is RBAC, so grant least privilege (Reader + a custom connect role, Virtual Machine User Login over Admin Login), put the login role behind PIM, and gate native sessions with Conditional Access requiring MFA and a compliant device. Encryption and evidence: Bastion brokers RDP/SSH over TLS, and on Premium session recordings should land in storage with a time-based immutability (WORM) policy and customer-managed keys so the audit trail cannot be altered after the fact. Least exposure for third parties: prefer time-boxed shareable links over any public-IP/NSG hole, scope them to one VM, and revoke on completion. Audit everything: BastionAuditLogs to a central workspace gives the who/when/what ledger; alert on off-hours or unexpected-source connections. This fits the broader Azure Zero Trust multilayer security model — “no public IPs, identity-governed, audited access” is precisely the network-and-access pillar of Zero Trust.
The security-control checklist, each with its lever:
| Control objective | Bastion lever | Verify with |
|---|---|---|
| No public IP on workloads | Private-IP brokering; strip VM IPs | az network nic ip-config show |
| No public IP on the broker | Premium private-only deployment | Bastion config |
| Management ports closed by default | Defender JIT to subnet range | NSG rules; JIT policy |
| Least-privilege connect | Reader + custom role; VM User Login | az role assignment list |
| Just-in-time elevation | PIM on the VM login role | PIM blade |
| Phishing-resistant session access | Conditional Access (MFA + compliant device) | CA policy |
| Tamper-evident session evidence | Premium recording + WORM + CMK | Storage immutability policy |
| Central audit ledger | BastionAuditLogs to Log Analytics |
Diagnostic settings |
Cost & sizing
Bastion bills three things: an hourly host rate (per SKU), a per-scale-unit hourly rate above the included instances, and outbound data (first 5 GB/month free). The host runs 24/7 once deployed — it does not auto-pause — so the dominant lever is do you need a host at all in this VNet, answered by centralizing in the hub. What drives the bill:
| Cost driver | Scales with | Lever to control it |
|---|---|---|
| Hourly host rate | SKU tier (Basic < Standard < Premium) | Use the lowest SKU that meets requirements |
| Scale-unit hours | --scale-units above the included count |
Right-size to peak concurrency |
| Outbound data | GB transferred (after 5 GB free) | Usually negligible for admin sessions |
| Number of hosts | One per VNet vs one per hub | Centralize: one host serves all spokes |
| Workload public IPs (saved) | IPs you delete | Stripping them reduces cost |
Rough figures (illustrative; check the Azure pricing calculator for your region). The right-sizing rule: pick the SKU by feature need and the scale units by peak concurrent sessions ÷ ~20 (RDP) or ÷ ~40 (SSH), then round up by one for headroom.
| Scenario | SKU | Scale units | Rough order of monthly cost | Note |
|---|---|---|---|---|
| Personal dev sandbox | Developer | n/a | Free | No peering; single session |
| Small team, one VNet, browser-only | Basic | 2 (fixed) | Low (≈ a small VM) | No native client |
| Platform baseline, a few spokes | Standard | 2–4 | Moderate | Native client, links, IP-connect |
| Regional hub, release windows | Standard/Premium | 8 | Higher (Premium adds recording) | ~160 concurrent RDP |
| Regulated estate, recorded sessions | Premium | 8–10 | Highest host rate + storage | WORM + CMK storage adds a little |
The savings side is easy to forget and often net-positive: deleting N jump-box VMs (each a 24/7 D2s-class VM plus its public IP) and stripping workload public IPs frequently outweighs the Bastion host cost, especially when one centralized host replaces several jump boxes. Free-tier note: the Developer SKU is genuinely free but cannot traverse peering — it is a sandbox tool, not platform infrastructure. For the broader picture of right-sizing shared platform services, see Azure FinOps & cost management at scale.
Interview & exam questions
Q1. Why does an Azure Bastion deployment require a subnet named exactly AzureBastionSubnet, and what’s the minimum size?
The platform identifies the subnet by that literal name — it will not deploy into a differently named subnet. The minimum size for any Bastion created on or after 2 November 2021 is /26; smaller (e.g. /27) is rejected, and a grandfathered /27 cannot scale host instances. Maps to AZ-700 / AZ-500 networking objectives.
Q2. You deployed Standard Bastion but az network bastion ssh fails. What’s the most likely cause?
Tunneling is not enabled. Native client subcommands need Standard+ and --enable-tunneling true (plus the Azure CLI SSH extension). Set the flag with az network bastion update --enable-tunneling true and confirm enableTunneling: true.
Q3. Can you downgrade a Bastion from Premium to Standard? No. SKU changes are upgrade-only (Basic→Standard→Premium). To move to a lower tier you must delete and redeploy. This is why the SKU choice must be deliberate up front.
Q4. A vendor with no Azure account needs RDP to one staging VM for two days. What’s the right Bastion feature, and what does it authenticate against?
A shareable link (Standard+), scoped to that one VM. It authenticates against the target VM’s own credentials (local username/password or key), not Entra. Time-box it and revoke with delete-shareable-link when the engagement ends.
Q5. Your hub Bastion can’t reach a VM in a spoke. The spoke is peered, but only to another spoke. Why does it fail, and how do you fix it?
VNet peering is non-transitive — Bastion does not traverse a second hop. The VM’s VNet must be peered directly to the hub (with allowForwardedTraffic on both sides), or you move meshed routing to Virtual WAN. The failure presents as a silent timeout.
Q6. How do you keep a VM’s management ports closed yet still let Bastion connect?
Pair Bastion with Defender for Cloud JIT: the NSG denies 22/3389 by default, and the JIT grant opens them time-boxed to the AzureBastionSubnet CIDR only — never a roaming public IP — because Bastion brokers from inside the VNet.
Q7. Which SKU is required for session recording, and where do recordings land? Premium. On disconnect, recordings are written as video to a blob container in your storage account via a SAS URL; you replay them from the Session Recording blade. Harden the storage with a WORM immutability policy and customer-managed keys.
Q8. What’s the minimum RBAC for a user to connect to a VM through Bastion?
Reader on the Bastion, Reader on the VM, and the NIC data-plane action (typically via a custom connect role). For Entra-based SSH/RDP, add Virtual Machine User Login (or Administrator Login only where truly needed). Prefer granting the login role through PIM.
Q9. How does native client tunneling differ from the browser session, and which command gives a raw TCP tunnel?
The browser is an HTML5 canvas; native tunneling opens a local connection through the broker for real scp/Ansible/mstsc. az network bastion tunnel gives a raw local TCP tunnel to any target port that any client can use; ssh and rdp are higher-level conveniences.
Q10. Why is “one Bastion per VNet” an anti-pattern, and what’s the alternative?
It multiplies hourly cost, subnets, NSGs, and audit streams for no benefit. Deploy one host in the hub; because Standard/Premium honour peering, it reaches VMs in every directly-peered spoke. Confirm --allow-forwarded-traffic true on both sides of each peering.
Q11. Which NSG flows are mandatory on AzureBastionSubnet?
Inbound: 443 from Internet, 443/4443 from GatewayManager, 443 from AzureLoadBalancer, and 8080/5701 from VirtualNetwork. Outbound: 22/3389 to VirtualNetwork, 443 to AzureCloud, 8080/5701 to VirtualNetwork, and 80 to Internet. Missing the GatewayManager flow is the classic silent break.
Q12. How do native Bastion sessions inherit your organization’s MFA posture without touching the VMs? Native client and Entra-based SSH/RDP authenticate through Microsoft Entra ID, so Conditional Access applies to the management plane — require MFA and a compliant device once, and every native Bastion session is gated, with no per-VM changes. Maps to SC-300 / AZ-500.
Quick check
- What is the exact required name and minimum size of the Bastion subnet?
- Which SKU is the floor for native client tunneling, shareable links, and IP-based connections?
- Why does a VM two peering hops from the hub fail to connect through a hub Bastion?
- When you pair Bastion with Defender JIT, what source prefix does the JIT rule grant?
- Which SKU is required for session recording, and how should the destination storage be hardened?
Answers
AzureBastionSubnet,/26or larger. The platform keys off the literal name;/27is rejected (and a grandfathered/27can’t scale instances).- Standard. Native tunneling, custom ports, file transfer, shareable links, and IP-based connection all start at Standard; Basic has none of them.
- VNet peering is non-transitive — Bastion doesn’t traverse a second hop. Peer that VNet directly to the hub (with forwarded traffic), or use Virtual WAN routing.
- The
AzureBastionSubnetCIDR (e.g.10.0.255.0/26), because Bastion brokers from inside the VNet — never an engineer’s roaming public IP. - Premium. Write recordings to a storage account with a time-based immutability (WORM) policy and customer-managed keys (CMK) so the evidence is tamper-evident.
Glossary
- Azure Bastion — A managed, agentless PaaS service that brokers RDP/SSH to your VMs over TLS 443, so the VMs need no public IP or inbound management ports.
AzureBastionSubnet— The mandatory, exactly-named,/26-minimum dedicated subnet the Bastion host lives in; holds nothing else.- Scale unit (instance) — One managed VM behind the Bastion service; each handles ~20 concurrent RDP or ~40 concurrent SSH sessions.
- SKU tier — Developer / Basic / Standard / Premium; an upgrade-only ratchet that gates which features exist.
- Native client tunneling — Connecting from the local CLI (
az network bastion ssh/tunnel/rdp) through the broker so realscp/Ansible/mstscwork; Standard+, requires--enable-tunneling. - Shareable link — A URL granting RDP/SSH to one specific VM without an Azure account; authenticates against the target VM’s own credentials.
- IP-based connection — Reaching a target by its private IP (on-prem over ExpressRoute, or peered VMs) rather than by Azure resource ID; Standard+.
- Session recording — Premium-only capture of the graphical RDP/SSH session as video, written to your storage account for audit.
BastionAuditLogs— The diagnostic log category recording each connection (user, source IP, target, protocol); stream it to Log Analytics.- JIT (Just-in-Time) VM access — A Defender for Cloud feature that keeps management ports closed and opens them time-boxed to a specified source — with Bastion, the broker subnet.
- Private-only deployment — A Premium configuration where the Bastion host itself has no public IP, removing the last internet-facing surface.
- Forwarded traffic — A VNet-peering flag (
allowForwardedTraffic) that lets brokered RDP/SSH transit the hub to reach a spoke. - Non-transitive peering — The property that VNet peering does not chain: A↔B and B↔C does not give A↔C; Bastion cannot reach a target two hops away.
- Customer-managed key (CMK) — Encryption with a key you own in Key Vault, used here on the recording storage so evidence stays under your control.
Next steps
- Lock down the network this lives in with Azure Virtual Network Deep Dive: Every Setting and choose your topology with Hub-spoke vs Virtual WAN enterprise topology.
- Gate every session behind identity: Microsoft Entra Conditional Access at scale and PIM for Azure resources.
- Put Bastion in context with the Azure Zero Trust multilayer security model and harden posture with Defender for Cloud CSPM & secure score.
- Wire the audit ledger and alerting through Azure Monitor & Application Insights for observability.
- Control egress off your now-public-IP-less VMs with Azure NAT Gateway for deterministic egress.