Azure Key Vault: Secrets, Keys and Certificates Done Right

Quick take: Secrets in config files and certificates on disk are liabilities you cannot audit and cannot rotate. Azure Key Vault moves them into a managed, access-controlled, logged service where every retrieval is an identity-checked, recorded event and rotation becomes a single operation instead of a fleet-wide change.

A development team I reviewed stored a production database password directly in their App Service application settings, in plaintext, copied into a .env file on three developer laptops and pasted into a runbook. When a contractor rolled off, nobody could answer the only two questions that matter: who has seen this password, and where else does it live? Rotating it meant hunting through a dozen apps and hoping they’d found them all. The fix was not a policy memo — it was Key Vault. The password moved into a vault, the apps authenticated with a managed identity instead of a stored credential, every read was logged to Azure Monitor, and the next rotation was one az keyvault secret set followed by a config refresh. The contractor’s access evaporated the moment their identity was removed. That is the entire value proposition, and this article is how you get there without the three or four mistakes that turn Key Vault from a safety net into a 3am outage.

Key Vault holds three kinds of object — secrets (arbitrary strings: connection strings, API keys, passwords), keys (cryptographic keys used for encryption, signing and wrapping, optionally HSM-backed), and certificates (X.509 certs with a managed lifecycle and auto-renewal) — behind two distinct authorization surfaces (a control plane that manages the vault itself and a data plane that reads the objects inside it), reachable either over the public endpoint or locked behind a Private Endpoint. Get the mental model of those layers right and Key Vault is boringly reliable. Get it wrong — a managed identity that was never enabled, a data-plane role you forgot to assign, a firewall that blocks your own app, a certificate nobody wired for rotation — and you get the failure modes this article enumerates exhaustively, each with the exact az command or portal blade that confirms it and the precise fix.

By the end you will treat secrets, keys and certificates as governed assets rather than files. You will know when to use RBAC over the legacy access-policy model, why soft-delete and purge protection are non-negotiable, how Key Vault references let App Service and Functions pull secrets with zero credentials in config, when a workload needs an HSM (and whether Standard, Premium, or Managed HSM is the right home), and how to make certificates renew themselves so a 2am TLS expiry never happens again. Because this is a reference you will return to mid-incident, the options, limits, error codes, roles and tiers are all laid out as scannable tables — read the prose once, then keep the tables open.

What problem this solves

Applications need secrets, keys and certificates to function, but the places teams instinctively put them are all liabilities. A connection string in appsettings.json is in source control and on every laptop that cloned the repo. An API key in an environment variable is visible to anyone with the portal or a shell on the box. A .pfx certificate on disk is a file that can be copied, has no rotation story, and silently expires. None of these can answer “who accessed this and when,” none can be rotated without touching every consumer, and all of them widen the blast radius of a single leak to your entire estate.

What breaks without Key Vault is not abstract. A leaked credential in a public Git history is among the most common breach vectors there is — and once it is in history, rotating is the only remedy, because the old value is permanent. Hard-coded secrets mean rotation is a coordinated, error-prone deployment instead of a config change, so teams simply don’t rotate, and a five-year-old database password is “fine until it isn’t.” Certificates that live on disk expire without warning and take production TLS down at the worst possible moment. And without a central audit trail, a security review cannot prove who touched what, which fails most compliance regimes outright.

Who hits this: essentially every team running anything on Azure. It bites hardest where secrets multiply — microservice estates with dozens of connection strings, apps with third-party API keys, anything terminating TLS on a custom domain, and any workload under a compliance regime (PCI-DSS, HIPAA, ISO 27001, SOC 2) that mandates key custody, rotation, and access logging. Key Vault is the Azure-native answer to all of it, and the cost of getting it slightly wrong is exactly the kind of failure that pages you. The whole field, framed before the deep dive:

Pain in production	What it looks like	Root liability	What Key Vault changes
Secret in config / source control	Password in `appsettings.json`, in Git history	Plaintext, copyable, permanent in history	Secret lives in the vault; config holds a reference, not the value
No idea who saw a credential	Contractor leaves, nobody can audit access	No access log	Every read is a logged, identity-attributed event
Rotation is a deployment	Changing a DB password touches 12 apps	Value duplicated everywhere	Rotate once in the vault; consumers re-read
Certificate expired at 2am	TLS down, frantic manual renewal	Cert on disk, no lifecycle	Managed cert with auto-renewal + expiry events
Encryption key on the app box	Key file alongside the data it protects	Key and data co-located	Key in the vault (or HSM); app calls wrap/unwrap
Compliance audit fails	Cannot prove key custody / rotation	No central control or trail	Centralized custody, RBAC, soft-delete, audit logs

Learning objectives

By the end of this article you can:

Distinguish secrets, keys and certificates precisely — what each is for, its size and shape limits, and which one a given asset belongs in.
Separate the control plane (managing the vault) from the data plane (reading objects inside it), and pick the right authorization model — Azure RBAC versus the legacy access-policy model — for each.
Wire an app to read secrets with zero credentials using a managed identity and Key Vault references, and explain exactly why a missing identity or unassigned role crash-loops the app.
Enable and reason about soft-delete and purge protection, recover a deleted vault or object, and explain why these are mandatory and irreversible.
Lock a vault down with the firewall and Private Endpoint, keep traffic off the public internet, and avoid blocking your own callers.
Choose between Standard, Premium (HSM-backed keys), and Managed HSM, and know when FIPS 140-2 Level 2/3 custody actually matters.
Configure certificate issuance and auto-rotation (integrated CA and self-signed), and set up secret/key rotation with rotation policies and Event Grid.
Read the throttling and 403 reference, diagnose a Key Vault failure to a specific cause, and fix it with the exact az/portal path.

Prerequisites & where this fits

You should be comfortable with the Azure Resource Manager model — subscriptions, resource groups, and that everything is a resource with an ID (the Azure Resource Hierarchy Explained covers this). You should understand Microsoft Entra ID (formerly Azure AD) at the level of “identities get tokens and tokens are checked against permissions,” and ideally have met managed identities before. Running az in Cloud Shell, reading JSON output, and basic TLS/certificate concepts (a cert has a private key, a chain, and an expiry) will all help. Nothing here requires cryptography expertise — Key Vault’s job is to make you not need it.

This sits at the heart of the Security & Identity track and is upstream of almost everything else. Apps pull secrets from it (Azure Functions and Serverless Patterns and App Service both use Key Vault references), gateways pull TLS certs from it (Application Gateway v2 WAF: End-to-End TLS, mTLS, and Custom Rule Tuning), container registries and storage use customer-managed keys housed in it, and App Configuration references it for secret-typed settings (Azure App Configuration in Production). When you lock it behind a Private Endpoint you are applying the same pattern as Azure Private Link and Private DNS: Keeping PaaS Off the Public Internet. A quick map of who owns what during an incident, so you call the right person:

Layer	What lives here	Who usually owns it	Failure classes it can cause
Caller identity	Managed identity, app registration	App / dev team	Empty KV reference, app crash-loop
Entra ID	Token issuance, RBAC assignments	Identity team	Token denied; no role assigned (403)
Vault control plane	SKU, firewall, soft-delete, RBAC mode	Platform / security	Misconfigured network, wrong auth model
Vault data plane	Secrets/keys/certs read/write	App + security	403 on get; throttling (429)
Network path	Private Endpoint, DNS, firewall ACLs	Network team	ForbiddenByFirewall; DNS resolves public
Backing CA / HSM	Certificate issuer, HSM key custody	Security / PKI	Cert won’t issue; key not exportable

Core concepts

Five mental models make every later decision obvious.

A vault is a boundary, not a database. A Key Vault is a named, regional resource (https://<name>.vault.azure.net) that holds three object types and enforces who can do what to them. It is a security and governance boundary first — you separate vaults by environment and sensitivity, not by convenience. The vault name is globally unique because it becomes a public DNS name, even when you later restrict it to a Private Endpoint.

Control plane and data plane are different doors with different keys. The control plane (Azure Resource Manager) governs the vault as a resource: create/delete it, set its firewall, change its SKU, configure soft-delete, assign data-plane roles. You authorize it with Azure RBAC roles like Key Vault Contributor, scoped at subscription/RG/vault. The data plane governs the objects inside: get a secret, sign with a key, import a cert. You authorize it either with Azure RBAC data-action roles (e.g. Key Vault Secrets User) or with the legacy per-vault access-policy list — and you pick exactly one model per vault. The single most common Key Vault mistake is confusing these: a Key Vault Contributor can manage the vault but cannot read a secret unless they also hold a data-plane role. Management access is not data access.

Identity is the currency; managed identity is the way you pay. Every data-plane call must present a valid Entra ID token proving an identity, which the vault checks against its authorization model. For apps, the right identity is a managed identity — an Entra identity Azure manages for the resource, with no secret you store anywhere. The app asks the platform for a token, the platform returns one, the app calls the vault. This is the whole point: the credential to access your secrets is itself not a stored secret. No managed identity means no token means the call fails — which is exactly why a forgotten identity makes an app crash-loop with empty secret values.

Soft-delete and purge protection make deletion survivable. Soft-delete (mandatory and always on for new vaults) means a deleted vault or object enters a recoverable state for a retention period (7–90 days, default 90) instead of vanishing. Purge protection (optional but recommended, and irreversible once enabled) means that during the retention window, nobody — not even an owner, not even an attacker with full rights — can permanently purge the resource early. Together they defend against accidental delete and malicious “delete everything” attacks. The cost is that a soft-deleted vault name is reserved until it’s purged or recovered, which trips up redeployments.

Versions are immutable; rotation creates a new version. Every secret, key and certificate is versioned. Updating a secret doesn’t overwrite — it adds a new version and marks it current; old versions remain (until you disable/delete them). You can reference a specific version (pinned) or the current version (auto-following). This is what makes rotation safe: you create version 2, consumers that reference “current” pick it up, and version 1 is still there if you need to roll back. Reference the unversioned URI to follow rotation; reference the versioned URI to pin.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the model side by side:

Concept	One-line definition	Where it lives	Why it matters
Vault	A regional container for secrets/keys/certs	Subscription / resource group	The security boundary; one per env/sensitivity
Secret	An arbitrary string value (≤25 KB)	Inside a vault	Connection strings, passwords, API keys
Key	A cryptographic key (RSA/EC), optionally HSM	Inside a vault	Encrypt/decrypt, sign/verify, wrap/unwrap
Certificate	An X.509 cert with managed lifecycle	Inside a vault (key + secret pair)	TLS, mTLS, code signing; auto-renewal
Control plane	Managing the vault resource	Azure Resource Manager	Create/delete/firewall/SKU; RBAC-governed
Data plane	Reading/writing objects inside	`*.vault.azure.net`	Get/set/sign; RBAC or access policy
Access policy	Legacy per-vault permission list	On the vault	One of two auth models (the old one)
Azure RBAC (data)	Role-based data-plane access	Entra + scope	The recommended auth model
Managed identity	Secret-free Entra identity for a resource	On the app/VM/etc.	How apps authenticate with no stored creds
KV reference	`@Microsoft.KeyVault(...)` in a setting	App setting / App Config	App pulls a secret with zero creds in config
Soft-delete	Recoverable deletion window	Vault property	7–90 day grace; mandatory now
Purge protection	Block early permanent deletion	Vault property	Irreversible; defends against malicious purge
HSM	Hardware Security Module key custody	Premium / Managed HSM	Keys never leave certified hardware
Rotation policy	Auto-renew schedule for a secret/cert	On the object	Hands-off rotation; expiry events

Secrets, keys and certificates — choosing the right object

The first decision on every asset is which object type it belongs in. Putting a TLS certificate in as a raw secret, or storing a password as a “key,” works just badly enough to cause pain later. Here is the definitive comparison:

Dimension	Secret	Key	Certificate
What it is	Arbitrary string/bytes	Cryptographic key (RSA, EC)	X.509 cert + its key
Typical use	Connection strings, passwords, API keys	Encrypt/decrypt, sign/verify, wrap/unwrap (CMK)	TLS/mTLS, code signing
You can read the value?	Yes — `get` returns the string	No — key material never leaves; you call operations	Public cert yes; private key only if exportable
Size / shape limit	≤ 25 KB value	RSA 2048/3072/4096; EC P-256/384/521	Bound by underlying key + secret limits
HSM-backed option	No	Yes (Premium / Managed HSM)	Via its key (Premium)
Versioned	Yes	Yes	Yes
Auto-rotation	Rotation policy (preview/GA varies)	Manual or scripted	Yes — integrated CA auto-renew
Backing storage	Single object	Single object	Stored as a key + a secret under the hood
Cert exposed as 3 objects	n/a	n/a	Certificate, Key, and Secret (the PFX/PEM) entries

Three reading notes that prevent the most common modelling mistakes:

If you have…	Put it in as a…	Not a…	Because
A database connection string	Secret	Key	It’s a string you read back; keys don’t return material
An RSA key to encrypt blobs (CMK)	Key	Secret	You want sign/wrap operations, not the raw bytes
A TLS cert for a custom domain	Certificate	Secret (raw PFX)	The certificate object gives lifecycle + auto-renew
A symmetric password/passphrase	Secret	Key	Key Vault keys are asymmetric (RSA/EC); symmetric → secret or Managed HSM
An SSH private key	Secret	Key	It’s opaque bytes you retrieve, not a KV crypto key

Secrets in depth

A secret is a versioned name→value pair where the value is any string up to 25 KB, plus optional attributes: enabled (a disabled secret can’t be read), activation date (nbf — not usable before), expiry date (exp — not usable after), content-type tag, and arbitrary metadata tags. Crucially, Key Vault does not enforce expiry by refusing to serve an expired secret in the way you might expect — it returns it but the exp attribute is advisory; you enforce it via rotation and monitoring. Set one and read it back:

# Create/update a secret (this becomes a new version, marked current)
az keyvault secret set --vault-name kv-shop-prod --name DbConnString \
  --value "Server=tcp:sql-shop.database.windows.net;Database=orders;..." \
  --expires "2026-12-31T00:00:00Z" --content-type "text/plain"

# Read the current version (the value comes back in plaintext to an authorized caller)
az keyvault secret show --vault-name kv-shop-prod --name DbConnString --query value -o tsv

In Bicep you generally create the vault declaratively and set secret values out-of-band (you don’t want plaintext secrets in templates), but you can declare a secret resource whose value comes from a secure parameter:

@secure()
param dbConnString string

resource kv 'Microsoft.KeyVault/vaults@2023-07-01' existing = { name: 'kv-shop-prod' }

resource secret 'Microsoft.KeyVault/vaults/secrets@2023-07-01' = {
  parent: kv
  name: 'DbConnString'
  properties: {
    value: dbConnString          // pass via secure pipeline variable, never literal
    contentType: 'text/plain'
    attributes: { enabled: true, exp: 1798675200 } // unix epoch
  }
}

The full secret attribute set and how to reason about each:

Attribute	What it does	Default	When to set it	Gotcha
`enabled`	Whether the secret can be read	`true`	Disable to revoke without deleting	A disabled current version → consumers fail
`exp` (expires)	Advisory expiry timestamp	none	Force a rotation deadline	KV still returns it; you must monitor/rotate
`nbf` (not-before)	Not usable before this time	none	Stage a future value	Reads before `nbf` fail
`contentType`	Free-text hint (e.g. mime)	none	Label PFX vs text vs JSON	Purely informational
Tags	Key/value metadata	none	Ownership, env, rotation owner	Tags are not secret — no values in them
`recoveryLevel`	Soft-delete/purge posture (read-only)	inherits vault	—	Reflects vault soft-delete + purge settings
Value size	The string itself	n/a	—	Hard cap 25 KB; larger → use Blob + CMK

Keys in depth

A key is cryptographic material you never see. You don’t get the bytes; you ask the vault to perform an operation with it — encrypt/decrypt, wrap/unwrap (key-wrapping for envelope encryption), sign/verify. This is the model behind customer-managed keys (CMK) for Storage, SQL TDE, Disk Encryption and Container Registry: the service holds your data, your key stays in Key Vault, and the service calls wrap/unwrap. Keys come in RSA (2048/3072/4096) and EC (P-256/P-384/P-521, and the secp256k1 variant), each optionally HSM-backed (the -HSM key types) on Premium or Managed HSM.

# Create an RSA 3072 key, software-protected (Standard) — add --protection hsm for Premium
az keyvault key create --vault-name kv-shop-prod --name cmk-storage \
  --kty RSA --size 3072 --ops wrapKey unwrapKey

# Use it to wrap (encrypt) a small payload — the bytes never leave the vault unencrypted
az keyvault key encrypt --vault-name kv-shop-prod --name cmk-storage \
  --algorithm RSA-OAEP-256 --value "$(echo -n 'data-key' | base64)" --data-type base64

The key option matrix — type, size, protection, allowed operations:

Setting	Values	Default	When to change	Trade-off / limit
Key type (`kty`)	RSA, RSA-HSM, EC, EC-HSM, oct-HSM	RSA	EC for smaller/faster sigs; HSM for custody	oct (symmetric) only on Managed HSM
RSA size	2048, 3072, 4096	2048	3072+ for stronger/longer-lived keys	Larger = slower ops
EC curve	P-256, P-384, P-521, P-256K	P-256	P-384/521 for higher assurance	secp256k1 niche (blockchain)
Protection	software, HSM	software (Standard)	HSM for FIPS / compliance	HSM keys can’t be exported in cleartext
Operations (`ops`)	encrypt, decrypt, sign, verify, wrap, unwrap	all	Least privilege per key	Granting all when you need wrap only
Exportable	true/false (release policy)	false	Only with secure-key-release + attestation	Most keys must be non-exportable
Rotation policy	auto/manual	manual	Schedule key rotation	New version; CMK consumers must follow

The cryptographic operations a key supports, and what each is for:

Operation	What it does	Typical caller	Algorithm examples
`encrypt` / `decrypt`	Protect small payloads directly	App doing envelope encryption	RSA-OAEP-256
`wrap` / `unwrap`	Wrap a data-encryption key (CMK)	Storage / SQL TDE / Disk	RSA-OAEP-256, AES-KW (Managed HSM)
`sign` / `verify`	Produce/check a digital signature	Token/code/document signing	RS256, PS256, ES256
`getKey` (public part)	Read the public key only	Verifiers, JWKS publishers	Public material only; private never leaves
(import)	Bring an existing key in	Migration / BYOK	RSA/EC, optionally `--byok` HSM

Certificates in depth

A certificate is the richest object: it bundles an X.509 cert, its private key (stored as a Key Vault key), and the exportable form (stored as a Key Vault secret — the PFX/PEM). That is why a single certificate shows up as three addressable objects: a certificate, a key, and a secret with the same name. Key Vault manages the lifecycle: issuance from an integrated CA (DigiCert, GlobalSign) or a self-signed/internal CA policy, and automatic renewal before expiry. This is the feature that makes “certificate expired at 2am” a solved problem.

# Create a self-signed cert with a policy (real workloads point issuerName at an integrated CA)
az keyvault certificate create --vault-name kv-shop-prod --name tls-shop \
  --policy "$(az keyvault certificate get-default-policy)"

# Inspect renewal/lifecycle and the three backing objects' URIs
az keyvault certificate show --vault-name kv-shop-prod --name tls-shop \
  --query "{sub:policy.x509CertificateProperties.subject, sid:sid, kid:kid}" -o json

The certificate policy controls issuance and renewal — the settings that matter:

Policy setting	What it controls	Typical value	When to change	Gotcha
`issuerName`	Who signs the cert	`Self`, `DigiCert`, `GlobalSign`	Public TLS → integrated CA	`Self` certs aren’t publicly trusted
Subject / SANs	CN and Subject Alternative Names	`CN=shop.example.com` + SANs	Multi-domain certs	Missing SAN → browser errors
Key type/size	Backing key	RSA 2048/3072, EC P-256	Stronger key or EC	Must match what your endpoint accepts
Validity (months)	Cert lifetime	12 (public CAs cap ~13 months)	Shorter for higher rotation	CA may override to its max
`exportable`	Whether the PFX can be exported	true (software), false (HSM)	Non-exportable for HSM custody	Non-exportable → App Service can’t import PFX
Auto-renewal (`renewBeforeExpiry`/lifetime action)	Renew N days/% before expiry	30 days / 80% lifetime	Always set for managed certs	Self-signed renews; integrated CA needs CA wired
Renewal type	`AutoRenew` vs `EmailContacts`	AutoRenew	Hands-off vs notify-only	EmailContacts only warns; doesn’t renew

How a certificate maps to its three backing objects (the source of much confusion):

Object exposed	URI form	Contains	Use it for
Certificate	`/certificates/<name>`	Public cert + policy + metadata	Lifecycle, thumbprint, renewal status
Key	`/keys/<name>`	The private key (operations only)	Sign/decrypt without exporting the key
Secret	`/secrets/<name>`	The full PFX/PEM (if exportable)	Importing into App Service / App Gateway

The two authorization models — RBAC vs access policies

This is where most teams either get it right and never think about it again, or get it wrong and fight 403s for a week. Every vault uses exactly one data-plane authorization model: modern Azure RBAC or legacy access policies. You set it at vault creation with enableRbacAuthorization and changing it later is disruptive.

Access policies (the original model) are a per-vault list: “this principal may do these operations on secrets, these on keys, these on certs.” They are flat (no inheritance), capped at 1024 entries per vault, not visible to Azure RBAC tooling, and grant operation permissions (get/list/set/delete) per object type. Azure RBAC instead uses standard role assignments — built-in roles like Key Vault Secrets User assigned at management-group/subscription/RG/vault/object scope — giving you inheritance, central governance through az role assignment, PIM/just-in-time eligibility, and a single consistent model across Azure. For anything new, use RBAC.

Dimension	Azure RBAC (recommended)	Access policies (legacy)
Granularity	Built-in/custom roles, down to individual object scope	Per-object-type operation flags
Inheritance	Yes — MG → sub → RG → vault → object	No — flat list on the vault
Scale limit	Azure RBAC limits (very high)	1024 access policy entries / vault
Central management	`az role assignment`, Policy, PIM	Per-vault, bespoke
Just-in-time (PIM)	Yes (eligible assignments)	No
Separation of duties	Control vs data roles are distinct	Mixed in one place
Visibility	Standard “Access control (IAM)”	Separate “Access policies” blade
Default for new vaults	Increasingly the recommended default	Still the portal default in places

The data-plane RBAC roles you actually use — assign the narrowest that fits:

Role	Grants (data plane)	Give it to	Don’t give it to
Key Vault Secrets User	Read secret values	App managed identities	Humans who only need to manage the vault
Key Vault Secrets Officer	Full secret CRUD	Secret administrators / pipelines	Read-only apps
Key Vault Crypto User	Use keys (encrypt/sign/wrap)	Services doing crypto ops (CMK)	Apps that only read secrets
Key Vault Crypto Officer	Full key CRUD	Key administrators	App identities
Key Vault Certificates Officer	Full certificate CRUD	Cert administrators / automation	Read-only consumers
Key Vault Reader	Read metadata (not values)	Auditors, dashboards	Anyone needing values
Key Vault Crypto Service Encryption User	Wrap/unwrap for service CMK	Storage/SQL/etc. service principal	Interactive users

Control-plane roles — note they grant nothing on the data inside:

Control-plane role	Grants	Critical caveat
Key Vault Contributor	Manage the vault (firewall, SKU, policies)	Cannot read secrets — needs a data role too
Owner / Contributor (subscription)	Everything at control plane	Same caveat: not automatically a data reader
Reader	View the vault resource	No data-plane access at all

Assign a data-plane role to an app’s managed identity — the canonical pattern:

# Get the app's managed identity principal, then grant Secrets User at the vault scope
PRINCIPAL=$(az webapp identity show -n app-shop-prod -g rg-shop-prod --query principalId -o tsv)
VAULT_ID=$(az keyvault show -n kv-shop-prod -g rg-shop-prod --query id -o tsv)
az role assignment create --assignee "$PRINCIPAL" \
  --role "Key Vault Secrets User" --scope "$VAULT_ID"

// Vault in RBAC mode + a Secrets User assignment for an app's identity
resource kv 'Microsoft.KeyVault/vaults@2023-07-01' = {
  name: 'kv-shop-prod'
  location: location
  properties: {
    sku: { family: 'A', name: 'standard' }
    tenantId: subscription().tenantId
    enableRbacAuthorization: true        // RBAC model, not access policies
    enableSoftDelete: true
    softDeleteRetentionInDays: 90
    enablePurgeProtection: true
    publicNetworkAccess: 'Disabled'      // pair with a Private Endpoint
  }
}

resource secretsUser 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(kv.id, appPrincipalId, 'Key Vault Secrets User')
  scope: kv
  properties: {
    // 4633e6cd-... is the role definition ID for Key Vault Secrets User
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '4633e6cd-...')
    principalId: appPrincipalId
    principalType: 'ServicePrincipal'
  }
}

The legacy access-policy equivalent, for the vaults you inherit that still use it:

# Only works on a vault with enableRbacAuthorization = false
az keyvault set-policy --name kv-legacy --object-id "$PRINCIPAL" \
  --secret-permissions get list

When to pick which model — the decision table:

If…	Use	Why
New vault, modern estate	Azure RBAC	Central governance, inheritance, PIM
You need just-in-time elevation	Azure RBAC	Access policies have no PIM
You need >1024 distinct grantees	Azure RBAC	Access policies cap at 1024
You’re maintaining a vault already on access policies	Keep access policies (or plan a migration window)	Switching models is disruptive mid-flight
You want per-secret (object-level) scope	Azure RBAC	Assign roles at the individual object scope

Managed identity and Key Vault references — secrets with zero credentials

The payoff of all this is that an app reads its secrets without storing any credential at all. Two pieces make it work: a managed identity on the app (so it can get an Entra token), and a Key Vault reference in a setting (so the value is pulled from the vault at runtime rather than stored in config).

A Key Vault reference is a special app-setting (App Service/Functions) or App Configuration value of the form @Microsoft.KeyVault(SecretUri=https://kv-shop-prod.vault.azure.net/secrets/DbConnString/). At startup (and on a refresh interval) the platform resolves it using the app’s managed identity and injects the resolved value as the environment variable your code reads. Your code sees a normal connection string; the value never sits in config.

# 1) Give the app a system-assigned managed identity
az webapp identity assign -n app-shop-prod -g rg-shop-prod

# 2) Grant that identity read access to secrets (RBAC)
PRINCIPAL=$(az webapp identity show -n app-shop-prod -g rg-shop-prod --query principalId -o tsv)
az role assignment create --assignee "$PRINCIPAL" --role "Key Vault Secrets User" \
  --scope "$(az keyvault show -n kv-shop-prod -g rg-shop-prod --query id -o tsv)"

# 3) Point an app setting at the secret via a KV reference
az webapp config appsettings set -n app-shop-prod -g rg-shop-prod --settings \
  "DbConnString=@Microsoft.KeyVault(SecretUri=https://kv-shop-prod.vault.azure.net/secrets/DbConnString/)"

resource site 'Microsoft.Web/sites@2023-12-01' = {
  name: 'app-shop-prod'
  location: location
  identity: { type: 'SystemAssigned' }   // the identity that resolves the reference
  properties: {
    serverFarmId: plan.id
    siteConfig: {
      appSettings: [
        {
          name: 'DbConnString'
          // unversioned URI → follows rotation automatically
          value: '@Microsoft.KeyVault(SecretUri=https://kv-shop-prod${environment().suffixes.keyvaultDns}/secrets/DbConnString/)'
        }
      ]
    }
  }
}

The two reference URI styles and their behaviour:

Reference form	Example tail	Behaviour	Use when
Unversioned	`/secrets/DbConnString/`	Follows the current version (picks up rotation)	You want rotation to flow without redeploy
Versioned	`/secrets/DbConnString/<ver>`	Pinned to that exact version	You need a deterministic, audited value

Why a Key Vault reference fails — the exact prerequisites, each a failure mode if missing:

Prerequisite	If missing…	Confirm	Fix
Managed identity enabled	Reference resolves to empty; app crash-loops	`az webapp identity show`	`az webapp identity assign`
Data-plane role assigned	403 on resolve; value empty	`az role assignment list --assignee <principal> --scope <vaultId>`	Assign Key Vault Secrets User
Vault firewall allows the app	ForbiddenByFirewall; empty value	`az keyvault show --query properties.networkAcls`	Allow trusted services / Private Endpoint
Secret exists & enabled	Reference resolves to nothing	`az keyvault secret show ...`	Create/enable the secret; fix the URI
Correct `SecretUri`	Silent failure / wrong value	Compare URI to `az keyvault secret show --query id`	Fix host/object/version in the URI
App Configuration reference (if via App Config)	Setting unresolved	App Config “Key Vault reference” status	Grant App Config’s identity Secrets User too

A subtle one: when references are cached, a rotation may not be picked up until the app restarts or the platform refresh fires. The status of every reference is visible in the portal Environment variables blade (each shows resolved/error), which is the first place to look when a secret-backed setting “isn’t taking.”

Soft-delete, purge protection and recovery

These two features turn deletion from a catastrophe into an inconvenience — and one of them is irreversible, so understand it before you flip it.

Soft-delete is now always on for vaults (you cannot disable it on new vaults). When you delete a vault or an object, it is retained in a deleted but recoverable state for the retention period (configurable 7–90 days; default 90). During that window you can recover it. After the window — or if someone purges it deliberately — it’s gone. Purge protection closes the deliberate-purge hole: when enabled, no one can purge the vault or its objects before the retention period elapses, not even with full permissions. This is the control that defeats “attacker with Owner deletes and purges everything.” The catch: purge protection is irreversible — once on, you cannot turn it off for the life of the vault, and the retention period becomes a hard floor.

# Inspect the deletion posture
az keyvault show -n kv-shop-prod -g rg-shop-prod \
  --query "{softDelete:properties.enableSoftDelete, retention:properties.softDeleteRetentionInDays, purge:properties.enablePurgeProtection}" -o json

# Recover a soft-deleted secret (within the retention window)
az keyvault secret recover --vault-name kv-shop-prod --name DbConnString

# List and recover a soft-deleted *vault*
az keyvault list-deleted --query "[].{name:name, scheduledPurge:properties.scheduledPurgeDate}" -o table
az keyvault recover --name kv-shop-prod

The two settings, their effects, and the trade-offs:

Setting	Values	Default (new vaults)	Effect	Irreversible?
Soft-delete	on (forced)	on	Deleted objects recoverable for the retention window	n/a (always on)
Retention period	7–90 days	90	How long recovery is possible	Can’t shorten below current with purge protection on
Purge protection	on / off	recommended on	Blocks early permanent purge by anyone	Yes — cannot be disabled once on

Recovery scenarios and the exact operation:

You deleted…	State	Recover with	Caveat
A secret/key/cert	Soft-deleted	`az keyvault secret recover` (etc.)	Within retention; needs recover permission
The whole vault	Soft-deleted	`az keyvault recover -n <name>`	Name reserved until recovered/purged
And purged it (no purge protection)	Gone	—	Unrecoverable; this is what PP prevents
And purge protection was on	Cannot purge early	Wait out retention or recover	The “delete everything” attack fails here

The redeployment gotcha worth its own table — soft-delete reserves the name:

Symptom	Cause	Confirm	Fix
`VaultAlreadyExists` on create, but you don’t see it	A same-named vault is soft-deleted, holding the name	`az keyvault list-deleted`	Recover it, or purge it (if PP off and retention permits), or pick a new name
Bicep/Terraform deploy fails recreating a vault	Prior delete left a soft-deleted vault	Same as above	Use `az keyvault recover` then let IaC adopt it

Network isolation — the firewall and Private Endpoint

By default a vault is reachable on its public endpoint (still requiring auth). For sensitive data that’s not enough — you want the vault unreachable from the internet at all. Two layers do this: the vault firewall (IP/VNet allow-lists with a default-deny) and, the strong form, a Private Endpoint that gives the vault a private IP inside your VNet and removes the public path entirely (the same model as Azure Private Endpoint vs Service Endpoint: Secure PaaS Access).

The trap is locking the vault down and then blocking your own callers — including App Service Key Vault references and Azure services that need access. Two escape hatches matter: Allow trusted Microsoft services (lets certain first-party services through the firewall) and correct Private DNS so the vault’s hostname resolves to the private IP for your callers.

# Default-deny, then allow a specific VNet subnet and trusted services
az keyvault update -n kv-shop-prod -g rg-shop-prod \
  --default-action Deny --bypass AzureServices

az keyvault network-rule add -n kv-shop-prod -g rg-shop-prod \
  --vnet-name vnet-shop --subnet snet-app

# The strong form: disable public access and add a Private Endpoint
az keyvault update -n kv-shop-prod -g rg-shop-prod --public-network-access Disabled
az network private-endpoint create -n pe-kv-shop -g rg-shop-prod \
  --vnet-name vnet-shop --subnet snet-pe \
  --private-connection-resource-id "$(az keyvault show -n kv-shop-prod -g rg-shop-prod --query id -o tsv)" \
  --group-id vault --connection-name kv-conn

resource kv 'Microsoft.KeyVault/vaults@2023-07-01' = {
  name: 'kv-shop-prod'
  location: location
  properties: {
    sku: { family: 'A', name: 'standard' }
    tenantId: subscription().tenantId
    enableRbacAuthorization: true
    publicNetworkAccess: 'Disabled'
    networkAcls: {
      defaultAction: 'Deny'
      bypass: 'AzureServices'           // let trusted first-party services through
      virtualNetworkRules: [ { id: appSubnetId } ]
      ipRules: []
    }
  }
}

The network controls, what each does, and the failure it causes when wrong:

Control	Setting	Effect	Failure if misconfigured
Default action	`Deny` / `Allow`	Default-deny is the secure posture	`Deny` with no rules → you lock yourself out
IP rules	CIDR allow-list	Permit specific public IPs	Office IP changes → 403 ForbiddenByFirewall
VNet rules (service endpoint)	Subnet allow-list	Permit a subnet	Wrong subnet → caller blocked
Bypass	`AzureServices` / `None`	Let trusted services through	`None` → App Service KV refs may break
Public network access	`Enabled` / `Disabled`	Remove the public path entirely	`Disabled` without PE/DNS → nothing can reach it
Private Endpoint	+ Private DNS zone	Private IP, internet path gone	Missing DNS → hostname resolves public → blocked

Decision table — how locked-down should this vault be?

Workload	Recommended network posture	Why
Dev/sandbox vault	Public + firewall (your IPs) or trusted services	Convenience; low sensitivity
Standard production app	Private Endpoint + public disabled	Secrets off the internet entirely
Regulated (PCI/HIPAA)	Private Endpoint + PP + RBAC + Managed HSM keys	Compliance mandates isolation + custody
Vault used by many Azure PaaS	Firewall + bypass AzureServices	First-party services need a path

HSM, Premium and Managed HSM — when hardware custody matters

For most secrets, the Standard SKU (software-protected keys) is correct and cheaper. You step up to hardware-backed key custody when compliance or risk demands that key material never exist in software. Three homes exist: Standard (software keys), Premium (a vault SKU adding HSM-protected keys on shared, FIPS 140-2 Level 2 validated HSMs), and Managed HSM (a dedicated, single-tenant pool of FIPS 140-2 Level 3 HSMs with its own RBAC and higher throughput). The decision is about assurance level and isolation, not features you can’t otherwise get.

Dimension	Standard	Premium	Managed HSM
Key protection	Software	HSM (shared)	HSM (dedicated, single-tenant)
FIPS 140-2 level	n/a (software)	Level 2	Level 3
Tenancy	Multi-tenant	Multi-tenant	Single-tenant pool
Secrets & certs	Yes	Yes	Keys-focused (no secrets/certs object types)
Throughput	Standard vault limits	Standard vault limits	Much higher, dedicated
Cost model	Per-operation, low	Per-operation + HSM key surcharge	Fixed hourly per HSM pool (significant)
RBAC	Azure RBAC / access policy	Azure RBAC / access policy	Local HSM RBAC + Azure RBAC
Use when	Most secrets/keys	HSM keys, modest scale	Strict compliance, high crypto throughput, BYOK

When to choose each — the decision table:

Requirement	Choose	Why
Connection strings, API keys, TLS certs	Standard	Software protection is fine and cheap
CMK with FIPS 140-2 Level 2	Premium	HSM-backed keys without dedicated-pool cost
FIPS 140-2 Level 3 / single-tenant custody	Managed HSM	Dedicated HSMs, strongest assurance
Very high crypto ops/sec	Managed HSM	Dedicated throughput, not shared limits
BYOK / strict key-ceremony import	Managed HSM (or Premium)	Secure key import / HSM-to-HSM
Tight budget, no compliance mandate	Standard	Avoid the HSM surcharge entirely

A common misread: you do not need Premium just to store secrets securely — Standard already encrypts everything at rest. Premium/Managed HSM is specifically about where the key material lives and what it’s certified to. Pay for it when an auditor asks “is this key in a FIPS-validated HSM,” not before.

Rotation — secrets, keys and certificate auto-renewal

Rotation is the feature that justifies the whole exercise, and the one most teams under-implement. There are three flavours, increasing in automation: manual (you set a new version on a schedule), policy-driven secret rotation (a rotation policy plus an Event Grid + Function that updates the backing service too), and certificate auto-renewal (the vault renews the cert itself before expiry).

Certificates are the easy win: set a lifetime action and the vault renews automatically — self-signed certs renew outright; integrated-CA certs renew through the wired CA. Secrets are harder because rotating a database password means also changing it in the database — Key Vault can store a new version, but something must update the backing system. The standard pattern: a rotation policy on the secret raises a NearExpiry event via Event Grid, which triggers a Function that rotates the credential in the backing service and writes the new value back as a new secret version. Consumers using the unversioned reference pick it up.

# Certificate auto-renewal: renew when 30 days remain (lifetime action on the policy)
az keyvault certificate create --vault-name kv-shop-prod --name tls-shop --policy '{
  "issuerParameters": {"name": "Self"},
  "x509CertificateProperties": {"subject": "CN=shop.example.com", "validityInMonths": 12},
  "lifetimeActions": [{"trigger": {"daysBeforeExpiry": 30}, "action": {"actionType": "AutoRenew"}}],
  "keyProperties": {"exportable": true, "keyType": "RSA", "keySize": 3072, "reuseKey": false}
}'

# Secret rotation policy: rotate 30 days before a 90-day expiry, and emit events
az keyvault secret set-attributes --vault-name kv-shop-prod --name DbPassword \
  --expires "$(date -u -d '+90 days' +%Y-%m-%dT%H:%M:%SZ)"

Wire the expiry event to automation via Event Grid:

# Subscribe a Function to Key Vault near-expiry / rotation events
az eventgrid event-subscription create --name kv-rotation \
  --source-resource-id "$(az keyvault show -n kv-shop-prod -g rg-shop-prod --query id -o tsv)" \
  --endpoint-type azurefunction \
  --endpoint "$(az functionapp function show -g rg-shop-prod -n fn-rotate --function-name Rotate --query id -o tsv)" \
  --included-event-types Microsoft.KeyVault.SecretNearExpiry Microsoft.KeyVault.CertificateNearExpiry

The rotation approaches compared:

Approach	Automation	Updates backing service?	Effort	Best for
Manual `secret set`	None	No (you do it)	Low setup, high ongoing toil	Rarely-rotated, low-risk secrets
Secret rotation policy + Event Grid + Function	High	Yes (your function)	Medium (build the function)	DB passwords, signing keys, API keys
Certificate auto-renewal (self-signed)	Full	n/a (cert renews itself)	Trivial (policy lifetime action)	Internal/self-signed TLS
Certificate auto-renewal (integrated CA)	Full	n/a	Medium (wire the CA issuer)	Public TLS on custom domains

The Key Vault events you can subscribe to (the rotation triggers):

Event type	Fires when	Typical handler action
`Microsoft.KeyVault.SecretNearExpiry`	A secret approaches `exp`	Rotate the credential, write new version
`Microsoft.KeyVault.SecretExpired`	A secret has expired	Alert / emergency rotate
`Microsoft.KeyVault.CertificateNearExpiry`	A cert approaches expiry	Renew (or verify auto-renew fired)
`Microsoft.KeyVault.CertificateNewVersionCreated`	A new cert version exists	Re-import to App Service / App Gateway
`Microsoft.KeyVault.KeyNearExpiry`	A key approaches expiry	Rotate CMK; notify consumers
`Microsoft.KeyVault.SecretNewVersionCreated`	A new secret version exists	Refresh caches / restart consumers

A reality check on consumers: a new version existing doesn’t mean every consumer is using it. App Service Key Vault references refresh on an interval or restart; App Gateway/Front Door need the cert re-imported (or, with managed-identity integration, re-synced). The CertificateNewVersionCreated event is your hook to push the new cert where it’s needed.

The throttling, limits and 403 reference

Key Vault is a shared, throttled service, and almost every production surprise is one of three things: a 403 (you’re not allowed, or the firewall blocked you), a 429 (you exceeded the transaction limit), or a missing object. Scan this first when something fails.

The error/status-code reference — the lookup table you keep open:

Code	Meaning	Likely cause	How to confirm	Fix
401 Unauthorized	No/invalid token	Identity not sending a valid Entra token	Caller has no managed identity / wrong audience	Enable identity; request `https://vault.azure.net` audience
403 Forbidden (AccessDenied)	Authenticated but not authorized	No data-plane role / access policy	`az role assignment list --scope <vaultId>`; access-policy blade	Assign Key Vault Secrets User (or policy)
403 ForbiddenByFirewall	Network ACL blocked the caller	Firewall default-deny, caller not allow-listed	`az keyvault show --query properties.networkAcls`	Allow IP/subnet; bypass AzureServices; Private Endpoint
403 ForbiddenByRbac	RBAC model, no role at this scope	Role missing or wrong scope	IAM blade on the vault/object	Assign role at the right scope
404 SecretNotFound	Object/version doesn’t exist	Wrong name, deleted, or wrong vault	`az keyvault secret show`; `list-deleted`	Fix name/URI; recover if soft-deleted
409 Conflict	Object in a conflicting state	Soft-deleted name reused; concurrent op	`az keyvault list-deleted`	Recover/purge; serialize operations
429 Too Many Requests	Transaction limit exceeded	Burst beyond the per-vault cap	`ServiceApiResult` metric; `Retry-After` header	Cache in-process; exponential backoff; split vaults
500/503 Service error	Transient backend issue	Rare platform blip	Retry with backoff; Service Health	Retry; if persistent, support
Disabled secret read	Returns failure	`enabled=false` on the version	`az keyvault secret show --query attributes.enabled`	Enable it or roll to a good version
Expired (`exp`) advisory	Value still returned	`exp` is advisory, not enforced on read	Check `attributes.exp`	Rotate; monitor expiry proactively

The transaction limits that drive throttling — real numbers (subscription-wide, per vault region, and subject to change, so always verify current docs):

Operation class	Approx. limit	Scope	Notes
Secret GET (and other “fast” transactions)	~25,000 / 10 s	Per vault	The cap you hit by not caching
HSM-key operations (RSA 2048+)	lower (hundreds–low-thousands / 10 s)	Per vault	HSM crypto is slower; budget accordingly
Certificate operations	lower than secret GETs	Per vault	Issuance/renewal are heavier
Managed HSM crypto ops	much higher than vault	Per HSM pool	Dedicated throughput is the point of MHSM
Backup/restore, full key ops	much lower	Per vault	Bulk ops can self-throttle

The three reading notes that save the most time:

Distinction	The trap	How to tell them apart
403 AccessDenied vs ForbiddenByFirewall	Both are “403” but fixes are opposite	The error body names it: AccessDenied = grant a role; ForbiddenByFirewall = network ACL
RBAC vault vs access-policy vault	Assigning a role on an access-policy vault does nothing	`enableRbacAuthorization` true → use roles; false → use `set-policy`
429 from your app vs from the platform	Looks like a Key Vault outage	Non-zero throttle metric + `Retry-After` → you’re over the cap; cache, don’t blame the service

Architecture at a glance

The diagram traces a secret read exactly as it happens on the wire, then maps the failure classes onto the hops where they bite. Read it left to right. On the far left, an App Service app holds a managed identity (badge 1 — if that identity is missing, no token is ever issued and the whole path is dead). The app asks the platform for a token; the request reaches Entra ID, which issues a short-lived Bearer JWT scoped to https://vault.azure.net. The app presents that token to the Key Vault data plane — but first it must clear two gates: the vault firewall (badge 2 — if public access is disabled and the caller isn’t on the Private Endpoint or an allow-listed network, it’s ForbiddenByFirewall) and the RBAC/access-policy check (badge 3 — the identity needs Key Vault Secrets User at the vault scope, or it’s AccessDenied). Only then does the data plane (badge 4 — and watch the ~25,000-GET-per-10s cap; bursts return 429) reach into the backing objects: the secret, the HSM-backed key, or the certificate.

The right edge shows the lifecycle that keeps it all current: Event Grid raises a NearExpiry event, a Function rotates the credential in its backing service and writes a new version back into the vault (badge 5 — if rotation was never wired, a cert simply expires and TLS goes down). Notice that every successful read converges on the same three facts you confirm during an incident: does the caller have an identity, can it pass the firewall, and does it hold a data-plane role? That ordering — identity, then network, then authorization, then throttle — is the whole diagnostic method. The first question on any Key Vault failure is “is this a 401 (no identity), a 403-firewall (network), a 403-RBAC (no role), or a 429 (throttle)?” — and the diagram tells you which hop owns each.

Real-world scenario

Medivault Health runs a patient-portal API on Azure App Service (Linux, .NET 8) on a P1v3 plan in Central India, with Azure SQL behind it, all under HIPAA-style controls. The platform team is five engineers; the original design stored the SQL connection string and a third-party lab-results API key as plaintext App Service settings, and terminated TLS with a .pfx an engineer renewed by hand each year. Monthly spend was about ₹42,000. Three separate incidents in one quarter forced a redesign, and the redesign was Key Vault, done properly.

The first incident was a near-miss audit finding: the plaintext connection string in app settings, visible to anyone with portal access, failed the access-control review outright. The auditor’s question — “prove who has read this credential and when” — had no answer. The team moved both the connection string and the lab API key into a vault, kv-medivault-prod in RBAC mode with purge protection on, and switched the app to system-assigned managed identity + Key Vault references. The connection string in app settings became @Microsoft.KeyVault(SecretUri=.../secrets/SqlConn/). Reads were now logged, attributable, and rotatable.

The rollout broke in a way that taught the core lesson. On first deploy, the app crash-looped — the SQL connection string resolved to empty. The reflex was to suspect the vault, the network, the secret. The actual cause: the app had a managed identity, and the identity had been created, but the Key Vault Secrets User role assignment had been applied at the resource-group scope on a vault that had been moved to a different RG, so the role didn’t apply at the vault’s actual scope. az role assignment list --assignee <principal> --scope <vaultId> returned nothing for the vault. Re-assigning the role at the vault scope fixed it instantly. The lesson on the wall: a Key Vault reference failing empty is almost always identity-or-role, not the secret.

The second incident was the certificate. The hand-renewed .pfx lapsed because the engineer who tracked it was on leave; the portal threw cert-expiry warnings nobody was watching, and the custom domain went to a browser TLS error for forty minutes during business hours. The fix was to move the certificate into the vault as a managed certificate with auto-renewal (AutoRenew 30 days before expiry), wired to an integrated CA, and to subscribe Event Grid CertificateNearExpiry and CertificateNewVersionCreated events to a Function that re-imported the new cert to the front-end and posted to the team’s Teams channel. The 2am-expiry class of incident was now structurally impossible.

The third was throttling. Under a reporting spike, the API — which read four secrets on every request with no caching — hit the per-vault GET cap and started getting 429s, which surfaced as request failures. Diagnose via the ServiceApiResult metric showed throttled transactions climbing exactly with load. The fix was not a bigger SKU: it was caching the resolved secrets in-process (refreshed every few minutes, plus on SecretNewVersionCreated) so a request did zero Key Vault calls in the hot path. Transaction volume fell ~98%, the 429s vanished, and the architecture was cheaper because Key Vault operations are billed per transaction. Final state: kv-medivault-prod with RBAC, purge protection, a Private Endpoint (public access disabled), managed-identity references, auto-renewing certs, event-driven rotation, and in-process caching. Spend was flat at ₹42,000; the audit passed; the pager went quiet. The incident timeline, because the order of moves is the lesson:

When	Symptom	Action taken	Effect	What it should have been
Q1 audit	Plaintext secret fails review	Move secrets to vault + MI references	Logged, attributable, rotatable	Never store plaintext in the first place
First deploy	App crash-loops, SqlConn empty	Suspect vault/network	Wasted an hour	Check identity + role at vault scope first
+20 min	Still empty	`az role assignment list --scope <vaultId>` = none	Root cause: role at wrong scope	Assign data roles at vault scope
Cert lapse	TLS error 40 min, business hours	Emergency manual renew	Outage over, root cause remains	Managed cert + auto-renew + events
+1 week	Cert hardened	Auto-renew + Event Grid → Function	2am-expiry class eliminated	Should have been day-one design
Reporting spike	429s, request failures	Suspect Key Vault outage	Misdirected	Read `ServiceApiResult`; it’s your throttle
+2 days	Throttling fixed	Cache secrets in-process	−98% transactions, 429s gone, cheaper	Never read secrets per-request uncached

Advantages and disadvantages

Centralizing secrets, keys and certificates in a managed, throttled, access-controlled service is overwhelmingly the right call — but it introduces a runtime dependency and a few sharp edges you must design around. Weigh it honestly:

Advantages (why this model helps)	Disadvantages (why it bites)
Secrets leave config and source control; an app holds a reference, not the value	Adds a runtime dependency — the vault must be reachable and authorized at startup
Every read is logged and identity-attributed — audits become answerable	Misconfigured identity/role makes the app crash-loop with empty values (looks like a random failure)
Managed identity means no stored credential to access your secrets	The firewall can block your own callers if you lock down without a path/DNS
Rotation becomes a single operation; certs auto-renew	Certificate consumers (App Gateway/Front Door) may need a re-import after renewal
Soft-delete + purge protection defend against accidental and malicious deletion	Purge protection is irreversible, and soft-delete reserves the name (redeploy gotcha)
HSM/Managed HSM offer FIPS-validated custody when compliance demands it	A shared, throttled service — uncached per-request reads hit 429 under load
RBAC gives central governance, inheritance, and PIM	Two auth models (RBAC vs access policies) confuse teams; control access ≠ data access
One vault per env/sensitivity cleanly scopes blast radius	Per-transaction billing means chatty access costs money as well as throttles

The model is right for essentially every workload that handles secrets — which is all of them. It bites hardest on teams who lock down a vault without testing their own callers’ path, who read secrets per-request without caching, who forget that Key Vault Contributor is not a data reader, and who never wire rotation and then get paged by an expiry. Every disadvantage is manageable — caching defeats throttling, vault-scope role assignment defeats the crash-loop, a Private Endpoint with DNS defeats the lockout — but only if you know they exist, which is the entire point of this article.

Hands-on lab

Stand up a vault, store a secret, grant an app’s managed identity read access, wire a Key Vault reference, and confirm an unauthorized caller is denied — all free-tier-friendly (a vault costs per transaction, effectively pennies; delete at the end). Run in Cloud Shell (Bash).

Step 1 — Variables and resource group.

RG=rg-kv-lab
LOC=centralindia
KV=kv-lab-$RANDOM          # globally-unique vault name
APP=app-kv-lab-$RANDOM     # globally-unique app name
az group create -n $RG -l $LOC -o table

Step 2 — Create a vault in RBAC mode with soft-delete (and purge protection off, so you can delete it cleanly).

az keyvault create -n $KV -g $RG -l $LOC \
  --enable-rbac-authorization true \
  --retention-days 7 \
  --sku standard -o table

Expected: a vault row; enableRbacAuthorization true. (We leave purge protection off only because this is a throwaway lab — in production, turn it on.)

Step 3 — Grant yourself a data role, then store a secret. Because the vault is RBAC, even as the creator you need a data role to write a secret:

ME=$(az ad signed-in-user show --query id -o tsv)
az role assignment create --assignee "$ME" --role "Key Vault Secrets Officer" \
  --scope "$(az keyvault show -n $KV -g $RG --query id -o tsv)"

# Give RBAC ~30s to propagate, then set a secret
az keyvault secret set --vault-name $KV --name DemoSecret --value "hello-from-kv" -o table

Expected: the secret object, id ending /secrets/DemoSecret/<version>. If you get 403, the role hasn’t propagated — wait and retry. (This is the “control access ≠ data access” lesson, live.)

Step 4 — Create an app with a managed identity.

az appservice plan create -n plan-kv-lab -g $RG --is-linux --sku B1 -o table
az webapp create -n $APP -g $RG -p plan-kv-lab --runtime "DOTNETCORE:8.0" -o table
az webapp identity assign -n $APP -g $RG -o table

Step 5 — Grant the app’s identity read-only access and wire a Key Vault reference.

PRINCIPAL=$(az webapp identity show -n $APP -g $RG --query principalId -o tsv)
az role assignment create --assignee "$PRINCIPAL" --role "Key Vault Secrets User" \
  --scope "$(az keyvault show -n $KV -g $RG --query id -o tsv)"

SECRET_URI=$(az keyvault secret show --vault-name $KV --name DemoSecret --query id -o tsv)
# Strip the version to follow rotation (unversioned reference)
BASE_URI=$(echo "$SECRET_URI" | sed 's#/[^/]*$#/#')
az webapp config appsettings set -n $APP -g $RG \
  --settings "DemoSecret=@Microsoft.KeyVault(SecretUri=$BASE_URI)" -o table

Step 6 — Confirm the reference resolved (not empty). In the portal: the app’s Environment variables blade shows DemoSecret with a green “resolved” status (an error icon means identity/role/firewall — exactly the failure table above). Via CLI you can verify the setting is the reference:

az webapp config appsettings list -n $APP -g $RG \
  --query "[?name=='DemoSecret'].{name:name, value:value}" -o table

Step 7 — Prove unauthorized access is denied. Remove the app’s role and confirm a read would now fail (the reference would resolve empty on next refresh):

az role assignment delete --assignee "$PRINCIPAL" --role "Key Vault Secrets User" \
  --scope "$(az keyvault show -n $KV -g $RG --query id -o tsv)"
# Re-add it so the app keeps working (or leave removed to observe the crash-loop)
az role assignment create --assignee "$PRINCIPAL" --role "Key Vault Secrets User" \
  --scope "$(az keyvault show -n $KV -g $RG --query id -o tsv)"

Validation checklist. You created an RBAC vault, learned that creating it doesn’t grant data access, stored and read a secret, gave an app a credential-free identity, wired a Key Vault reference, and saw that removing the role is what breaks it. The steps mapped to what each proves:

Step	What you did	What it proves	Real-world analogue
2	RBAC vault, soft-delete	The secure default posture	Every production vault
3	Assign a data role to yourself	Control access ≠ data access	The #1 “why 403” confusion
5	MI + Secrets User + KV reference	Secrets with zero stored creds	The canonical app pattern
6	Check the reference resolved	The reference-status diagnostic	First look when a setting “won’t take”
7	Remove the role	Role-or-identity is what breaks references	The empty-value crash-loop, live

Cleanup (avoid lingering charges and free the vault name).

az group delete -n $RG --yes --no-wait
# Because soft-delete reserves the name, purge it if you want the name back immediately:
az keyvault purge -n $KV  # only works with purge protection OFF (as in this lab)

Cost note. A B1 plan is a few rupees per hour and Key Vault transactions are fractions of a paisa each — an hour of this lab is well under ₹50, and deleting the resource group stops everything. Remember az keyvault purge is required to fully release the name (soft-delete keeps it reserved otherwise).

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First as a scannable table you can read mid-incident, then the entries that bite hardest with full confirm-command detail.

#	Symptom	Root cause	Confirm (exact cmd / portal path)	Fix
1	App crash-loops; secret-backed setting resolves empty	App has no managed identity	`az webapp identity show -n <app> -g <rg>` (empty)	`az webapp identity assign`; then grant the role
2	403 AccessDenied reading a secret, identity exists	No data-plane role assigned (or wrong scope)	`az role assignment list --assignee <principal> --scope <vaultId>`	Assign Key Vault Secrets User at the vault scope
3	403 ForbiddenByFirewall from your own app	Firewall default-deny, caller not allow-listed; or public disabled, no PE	`az keyvault show --query properties.networkAcls`; `publicNetworkAccess`	Allow subnet / bypass AzureServices / add Private Endpoint + DNS
4	“Key Vault Contributor” still can’t read secrets	Confusing control plane with data plane	IAM blade: they have Contributor, no data role	Add a data-plane role (Secrets User/Officer)
5	Assigning an access policy “does nothing”	Vault is in RBAC mode (`enableRbacAuthorization`)	`az keyvault show --query properties.enableRbacAuthorization`	Use `az role assignment` (not `set-policy`) on RBAC vaults
6	Intermittent failures / 429 under load	Throttling — reading secrets per-request, uncached	`ServiceApiResult` metric throttled > 0; `Retry-After` header	Cache in-process; exponential backoff; split vaults
7	Rotated secret not picked up by the app	KV reference cached, or versioned URI pinned	App restarts pick it up; check URI has a version	Use unversioned URI; restart/refresh; handle `NewVersionCreated`
8	TLS broke when the cert “renewed”	App Gateway/Front Door still serving old cert	Compare served thumbprint to vault current version	Re-import cert (or MI-sync) on `CertificateNewVersionCreated`
9	Certificate silently expired	No auto-renewal lifetime action wired	`az keyvault certificate show --query policy.lifetimeActions`	Add `AutoRenew` lifetime action; subscribe NearExpiry events
10	`VaultAlreadyExists` / can’t recreate a vault	A same-named vault is soft-deleted (name reserved)	`az keyvault list-deleted`	`az keyvault recover`, or purge (if PP off), or rename
11	Can’t disable purge protection	Purge protection is irreversible	`az keyvault show --query properties.enablePurgeProtection` = true	Cannot disable; wait out retention or recreate the vault
12	Read returns success but value is wrong/empty	Secret disabled, expired (advisory), or wrong version	`az keyvault secret show --query "{en:attributes.enabled, exp:attributes.exp}"`	Enable / roll forward / fix the URI
13	App Configuration KV reference unresolved	App Config’s identity lacks Secrets User	App Config “Key Vault reference” error status	Grant App Config’s managed identity Secrets User on the vault
14	Crypto op fails on a key (encrypt/sign)	Key lacks that operation in `key_ops`, or wrong algorithm	`az keyvault key show --query key.keyOps`	Grant the op / pick a supported algorithm; Crypto User role

The expanded form, with full reasoning for the entries that bite hardest:

1. App crash-loops and a secret-backed setting resolves to empty. Root cause: The app has no managed identity, so no Entra token is issued and the Key Vault reference resolves to nothing. Confirm: az webapp identity show -n <app> -g <rg> returns empty/null; the portal Environment variables blade shows the reference with a red error. Fix: az webapp identity assign (system-assigned) or attach a user-assigned identity, then grant it the data role (mistake #2 is the very next step people forget).

2. 403 AccessDenied reading a secret even though the identity exists. Root cause: The identity has no data-plane role, or the role was assigned at the wrong scope (e.g. an RG that no longer contains the vault, as in the Medivault story). Confirm: az role assignment list --assignee <principal> --scope $(az keyvault show -n <kv> -g <rg> --query id -o tsv) returns nothing. Fix: az role assignment create --assignee <principal> --role "Key Vault Secrets User" --scope <vaultId> — assign at the vault scope (or object scope for finer control).

3. 403 ForbiddenByFirewall from your own application. Root cause: The vault firewall is default-deny and the caller isn’t allow-listed, or public access is disabled with no Private Endpoint/DNS for the caller. Confirm: az keyvault show -n <kv> -g <rg> --query "{acls:properties.networkAcls, pna:properties.publicNetworkAccess}"; the error body says ForbiddenByFirewall (not AccessDenied). Fix: Add the caller’s subnet/IP, set bypass AzureServices for first-party callers, or (the strong form) add a Private Endpoint with a Private DNS zone so the hostname resolves privately.

4. Someone with Key Vault Contributor still can’t read a secret. Root cause: Control plane ≠ data plane. Contributor manages the vault but grants no access to the objects inside. Confirm: IAM blade shows Contributor but no Secrets/Crypto/Certificates data role. Fix: Assign the appropriate data-plane role. Management access never implies data access — by design.

5. Adding an access policy has no effect. Root cause: The vault is in Azure RBAC mode (enableRbacAuthorization = true), so the access-policy list is ignored. Confirm: az keyvault show --query properties.enableRbacAuthorization returns true. Fix: Use az role assignment create instead of az keyvault set-policy. (Pick one model per vault and stick to it.)

6. Intermittent failures and 429s under load. Root cause: Throttling — the app reads secrets on every request without caching and exceeds the per-vault transaction cap. Confirm: The ServiceApiResult metric shows throttled results climbing with load; responses carry a Retry-After header. Fix: Cache the resolved secrets in-process (refresh on an interval and on SecretNewVersionCreated); add exponential backoff; for genuinely high volume, split across vaults. A bigger SKU does not fix this.

7. A rotated secret isn’t picked up. Root cause: The Key Vault reference is cached, or you referenced a versioned URI that pins an old version. Confirm: The reference URI ends in a version GUID; restarting the app picks up the new value. Fix: Use the unversioned URI to follow rotation; restart/refresh the consumer; handle SecretNewVersionCreated to refresh caches deliberately.

8. TLS broke right after a certificate “renewed.” Root cause: The renewal created a new version in the vault, but the consumer (Application Gateway, Front Door) is still serving the old cert because it wasn’t re-imported/synced. Confirm: Compare the thumbprint the endpoint serves against the vault’s current certificate version. Fix: Re-import the cert to the consumer (or rely on managed-identity cert integration), triggered by the CertificateNewVersionCreated event.

9. A certificate silently expired. Root cause: No auto-renewal lifetime action was configured (or it was EmailContacts, which only warns). Confirm: az keyvault certificate show --query policy.lifetimeActions shows no AutoRenew trigger. Fix: Add an AutoRenew lifetime action (e.g. 30 days before expiry) and subscribe CertificateNearExpiry/CertificateNewVersionCreated events so renewal is verified and propagated.

10. You can’t recreate a vault — VaultAlreadyExists. Root cause: A previously-deleted, same-named vault is soft-deleted and still holding the globally-unique name. Confirm: az keyvault list-deleted shows it with a scheduledPurgeDate. Fix: az keyvault recover -n <name> to bring it back (and let IaC adopt it), or az keyvault purge if purge protection is off and policy allows, or choose a different name.

11. You can’t turn off purge protection. Root cause: Purge protection is irreversible by design. Confirm: az keyvault show --query properties.enablePurgeProtection is true. Fix: There is none for the existing vault — wait out retention for soft-deleted objects, or stand up a new vault if you genuinely need a no-PP vault (rare; PP is the safer default).

Best practices

Use Azure RBAC, not access policies, on every new vault. RBAC gives inheritance, central governance, PIM, and object-level scope; access policies are flat, capped at 1024, and invisible to the rest of Azure RBAC tooling.
Enable soft-delete (it’s forced) and purge protection. Purge protection is the control that defeats a malicious “delete and purge everything.” Accept that it’s irreversible — that’s the point.
Authenticate apps with managed identity + Key Vault references. No stored credential should ever exist to reach your secrets. Verify the reference resolves (Environment variables blade) after every deploy.
Assign data-plane roles at the vault (or object) scope, narrowly. Key Vault Secrets User for read-only apps; never hand an app Secrets Officer. Remember Contributor reads nothing.
Separate vaults by environment and sensitivity. Prod and non-prod in different vaults (and ideally subscriptions); a blast-radius boundary, not a convenience grouping.
Cache resolved secrets in-process. Reading a secret per request will throttle you (429) and cost per-transaction. Refresh on an interval and on SecretNewVersionCreated.
Lock the network down — Private Endpoint + public access disabled for production. Then test your own callers’ path and DNS, so you don’t 403 yourself. Set bypass AzureServices where first-party services need access.
Wire rotation, don’t hope. Certificate AutoRenew lifetime actions for certs; Event Grid NearExpiry → Function for secrets/keys that must change in a backing service. An unwired expiry is a scheduled outage.
Reference the unversioned URI to follow rotation; pin a version only when you need determinism. And remember consumers like App Gateway need a re-import on new cert versions.
Choose the SKU by custody requirement, not reflex. Standard for most; Premium for FIPS 140-2 L2 HSM keys; Managed HSM for L3/single-tenant/high-throughput. Don’t pay the HSM surcharge without a mandate.
Manage vault config and role assignments as code (Bicep), reviewed in PRs. A wrong scope or a missing identity is a boot-time landmine; catch it in review, not at 3am.
Alert on the leading indicators: throttled transactions, certificate/secret near-expiry, unauthorized (403) spikes, and availability — not just “app down.”

The alerts worth wiring before the next incident — leading indicators, not the lagging “app down”:

Alert on	Signal / metric	Threshold (starting point)	Why it’s leading
Throttling	`ServiceApiResult` (throttled)	> 0 sustained 5 min	First sign of uncached per-request reads before 429s cascade
Cert near-expiry	`CertificateNearExpiry` event / days-to-expiry	< 30 days	Catches a renewal that didn’t fire before TLS breaks
Secret near-expiry	`SecretNearExpiry` event	< 14 days	Rotate before consumers fail on a stale credential
Unauthorized access	403 result count	spike above baseline	Misconfig or an actual access attempt
Availability	Vault availability metric	< 99.9%	Platform issue vs your config — rule it in/out fast
Saturation toward cap	Total transactions / 10 s	approaching the GET cap	You’re about to throttle; add caching now

Security notes

Managed identity over any stored secret. The app’s system- or user-assigned managed identity with Key Vault references means connection strings and keys never sit in plaintext config. Grant least privilege — Key Vault Secrets User, not a broad or officer role.
Purge protection + soft-delete as a deletion-resistance control. They are a security feature, not just an ops convenience: together they defeat both fat-finger deletion and a credentialed attacker trying to wipe your keys.
Network-isolate sensitive vaults. A Private Endpoint with public access disabled keeps the vault off the internet entirely; pair it with correct Private DNS. For PaaS callers that can’t use a PE, scope the firewall tightly and use bypass AzureServices deliberately, not blanket-allow.
HSM custody where compliance demands it. Premium (FIPS 140-2 L2) or Managed HSM (L3, single-tenant) when an auditor requires key material to live in certified hardware. Make CMK keys non-exportable.
Audit everything, and watch it. Enable Key Vault diagnostic logs to a Log Analytics workspace; every SecretGet, KeyOperation and policy change is recorded. Pair with Azure Monitor and Application Insights: Full-Stack Observability and alert on 403 spikes and unexpected callers.
Govern vault creation centrally. Use Azure Policy and Governance at Scale: Enforce the Rules Automatically to require soft-delete + purge protection, deny public network access, and enforce RBAC mode on every vault by default.
Least privilege on the control plane too. Key Vault Contributor is powerful (firewall, SKU, policies) — restrict it; and remember it grants no data access, so don’t over-grant it trying to “let someone read a secret.”
Rotate keys and credentials on a schedule, and treat a leaked credential as permanently compromised — rotate, don’t hope it wasn’t seen. Soft-delete protects the value; rotation protects against exposure.

The security knobs that also prevent incidents — secure and resilient pull the same direction here:

Control	Setting / mechanism	Secures against	Also prevents
Managed identity + KV references	`identity` + `@Microsoft.KeyVault(...)`	Plaintext secrets in config	Hand-rolled credentials drifting/leaking
Azure RBAC, least privilege	`Key Vault Secrets User` at vault scope	Over-broad access to secrets	Officer-role mistakes; lateral movement
Soft-delete + purge protection	`enableSoftDelete`, `enablePurgeProtection`	Malicious/accidental deletion	Painful unrecoverable loss; redeploy-after-delete
Private Endpoint + public disabled	`publicNetworkAccess: 'Disabled'` + PE	Internet-exposed secrets	Some firewall lockout classes (with DNS done right)
Diagnostic logs to Log Analytics	Vault diagnostic settings	Unauditable access	Slow incident triage
HSM / Managed HSM	Premium / Managed HSM	Key material in software	Failed compliance audits
Policy: enforce vault standards	Azure Policy (deny/audit)	Drifting, insecure vaults	One team’s mistake becoming estate-wide

Cost & sizing

The bill drivers and how they interact with the design:

Operations are billed per transaction. Standard vault secret/key operations are fractions of a paisa each, so the cost lever is volume — an app reading four secrets per request at high RPS racks up both a bill and throttling. In-process caching is the single biggest cost (and 429) reducer. There is no per-vault hourly charge on Standard.
HSM keys carry a surcharge. Premium HSM-protected keys are billed per key per month (plus operations); Managed HSM is a fixed hourly charge per HSM pool (substantial — think enterprise-scale, not per-app). Don’t reach for HSM without a compliance mandate.
Certificates incur a per-renewal/operation cost (and the integrated-CA cost is the CA’s, separate from Key Vault). Auto-renewal volume is low, so this is rarely material.
Private Endpoint adds a small hourly + per-GB charge — cheap insurance to keep a sensitive vault off the internet, and almost always worth it for production.
Logging (diagnostic logs to Log Analytics) is billed per GB ingested — worth it for audit, but Key Vault log volume is modest unless you read secrets uncached at high volume (another reason to cache).

A rough monthly picture: a typical app’s Key Vault footprint (a Standard vault, a handful of secrets, a managed cert, sane caching) is often ₹0–200/month — operations are that cheap when you cache. Add a Private Endpoint (~₹600–900/month) for production isolation. Premium adds per-HSM-key charges; Managed HSM is a different order of magnitude (hourly per pool — for estates with real compliance throughput, not single apps). Medivault’s vault cost stayed in the low hundreds of rupees even after Private Endpoint, because caching cut transactions ~98%. The cost drivers and what each buys you:

Cost driver	What you pay for	Rough INR / month	What it fixes / enables	Watch-out
Standard vault operations	Per-transaction secret/key/cert ops	~₹0–200 (with caching)	The base service	Uncached per-request reads → bill + 429
Private Endpoint	Hourly + per-GB	~₹600–900	Vault off the public internet	Needs VNet + Private DNS
Premium HSM keys	Per HSM-key/month + ops	varies per key	FIPS 140-2 L2 custody	Surcharge per key; only with a mandate
Managed HSM	Fixed hourly per HSM pool	high (enterprise)	L3, single-tenant, high throughput	Not for a single app’s secrets
Diagnostic logs	Per-GB ingested to Log Analytics	~₹100–500	Audit trail / alerting	Volume tracks (uncached) read volume
Certificate renewals	Per renewal/op (+ CA cost separately)	low	Auto-renewing TLS	Integrated-CA cost is the CA’s

The sizing rule in one line: right-size by transaction volume and custody requirement, not by SKU reflex. Cache to kill volume; choose Standard unless an auditor names a FIPS level; add a Private Endpoint for production; reserve Managed HSM for genuine enterprise crypto throughput.

Interview & exam questions

1. What is the difference between the control plane and the data plane in Key Vault, and why does it trip people up? The control plane (Azure Resource Manager) manages the vault as a resource — create/delete, firewall, SKU, configure RBAC mode — governed by roles like Key Vault Contributor. The data plane governs the objects inside — get a secret, sign with a key — governed by data roles like Key Vault Secrets User or access policies. It trips people up because Contributor can manage the vault but cannot read a secret; management access is not data access.

2. An app’s Key Vault reference resolves to an empty value and the app crash-loops. What are the two most likely causes? Either the app has no managed identity (so no token is issued — check az webapp identity show), or the identity exists but has no data-plane role (or it’s assigned at the wrong scope — check az role assignment list --scope <vaultId>). Fix by enabling the identity and assigning Key Vault Secrets User at the vault scope.

3. When would you choose Azure RBAC over the access-policy model? Essentially always for new vaults: RBAC gives inheritance (MG→sub→RG→vault→object), central governance via az role assignment, just-in-time elevation through PIM, and object-level scope. Access policies are a flat per-vault list capped at 1024 entries with no PIM. Each vault uses exactly one model, set via enableRbacAuthorization.

4. What do soft-delete and purge protection do, and what’s the catch with purge protection? Soft-delete (always on now) keeps a deleted vault/object recoverable for a 7–90 day retention window. Purge protection blocks anyone from permanently purging during that window — defeating a malicious “delete and purge everything.” The catch: purge protection is irreversible once enabled, and it makes the retention period a hard floor.

5. Why does a TLS certificate appear as three objects in Key Vault? A certificate object bundles the X.509 cert, its private key (stored as a Key Vault key), and the exportable PFX/PEM (stored as a Key Vault secret) — so the same name is addressable as a certificate, a key, and a secret. You use the certificate object for lifecycle/renewal, the key for operations without exporting, and the secret to import the full PFX into App Service or Application Gateway.

6. An app intermittently gets 429 from Key Vault under load. What’s happening and how do you fix it? Key Vault is a throttled, shared service with a per-vault transaction cap (~25,000 fast transactions / 10 s). An app reading secrets per request without caching exceeds it under load and gets 429 with a Retry-After. The fix is in-process caching (refresh on an interval and on SecretNewVersionCreated) plus exponential backoff — not a bigger SKU.

7. You can’t recreate a vault — VaultAlreadyExists — but you don’t see it in the portal. Why? A previously-deleted, same-named vault is soft-deleted and still reserving the globally-unique name. Confirm with az keyvault list-deleted. Recover it (az keyvault recover) and let IaC adopt it, purge it (if purge protection is off and policy permits), or pick a new name.

8. What’s the difference between Standard, Premium, and Managed HSM? Standard stores software-protected keys (and is fine for most secrets/certs). Premium adds HSM-protected keys on shared FIPS 140-2 Level 2 HSMs. Managed HSM is a single-tenant pool of FIPS 140-2 Level 3 HSMs with its own RBAC and much higher throughput, billed at a fixed hourly rate per pool. Choose by required assurance level and isolation, not by feature envy.

9. How do you make a database password rotate automatically with Key Vault? A rotation policy sets an expiry; a SecretNearExpiry event via Event Grid triggers a Function that rotates the credential in the database and writes the new value back as a new secret version. Consumers using the unversioned reference pick up the new version (on restart/refresh). Key Vault alone can’t change the backing system — the Function does that half.

10. You locked a vault to a Private Endpoint and now your own app gets 403. What went wrong? Either the firewall is default-deny without the caller’s network allowed, public access is disabled without a working Private Endpoint + Private DNS for the caller, or you didn’t set bypass AzureServices for a first-party caller. The 403 body says ForbiddenByFirewall (network), distinct from AccessDenied (missing role). Fix the network path/DNS or allow-list, don’t touch the role.

11. What’s the difference between referencing a versioned and an unversioned secret URI? An unversioned URI (/secrets/Name/) follows the current version, so rotation flows through without changing the reference. A versioned URI pins an exact version — deterministic and audited, but it won’t pick up rotation. Use unversioned to auto-follow rotation, versioned when you need a fixed, reviewed value.

12. Why is Key Vault Contributor insufficient to let someone read a secret, and what would you assign instead? Contributor is a control-plane role — it manages the vault but grants no data-plane access to the objects inside, by design (separation of duties). To read secret values you assign a data-plane role: Key Vault Secrets User (read) or Secrets Officer (CRUD), at the vault or object scope.

These map to AZ-500 (Security Engineer) — manage Key Vault, secrets, keys, certificates, RBAC, network restrictions — and AZ-204 (Developer Associate) — secure app configuration data using Key Vault and managed identities. The networking angle (Private Endpoint, firewall) touches AZ-700, and governance (Policy enforcing vault standards) touches AZ-305. A compact cert-mapping for revision:

Question theme	Primary cert	Exam objective area
Control vs data plane, RBAC vs policies	AZ-500	Manage Key Vault access
Managed identity + KV references	AZ-204 / AZ-500	Secure app config; managed identities
Soft-delete, purge protection, recovery	AZ-500	Configure Key Vault security
HSM / Managed HSM, FIPS levels	AZ-500	Key management & custody
Private Endpoint / firewall	AZ-700 / AZ-500	Secure PaaS connectivity
Rotation, certificates, Event Grid	AZ-204 / AZ-500	Implement secure secret rotation
Policy enforcing vault standards	AZ-305	Design governance

Quick check

Someone has Key Vault Contributor on a vault but gets 403 reading a secret. Why, and what do you assign instead?
An app’s Key Vault reference resolves to an empty value and the app crash-loops. Name the two things to check, in order.
True or false: a bigger vault SKU is the correct fix for 429 throttling errors under load.
You enabled purge protection last week and now want to disable it. Can you, and why or why not?
A custom-domain TLS certificate stored in Key Vault expired despite being “managed.” What was almost certainly not configured?

Answers

Key Vault Contributor is a control-plane role — it manages the vault (firewall, SKU, policies) but grants no data-plane access to the objects inside, by design. Assign a data-plane role instead: Key Vault Secrets User (read) or Secrets Officer (CRUD), at the vault or object scope.
First, does the app have a managed identity (az webapp identity show — if empty, no token is issued; az webapp identity assign). Second, does that identity have a data-plane role at the vault’s actual scope (az role assignment list --assignee <principal> --scope <vaultId> — if empty, assign Key Vault Secrets User at the vault scope). It’s almost always identity-or-role, not the secret.
False. 429 is throttling against a per-vault transaction cap; a bigger SKU doesn’t raise it. The fix is in-process caching (read the secret once, refresh on an interval and on SecretNewVersionCreated) plus exponential backoff. Managed HSM has higher throughput, but the real fix is to stop reading secrets per request.
No. Purge protection is irreversible by design — once enabled it cannot be turned off for the life of the vault, and the retention period becomes a hard floor. If you genuinely need a no-PP vault you must create a new one (rare; PP is the safer default).
Auto-renewal — an AutoRenew lifetime action on the certificate policy (e.g. renew 30 days before expiry), and ideally a subscription to CertificateNearExpiry/CertificateNewVersionCreated events. A policy set to EmailContacts only warns and doesn’t renew; “stored in Key Vault” is not the same as “set to renew itself.”

Glossary

Key Vault — a regional, named Azure resource (https://<name>.vault.azure.net) that stores and access-controls secrets, keys and certificates.
Secret — a versioned name→value pair holding any string up to 25 KB (connection strings, passwords, API keys); the value is readable by an authorized caller.
Key — cryptographic material (RSA/EC, optionally HSM-backed) you never read directly; you invoke operations (encrypt/decrypt, sign/verify, wrap/unwrap) on it.
Certificate — an X.509 cert with a managed lifecycle (issuance + auto-renewal), stored under the hood as a key + a secret, so it’s addressable as three objects.
Control plane — Azure Resource Manager operations that manage the vault resource (create/delete, firewall, SKU, RBAC mode); governed by RBAC roles like Key Vault Contributor.
Data plane — operations on the objects inside the vault (get/set/sign); governed by data-plane RBAC roles or access policies, over *.vault.azure.net.
Access policy — the legacy per-vault permission list (flat, capped at 1024 entries) granting per-object-type operations; one of two auth models.
Azure RBAC (data plane) — the recommended auth model using role assignments (e.g. Key Vault Secrets User) with inheritance, central governance, and PIM.
Managed identity — a secret-free Entra identity Azure manages for a resource, letting it obtain tokens and authenticate to Key Vault with no stored credential.
Key Vault reference — an app setting/App Config value of the form @Microsoft.KeyVault(SecretUri=…) that the platform resolves at runtime using the app’s managed identity.
Soft-delete — a recoverable-deletion state (7–90 day retention; always on) for deleted vaults/objects, allowing recovery within the window.
Purge protection — an irreversible setting that blocks anyone from permanently purging a vault/object before the retention period elapses.
Private Endpoint — a private IP for the vault inside your VNet that removes the public path; pair with Private DNS so the hostname resolves privately.
Vault firewall (network ACLs) — IP/VNet allow-lists with a default-deny and an optional bypass for trusted Azure services.
HSM (Hardware Security Module) — certified hardware where key material lives and never leaves in cleartext; Premium (FIPS 140-2 L2) or Managed HSM (L3, single-tenant).
CMK (customer-managed key) — your key in Key Vault used by a service (Storage/SQL/Disk/ACR) via wrap/unwrap to encrypt its data, so you control the key.
Rotation policy — a schedule/lifetime action that triggers renewal (cert auto-renew) or near-expiry events (secret/key) for hands-off rotation.
Event Grid (Key Vault events) — the eventing source for SecretNearExpiry, CertificateNewVersionCreated, etc., used to drive rotation automation.
Throttling (429) — a Too Many Requests response when transactions exceed the per-vault cap (~25,000 fast ops/10s); fixed with caching and backoff.

Next steps

You can now treat secrets, keys and certificates as governed assets and avoid the four failures that page you. Build outward:

Next: Azure App Configuration in Production: Dynamic Refresh, Feature Flags, Key Vault References, and Snapshots — manage settings alongside Key Vault references so config and secrets are governed together.
Related: Application Gateway v2 WAF: End-to-End TLS, mTLS, and Custom Rule Tuning — pull TLS/mTLS certificates straight from Key Vault and keep end-to-end encryption clean.
Related: Azure Private Link and Private DNS: Keeping PaaS Off the Public Internet — the Private Endpoint + DNS pattern that takes your vault off the internet without locking yourself out.
Related: Azure Monitor and Application Insights: Full-Stack Observability — wire diagnostic logs and alerts so 403 spikes, throttling and near-expiry never go unnoticed.
Related: Azure Policy and Governance at Scale: Enforce the Rules Automatically — enforce soft-delete, purge protection, RBAC mode and no-public-access on every vault by default.
Related: Troubleshooting Azure App Service: 502/503 Errors, Cold Starts & Restart Loops — where a failed Key Vault reference shows up as a mysterious restart loop.