Security Azure

Eliminating Secrets in Azure: Key Vault, Managed Identity, and Automated Rotation

Secrets leak through a thousand small cracks: a connection string in appsettings.json, a SAS token pasted into a pipeline variable, a service principal password that hasn’t rotated since the project began. The target state is unambiguous - no secrets in source, no secrets in CI, no secrets in plaintext app settings, and anything that is a secret rotates on a schedule without a human in the loop. This walkthrough gets you there with Key Vault, managed identity, and Event Grid-driven rotation.

1. The target state

Before touching anything, fix the contract in your head:

The hardest part is not the tooling - it is the inventory. Grep your repos and pipeline definitions for Password=, AccountKey=, SharedAccessSignature, client_secret, and PEM headers before you start. You cannot eliminate what you have not found.

2. Key Vault data plane: RBAC, not access policies

Create the vault with the RBAC authorization model and protection flags on from day one. Soft-delete is enabled by default and cannot be turned off; purge protection is opt-in but should be mandatory for anything production.

az keyvault create \
  --name kv-kloudvin-prod \
  --resource-group rg-platform-prod \
  --location eastus2 \
  --enable-rbac-authorization true \
  --enable-purge-protection true \
  --retention-days 90 \
  --sku standard

The legacy access policy model grants permissions as a flat list on the vault and is invisible to Azure’s central access tooling. The RBAC model uses standard Azure role assignments, so the same az role assignment list, PIM, and Access Reviews that govern the rest of your estate now cover Key Vault. Pick RBAC and never look back.

The data-plane roles you actually use:

Role Use it for
Key Vault Secrets User App/workload read access to secret values
Key Vault Secrets Officer CI or operators that create/update secrets
Key Vault Certificates Officer Managing certificate objects and issuers
Key Vault Crypto User Wrap/unwrap, sign/verify with keys
Key Vault Administrator Break-glass / full data-plane control

A critical gotcha: enable-rbac-authorization true makes the vault ignore access policies entirely. If you migrate an existing vault, assign the RBAC roles before flipping the flag, or every consumer loses access at the cutover.

3. Wiring app access with managed identity

The whole point is to never hold a credential to reach Key Vault. A managed identity is an Entra ID service principal that Azure manages for you - no secret to store, rotate, or leak.

System-assigned identity is tied to one resource’s lifecycle (deleted with it) and is the right default for a single app. User-assigned identity is a standalone resource you attach to many compute targets - use it when several services share an identity, or when you need the identity (and its role assignments) to exist before the compute does, which matters for clean IaC ordering.

# System-assigned on a Function App
az functionapp identity assign \
  --name func-orders-prod \
  --resource-group rg-platform-prod

PRINCIPAL_ID=$(az functionapp identity show \
  --name func-orders-prod \
  --resource-group rg-platform-prod \
  --query principalId -o tsv)

# Grant read access scoped to the vault, not the resource group
VAULT_ID=$(az keyvault show --name kv-kloudvin-prod --query id -o tsv)

az role assignment create \
  --assignee-object-id "$PRINCIPAL_ID" \
  --assignee-principal-type ServicePrincipal \
  --role "Key Vault Secrets User" \
  --scope "$VAULT_ID"

In code, you never pass a connection string. The DefaultAzureCredential chain finds the managed identity at runtime:

var client = new SecretClient(
    new Uri("https://kv-kloudvin-prod.vault.azure.net/"),
    new DefaultAzureCredential());

KeyVaultSecret secret = await client.GetSecretAsync("Sql-ConnectionString");

If you use a user-assigned identity, DefaultAzureCredential needs to know which one. Set the client ID explicitly via AZURE_CLIENT_ID (or ManagedIdentityCredentialOptions), otherwise the runtime guesses and you get intermittent 403s when more than one identity is attached.

For CI, replace the stored secret with federated credentials so the pipeline gets a short-lived token via OIDC:

az ad app federated-credential create \
  --id "$APP_OBJECT_ID" \
  --parameters '{
    "name": "github-main",
    "issuer": "https://token.actions.githubusercontent.com",
    "subject": "repo:kloudvin/platform:ref:refs/heads/main",
    "audiences": ["api://AzureADTokenExchange"]
  }'

4. Storing and referencing secrets

Write secrets once, reference them everywhere. Avoid handing values to developers at all.

az keyvault secret set \
  --vault-name kv-kloudvin-prod \
  --name Sql-ConnectionString \
  --value "Server=tcp:sql-prod.database.windows.net;Database=orders;Authentication=Active Directory Managed Identity;"

Note the connection string above uses managed identity auth to SQL - so even this “secret” contains no password. That is the ideal: many things you used to store as secrets disappear once the downstream service supports Entra auth.

App Service / Functions: Key Vault references

App Service can resolve a Key Vault reference into an app setting at startup, using the app’s managed identity. The app code just reads an environment variable; the platform does the fetch.

az functionapp config appsettings set \
  --name func-orders-prod \
  --resource-group rg-platform-prod \
  --settings "ApiKey=@Microsoft.KeyVault(SecretUri=https://kv-kloudvin-prod.vault.azure.net/secrets/ExternalApiKey/)"

Omit the version GUID from the SecretUri (as above) so the app always resolves the current version - essential for rotation to take effect without a redeploy. App Service refreshes references periodically and on restart.

AKS: Secrets Store CSI driver

In Kubernetes, use the Secrets Store CSI Driver with the Azure provider and workload identity. Secrets are mounted as files (and optionally synced to native Secret objects), fetched by a federated pod identity.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: kv-orders
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    clientID: "<workload-identity-client-id>"
    keyvaultName: "kv-kloudvin-prod"
    tenantId: "<tenant-id>"
    objects: |
      array:
        - |
          objectName: ExternalApiKey
          objectType: secret

Mount it on the pod, and enable the optional rotation poller on the driver (--set enableSecretRotation=true --set rotationPollInterval=2m on the Helm install) so mounted values refresh after a Key Vault update.

5. Automated rotation for keys

Key Vault emits Event Grid events on a secret’s lifecycle, including Microsoft.KeyVault.SecretNearExpiry (fired at ~30 days before the secret’s expiry by default). The pattern: set an expiry on the secret, subscribe a rotation Function to the near-expiry event, and have the Function mint a new credential at the source and write the new version back.

Storage account keys are the canonical example. They come in pairs (key1/key2) precisely to allow zero-downtime rotation: regenerate the inactive key, publish it, then the next cycle regenerates the other.

# Give the secret an expiry so near-expiry events fire
az keyvault secret set-attributes \
  --vault-name kv-kloudvin-prod \
  --name StorageKey \
  --expires "2026-07-01T00:00:00Z"

# Subscribe a Function to the near-expiry event
az eventgrid event-subscription create \
  --name rotate-storage-key \
  --source-resource-id "$VAULT_ID" \
  --endpoint-type azurefunction \
  --endpoint "$FUNCTION_RESOURCE_ID" \
  --included-event-types Microsoft.KeyVault.SecretNearExpiry

The rotation Function’s logic, in plain terms:

  1. Read which key is currently published (store a tag like CredentialId=key2 alongside the secret).
  2. Regenerate the other key on the storage account via the management API.
  3. Write the new key as a new version of the secret, set a fresh expiry, and flip the tag.
// Regenerate the inactive key, then publish it as a new secret version.
var keys = await storageMgmt.StorageAccounts
    .RegenerateKeyAsync(rg, accountName, new StorageAccountRegenerateKeyParameters("key1"));

var newValue = keys.Value.First(k => k.KeyName == "key1").Value;

await secretClient.SetSecretAsync(new KeyVaultSecret("StorageKey", newValue)
{
    Properties = { ExpiresOn = DateTimeOffset.UtcNow.AddDays(60) }
});

Because apps reference the versionless secret URI, they pick up the new value on their next refresh - no deploy, no downtime. Microsoft publishes a reference implementation of this exact pattern; treat the above as the shape, and lift the production-hardened Function from the docs sample.

6. Rotating certificates with an integrated issuer

For certificates, Key Vault can manage the full lifecycle if you wire in an issuer. Configure the issuer once (DigiCert and GlobalSign are natively integrated; for ACME/Let’s Encrypt you typically front it with an automation Function or use App Service managed certificates for simple cases), then create a certificate policy with auto-renewal.

az keyvault certificate issuer create \
  --vault-name kv-kloudvin-prod \
  --issuer-name DigiCertProd \
  --provider DigiCert \
  --account-id "$DIGICERT_ACCOUNT" \
  --api-key "$DIGICERT_API_KEY"

The policy controls subject, key type, and the renewal trigger. --validity is in months; the lifetime action renews automatically before expiry:

az keyvault certificate create \
  --vault-name kv-kloudvin-prod \
  --name star-kloudvin-io \
  --policy '{
    "issuerParameters": { "name": "DigiCertProd" },
    "keyProperties": { "keyType": "RSA", "keySize": 2048, "reuseKey": false },
    "x509CertificateProperties": {
      "subject": "CN=*.kloudvin.io",
      "validityInMonths": 12
    },
    "lifetimeActions": [{
      "trigger": { "lifetimePercentage": 80 },
      "action": { "actionType": "AutoRenew" }
    }]
  }'

At 80% of lifetime, Key Vault asks the issuer for a renewal and creates a new version automatically. Consumers (App Service custom domains, Application Gateway) that bind to the versionless certificate reference pick it up; for those that cache, you still need a refresh hook.

7. Network lockdown

A vault reachable from the public internet is one stolen token away from exfiltration. Default to deny, then allow only private traffic.

# Default-deny the firewall, but allow trusted Azure services
az keyvault update \
  --name kv-kloudvin-prod \
  --resource-group rg-platform-prod \
  --default-action Deny \
  --bypass AzureServices

# Private endpoint into the platform VNet
az network private-endpoint create \
  --name pe-kv-kloudvin-prod \
  --resource-group rg-platform-prod \
  --vnet-name vnet-platform \
  --subnet snet-privatelink \
  --private-connection-resource-id "$VAULT_ID" \
  --group-id vault \
  --connection-name kv-connection

Then link the privatelink.vaultcore.azure.net Private DNS zone to your VNets so the vault FQDN resolves to the private IP. To fully disable the public endpoint set public network access off:

az keyvault update \
  --name kv-kloudvin-prod \
  --resource-group rg-platform-prod \
  --public-network-access Disabled

Leave --bypass AzureServices on, or platform features that legitimately need the data plane (Key Vault references, certificate binding, Event Grid) can break. This is the most common self-inflicted outage during lockdown.

8. Monitoring and protection

Send the AuditEvent logs to Log Analytics so every secret read is attributable, and confirm soft-delete / purge protection are actually on.

az monitor diagnostic-settings create \
  --name kv-audit \
  --resource "$VAULT_ID" \
  --logs '[{"category":"AuditEvent","enabled":true}]' \
  --workspace "$WORKSPACE_ID"

A KQL query to spot anomalous access - reads from outside expected identities:

AzureDiagnostics
| where ResourceType == "VAULTS" and OperationName == "SecretGet"
| summarize count() by identity_claim_appid_g, CallerIPAddress, bin(TimeGenerated, 1h)
| sort by count_ desc

Enterprise scenario

A payments platform we ran had ~40 Function Apps reading secrets from one regional Key Vault. We flipped on --public-network-access Disabled with a private endpoint, rehearsed it in non-prod, and shipped. Within the hour, Key Vault references across half the estate started returning 403, and App Service health probes went red - but only for apps in a second region we had spun up later. The vault itself was fine; the apps could not resolve it.

The gotcha: the private endpoint created a privatelink.vaultcore.azure.net A-record, but the Private DNS zone was linked to only the primary VNet. The secondary region’s VNet had no link, so its SDKs resolved the public vault.azure.net CNAME, hit the now-default-deny firewall, and failed. --bypass AzureServices saved the platform-managed Key Vault references that ran in-region; cross-region traffic had no such grace.

The fix was a single missing zone link, not a rollback:

az network private-dns link vnet create \
  --resource-group rg-platform-prod \
  --zone-name privatelink.vaultcore.azure.net \
  --name link-vnet-secondary \
  --virtual-network vnet-platform-westus2 \
  --registration-enabled false

The lesson that stuck: private endpoints are a DNS problem disguised as a networking problem. We added a post-deploy check - resolve the vault FQDN from a pod in every consuming VNet and assert it returns a 10.x address - and wired it into the pipeline gate. Locking down the data plane is the easy part; proving every consumer still resolves to the private IP is the part that actually keeps you online.

Verify

Prove the chain works end to end, not just that resources exist:

# 1. The app's identity can actually read (test with its scope, not yours)
az keyvault secret show --vault-name kv-kloudvin-prod --name StorageKey \
  --query "attributes.expires"

# 2. Confirm RBAC, not access policies, is enforcing
az keyvault show --name kv-kloudvin-prod \
  --query "properties.enableRbacAuthorization"

# 3. Confirm purge protection is irreversibly on
az keyvault show --name kv-kloudvin-prod \
  --query "properties.enablePurgeProtection"

# 4. Force a rotation rehearsal: shorten expiry to trigger near-expiry,
#    then watch a NEW secret version appear after the Function runs
az keyvault secret list-versions --vault-name kv-kloudvin-prod --name StorageKey \
  --query "[].{version:id, created:attributes.created}" -o table

The decisive test for zero-downtime: after rotation produces a new version, your app keeps serving traffic with no restart and no error spike, because it references the versionless URI. Watch your app’s dependency-failure metric across the rotation window - it should be flat.

Checklist

Pitfalls and next steps

The traps that bite teams: forgetting that pinning a secret URI to a specific version silently defeats rotation; flipping a vault to RBAC without pre-assigning roles and locking everyone out; disabling public access but never linking the Private DNS zone, so SDKs resolve the public IP and hit the firewall; and treating storage-key rotation as a single-key operation instead of using the key1/key2 dance for zero downtime.

For next steps, push the same model outward: managed identity for SQL, Service Bus, and Storage data-plane access removes those secrets entirely; layer PIM and Access Reviews on the Key Vault RBAC roles so even operator access is just-in-time; and codify all of the above in Bicep or Terraform so the secure configuration is the default, not a checklist someone has to remember.

Key VaultManaged IdentitySecret RotationRBACAzureEvent Grid

Comments

Keep Reading