Eliminating Secrets: Key Vault and Workload Identity Federation End to End

Every stored credential is a liability with a half-life: secrets expire at the worst moment, leak into logs and .env files, and outlive the engineer who created them. This guide walks the full path to a secret-free estate — Key Vault as the system of record for the few secrets you cannot avoid, managed identities for anything running inside Azure, and workload identity federation (OIDC) to extend that passwordless model to GitHub Actions and AKS. The destination is an estate where the only thing you rotate is trust, not strings.

The secret-zero problem

The hard part of removing secrets is the bootstrap. To read a secret from Key Vault, a workload must authenticate to Entra ID — but if that authentication itself relies on a stored client secret, you have only moved the problem one hop upstream. This is the secret-zero problem: how do you establish trust without a pre-shared credential?

The answer is platform-issued identity. The platform a workload runs on (an Azure VM, an AKS pod, a GitHub runner) issues it an identity, and Entra ID is configured to trust that platform’s token. No secret is stored anywhere. There are two mechanisms:

Mechanism	Where the identity comes from	Use it for
Managed identity	The Azure platform mints and rotates an identity bound to a resource	Anything running inside Azure (App Service, Functions, VMs, Container Apps)
Workload identity federation	An external OIDC issuer (GitHub, an AKS OIDC issuer, GitLab)	Workloads outside Azure’s IMDS reach, or pods needing per-service-account identity

The mental model: managed identity is “Azure trusts itself.” Federation is “Entra ID trusts a specific subject from a specific external issuer.” Both end in a short-lived Entra access token and zero stored secrets.

Step 1 — Key Vault foundations

Before federating anything, get the vault right. Two decisions dominate: authorization model and data protection.

RBAC over access policies. Legacy access policies are a flat list on the vault; anyone with Microsoft.KeyVault/vaults/write (Contributor, Key Vault Contributor) can grant themselves data access — a privilege-escalation path. Azure RBAC uses the standard role-assignment plane and is the recommended model. As of Key Vault API version 2026-02-01, RBAC is the default for newly created vaults.

az keyvault create \
  --name kv-plat-prod-001 \
  --resource-group rg-platform-prod \
  --location australiaeast \
  --enable-rbac-authorization true \
  --enable-purge-protection true \
  --retention-days 90 \
  --public-network-access Disabled

The data-plane RBAC roles you will actually use:

Role	Grants	Assign to
Key Vault Secrets User	Read secret values	Workloads (managed identities, federated apps)
Key Vault Secrets Officer	Create, update, delete secrets	CI/CD that seeds secrets, secret-ops humans
Key Vault Administrator	All data-plane ops on keys, secrets, certs	Break-glass and platform admins only

Assign least privilege at the secret scope where you can, and never hand a runtime workload more than Secrets User:

az role assignment create \
  --role "Key Vault Secrets User" \
  --assignee-object-id "$APP_PRINCIPAL_ID" \
  --assignee-principal-type ServicePrincipal \
  --scope "/subscriptions/$SUB/resourceGroups/rg-platform-prod/providers/Microsoft.KeyVault/vaults/kv-plat-prod-001"

Soft-delete and purge protection are non-negotiable for production. Soft-delete (always on) recovers a deleted vault or secret within the retention window; purge protection blocks even a privileged actor from hard-deleting before that window elapses, defeating a ransomware-style destroy. It is irreversible once enabled — that is the point.

Network isolation. --public-network-access Disabled plus a private endpoint keeps the data plane off the internet. Pair it with a Key Vault firewall that allows trusted Azure services so platform integrations still resolve.

Step 2 — Managed identities, decoded

Inside Azure, you almost never need federation — you need a managed identity. There are two flavors, and choosing wrong creates real operational pain.

System-assigned: lifecycle tied 1:1 to a single resource — created and deleted with it. Good for a standalone service where the identity should never outlive the workload.
User-assigned (UAMI): a standalone resource you create once and attach to many workloads. This is what you want at platform scale: assign Key Vault RBAC to the UAMI once, and every App Service, VM, or AKS pod that carries it inherits access. It also survives blue/green resource replacement.

# A UAMI shared across a workload family
az identity create \
  --name id-orders-api \
  --resource-group rg-platform-prod \
  --location australiaeast

APP_PRINCIPAL_ID=$(az identity show -n id-orders-api -g rg-platform-prod --query principalId -o tsv)
APP_CLIENT_ID=$(az identity show -n id-orders-api -g rg-platform-prod --query clientId -o tsv)

For an App Service, attach the UAMI and point app settings at the vault using Key Vault references — the platform resolves them at startup using the identity, so your code never sees a secret string:

az webapp identity assign \
  --name app-orders-prod --resource-group rg-platform-prod \
  --identities "/subscriptions/$SUB/resourceGroups/rg-platform-prod/providers/Microsoft.ManagedIdentity/userAssignedIdentities/id-orders-api"

az webapp config appsettings set \
  --name app-orders-prod --resource-group rg-platform-prod \
  --settings "Db__ConnString=@Microsoft.KeyVault(SecretUri=https://kv-plat-prod-001.vault.azure.net/secrets/orders-db-conn/)"

The SecretUri without a version (trailing /) resolves the current version. That single decision is the foundation of zero-downtime rotation in Step 6.

Step 3 — Workload identity federation: how the trust works

Federation lets Entra ID accept an OIDC token from an external issuer in exchange for an Entra access token — no client secret involved. You configure a federated identity credential (FIC) on either an app registration or a user-assigned managed identity. A FIC is a trust assertion with three fields that must all match the incoming token:

issuer — the OIDC issuer URL (e.g. https://token.actions.githubusercontent.com, or your AKS cluster’s OIDC issuer URL)
subject — the exact sub claim identifying the workload (a repo+branch, a repo+environment, or a Kubernetes service account)
audience — for Entra this is api://AzureADTokenExchange

At runtime the external platform issues a short-lived OIDC token, the workload presents it to Entra ID’s token endpoint, Entra validates issuer/subject/audience against a configured FIC, and returns a normal access token. The OIDC token lives minutes; nothing durable is stored.

Limit: a single managed identity (or app) supports a maximum of 20 federated identity credentials. Plan subjects accordingly — one FIC per branch and per environment adds up fast. Flexible federated credentials (claims matching with wildcards) exist in preview for GitHub/GitLab/Terraform Cloud on app objects if you outgrow exact-match.

Step 4 — Federating GitHub Actions to Azure

This kills the AZURE_CREDENTIALS JSON secret that haunts so many pipelines. Create (or reuse) an app registration, then add a FIC whose subject pins the exact repo and ref.

APP_ID=$(az ad app create --display-name "gh-orders-deploy" --query appId -o tsv)
az ad sp create --id "$APP_ID"

The subject claim is where least privilege lives. Pin to a branch or a GitHub Environment — environment scoping is stronger because it lets you gate on approvals and environment protection rules:

# Environment-scoped: only the 'prod' environment of this repo can assume the identity
az ad app federated-credential create \
  --id "$APP_ID" \
  --parameters '{
    "name": "gh-orders-prod-env",
    "issuer": "https://token.actions.githubusercontent.com",
    "subject": "repo:contoso/orders-api:environment:prod",
    "audiences": ["api://AzureADTokenExchange"]
  }'

Common subject formats:

Scenario	Subject
Branch push	`repo:ORG/REPO:ref:refs/heads/main`
Tag	`repo:ORG/REPO:ref:refs/tags/v1.2.3`
Pull request	`repo:ORG/REPO:pull_request`
Environment (preferred)	`repo:ORG/REPO:environment:prod`

Grant the app’s service principal only the roles that deployment needs — scoped to the target resource group, never the subscription. Then the workflow needs the id-token: write permission and the azure/login action with no secret:

name: deploy-orders
on:
  push:
    branches: [main]

permissions:
  id-token: write        # required to request the GitHub OIDC token
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: prod      # must match the FIC subject 'environment:prod'
    steps:
      - uses: actions/checkout@v4
      - uses: azure/login@v2
        with:
          client-id: ${{ vars.AZURE_CLIENT_ID }}
          tenant-id: ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
      - run: az webapp deploy --name app-orders-prod --resource-group rg-platform-prod --src-path ./app.zip --type zip

Note AZURE_CLIENT_ID and friends are repository variables, not secrets — they are identifiers, not credentials, and leaking them grants nothing without the matching OIDC trust.

Step 5 — AKS workload identity

Inside the cluster, pod-managed identity is deprecated; Microsoft Entra Workload ID is the model. The cluster runs an OIDC issuer, and a mutating webhook injects a projected service-account token plus the environment variables the Azure SDKs expect. Enable both:

az aks update \
  --name aks-plat-prod --resource-group rg-platform-prod \
  --enable-oidc-issuer \
  --enable-workload-identity

OIDC_ISSUER=$(az aks show -n aks-plat-prod -g rg-platform-prod \
  --query "oidcIssuerProfile.issuerUrl" -o tsv)

Federate a UAMI to a specific Kubernetes service account. The subject is system:serviceaccount:<namespace>:<name> and the issuer is the cluster’s OIDC URL:

az identity federated-credential create \
  --name fic-orders-sa \
  --identity-name id-orders-api \
  --resource-group rg-platform-prod \
  --issuer "$OIDC_ISSUER" \
  --subject "system:serviceaccount:orders:sa-orders" \
  --audiences "api://AzureADTokenExchange"

Annotate the service account with the UAMI client ID, and label pods to opt in. The annotation tells the webhook which identity to broker; the pod label flips the workload into fail-close behavior.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sa-orders
  namespace: orders
  annotations:
    azure.workload.identity/client-id: "<APP_CLIENT_ID of id-orders-api>"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-api
  namespace: orders
spec:
  template:
    metadata:
      labels:
        azure.workload.identity/use: "true"   # opt this pod into the webhook
    spec:
      serviceAccountName: sa-orders
      containers:
        - name: orders-api
          image: acrplatprod.azurecr.io/orders-api:1.4.0

With DefaultAzureCredential, the SDK inside the pod now authenticates with zero config. If you prefer secrets mounted as files, layer the Azure Key Vault provider for Secrets Store CSI Driver, which also works in workload-identity mode:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: spc-orders-kv
  namespace: orders
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    clientID: "<APP_CLIENT_ID of id-orders-api>"   # workload identity mode
    keyvaultName: "kv-plat-prod-001"
    tenantId: "<TENANT_ID>"
    objects: |
      array:
        - |
          objectName: orders-db-conn
          objectType: secret

Enable the add-on with rotation when you create or update the cluster:

az aks enable-addons \
  --addons azure-keyvault-secrets-provider \
  --name aks-plat-prod --resource-group rg-platform-prod \
  --enable-secret-rotation \
  --rotation-poll-interval 2m

Step 6 — Rotation without downtime

Rotation breaks applications when code pins a version. The discipline is to reference secrets without a version and let the resolver follow the current one.

App Service / Key Vault references: a versionless SecretUri (Step 2) re-resolves on app restart and on a periodic refresh, so rotating the secret in the vault propagates without a redeploy.
CSI driver: with --enable-secret-rotation, the provider polls the vault every rotation-poll-interval (default 2 minutes) and updates both the mounted files and any synced Kubernetes Secret. Mounted file content updates in place; apps that read the file per-request pick it up automatically. Apps that read once at startup still need a signal — watch the file or subscribe to the rotation.
Event-driven: Key Vault emits Microsoft.KeyVault.SecretNewVersionCreated to Event Grid. Wire that to a Function or webhook to trigger graceful cache invalidation or a rolling restart the moment a new version lands, rather than waiting on a poll interval.

The golden rule: store the secret in exactly one place (the vault), reference it versionlessly everywhere, and treat rotation as a vault-side operation that consumers observe — never a coordinated multi-system deploy.

Step 7 — Auditing and detecting orphaned secrets

You cannot claim “secret-free” without proving it. Two fronts: find the secrets you missed, and watch the vault you kept.

Find orphaned secrets. Sweep app settings and pipeline definitions for plaintext that should be a Key Vault reference or a federated identity:

# App settings that look like inline secrets rather than KV references
az webapp config appsettings list -n app-orders-prod -g rg-platform-prod \
  --query "[?!contains(value, '@Microsoft.KeyVault')].name" -o tsv

Hunt the classic offenders across the estate with Resource Graph — for example, app registrations that still carry password credentials (a federation candidate):

az graph query -q "
  resources
  | where type == 'microsoft.web/sites'
  | extend kind = tostring(kind)
  | project name, resourceGroup, kind"

Diagnostic logs. Route Key Vault AuditEvent logs to Log Analytics so every data-plane access is queryable and retained:

az monitor diagnostic-settings create \
  --name kv-audit \
  --resource "/subscriptions/$SUB/resourceGroups/rg-platform-prod/providers/Microsoft.KeyVault/vaults/kv-plat-prod-001" \
  --logs '[{"category":"AuditEvent","enabled":true}]' \
  --workspace "/subscriptions/$SUB/resourceGroups/rg-obs/providers/Microsoft.OperationalInsights/workspaces/law-platform"

Alert on anomalies. A KQL alert for access from an unexpected identity or a spike in SecretGet denials catches both misconfiguration and intrusion:

AzureDiagnostics
| where ResourceType == "VAULTS" and OperationName == "SecretGet"
| where ResultType != "Success"
| summarize denials = count() by identity_claim_appid_g, bin(TimeGenerated, 15m)
| where denials > 10

Enterprise scenario

A retail platform team federated forty-odd microservice repos to a single shared gh-deploy app registration, one FIC per repo’s prod environment. Within weeks they hit the hard cap: the 21st az ad app federated-credential create failed with The number of federated identity credentials on the application has reached the maximum allowed value of 20. The instinct was to mint more app registrations, but that scatters role assignments and audit identity across dozens of principals — exactly the sprawl they were trying to kill.

The fix was to stop modelling identity per repo. They created one user-assigned managed identity per deployment tier (id-deploy-prod, id-deploy-nonprod) and adopted GitHub’s repository_owner claim instead of pinning each repo. Crucially, a plain sub match cannot express “any repo in this org,” so they switched the FIC to a flexible federated credential using claimsMatchingExpression against assertion.repository_owner:

az ad app federated-credential create \
  --id "$APP_ID" \
  --parameters '{
    "name": "gh-org-prod",
    "issuer": "https://token.actions.githubusercontent.com",
    "audiences": ["api://AzureADTokenExchange"],
    "claimsMatchingExpression": {
      "value": "claims['"'"'repository_owner'"'"'] eq '"'"'contoso'"'"' and claims['"'"'environment'"'"'] eq '"'"'prod'"'"'",
      "languageVersion": 1
    }
  }'

One credential now covers every repo the org owns, gated on the prod environment so approvals still apply. Forty FICs collapsed to two, role assignments live on two identities, and sign-in logs attribute every deploy to one auditable principal. The lesson: federation subjects should map to a trust boundary, not to a repository — model the boundary first and the credential count takes care of itself.

Verify

Confirm the trust chain works end to end before you delete any old secret.

# 1. GitHub Actions: the workflow run should show azure/login succeeding with no secret.
#    In the Azure portal, check Entra ID sign-in logs for the app's federated sign-in.

# 2. AKS: exec into a pod and confirm the SDK obtains a token via the projected SA token.
kubectl exec -n orders deploy/orders-api -- env | grep AZURE_
#   expect AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_FEDERATED_TOKEN_FILE, AZURE_AUTHORITY_HOST

# 3. App Service: confirm the KV reference resolved (Status should read 'Resolved').
az webapp config appsettings list -n app-orders-prod -g rg-platform-prod \
  --query "[?name=='Db__ConnString']"

# 4. Vault data plane: confirm the workload can actually read.
az keyvault secret show --vault-name kv-plat-prod-001 --name orders-db-conn --query id -o tsv

If the AKS env vars are absent, the pod is missing the azure.workload.identity/use: "true" label. If azure/login fails, the FIC subject does not match the workflow’s environment/ref exactly — subjects are case- and string-sensitive.

Migration checklist

Pitfalls

Federation subject drift. Renaming a GitHub Environment, branch, or Kubernetes service account silently breaks the FIC. Treat subjects as a contract and change them deliberately.
The 20-FIC ceiling. One credential per branch and environment fills up fast. Consolidate with environment scoping or flexible federated credentials rather than minting more identities.
Pinned secret versions. A single versioned SecretUri or hardcoded version anywhere reintroduces a rotation outage. Audit for them.
Purge protection regret. It is irreversible — but the alternative is a recoverable destroy path. Enable it in production and move on.
Network lockout. Disabling public access without a private endpoint or trusted-services exception will lock out your own pipelines. Land the network path before flipping the switch.

Secret-zero is reached not when you have a vault, but when no stored credential anywhere grants access to it. Federation closes that last gap — the only thing left to manage is trust, expressed as issuer, subject, and audience, and trust does not leak into a log file.

Eliminating Secrets: Key Vault and Workload Identity Federation End to End

The secret-zero problem

Step 1 — Key Vault foundations

Step 2 — Managed identities, decoded

Step 3 — Workload identity federation: how the trust works

Step 4 — Federating GitHub Actions to Azure

Step 5 — AKS workload identity

Step 6 — Rotation without downtime

Step 7 — Auditing and detecting orphaned secrets

Enterprise scenario

Verify

Migration checklist

Pitfalls

Written by Vinod

Comments

Keep Reading

Application Gateway for Containers: Gateway API on AKS with Traffic Splitting, mTLS, and Header Routing

Azure Event Hubs at Scale: Partitioning, Capture, Kafka Endpoint, and Stream Analytics Processing

Azure Service Bus at Scale: Sessions, Deduplication, and Dead-Letter Handling