Security Azure

Locking Down Workload Identities: Conditional Access, Risk Detection, and Going Secretless

We spent a decade hardening human sign-ins - MFA, phishing-resistant credentials, risk-based Conditional Access - and quietly left the non-human half of the directory wide open. Every tenant has hundreds of service principals: SaaS connectors, CI deployers, automation runbooks, the random app some team consented to in 2021. Most carry a client secret valid for two years, broad Microsoft Graph application permissions, and zero conditions on where they can authenticate from. They do not get MFA prompts. They do not trigger a “new device” notification. That is exactly why attackers love them: a stolen secret is a quiet, durable foothold that survives the user password resets you do during incident response.

This guide treats workload identities as first-class subjects of governance. We inventory them, put Conditional Access around them, watch them with Identity Protection, strip the secrets out, right-size their permissions, and finish with the playbook you run when one is compromised.

Licensing note up front: workload-identity Conditional Access and workload identity risk detections require Microsoft Entra Workload ID Premium (a per-service-principal add-on), separate from the user-based Entra ID P2 your humans use. Budget for it before you design around it.

1. The blind spot: why service principals are the favorite persistence mechanism

A service principal is the local representation of an application in your tenant - the thing that actually holds credentials and gets tokens. Two properties make it dangerous:

Real intrusions (the Midnight Blizzard / Storm-0558 class of attack) followed this pattern: compromise or mint a credential on an over-privileged app, then quietly use Graph to exfiltrate mail and add their own credential for persistence. The user-focused controls never fired because no user was involved.

The fix is the same Zero Trust posture we apply to people, retargeted at workloads: verify the context of the sign-in, grant least privilege, assume the credential will leak, and remove the credential entirely where you can.

2. Inventory: app registrations, enterprise apps, and over-privileged Graph permissions

You cannot govern what you have not enumerated. Start with the two halves of every app:

Pull a full inventory with Microsoft Graph PowerShell:

Connect-MgGraph -Scopes "Application.Read.All","Directory.Read.All","AuditLog.Read.All"

# All app registrations with credential expiry
Get-MgApplication -All -Property DisplayName,AppId,PasswordCredentials,KeyCredentials |
  Select-Object DisplayName, AppId,
    @{n='Secrets';     e={ $_.PasswordCredentials.Count }},
    @{n='Certs';       e={ $_.KeyCredentials.Count }},
    @{n='NextExpiry';  e={ ($_.PasswordCredentials.EndDateTime + $_.KeyCredentials.EndDateTime |
                            Sort-Object | Select-Object -First 1) }} |
  Sort-Object NextExpiry | Format-Table -AutoSize

Now find the genuinely dangerous ones - principals granted high-impact application Graph permissions (not delegated). This is the query that surfaces your blast radius:

$graph = Get-MgServicePrincipal -Filter "appId eq '00000003-0000-0000-c000-000000000000'"
$dangerous = @('Directory.ReadWrite.All','RoleManagement.ReadWrite.Directory',
               'AppRoleAssignment.ReadWrite.All','Application.ReadWrite.All',
               'Mail.ReadWrite','Mail.Read','User.ReadWrite.All')

Get-MgServicePrincipalAppRoleAssignment -ServicePrincipalId $graph.Id -All |
  Where-Object { $_.AppRoleId } | ForEach-Object {
    $roleName = ($graph.AppRoles | Where-Object Id -eq $_.AppRoleId).Value
    if ($roleName -in $dangerous) {
      [pscustomobject]@{ App = $_.PrincipalDisplayName; Permission = $roleName }
    }
  } | Sort-Object App | Format-Table -AutoSize

AppRoleAssignment.ReadWrite.All and RoleManagement.ReadWrite.Directory are effectively tenant takeover permissions - an app holding either can grant itself any other permission or assign itself Global Administrator. Treat them as Tier 0.

Finally, flag the orphans and stragglers: apps with no owner, apps that have not signed in for 90+ days, and apps with secrets but no certificate. Service principal sign-ins live in the MicrosoftGraphActivityLogs and the sign-in logs under the service principal sign-in category - pull Get-MgAuditLogSignIn -Filter "signInEventTypes/any(t: t eq 'servicePrincipal')" (Workload ID Premium) to see who is actually active.

3. Conditional Access for workload identities (IP location restrictions)

Human CA policies do not apply to service principals. There is a separate CA target: workload identities. The single most valuable control here is a location condition - the vast majority of your service principals only ever authenticate from a fixed set of egress IPs (your CI runners, your automation subnet, a partner’s published ranges). Lock them to those IPs and a stolen secret used from an attacker’s infrastructure is simply blocked.

First define a named location for your legitimate egress:

$ip = New-MgIdentityConditionalAccessNamedLocation -BodyParameter @{
  "@odata.type" = "#microsoft.graph.ipNamedLocation"
  displayName   = "Corp egress + CI runners"
  isTrusted     = $true
  ipRanges      = @(
    @{ "@odata.type" = "#microsoft.graph.iPv4CidrRange"; cidrAddress = "203.0.113.0/24" }
    @{ "@odata.type" = "#microsoft.graph.iPv4CidrRange"; cidrAddress = "198.51.100.16/28" }
  )
}

Then create a CA policy that targets service principals (not users) and blocks sign-ins from anywhere except that location. Start in enabledForReportingButNotEnforced and read the report-only results before you flip to enabled:

$params = @{
  displayName = "WL - Block SP sign-in outside corp egress"
  state       = "enabledForReportingButNotEnforced"   # report-only first
  conditions  = @{
    clientApplications = @{
      includeServicePrincipals = @("ServicePrincipalsInMyTenant")
      # excludeServicePrincipals = @("<break-glass-automation-app-id>")
    }
    applications = @{ includeApplications = @("All") }
    locations    = @{
      includeLocations = @("All")
      excludeLocations = @($ip.Id)        # everything except our egress
    }
  }
  grantControls = @{ operator = "OR"; builtInControls = @("block") }
}
New-MgIdentityConditionalAccessPolicy -BodyParameter $params

Caveats that matter in production:

4. Detecting compromised service principals with workload identity risk signals

Entra Identity Protection extends to workload identities and produces service-principal risk detections that are exactly the signals an attacker trips:

Detection What it means
Leaked credentials The app’s secret/cert was found in a public leak (paste sites, repos).
Anomalous service principal activity Behavior deviates from the app’s learned baseline (new resources, unusual Graph calls).
Suspicious sign-ins Sign-in patterns inconsistent with the principal’s normal behavior.
Admin confirmed SP compromised An analyst manually flagged it - drives automation downstream.

Surface risky workload identities via Graph (IdentityRiskyServicePrincipal.Read.All):

Get-MgRiskyServicePrincipal -All |
  Where-Object { $_.RiskLevel -in @('high','medium') -and $_.RiskState -ne 'dismissed' } |
  Select-Object DisplayName, AppId, RiskLevel, RiskState, RiskLastUpdatedDateTime |
  Format-Table -AutoSize

Then close the loop with a risk-based Conditional Access policy for workload identities that blocks any service principal at elevated risk. This is the automated containment that runs at 3 a.m. without you:

$risk = @{
  displayName = "WL - Block risky service principals"
  state       = "enabled"
  conditions  = @{
    clientApplications     = @{ includeServicePrincipals = @("ServicePrincipalsInMyTenant") }
    applications           = @{ includeApplications = @("All") }
    servicePrincipalRiskLevels = @("high","medium")
  }
  grantControls = @{ operator = "OR"; builtInControls = @("block") }
}
New-MgIdentityConditionalAccessPolicy -BodyParameter $risk

For investigation and SIEM correlation, the ServicePrincipalRiskEvents and AADServicePrincipalSignInLogs tables in Log Analytics / Sentinel are where you hunt. A useful KQL starting point:

AADServicePrincipalSignInLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0                       // successful sign-ins only
| summarize SignIns = count(), IPs = make_set(IPAddress, 50) by ServicePrincipalName, AppId
| extend DistinctIPs = array_length(IPs)
| where DistinctIPs > 5                        // apps suddenly auth'ing from many IPs
| sort by DistinctIPs desc

5. Going secretless: federated credentials and managed identities

The most durable fix is to delete the secret. Two mechanisms cover almost every workload:

Managed identities for anything running in Azure (VMs, App Service, Functions, AKS, Container Apps, Automation). Azure manages the credential lifecycle entirely - there is no secret you can leak. Prefer user-assigned so the identity outlives a single resource and is reusable:

az identity create -g rg-platform-prod -n id-app-prod
PRINCIPAL_ID=$(az identity show -g rg-platform-prod -n id-app-prod --query principalId -o tsv)

# Grant data-plane access, not Owner - e.g. read a Key Vault
az role assignment create --assignee "$PRINCIPAL_ID" \
  --role "Key Vault Secrets User" \
  --scope "/subscriptions/$SUB_ID/resourceGroups/rg-platform-prod/providers/Microsoft.KeyVault/vaults/kv-app-prod"

Federated identity credentials (FIC) for workloads running outside Azure - GitHub Actions, GitLab, other clouds, or Kubernetes via the OIDC issuer. Instead of a secret, the workload presents a short-lived OIDC token whose issuer/subject/audience you pre-registered. Entra exchanges it for an access token only on an exact claim match. For GitHub Actions on the main branch:

az ad app federated-credential create --id "$APP_ID" --parameters '{
  "name": "gha-main",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:kloudvin/platform:ref:refs/heads/main",
  "audiences": ["api://AzureADTokenExchange"]
}'

Once federation or a managed identity is in place, delete every password credential on the app so there is nothing left to steal:

$app = Get-MgApplication -Filter "appId eq '$AppId'"
$app.PasswordCredentials | ForEach-Object {
  Remove-MgApplicationPassword -ApplicationId $app.Id -KeyId $_.KeyId
}

The subject in a GitHub FIC is an exact string match - no wildcards. Register one credential per branch/environment you actually deploy from. That tightness is the point: a leaked workflow file cannot move the trust to another branch.

6. Right-sizing application permissions and admin consent workflows

Two governance moves cut the permission blast radius:

Prefer delegated over application permissions, and least-privilege scopes. If an app only sends mail as one mailbox, use Application Access Policies in Exchange to constrain Mail.Send to a single mailbox instead of the whole tenant:

New-ApplicationAccessPolicy -AppId $AppId `
  -PolicyScopeGroupId "svc-mailers@contoso.com" `
  -AccessRight RestrictAccess `
  -Description "Limit app to mailboxes in svc-mailers group"

Turn off ad-hoc user consent and route requests through an admin consent workflow. Stop users from consenting apps into your tenant; force a reviewer.

# Restrict user consent to verified publishers + low-impact permissions only
Update-MgPolicyAuthorizationPolicy -BodyParameter @{
  defaultUserRolePermissions = @{
    permissionGrantPoliciesAssigned = @("ManagePermissionGrantsForSelf.microsoft-user-default-low")
  }
}

Then enable the admin consent request workflow so a user who needs a new app generates a request that named reviewers approve - giving you a governed front door instead of silent grants. Configure reviewers and notifications under Entra admin center > Identity > Enterprise applications > Consent and permissions > Admin consent settings, and review the resulting requests as a recurring control.

7. Credential hygiene: expiry alerting, certificate rotation, and orphan cleanup

Even with secretless as the goal, you will have legacy apps holding credentials for a while. Run hygiene as a scheduled job:

# Alert on credentials expiring within 30 days (or already expired)
$threshold = (Get-Date).AddDays(30)
Get-MgApplication -All -Property DisplayName,AppId,PasswordCredentials,KeyCredentials |
  ForEach-Object {
    foreach ($c in @($_.PasswordCredentials + $_.KeyCredentials)) {
      if ($c.EndDateTime -and $c.EndDateTime -lt $threshold) {
        [pscustomobject]@{
          App   = $_.DisplayName; AppId = $_.AppId
          Type  = if ($c.GetType().Name -match 'Key') {'Certificate'} else {'Secret'}
          KeyId = $c.KeyId; Expires = $c.EndDateTime
        }
      }
    }
  } | Sort-Object Expires | Format-Table -AutoSize

Hygiene rules worth enforcing:

New-MgPolicyAppManagementPolicy -BodyParameter @{
  displayName = "No long-lived secrets"
  isEnabled   = $true
  restrictions = @{
    passwordCredentials = @(@{
      restrictionType = "passwordLifetime"
      maxLifetime     = "P90D"        # 90-day ceiling on client secrets
      state           = "enabled"
    })
  }
}

8. Continuous monitoring and the compromise playbook

Wire the signals into Sentinel and define analytics rules for the high-signal events: new credential added to an app/SP, new app role (permission) granted, app added to a privileged directory role, and service principal risk detection raised. The “credential added” rule catches the classic persistence move:

AuditLogs
| where OperationName in ("Add service principal credentials",
                          "Update application - Certificates and secrets management")
| extend Target = tostring(TargetResources[0].displayName)
| extend Actor  = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
| project TimeGenerated, OperationName, Target, Actor, Result
| sort by TimeGenerated desc

When a principal is compromised, time-to-contain is everything. The playbook:

  1. Confirm and contain. In Identity Protection, mark the SP compromised (drives risk-based CA to block it). Then hard-disable: Update-MgServicePrincipal -ServicePrincipalId $id -AccountEnabled:$false.
  2. Revoke all credentials and active tokens. Remove every PasswordCredentials/KeyCredentials entry on the app, then revoke issued tokens so existing sessions die:
    Revoke-MgServicePrincipalSignInSession -ServicePrincipalId $id
    
  3. Hunt for attacker persistence. Diff the app’s credentials, owners, app role assignments, and federated credentials against your last known-good export. Attackers add a second secret or a rogue FIC so revoking the first one does nothing - check Get-MgApplicationFederatedIdentityCredential too.
  4. Assess blast radius. Pull MicrosoftGraphActivityLogs for that AppId to see exactly which Graph resources the principal touched during the window (which mailboxes, which directory objects).
  5. Rebuild clean. Re-create the identity secretless (managed identity or FIC), grant least privilege, and bring it back behind the location + risk CA policies.
  6. Post-incident: add the leaked IP ranges to a block, tighten the FIC subject, and add an analytics rule for whatever technique you missed.

Verify

Confirm the controls are real, not just present:

# 1. Workload-identity CA policies exist and target service principals
Get-MgIdentityConditionalAccessPolicy -All |
  Where-Object { $_.Conditions.ClientApplications.IncludeServicePrincipals } |
  Select-Object DisplayName, State

# 2. Risk-based blocking is live for workloads
Get-MgRiskyServicePrincipal -All | Measure-Object   # baseline count of risky SPs

# 3. No app holds Tier-0 Graph permissions it should not (re-run the section-2 query)
# 4. App management policy caps secret lifetime
Get-MgPolicyAppManagementPolicy | Select-Object DisplayName, IsEnabled

Enterprise scenario

A global logistics platform team ran ~340 service principals across three regions. During a tabletop exercise, their red team demonstrated the gap precisely: they planted a CI deployer’s client secret (scraped from an old pipeline log) on a VPS in another country and authenticated to Microsoft Graph successfully - the app had Mail.Read (application) and a secret valid for another 14 months. Nothing alerted, because no user and no MFA were involved.

The constraint was that they could not go fully secretless overnight - a dozen partner-operated apps and on-prem schedulers genuinely needed credentials for another two quarters. So they sequenced it. First, workload-identity CA with a location lock: every in-tenant service principal was restricted to a named location containing only their three NAT egress CIDRs plus their GitHub runner ranges, rolled out in report-only for two weeks to catch surprises (they found four shadow automations running from a developer’s home IP, which became their first cleanup). The same VPS replay attack, re-run after enforcement, was blocked at the token endpoint:

# The location-locked block policy that stopped the replay
$params = @{
  displayName   = "WL - SP egress lock"
  state         = "enabled"
  conditions    = @{
    clientApplications = @{ includeServicePrincipals = @("ServicePrincipalsInMyTenant")
                            excludeServicePrincipals = @("<breakglass-automation-appid>") }
    applications       = @{ includeApplications = @("All") }
    locations          = @{ includeLocations = @("All"); excludeLocations = @($corpEgressLocationId) }
  }
  grantControls = @{ operator = "OR"; builtInControls = @("block") }
}
New-MgIdentityConditionalAccessPolicy -BodyParameter $params

In parallel they migrated everything Azure-resident to user-assigned managed identities, moved GitHub deployers to federated credentials, and capped secret lifetime at 90 days via an app management policy so the long-lived-secret problem could not regrow. Six months later the credential count was down from ~340 secrets to 11 (all partner apps on a tracked retirement plan), and “leaked credentials” risk detections - which had previously been unmonitored - were wired into a Sentinel playbook that auto-disabled the principal and paged the on-call. The single highest-leverage move, in their post-mortem, was not the secretless migration; it was the location lock, because it neutralized stolen secrets on day one while the slower migration caught up.

Checklist

workload-identitiesConditional-Accessfederated-credentialsIdentity-Protectionservice-principalsEntra

Comments

Keep Reading