Operationalizing Entra ID Protection: Risk-Based Conditional Access, Detection Tuning, and Risk Investigation

Turning on the two default risk policies in Microsoft Entra ID Protection takes about ninety seconds. Running the program so it actually reduces account-takeover risk — without burying the helpdesk in password-reset tickets, without training your SOC to ignore the risky-users blade, and without a single false-positive lockout of the CFO at 6 a.m. — is the part nobody documents. Identity Protection is Microsoft’s machine-learning risk engine for identities: it scores every authentication (sign-in risk) and every identity (user risk) using trillions of daily signals from Entra ID, Microsoft Accounts, Xbox, and the Microsoft security graph, then hands those scores to you as conditions you can enforce with Conditional Access. Used well, it is the closest thing to an autopilot for account-takeover defence. Used badly, it is a random password-reset generator with a dashboard nobody reads.

This is the build I use in production tenants: the risk model you have to internalize first (two scores, two firing modes, three levels, six states), the full detection catalog — anonymous IP, atypical travel, unfamiliar sign-in properties, token anomalies, leaked credentials, password spray, attacker-in-the-middle (AiTM) and the rest — with the timing and noise profile of each, the two risk-based Conditional Access policies that do the enforcement (and exactly why they beat the legacy Identity Protection toggles), the self-remediation plumbing that keeps humans out of the loop, a repeatable investigation workflow across the Risky users, Risky sign-ins, and Risk detections blades, the tuning that kills false positives at the source, and the Graph + Sentinel wiring that turns the whole thing into an operated service with KPIs.

By the end you will stop treating risk detections as alerts to acknowledge and start treating them as labelled training data flowing through a control loop: detect, enforce, remediate, label, tune. Every section carries the real Graph calls, az CLI, Terraform, and KQL — because the portal is for investigating, and everything else should be code.

What problem this solves

Credential attacks are the dominant initial-access vector against cloud identities, and they do not look like Hollywood hacking. They look like a password spray that tries three common passwords against 40,000 accounts over two weeks, a leaked credential pair bought from an infostealer log marketplace, or an AiTM phishing proxy that relays a victim’s real MFA and steals the session cookie. Static defences — password policy, blanket MFA — blunt these but do not adapt: blanket MFA prompts everyone equally (so users learn to approve reflexively), and no static policy notices that this specific sign-in came from a Tor exit node with a token minted in the wrong place.

Without a risk-based layer, three failure modes show up in production. First, compromise dwell time: a leaked credential sits usable for weeks because nothing forces a reset until an admin stumbles over it. Second, MFA fatigue: because every sign-in is challenged identically, users approve prompts they should question, and attackers exploit exactly that. Third, SOC noise blindness: teams that do enable Identity Protection but never tune it get a wall of medium-risk atypical-travel detections from their own VPN egress, dismiss them in bulk, and thereby teach both the humans and the model to ignore the one real detection in the pile.

Who hits this: every tenant with Entra ID P2 that has ticked the “enable risk policies” box without building the operating model around it — which, in my experience of inheriting tenants, is most of them. The symptoms are recognizable on sight: a risky-users blade with 400 stale entries at “at risk” going back months, a user-risk policy scoped to Medium that generates daily forced resets, no trusted named locations, no Sentinel export, and a helpdesk that has learned to bypass the whole system by resetting passwords manually (which, done without MFA-verified identity proofing, is itself the vulnerability). This article is the difference between owning a risk engine and being owned by one.

Without a tuned Identity Protection program	With the program in this article
Leaked credentials usable until someone notices	High user risk forces a secure password change on next sign-in
Blanket MFA prompts train reflexive approval	MFA fires on risky sign-ins; quiet sign-ins stay quiet
Atypical-travel noise from VPN egress buries real detections	Trusted named locations suppress the false-positive class at source
Risky-users blade is a graveyard of stale “at risk” entries	Every entry is remediated, dismissed, or confirmed within SLA
Helpdesk resets passwords on request (social-engineerable)	Self-remediation via MFA-gated password change; no human in loop
Detections live 30–90 days in the portal, then vanish	Streamed to Sentinel/Log Analytics for long retention and hunting
“Did the policy work?” is unanswerable	Report-only data, KPIs, and KQL prove coverage and gaps

Learning objectives

By the end of this article you can:

Explain the sign-in risk vs user risk split, the real-time vs offline detection axis, and why leaked credentials can never block the sign-in that revealed them.
Enumerate every documented risk detection — from anonymous IP to AiTM to suspicious API traffic — with its riskEventType value, firing mode, risk type, and typical false-positive source.
Build the two production risk policies as Conditional Access policies (sign-in risk Medium+ → MFA; user risk High → secure password change) in report-only first, via Graph and Terraform, and explain why they beat the legacy Identity Protection toggles.
Stand up the self-remediation path — MFA registration, SSPR, password writeback or on-premises password reset — so risky users clear themselves without a ticket.
Run a repeatable investigation loop across the Risky users, Risky sign-ins, and Risk detections blades, read a user’s risk history, and choose correctly between dismiss, confirm safe, and confirm compromised.
Tune false positives structurally: trusted named locations for corporate and VPN egress, threshold selection, and dismissal hygiene that feeds the model true labels.
Export and automate with Microsoft Graph (riskyUsers, riskDetections, confirmCompromised, dismiss) and hunt in Microsoft Sentinel (AADUserRiskEvents, AADRiskyUsers, SigninLogs) with real KQL.
Run the program on KPIs — self-remediation rate, false-positive rate by detection type, MTTR, high-risk-success count — and defend the P2 licensing spend.

Prerequisites & where this fits

You should already be fluent in Conditional Access mechanics — assignments, conditions, grant vs session controls, report-only mode — at the level of Designing Conditional Access at Scale: A Persona-Based Policy Framework with Authentication Context and Filters, because risk conditions are just two more inputs to that engine. You need working knowledge of Entra authentication methods (MFA registration, SSPR) and, for hybrid tenants, how Microsoft Entra Connect Sync moves password hashes and writes passwords back — two mechanics that gate what Identity Protection can detect and remediate. Basic KQL and Graph API familiarity are assumed for the automation sections.

Licensing first, because it decides scope: Identity Protection’s risk-based policies require Microsoft Entra ID P2 (standalone, or via Microsoft 365 E5 / EMS E5 / Microsoft 365 F5 Security) for every user in scope of a risk-based policy. P1 gets Conditional Access but not risk conditions, and only a redacted view of the reports. Workload identity risk (risky service principals) needs Microsoft Entra Workload ID Premium on top. Role-wise you need Conditional Access Administrator to build policies, and Security Operator or higher to work the risk blades (Security Reader can view but not remediate; Global Reader sees reports only).

Capability	Free	P1	P2
Risk-based Conditional Access (sign-in/user risk conditions)	No	No	Yes
Risky users report	Limited (no detail/history)	Limited (no detail/history)	Full access
Risky sign-ins report	Limited	Limited	Full access
Risk detections report	No	Limited (no detection detail)	Full access
Detection names visible	“Additional risk detected” for premium detections	“Additional risk detected” for premium detections	Full detection names
Users-at-risk email + weekly digest	No	No	Yes
MFA registration policy (Identity Protection)	No	No	Yes
Workload identity risk (service principals)	—	—	Requires Workload ID Premium add-on

This article sits at the centre of the identity-security track: upstream of it are Conditional Access design and break-glass account engineering (which you must finish first — a misfiring risk policy with no excluded emergency account is a self-inflicted tenant lockout); downstream are the SOC integrations, KQL threat hunting, and the phishing-resistant endgame in FIDO2 passwordless rollout, which removes the password attacks Identity Protection spends most of its time detecting.

Core concepts

Five ideas carry the whole system. Internalize them and every design decision later becomes obvious.

Two scores, two questions. Sign-in risk answers “what is the probability that this specific authentication request was not performed by the account owner?” It is computed per sign-in, from properties of that sign-in: source IP reputation and anonymity, geo-velocity against the user’s history, device and browser familiarity, token characteristics. User risk answers “what is the probability that this identity is compromised?” It accumulates across sign-ins and non-sign-in signals — the flagship being leaked credentials, where Microsoft found the user’s actual username/password pair in a breach corpus, paste site, or law-enforcement feed. A risky sign-in usually contributes to user risk, but user risk can rise with no risky sign-in at all (the leak happened elsewhere), and a single risky sign-in does not necessarily make the user risky. Two scores, two policies, two different remediations.

Real-time vs offline is an architectural constraint, not a detail. Real-time detections (anonymous IP, unfamiliar sign-in properties, some token anomalies, verified threat-actor IP) are evaluated during the authentication, so a sign-in-risk Conditional Access policy can challenge or block that very request — though even “real-time” detections can take 5–10 minutes to appear in the reports. Offline detections (leaked credentials, atypical travel, password spray, AiTM, everything sourced from Defender for Cloud Apps) are computed after the fact — minutes to hours, sometimes up to a day — so they can only raise risk that affects the next policy evaluation. This is the single most misunderstood fact in incident reviews: a leaked-credentials detection cannot block the sign-in that used the leaked password, because Microsoft learned about the leak after the fact. Your compensating control is the user-risk policy at the next authentication, plus Continuous Access Evaluation cutting live sessions when user risk goes high.

Levels are thresholds, not explanations. Every detection and both aggregate scores resolve to Low, Medium, or High (plus No risk, and Hidden where the licence redacts detail). Microsoft deliberately does not publish the scoring weights, and they change as the model retrains — so you tune against observed outcomes (false-positive rate per detection type), never against assumed math. Policy design maps levels to controls: annoyance-tolerant controls (MFA) at Medium+, destructive controls (password change, block) at High only.

Risk is a state machine, and you are one of its inputs. A risky user or sign-in carries a risk state that changes through remediation and admin action. The states and who moves them:

`riskState`	Meaning	Set by	Effect on risk level
`none`	No risk ever detected	System	—
`atRisk`	Active risk, nothing has cleared it	System (detection)	Low/Medium/High as scored
`remediated`	User or admin performed a remediation (secure password change/reset, risk-policy MFA)	System, after the action	Drops to none
`dismissed`	Admin declared the risk benign	Admin (Dismiss user risk)	Drops to none
`confirmedSafe`	Admin labelled a specific sign-in legitimate	Admin (Confirm sign-in safe)	That sign-in’s risk cleared; feeds model a false-positive label
`confirmedCompromised`	Admin labelled the user/sign-in as true compromise	Admin (Confirm compromised)	User risk forced to High; feeds model a true-positive label

Alongside the state, riskDetail records why the state changed — and reading it is how you audit the program. The values you will actually see:

`riskDetail` value	What happened
`userPerformedSecuredPasswordChange`	User changed password after MFA (risk policy flow) — self-remediated
`userPerformedSecuredPasswordReset`	User reset password via SSPR with MFA — self-remediated
`userPassedMFADrivenByRiskBasedPolicy`	User satisfied MFA prompted by the sign-in-risk policy — sign-in risk remediated
`adminGeneratedTemporaryPassword`	Admin reset the password — remediated (weaker: attacker with session may persist)
`adminConfirmedSigninSafe`	Admin confirmed the sign-in safe (false-positive label)
`adminDismissedAllRiskForUser`	Admin dismissed the user’s risk
`adminConfirmedSigninCompromised` / `adminConfirmedUserCompromised`	Admin confirmed compromise (true-positive label)
`aiConfirmedSigninSafe`	The model itself reclassified the sign-in as safe
`hidden`	Detail redacted — tenant lacks P2 for this view

Enforcement is Conditional Access; Identity Protection is the sensor. Identity Protection produces signInRiskLevels and userRiskLevels; Conditional Access consumes them as conditions and applies grant/session controls. The legacy toggles inside the Identity Protection blade bundled both jobs and are deprecated (more below). Keep the mental model clean: detections → risk scores → CA conditions → controls → remediation → state change → (your labels) → model. That loop is the whole product.

One more moving part deserves a first-class mention: Continuous Access Evaluation (CAE). Classic OAuth enforcement waits for the access token to expire (default ~60–90 minutes) before Conditional Access re-evaluates. CAE-capable services (Exchange Online, SharePoint Online, Teams, Microsoft Graph) subscribe to critical events — user disabled or deleted, password changed or reset, refresh tokens revoked, MFA enabled, and elevation to high user risk — and revoke in near-real-time. In a tenant with CAE (on by default for eligible clients), confirming a user compromised or a High user-risk detection doesn’t just gate the next sign-in; it can cut live sessions to CAE-capable workloads within minutes. Non-CAE apps still ride out their token lifetime — which is why token revocation is a mandatory containment step, not an optional one.

Concept	One-line definition	Where you see it
Sign-in risk	Probability this authentication wasn’t the owner	`RiskLevelDuringSignIn` in SigninLogs; CA condition `signInRiskLevels`
User risk	Probability the identity is compromised	Risky users blade; CA condition `userRiskLevels`
Real-time detection	Scored during the authentication; can gate that sign-in	`DetectionTimingType == "realtime"`
Offline detection	Scored after the fact; affects next evaluation	`DetectionTimingType == "offline"`
Risk level	Low / Medium / High bucket per detection and aggregate	Policy thresholds
Risk state	Lifecycle: atRisk → remediated/dismissed/confirmed*	`riskState` on users, sign-ins, detections
Risk detail	Why the state last changed	`riskDetail`
Self-remediation	User clears own risk via MFA / secure password change	`userPerformed*` riskDetail values
CAE	Near-real-time token revocation on critical events incl. high user risk	Session cut mid-lifetime
Risk-based CA	CA policies using risk levels as conditions	The enforcement layer

The detection catalog: every risk detection and when it fires

Design decisions live or die on knowing what each detection actually detects, when it fires relative to the sign-in, and what it false-positives on. The riskEventType values below are the exact strings you filter on in Graph (identityProtection/riskDetections) and in the Sentinel table AADUserRiskEvents.

Premium sign-in risk detections

Detection	`riskEventType`	Timing	What actually triggers it	Classic false-positive source
Anonymous IP address	`anonymizedIPAddress`	Real-time	Sign-in from a Tor exit node or anonymizing VPN/proxy service	Privacy-conscious users on consumer VPNs (Mullvad, Proton)
Unfamiliar sign-in properties	`unfamiliarFeatures`	Real-time	Sign-in deviates from the user’s learned history: IP, ASN, geo, device, browser, tenant IP subnet familiarity	New joiners (no history yet — ~first weeks are noisy), travel, new device rollouts
Verified threat actor IP	`nationStateIP`	Real-time	Source IP attributed by Microsoft Threat Intelligence to a named state-sponsored or criminal actor	Very rare — treat as high fidelity
Anomalous token	`anomalousToken`	Real-time or offline	Token with abnormal characteristics — unusual lifetime, replay from a different place than issued (token theft indicator)	Some legitimate fat-client/token-caching patterns; low volume
Malicious IP address	`maliciousIPAddress`	Offline	IP with high failed-sign-in reputation / known bad infrastructure across Microsoft’s graph	Shared egress previously abused (carrier-grade NAT, hosting ranges)
Atypical travel	`unlikelyTravel`	Offline	Two sign-ins whose geo distance vs time gap implies impossible physical travel, judged against the user’s pattern	The classic: split-tunnel VPN, cloud egress in another region, mobile vs corporate network flapping
Password spray	`passwordSpray`	Offline	The account was part of a detected spray campaign (many accounts, few passwords) — fires when the pattern is identified	Genuine broad lockout events (bad SSO config) can resemble spray
Attacker in the middle (AiTM)	`attackerinTheMiddle`	Offline	Microsoft 365 Defender / MSTIC identified the sign-in as proxied through an AiTM phishing kit (session-cookie theft)	Very high fidelity — treat every hit as real until proven otherwise
Suspicious browser	`suspiciousBrowser`	Offline	The same browser fingerprint used for risky sign-ins across multiple users/tenants	Kiosk/shared-lab browsers
Token issuer anomaly	`tokenIssuerAnomaly`	Offline	SAML token from your federation trust shows signs of forgery/manipulation (Golden-SAML-style)	Misconfigured third-party IdP token pipelines
Microsoft Entra threat intelligence (sign-in)	`investigationsThreatIntelligence`	Real-time or offline	Sign-in matches a known attack pattern from Microsoft’s internal/external intel sources	Low — high fidelity by design
Additional risk detected	`generic`	Real-time or offline	Placeholder shown when the tenant lacks the licence to see the real detection name	— (fix your licence view, not the detection)

Three of these deserve a longer look because they change how you respond:

Anomalous token and token issuer anomaly are your token-theft tripwires. Post-MFA attacks steal what MFA produced — the token or session cookie — rather than the password. anomalousToken firing on a sign-in that passed MFA is not reassuring, it is the signature of replay: the token was minted for one context and is being spent in another. Response is session revocation, not password reset alone — the stolen artifact is the token, and only revocation invalidates it.

AiTM is the MFA bypass you must plan for. An AiTM kit (Evilginx-class, or phishing-as-a-service like the campaigns Microsoft tracks) proxies the real login page; the victim types real credentials and completes real MFA, and the attacker keeps the resulting session cookie. The attackerinTheMiddle detection is offline — the session is already live when it fires — so the response runbook is: revoke sessions immediately, reset credentials, hunt the mailbox for persistence (inbox rules), and check what the session touched. The preventive control is phishing-resistant authentication (FIDO2/passkeys/Windows Hello) plus token protection, because AiTM cannot relay an origin-bound credential.

Password spray tells you about the attack, not just the account. One spray detection usually means the campaign touched many of your accounts; pivot from the single detection to the campaign (KQL later) rather than treating it as one user’s problem. Spray detections on accounts that succeeded authentication are P1 incidents.

Sign-in risk detections sourced from Defender for Cloud Apps

If Microsoft Defender for Cloud Apps is licensed and its app connectors are on, its anomaly detections feed Identity Protection as sign-in risk. They are all offline and inherit MDCA’s own tuning (its IP ranges and anomaly policy sensitivity — tune there, not just in Entra):

Detection	`riskEventType`	What it indicates
Impossible travel	`mcasImpossibleTravel`	MDCA’s geo-velocity anomaly across app activity, not just sign-in
Activity from anonymous IP address	`activityFromAnonymousIPAddress`	App activity (not just auth) from an anonymizer
New country	`newCountry`	Activity from a location never seen for this org/user
Mass access to sensitive files	`mcasFinSuspiciousFileAccess`	Bulk access pattern against labelled/sensitive content
Suspicious inbox forwarding	`suspiciousInboxForwarding`	Auto-forward rule exfiltrating mail externally
Suspicious inbox manipulation rules	`mcasSuspiciousInboxManipulationRules`	Rules that hide/delete mail — the classic BEC persistence move

The two inbox detections are worth their weight in gold: they fire on post-compromise behaviour, which means they catch account takeovers that sailed past every sign-in-time control. An inbox-rule detection on a user with no risky sign-ins usually means the initial access predates your telemetry or came through a legacy path — investigate wider, not narrower.

User risk detections

Detection	`riskEventType`	Timing	What actually triggers it
Leaked credentials	`leakedCredentials`	Offline	Microsoft matched the user’s current username:password pair in breach corpora, paste sites, dark-web markets, or law-enforcement feeds
Microsoft Entra threat intelligence (user)	`investigationsThreatIntelligence`	Offline	User activity matches known attack patterns per Microsoft intel
Anomalous user activity	`anomalousUserActivity`	Offline	The user’s directory behaviour deviates from their baseline (unusual admin-ish operations)
Possible attempt to access Primary Refresh Token (PRT)	`attemptedPRTAccess`	Offline	Defender for Endpoint signal that something on the device tried to extract the PRT — device-bound token theft
Suspicious API traffic	`suspiciousAPITraffic`	Offline	Abnormal Graph/AD enumeration volume from the user (reconnaissance signature)
Suspicious sending patterns	`suspiciousSendingPatterns`	Offline	EOP/MDCA judge the mailbox is likely being used for spam/phish outbound
User reported suspicious activity	`userReportedSuspiciousActivity`	Offline	The user pressed “No, it’s not me” / report-fraud on an MFA prompt they didn’t initiate

Two operational notes. Leaked credentials requires password hash sync for hybrid users — the match is computed against the hash Microsoft holds, so a hybrid tenant running pass-through authentication or federation without PHS gets zero leaked-credential detections for synced users. That is a silent, catastrophic coverage gap; PHS-as-backup is worth it for this detection alone. userReportedSuspiciousActivity is your users doing SOC work for you — an MFA-fatigue attack that a user reports lands the account at high risk automatically. Publicize the report button in awareness training; it converts your workforce into sensors.

What non-P2 tenants see

Tenants without P2 (or views by under-licensed admins) get the redacted catalog — worth knowing so you can read a customer’s tenant correctly:

Visible without P2	`riskEventType`	Notes
Additional risk detected (sign-in)	`generic`	Stand-in for every premium sign-in detection
Additional risk detected (user)	`generic`	Stand-in for every premium user detection
Anonymous IP address	`anonymizedIPAddress`	Shown in full to all tiers
Admin confirmed user compromised	`adminConfirmedUserCompromised`	Your own admin action, reflected as a detection
Leaked credentials	`leakedCredentials`	Shown in full to all tiers

Noise profile: where your tuning time will go

Ranked by false-positive volume in a typical corporate tenant — this is where Section “Tuning false positives” earns its keep:

Rank	Detection	Typical FP driver	Structural fix
1	Unfamiliar sign-in properties	New users, device refreshes, egress changes	Trusted named locations; accept the new-joiner burn-in
2	Atypical travel / Impossible travel	Split-tunnel VPN, cloud egress, mobile network flapping	Trusted named locations covering ALL egress; MDCA IP tagging
3	Anonymous IP address	Consumer privacy VPNs	Policy decision: block, or MFA + user education
4	Malicious IP address	Shared/carrier-grade NAT with abusive tenants	Named location for known-good shared egress; verify before dismissing
5	Anomalous token	Token-caching client quirks	Investigate first — dismiss only with device evidence
6+	AiTM, verified threat actor IP, leaked credentials, PRT access	Rare FPs	Treat as true positives by default

Risk-based Conditional Access: the policies that act on risk

Identity Protection historically shipped its own built-in “user risk policy” and “sign-in risk policy” toggles inside its blade. Do not use them. Microsoft has deprecated the legacy risk policies in favour of Conditional Access risk conditions, has been actively migrating tenants, and the CA versions are simply better tools:

Capability	Legacy Identity Protection policies	Risk-based Conditional Access
Report-only mode (soak before enforcing)	No	Yes — the killer feature
Scoping (groups, roles, apps, device filters)	Users only, coarse	Full CA assignment model
Break-glass / service-account exclusions	Users/groups only	Groups, roles, guest types, workload filters
Multiple policies with different thresholds	One of each	As many as your design needs
Session controls (sign-in frequency)	No	Yes — force re-auth every time on risk
Authentication strengths (require phishing-resistant MFA)	No — any MFA	Yes
Controls available	MFA (sign-in), password change (user)	MFA, auth strengths, password change, block, compliant device combos
What If tool / policy insights workbook	No	Yes
Manageable as code (Graph, Terraform)	Poorly	Fully

The migration is mechanical: note the legacy policy’s threshold and scope, rebuild as CA in report-only, compare impact, enable CA, disable legacy. Never run both enforced in parallel — users hit doubled prompts and your impact data becomes uninterpretable.

The two-policy core design

Policy A — sign-in risk gate. Medium and High sign-in risk → require MFA, re-authenticated every time. This is the real-time challenge: a genuine user passes MFA and continues (and that MFA remediates the sign-in risk — userPassedMFADrivenByRiskBasedPolicy); an attacker with only a password fails. Session control matters: without sign-in frequency “every time”, a risky sign-in can be satisfied by a cached MFA claim from hours ago, which defeats the purpose — you want fresh proof-of-presence at the moment of risk.

Policy B — user risk gate. High user risk → require MFA and secure password change. This is the remediation forcing-function: the user proves identity with strong auth, sets a new password, and the platform automatically drops user risk to none (remediated). One policy evaluation replaces the entire manual “helpdesk resets the password and hopes” workflow.

Design choice	Policy A (sign-in risk)	Policy B (user risk)
Condition	`signInRiskLevels: ["medium","high"]`	`userRiskLevels: ["high"]`
Grant control	`mfa` (or an authentication strength)	`mfa` AND `passwordChange`
Operator	OR (single control)	AND — mandatory
Session control	Sign-in frequency: every time	Sign-in frequency: every time
Users	All users, minus break-glass + service accounts	Same
Apps	All resources	All resources — required by the control
Why this threshold	Medium+ is where signal outweighs noise for a non-destructive control	High is dominated by leaked credentials + confirmed compromise; Medium would force benign resets weekly

The passwordChange control carries hard platform constraints — violate them and Graph rejects the policy or users hit unsatisfiable loops:

Constraint	Detail
Must pair with MFA	`builtInControls: ["mfa","passwordChange"]`, operator `AND` — a reset without proof of identity would let the attacker holding the leaked password rotate it themselves
Must target all resources	The policy’s app assignment must be All resources with no app exclusions — a partial scope would let a risky session reach unscoped apps without remediating
No mixing with other grant controls	Cannot combine with compliant-device/other grants in the same policy
Users need a change path	SSPR registered, and for hybrid: password writeback or on-prem reset flow (next section) — otherwise the policy is a lockout machine

Block vs remediate: when to use High sign-in risk → block

Some organizations add a third policy: High sign-in risk → block (instead of MFA). The argument for: real-time High is rare and heavily weighted toward token anomalies and threat-actor infrastructure, and an AiTM-relayed session can sometimes pass an MFA challenge — MFA is not a sufficient gate against a live relay. The argument against: block generates lockout tickets with no self-service exit, and if your named locations are untuned, a false-positive High blocks an executive abroad. My rule: run High→block only after your false-positive rate at High is proven near-zero over a month of data, and prefer requiring a phishing-resistant authentication strength at High as the middle path — it defeats AiTM relay without a hard block.

Control at High sign-in risk	Stops AiTM relay?	Lockout risk	Self-service recovery	Use when
Require MFA (any method)	Partially (relay can pass OTP/push)	Low	Yes	Default starting point
Require phishing-resistant auth strength	Yes (origin-bound)	Medium (needs FIDO2/WHfB rollout)	Yes, if methods registered	Passwordless program underway
Block	Yes (nothing issued)	High	No — helpdesk only	Proven-clean High signal; mature SOC

Build them: Graph first, report-only always

Before either policy, confirm your emergency-access accounts are excluded from every CA policy in the tenant — the classic self-inflicted outage is a risk policy that locks out your only path back in (full design in Engineering Break-Glass Emergency Access Accounts in Entra ID):

# Audit: every enabled CA policy and its exclusions — break-glass group must appear in all
Connect-MgGraph -Scopes "Policy.Read.All"
Get-MgIdentityConditionalAccessPolicy |
  Where-Object { $_.State -eq 'enabled' } |
  Select-Object DisplayName,
    @{n='ExcludedUsers'; e={$_.Conditions.Users.ExcludeUsers -join ','}},
    @{n='ExcludedGroups';e={$_.Conditions.Users.ExcludeGroups -join ','}} |
  Format-Table -AutoSize

Policy A via Graph (note frequencyInterval: everyTime — no value/type when every-time):

az rest --method post \
  --url "https://graph.microsoft.com/v1.0/identity/conditionalAccess/policies" \
  --body '{
    "displayName": "CA300-Global-SigninRisk-MediumHigh-RequireMFA",
    "state": "enabledForReportingButNotEnforced",
    "conditions": {
      "signInRiskLevels": ["high", "medium"],
      "clientAppTypes": ["all"],
      "applications": { "includeApplications": ["All"] },
      "users": {
        "includeUsers": ["All"],
        "excludeGroups": ["<breakglass-group-guid>", "<svc-exclusion-group-guid>"]
      }
    },
    "grantControls": { "operator": "OR", "builtInControls": ["mfa"] },
    "sessionControls": {
      "signInFrequency": {
        "isEnabled": true,
        "frequencyInterval": "everyTime",
        "authenticationType": "primaryAndSecondaryAuthentication"
      }
    }
  }'

Policy B — the user-risk remediation gate:

az rest --method post \
  --url "https://graph.microsoft.com/v1.0/identity/conditionalAccess/policies" \
  --body '{
    "displayName": "CA301-Global-UserRisk-High-SecurePasswordChange",
    "state": "enabledForReportingButNotEnforced",
    "conditions": {
      "userRiskLevels": ["high"],
      "clientAppTypes": ["all"],
      "applications": { "includeApplications": ["All"] },
      "users": {
        "includeUsers": ["All"],
        "excludeGroups": ["<breakglass-group-guid>"]
      }
    },
    "grantControls": { "operator": "AND", "builtInControls": ["mfa", "passwordChange"] },
    "sessionControls": {
      "signInFrequency": {
        "isEnabled": true,
        "frequencyInterval": "everyTime",
        "authenticationType": "primaryAndSecondaryAuthentication"
      }
    }
  }'

And as Terraform (azuread provider), because risk policies belong in the same policy-as-code pipeline as the rest of your CA estate:

resource "azuread_conditional_access_policy" "signin_risk_mfa" {
  display_name = "CA300-Global-SigninRisk-MediumHigh-RequireMFA"
  state        = "enabledForReportingButNotEnforced"

  conditions {
    client_app_types    = ["all"]
    sign_in_risk_levels = ["medium", "high"]

    applications { included_applications = ["All"] }

    users {
      included_users  = ["All"]
      excluded_groups = [var.breakglass_group_id, var.svc_exclusion_group_id]
    }
  }

  grant_controls {
    operator          = "OR"
    built_in_controls = ["mfa"]
  }

  session_controls {
    sign_in_frequency_interval            = "everyTime"
    sign_in_frequency_authentication_type = "primaryAndSecondaryAuthentication"
  }
}

resource "azuread_conditional_access_policy" "user_risk_pwd_change" {
  display_name = "CA301-Global-UserRisk-High-SecurePasswordChange"
  state        = "enabledForReportingButNotEnforced"

  conditions {
    client_app_types = ["all"]
    user_risk_levels = ["high"]

    applications { included_applications = ["All"] }

    users {
      included_users  = ["All"]
      excluded_groups = [var.breakglass_group_id]
    }
  }

  grant_controls {
    operator          = "AND"
    built_in_controls = ["mfa", "passwordChange"]
  }
}

Run both in report-only for a minimum of two weeks (a month if your workforce travels). Report-only evaluates and logs what would have happened (reportOnlyFailure, reportOnlyInterrupted, reportOnlySuccess in the sign-in log’s CA tab) without touching users. The KQL to size the blast radius is in the Sentinel section. Flip to enabled only when the would-be-challenged volume matches helpdesk capacity and the false-positive tuning below is done.

Rollout sequencing that survives contact with reality

Phase	Duration	Action	Exit criteria
0 — Plumbing	2–4 weeks	MFA registration campaign, SSPR + writeback (hybrid), break-glass exclusions, named locations	MFA registration >95%; SSPR verified end-to-end
1 — Report-only	2–4 weeks	Both policies `enabledForReportingButNotEnforced`; weekly impact review	FP rate by detection type understood; volume ≤ helpdesk capacity
2 — Pilot enforce	2 weeks	Enable for IT + a pilot BU via include-group	Self-remediation rate >85% in pilot; no lockout escalations
3 — Staged enforce	2–6 weeks	Expand include scope by business unit	KPIs stable at each expansion
4 — Full enforce + High→strength/block	Ongoing	All users; consider phishing-resistant strength at High	Weekly dismissal review; monthly threshold review

Self-remediation: the plumbing that keeps the helpdesk out of the loop

The entire economic argument for risk-based policy is that the user remediates themselves: trip the sign-in-risk gate → pass MFA → sign-in risk clears; trip the user-risk gate → MFA + new password → user risk clears. No ticket, no admin, no dwell time. But self-remediation is a machine with four load-bearing parts, and if any one is missing the policy converts detections into lockouts:

Prerequisite	Why it gates remediation	Verify with
MFA method registered (per user)	A user with no method cannot satisfy Policy A or the MFA half of Policy B — hard stop, helpdesk call	`GET /reports/authenticationMethods/userRegistrationDetails` — `isMfaCapable`
SSPR enabled + registered	Policy B’s password change flow rides the combined registration / reset stack	Authentication methods policy + registration report
Password writeback (hybrid with cloud-initiated change)	A cloud password change that never reaches on-prem AD “clears risk” then locks the user out of domain resources	`Get-ADSyncAADPasswordResetConfiguration`; end-to-end test reset
PHS enabled (hybrid)	Gates leaked-credentials detection entirely; also enables the on-prem reset-clears-risk path	Entra Connect features; `Get-MgDirectoryOnPremiseSynchronization`

Registration coverage is a Graph one-liner you should trend weekly during Phase 0:

# Who is NOT MFA-capable right now (these users cannot self-remediate)
az rest --method get \
  --url "https://graph.microsoft.com/v1.0/reports/authenticationMethods/userRegistrationDetails?\$filter=isMfaCapable eq false&\$top=999" \
  --query "value[].{upn:userPrincipalName, sspr:isSsprRegistered}" -o table

Hybrid tenants get one more option worth enabling deliberately: “Allow on-premises password change to reset user risk” (Identity Protection → Settings). With PHS on, a password change performed on-premises (helpdesk AD reset, Ctrl+Alt+Del change) syncs up and remediates user risk — closing the gap where hybrid users who reset via the old on-prem path stayed flagged at High forever. Enable it only if your on-prem reset process itself verifies identity properly; otherwise you have built a risk-clearing side door for social engineers who can talk a service desk into an AD reset.

How each remediation and admin action moves the state machine — keep this table next to your runbook, because choosing the wrong action either leaves users stuck at High or mislabels your model:

Action	Performed by	Effect on risk	`riskDetail` recorded	When to use
MFA prompted by sign-in-risk policy	User	That sign-in’s risk → remediated	`userPassedMFADrivenByRiskBasedPolicy`	Automatic — Policy A
Secure password change (signed-in, post-MFA)	User	User risk → remediated	`userPerformedSecuredPasswordChange`	Automatic — Policy B
SSPR with MFA	User	User risk → remediated	`userPerformedSecuredPasswordReset`	User-initiated recovery
On-prem password change (setting enabled, PHS)	User/helpdesk	User risk → remediated	on-premises change detail	Hybrid fallback
Admin password reset	Admin	User risk → remediated (weaker — see note)	`adminGeneratedTemporaryPassword`	User unreachable / no MFA method
Dismiss user risk	Admin	Risk → none, state `dismissed`; closes the user’s detections	`adminDismissedAllRiskForUser`	Confirmed false positive at user level
Confirm sign-in safe	Admin	That sign-in cleared, state `confirmedSafe`; FP label to model	`adminConfirmedSigninSafe`	Verified-legit single sign-in
Confirm user compromised	Admin	User risk → High, state `confirmedCompromised`; TP label to model	`adminConfirmedUserCompromised`	Verified compromise — start containment
Confirm sign-in compromised	Admin	Sign-in labelled TP; drives user risk up	`adminConfirmedSigninCompromised`	Verified-malicious single sign-in
Block user (`accountEnabled=false`)	Admin	Auth stops; risk state unchanged	—	Containment while investigating

The note on admin resets: adminGeneratedTemporaryPassword remediates the score but does nothing about live sessions or tokens the attacker already holds. An admin reset without token revocation is cosmetic containment. Which is why the containment runbook in the next section always revokes.

The hybrid trap to test before enforcement, not after: password writeback disabled → user trips Policy B → “successfully” changes cloud password → risk clears → on-prem AD never got the new password → user locked out of every domain-joined resource, and the helpdesk queue fills by 09:30. Verify writeback with a real end-to-end reset on a pilot account.

Investigating risky users, sign-ins, and detections

The three blades under Entra admin center → Protection → Identity Protection are an evidence chain, and the discipline is to walk it the same way every time: Risky users (who), pivot to Risky sign-ins (when/where/how), corroborate in Risk detections (why), decide, act, label.

Reading the risky user

Open the user; the detail pane carries the aggregate risk level/state/last-updated and the Risk history — every risky sign-in, detection, and admin action for the account (the portal surfaces roughly the last 90 days of history; detections themselves age out of the portal, which is why the Sentinel export exists). The history answers the first triage question: is this an event or a pattern? One atypical-travel hit on Monday is an event. Leaked credentials + anonymous-IP sign-in + inbox-rule detection across three days is a kill chain.

Connect-MgGraph -Scopes "IdentityRiskyUser.Read.All","IdentityRiskEvent.Read.All"

# All users currently at high risk, still unremediated
Get-MgRiskyUser -Filter "riskLevel eq 'high' and riskState eq 'atRisk'" |
  Select-Object UserPrincipalName, RiskLevel, RiskState, RiskLastUpdatedDateTime

# Full risk history for one user — the evidence chain in one call
Get-MgRiskyUserHistory -RiskyUserId <user-object-id> |
  Select-Object RiskLastUpdatedDateTime, RiskState, RiskDetail,
    @{n='Activity';e={$_.Activity.AdditionalProperties.riskEventTypes -join ','}} |
  Sort-Object RiskLastUpdatedDateTime

Reading the risky sign-in

For each risky sign-in, the fields that carry verdict weight:

Field	What it tells you	Verdict weight
`riskLevelDuringSignIn` vs `riskLevelAggregated`	Real-time score at the moment vs final score after offline detections landed	A benign-looking sign-in that turned High aggregated = offline intel arrived; re-open it
`riskEventTypes_v2`	Which detections fired on this sign-in	AiTM/anomalousToken/nationStateIP ≫ unlikelyTravel
IP / ASN / location	Attacker infrastructure vs corporate egress vs home ISP	Hosting-provider ASN + foreign geo is a red flag; your own VPN range is a tuning task
Device ID / join state / browser	Known managed device vs unknown	Compliant hybrid-joined device argues benign
`authenticationRequirement` + MFA result	Was MFA satisfied, and how	MFA satisfied by claim in token on a risky sign-in deserves scrutiny (token replay)
`conditionalAccessStatus`	Did your policies even apply	`notApplied` on a risky success = coverage gap — a finding about your estate
Application targeted	What the session could reach	Mail/Graph/management-plane targets escalate priority

The triage decision, condensed to the table I actually pin up for SOC shifts:

Evidence pattern	Probable verdict	Action
Risky sign-in from managed, compliant device; MFA freshly satisfied; user confirms activity	False positive	Confirm sign-in safe; if user-level risk accrued, dismiss; queue tuning if pattern repeats
Atypical travel matching a real itinerary (user confirms; calendar corroborates)	Benign travel	Confirm safe; consider named-location review for frequent sites
Leaked credentials, no suspicious sign-ins yet	True positive (credential exposure)	Let Policy B force secure change at next sign-in; if user is dormant, admin-reset + revoke now
Anonymous IP + unfamiliar properties + user denies	Likely compromise	Confirm compromised → containment runbook
AiTM detection, any context	Treat as compromise	Containment immediately; hunt mailbox rules; check what session accessed
`anomalousToken` on MFA-satisfied sign-in	Token theft suspicion	Revoke sessions first, investigate second
Password spray detection, sign-in failed	Attack attempt, not compromise	No user action; pivot to campaign hunt; confirm lockout/smart-lockout held
Password spray detection, sign-in succeeded	Active compromise	Confirm compromised + containment; assume other accounts hit

The containment runbook, scripted

Same steps, same order, every time — the order matters because revocation before reset closes the window where the attacker’s refresh token could mint new access tokens against the old credential:

Connect-MgGraph -Scopes "User.ReadWrite.All","IdentityRiskyUser.ReadWrite.All"

$userId = "<object-id-of-compromised-user>"

# 1. Label: confirm compromised (user risk -> High, CAE-capable apps get the
#    critical event, and the model receives a true-positive label)
Confirm-MgRiskyUserCompromised -UserIds @($userId)

# 2. Contain: revoke every refresh token / session
Revoke-MgUserSignInSession -UserId $userId

# 3. Rotate: force credential change at next sign-in
Update-MgUser -UserId $userId `
  -PasswordProfile @{ forceChangePasswordNextSignIn = $true }

# 4. (If interactive access must stop entirely while you investigate)
Update-MgUser -UserId $userId -AccountEnabled:$false

Then the blast-radius sweep — what did the session touch (mailbox rules, consent grants, new devices, role changes):

AuditLogs
| where TimeGenerated > ago(7d)
| where InitiatedBy.user.id == "<user-object-id>"
| where OperationName in ("New-InboxRule", "Set-InboxRule", "Consent to application",
    "Add service principal credentials", "Register device", "Add member to role")
| project TimeGenerated, OperationName, Result, TargetResources
| sort by TimeGenerated asc

After verified remediation, close the loop: a user left at confirmedCompromised/High re-trips Policy B on every sign-in forever. The correct close-out is confirm-compromised (true label) → contain → verified clean → dismiss (return to none):

Invoke-MgDismissRiskyUser -UserIds @($userId)   # only after remediation is verified

Investigation SLAs worth adopting

Signal	Triage SLA	Rationale
High user risk (esp. leaked credentials)	4 business hours	Policy B is already gating, but dormant users never trip it — push remediation
AiTM / anomalous token / threat-actor IP	1 hour	Session already live; offline detection means the attacker has a head start
High-risk sign-in that succeeded with CA `notApplied`	1 hour	Uncontrolled compromise path — both an incident and a policy gap
Medium sign-in risk, policy challenged and passed	Daily batch review	Self-remediated; review for tuning signal only
Password spray campaign detection	Same day	Campaign scope assessment across all targeted accounts

Tuning false positives: named locations, VPN egress, and dismissal hygiene

Untuned, the noisiest detections in a corporate tenant are unfamiliar sign-in properties and atypical/impossible travel — and the culprit is almost always your own network: split-tunnel VPN egressing from another city, SD-WAN breaking out through a cloud security stack two regions away, or a mobile fleet flapping between carrier NAT and office Wi-Fi. The model sees a user “teleport” 1,200 km in four minutes because that is literally what your egress did.

Trusted named locations: the structural fix

Tag every corporate egress range — offices, VPN concentrators, cloud proxy/SWG egress (Zscaler/Netskope ranges you use), branch breakouts — as trusted named locations. isTrusted = true is the field that feeds the risk engine’s familiarity model; a plain named location without it only serves location conditions in CA and does nothing for risk scoring.

Connect-MgGraph -Scopes "Policy.ReadWrite.ConditionalAccess"

$params = @{
  "@odata.type" = "#microsoft.graph.ipNamedLocation"
  displayName   = "NL-Trusted-CorpEgress"
  isTrusted     = $true
  ipRanges      = @(
    @{ "@odata.type" = "#microsoft.graph.iPv4CidrRange"; cidrAddress = "203.0.113.0/24" },   # HQ + DC egress
    @{ "@odata.type" = "#microsoft.graph.iPv4CidrRange"; cidrAddress = "198.51.100.0/23" },  # VPN concentrators
    @{ "@odata.type" = "#microsoft.graph.iPv6CidrRange"; cidrAddress = "2001:db8:85a3::/48" }
  )
}
New-MgIdentityConditionalAccessNamedLocation -BodyParameter $params

resource "azuread_named_location" "corp_egress" {
  display_name = "NL-Trusted-CorpEgress"
  ip {
    ip_ranges = [
      "203.0.113.0/24",
      "198.51.100.0/23",
      "2001:db8:85a3::/48",
    ]
    trusted = true
  }
}

Named-location type	Risk-engine effect	CA-condition use	Gotcha
IP location, `isTrusted = true`	Lowers risk scoring for these ranges; suppresses travel/unfamiliar FPs	Usable in location conditions	Stale ranges after an egress migration silently re-open the noise
IP location, not trusted	None — labelling only	Usable in conditions	The #1 tuning miss: ranges added but `isTrusted` left false
Country/region (GPS or IP-resolved)	No direct risk suppression	Geo-fencing policies	IP-geo is approximate; GPS needs Authenticator
MDCA IP address tags (“Corporate”)	Tunes MDCA-sourced detections (`mcasImpossibleTravel`, `newCountry`)	—	Separate system: tag ranges in Defender for Cloud Apps too, or its detections keep firing

That last row matters: the MDCA-sourced detections do their geo math inside Defender for Cloud Apps, which has its own IP-range store (Settings → Cloud Apps → IP address ranges, category Corporate). A tenant that tunes Entra named locations but not MDCA tags keeps getting mcasImpossibleTravel noise and concludes tuning “doesn’t work.”

Discovery query for what to add — the egress ranges your risky-but-dismissed sign-ins actually come from:

SigninLogs
| where TimeGenerated > ago(30d)
| where RiskLevelDuringSignIn in ("medium","high") or RiskLevelAggregated in ("medium","high")
| where RiskState in ("dismissed","confirmedSafe","remediated")
| summarize signins = count(), users = dcount(UserPrincipalName),
    sampleUser = any(UserPrincipalName) by IPAddress, ASN = tostring(AutonomousSystemNumber)
| where users >= 3            // shared egress, not one traveller
| sort by signins desc

Any row with dozens of sign-ins and many distinct users is corporate egress you forgot to trust. One caution cuts the other way: do not trust ranges you do not control. Trusting a whole carrier NAT, a coworking ISP, or “the country we operate in” lowers scoring for every attacker inside that range. Trusted means your perimeter, cryptographically-adjacent egress only. And resist blanket policy exclusions for locations: excluding a location from the risk policy (rather than trusting it in scoring) creates a clean bypass corridor — an attacker who lands one foothold inside that range inherits the exemption.

Thresholds, exclusions, and the service-account problem

Tuning lever	Fixes	Cost/risk	Verdict
Trusted named locations	Travel/unfamiliar FPs at source	Must maintain range inventory	Always do
MDCA corporate IP tags	MDCA-sourced detection FPs	Second inventory to maintain	Always do if MDCA feeds risk
Sign-in risk threshold High-only (from Medium+)	Cuts MFA-challenge volume ~60–80%	Misses medium-fidelity real attacks	Temporary step during rollout only
User risk threshold Medium+ (from High)	Catches more, earlier	Weekly benign forced resets; helpdesk load	Avoid except high-security enclaves
Excluding user groups from risk policies	Stops FP pain for that group	Standing bypass — attacker’s favourite group to land in	Break-glass + true service accounts only, access-reviewed
Per-user MFA/SSPR gaps closed	Converts lockouts into self-remediation	Campaign effort	Do in Phase 0
Dismissal hygiene (below)	Model quality long-term	Analyst discipline	Non-negotiable

Service accounts deserve their own line: user-shaped service accounts (shared mailboxes with passwords, scripts running as users) trip risk constantly — they sign in from datacenters at 03:00 with no MFA — and teams respond by excluding them from risk policies, creating permanent unmonitored bypass identities. The correct fix is not exclusion, it is migration: workloads to managed identities/service principals (Governing OAuth Consent and Application Permissions in Entra ID covers the app-identity side), and whatever genuinely must remain user-shaped goes into a tightly-membered, access-reviewed exclusion group with compensating monitoring.

Dismissal hygiene: your labels train the model

When you dismiss, confirm-safe, or confirm-compromise, you are not doing dashboard housekeeping — you are labelling training data, and the labels flow back into scoring:

Habit	Consequence
Bulk-dismiss to “clean up” the blade	Model under-weights those patterns tenant-wide; humans learn the blade is meaningless; the one real detection in the batch is now labelled benign
Dismiss without user contact or device evidence	Unverified FP labels; repeat-offender patterns never surface
Confirm safe on verified-legit sign-ins	High-quality FP label; the same context scores lower next time — the good kind of tuning
Confirm compromised on verified incidents	High-quality TP label; similar patterns score higher — you are hardening every tenant Microsoft protects
Dismiss instead of remediate for real-but-handled risk	Wrong label: dismissal says “this was never risk.” If it was real and you contained it, the close-out is confirm-compromised → contain → dismiss after verification

Weekly, audit who dismissed what — dismissals are audited in AuditLogs and reversible only by re-detection:

AuditLogs
| where TimeGenerated > ago(7d)
| where OperationName in ("DismissUser", "ConfirmSafe", "ConfirmCompromised",
    "Dismiss risky user", "Confirm user compromised", "Confirm sign-in safe")
    or OperationName has_any ("risky", "risk")
| extend Actor = tostring(InitiatedBy.user.userPrincipalName)
| summarize actions = count() by Actor, OperationName
| sort by actions desc

A single analyst carrying 80% of dismissals with zero confirm-safes is a process smell: they are clearing a queue, not investigating.

Automation and integration: Graph, Sentinel, and Defender

The portal is for humans mid-investigation; everything recurring reads Graph or KQL. Identity Protection exposes three resource collections plus per-user history, and the full admin action set, over graph.microsoft.com:

Endpoint (v1.0)	Returns	Key query params	Least-privilege permission
`GET /identityProtection/riskyUsers`	Users with current risk level/state/detail	`$filter=riskLevel eq 'high'`, `riskState`, `riskLastUpdatedDateTime`	`IdentityRiskyUser.Read.All`
`GET /identityProtection/riskyUsers/{id}/history`	Full risk history for one user	—	`IdentityRiskyUser.Read.All`
`POST /identityProtection/riskyUsers/confirmCompromised`	Sets High/confirmedCompromised	body: `{"userIds":[...]}`	`IdentityRiskyUser.ReadWrite.All`
`POST /identityProtection/riskyUsers/dismiss`	Clears risk to none/dismissed	body: `{"userIds":[...]}`	`IdentityRiskyUser.ReadWrite.All`
`GET /identityProtection/riskDetections`	Individual detections	`$filter=detectedDateTime gt ...`, `riskEventType`, `riskLevel`	`IdentityRiskEvent.Read.All`
`GET /identityProtection/riskyServicePrincipals`	Workload identity risk (needs Workload ID Premium)	`riskLevel`, `riskState`	`IdentityRiskyServicePrincipal.Read.All`
`GET /identityProtection/servicePrincipalRiskDetections`	Workload detections	`$filter` as above	`IdentityRiskEvent.Read.All`
`GET /auditLogs/signIns`	Sign-ins incl. risk fields	`$filter=riskLevelDuringSignIn eq 'high'`	`AuditLog.Read.All`

A daemon doing the SIEM pull should be an app registration with application permissions (IdentityRiskEvent.Read.All at minimum), admin-consented, authenticating with a federated credential or certificate — never a client secret. The watermark pattern keeps the pull incremental and loss-free:

# Incremental risk-detection pull keyed on a persisted watermark
LAST="$(cat /var/lib/idp/last_watermark 2>/dev/null || echo '2026-06-01T00:00:00Z')"

az rest --method get \
  --url "https://graph.microsoft.com/v1.0/identityProtection/riskDetections?\$filter=detectedDateTime gt ${LAST}&\$orderby=detectedDateTime asc&\$top=500" \
  > /tmp/idp_batch.json

jq -r '.value[] | [.detectedDateTime, .riskEventType, .riskLevel, .riskState,
  .userPrincipalName, .ipAddress] | @tsv' /tmp/idp_batch.json

# Advance the watermark only after successful ingest
jq -r '.value[-1].detectedDateTime // empty' /tmp/idp_batch.json | tee /var/lib/idp/last_watermark

Streaming into Sentinel: the tables and what lives where

For a SOC, skip the polling and stream. Two integration paths land complementary data:

Path	Mechanism	Tables produced	Contains
Microsoft Entra ID connector (diagnostic settings)	Log categories → Log Analytics	`SigninLogs`, `AADNonInteractiveUserSignInLogs`, `AuditLogs`, `AADUserRiskEvents`, `AADRiskyUsers`, `AADServicePrincipalRiskEvents`, `AADRiskyServicePrincipals`	Raw detections, risk state history, per-sign-in risk fields
Microsoft Entra ID Protection connector (alerts)	Alert feed	`SecurityAlert` (ProductName = the Identity Protection provider)	Curated alerts per detection, incident-ready

Enable the diagnostic categories as code — this is the part teams forget, then wonder why AADUserRiskEvents is empty:

az monitor diagnostic-settings create \
  --name "entra-to-sentinel" \
  --resource "/providers/microsoft.aadiam/diagnosticSettings" \
  --workspace "<log-analytics-workspace-resource-id>" \
  --logs '[
    {"category":"SignInLogs","enabled":true},
    {"category":"NonInteractiveUserSignInLogs","enabled":true},
    {"category":"AuditLogs","enabled":true},
    {"category":"UserRiskEvents","enabled":true},
    {"category":"RiskyUsers","enabled":true},
    {"category":"RiskyServicePrincipals","enabled":true},
    {"category":"ServicePrincipalRiskEvents","enabled":true}
  ]'

The KQL set I consider the minimum viable hunting pack:

// 1. Coverage gap: risky sign-ins that SUCCEEDED without CA applying —
//    each row is both an incident and a policy-design finding
SigninLogs
| where TimeGenerated > ago(24h)
| where RiskLevelDuringSignIn in ("high","medium") and ResultType == 0
| where ConditionalAccessStatus == "notApplied"
| project TimeGenerated, UserPrincipalName, AppDisplayName, IPAddress,
    Location, RiskLevelDuringSignIn, RiskEventTypes_v2, AuthenticationRequirement

// 2. Leaked credentials -> did anyone sign in successfully AFTER the leak landed
//    and BEFORE remediation? That window is your real exposure.
let leaks = AADUserRiskEvents
    | where TimeGenerated > ago(14d) and RiskEventType == "leakedCredentials"
    | project UserId, leakTime = DetectedDateTime;
SigninLogs
| where TimeGenerated > ago(14d) and ResultType == 0
| join kind=inner leaks on UserId
| where TimeGenerated > leakTime
| where RiskState != "remediated"
| project TimeGenerated, UserPrincipalName, IPAddress, AppDisplayName, leakTime

// 3. Password-spray campaign scope: from one detection to the whole blast radius
let sprayIPs = AADUserRiskEvents
    | where TimeGenerated > ago(7d) and RiskEventType == "passwordSpray"
    | distinct IpAddress;
SigninLogs
| where TimeGenerated > ago(7d)
| where IPAddress in (sprayIPs)
| summarize attempts = count(), failures = countif(ResultType != 0),
    successes = countif(ResultType == 0), targets = dcount(UserPrincipalName) by IPAddress
| sort by successes desc, attempts desc

// 4. Report-only impact sizing for the two policies before enforcement
SigninLogs
| where TimeGenerated > ago(14d)
| mv-expand ca = todynamic(ConditionalAccessPolicies)
| where tostring(ca.displayName) startswith "CA30"
| where tostring(ca.result) startswith "reportOnly"
| summarize signins = count(), users = dcount(UserPrincipalName)
    by policy = tostring(ca.displayName), result = tostring(ca.result)

// 5. False-positive rate by detection type — the tuning scoreboard
AADUserRiskEvents
| where TimeGenerated > ago(30d)
| summarize total = count(),
    dismissed = countif(RiskState in ("dismissed","confirmedSafe")),
    remediated = countif(RiskState == "remediated"),
    confirmedBad = countif(RiskState == "confirmedCompromised")
  by RiskEventType
| extend fpRate = round(100.0 * dismissed / total, 1)
| sort by fpRate desc

Playbooks and the human-in-the-loop line

Wire a Sentinel automation rule on Identity Protection alerts at High severity to a Logic App that executes the containment runbook (confirm compromised → revoke sessions → force reset → notify the SecOps channel and the user’s manager) using the Graph calls from the investigation section under a managed identity. Draw the automation line deliberately:

Signal	Auto-contain?	Rationale
Leaked credentials + subsequent foreign-IP successful sign-in	Yes	Two independent corroborating signals; false-positive odds negligible
AiTM detection	Yes (at minimum auto-revoke sessions)	Session already live; minutes matter
High user risk, leaked credentials only, no suspicious sign-in	Semi — force Policy B by notifying user to re-auth	Self-remediation path exists; destructive action unneeded
Atypical travel alone, any level	No — human triage	Highest FP class; auto-containment here torches trust in the program
Verified threat actor IP	Yes for revoke; human for reset/disable	High fidelity, but account disable of an exec needs a human eye

Defender-side: Identity Protection alerts and risk signals surface in Microsoft Defender XDR, where they correlate with Defender for Endpoint (device compromise → attemptedPRTAccess), Defender for Office 365 (the phish that started the AiTM chain), and Defender for Identity (the on-prem lateral movement after). The practical habit: investigate identity incidents in the Defender portal when you have the suite — the incident graph stitches the kill chain — and keep Sentinel as the cross-source hunting and retention layer. If you also run PIM, note the compounding control: PIM role activation can require an authentication context enforced by a CA policy that includes sign-in-risk conditions, so a risky session cannot activate Global Admin even with valid credentials.

Architecture at a glance

Picture the system as a control loop wrapped around every authentication. On the left, the signal plane: every sign-in flows through Entra ID’s token service, where real-time detections (anonymous IP, unfamiliar properties, threat-actor IP, token anomalies) score the request as it happens, while offline detectors — the leaked-credential matchers, the geo-velocity model, the spray-pattern miner, Defender for Cloud Apps’ behavioural analytics, Defender XDR’s AiTM identification — continuously push detections in minutes-to-hours after the fact. All of it lands in two aggregate scores: a per-request sign-in risk and a per-identity user risk.

In the middle sits the decision plane: Conditional Access evaluates signInRiskLevels and userRiskLevels alongside every other condition, with your trusted named locations damping the scoring for controlled egress before policies even fire. Policy CA300 converts Medium+ sign-in risk into a fresh MFA challenge (every-time frequency, no cached claims); policy CA301 converts High user risk into MFA plus a secure password change; both carve out only the break-glass group. Continuous Access Evaluation back-propagates High-user-risk events into live sessions on CAE-capable services, so enforcement is not waiting for token expiry.

On the right, the response plane closes the loop twice. The fast loop is self-remediation: the user passes MFA or changes their password, the platform flips riskState to remediated, and no human was involved. The slow loop is your SOC: detections stream through diagnostic settings into Sentinel (AADUserRiskEvents, AADRiskyUsers, SigninLogs) and the alert connector, automation rules trigger containment playbooks over Graph (confirmCompromised, session revocation, forced reset), analysts investigate across the three portal blades, and every dismiss / confirm-safe / confirm-compromised verdict flows back into the model as a label — which changes what the signal plane scores tomorrow. The architecture’s defining property is that you are not a consumer of the risk engine; you are a component of it.

Real-world scenario

Meridian AgriFinance, a 12,000-seat lender headquartered in Pune with field offices across three states, ran Entra ID P2 (bundled in their Microsoft 365 E5 step-up) with Identity Protection in the default posture: both legacy toggles on (sign-in risk Medium+ → MFA, user risk High → password change), no named locations, no Sentinel export, risky-users blade unowned. It “worked” — meaning nobody looked at it.

The first symptom was economic, not security: the helpdesk logged 340 password-reset tickets in one month tagged “system forced reset,” and the CISO asked why. Investigation showed the field-force VPN had been migrated to a cloud SWG whose egress sat in Chennai and Mumbai — so every field agent’s morning looked like Pune→Chennai teleportation. Atypical travel and unfamiliar sign-in properties detections had spiked 9×, pushed dozens of users to High weekly, and the legacy user-risk toggle forced resets. Worse: the identity team had responded by bulk-dismissing ~1,100 detections over two months — poisoning the model with false “benign” labels for exactly the sign-in shape a real attacker relaying through cloud infrastructure would produce.

The rebuild took six weeks along the phases in this article. Week 1–2: MFA registration pushed from 91% to 99.2% (the 8% gap was exactly the population that would have become lockout tickets), SSPR writeback verified end-to-end, both SWG egress ranges plus office/VPN CIDRs became trusted named locations, and the same ranges were tagged Corporate in Defender for Cloud Apps. Week 3–4: legacy toggles off; CA300/CA301 deployed in report-only via Terraform. The report-only data was the eye-opener: would-be MFA challenges fell from a projected ~1,900/week (pre-tuning) to ~140/week, and would-be forced resets from ~85/week to ~6/week. Week 5: enforcement for IT and one field region; week 6: everyone. Sentinel got the diagnostic categories and the FP-rate-by-detection-type query became a Monday dashboard.

Fourteen weeks post-enforcement, the program caught its first real incident: a leaked-credentials detection on a credit analyst (infostealer on a personal device harvesting a reused password), followed nine hours later by an anonymous-IP sign-in attempt from a hosting ASN — which CA300 challenged and the attacker failed. Policy B forced a secure password change on the analyst’s next morning sign-in; the SOC’s Sentinel correlation query had already flagged the pair, and the analyst confirmed the timeline. Total human effort: one 20-minute investigation and one confirm-compromised label. The numbers that sold the program to the board: reset tickets down from 340/month to 11/month, dismissal rate down from 96% of detections to 7%, self-remediation at 94%, and a measured MTTR for High user risk of 6.5 hours — versus the industry’s weeks-long credential-exposure dwell times.

Metric	Before	After (steady state)
Forced-reset helpdesk tickets / month	340	11
Detections dismissed (FP rate proxy)	96% (bulk)	7% (investigated)
MFA challenges from risk / week	~1,900 (projected)	~140
Self-remediation rate	Unmeasured (~30%)	94%
MTTR, High user risk	Days–weeks (unowned)	6.5 hours
Real incidents caught by the loop	Unknown	1 in first quarter, contained pre-impact

Advantages and disadvantages

Advantages	Disadvantages
Adaptive: challenges scale with evidence, so low-risk users see fewer prompts than blanket MFA — less fatigue, better security	Requires P2 for every user in scope — real money at enterprise seat counts
Backed by Microsoft-scale signal (breach corpora, MSTIC intel, cross-tenant patterns) you could never build in-house	The model is a black box: no published weights, no per-detection sensitivity dials — you tune inputs and labels only
Self-remediation closes credential exposure in hours with zero tickets when the plumbing is right	Self-remediation plumbing (MFA/SSPR/writeback/PHS) is a hard prerequisite; missing pieces convert detections into lockouts
Offline detections (leaked creds, AiTM, inbox rules) catch what no sign-in-time control can see	Offline latency means the triggering session is already live — you must pair with CAE + revocation runbooks
Native CA integration: risk is just another condition, composable with device, location, auth strength	False positives are structural (VPN/cloud egress) until named locations are complete — untuned tenants drown
Full API surface (Graph) + Sentinel tables make it automatable end to end	Portal retention is short (sign-in logs 30 days; ~90 days of risk history) — long-term evidence requires export
Your labels improve the model — a mature SOC compounds value	Your sloppy labels degrade it — bulk dismissal actively harms detection quality
Extends to workload identities (risky service principals)	…at additional Workload ID Premium licensing, and with a smaller detection set

The honest summary: Identity Protection is the highest-leverage account-takeover control in the Microsoft stack if you fund P2, finish the remediation plumbing before enforcement, and staff the label-hygiene loop. Deployed as a checkbox, it delivers noise, lockouts, and a false sense of coverage.

Hands-on lab

Goal: in a non-production tenant, stand up the full loop — named location, both risk policies in report-only, a reproducible risk detection, self-remediation, investigation via Graph, and clean teardown. Cost: zero if you use a trial (Entra ID P2 30-day trial, or a Microsoft 365 developer/E5 trial tenant). You need Global Administrator (lab tenant), a test user with Authenticator registered, and the Tor Browser on a lab machine — the cleanest reproducible anonymous IP trigger there is.

Step 1 — Confirm P2 and baseline state.

az rest --method get --url "https://graph.microsoft.com/v1.0/subscribedSkus" \
  --query "value[].{sku:skuPartNumber, enabled:prepaidUnits.enabled}" -o table
# Expect AAD_PREMIUM_P2 (or an E5 SKU containing it) with enabled > 0

az rest --method get \
  --url "https://graph.microsoft.com/v1.0/identityProtection/riskDetections?\$top=1" -o json
# Expect 200 with a value array (possibly empty) — proves the feed and your permissions

Step 2 — Create the test user and register MFA. Create risk.lab@<tenant>, sign in once normally from your usual network, and register Microsoft Authenticator when combined registration prompts. This sign-in also starts building the user’s “familiar” baseline.

Step 3 — Trusted named location for your lab egress (so your own normal sign-ins stay quiet):

Connect-MgGraph -Scopes "Policy.ReadWrite.ConditionalAccess"
$myEgress = (Invoke-RestMethod "https://api.ipify.org?format=json").ip
New-MgIdentityConditionalAccessNamedLocation -BodyParameter @{
  "@odata.type" = "#microsoft.graph.ipNamedLocation"
  displayName   = "NL-Lab-Trusted"
  isTrusted     = $true
  ipRanges      = @(@{ "@odata.type"="#microsoft.graph.iPv4CidrRange"; cidrAddress = "$myEgress/32" })
}

Step 4 — Deploy CA300 and CA301 in report-only using the two az rest bodies from the policy section (keep enabledForReportingButNotEnforced; put your admin account’s object ID in excludeUsers as the lab’s break-glass stand-in). Verify:

Get-MgIdentityConditionalAccessPolicy -Filter "startsWith(displayName,'CA30')" |
  Select-Object DisplayName, State
# Expected: both policies, state = enabledForReportingButNotEnforced

Step 5 — Generate a real detection. On the lab machine, open Tor Browser → https://portal.office.com → sign in as risk.lab. Expect friction (new device + Tor exit). Complete the sign-in if allowed — report-only will not block. Real-time detections typically surface in 5–10 minutes.

Step 6 — Observe the detection in Graph and the portal.

az rest --method get \
  --url "https://graph.microsoft.com/v1.0/identityProtection/riskDetections?\$filter=userPrincipalName eq 'risk.lab@<tenant>'&\$orderby=detectedDateTime desc" \
  --query "value[].{type:riskEventType, level:riskLevel, state:riskState, timing:detectionTimingType, ip:ipAddress}" -o table
# Expected: anonymizedIPAddress (realtime), likely unfamiliarFeatures too, riskState=atRisk

Portal: Protection → Identity Protection → Risky sign-ins shows the Tor sign-in with Risk level (real-time); the sign-in log’s Conditional Access tab shows CA300 = reportOnlyFailure (it would have demanded MFA). That row is your proof the policy logic works before any user feels it.

Step 7 — Watch self-remediation. Flip CA300 to enabled (lab tenant only), repeat the Tor sign-in, and complete the MFA challenge. Re-query step 6: the sign-in’s riskState moves to remediated with riskDetail userPassedMFADrivenByRiskBasedPolicy. This is the no-ticket loop in miniature.

Step 8 — Exercise the admin actions.

Connect-MgGraph -Scopes "IdentityRiskyUser.ReadWrite.All"
$u = Get-MgRiskyUser -Filter "userPrincipalName eq 'risk.lab@<tenant>'"

Confirm-MgRiskyUserCompromised -UserIds @($u.Id)      # -> riskLevel high, confirmedCompromised
Get-MgRiskyUserHistory -RiskyUserId $u.Id |
  Select-Object RiskLastUpdatedDateTime, RiskState, RiskDetail | Sort-Object RiskLastUpdatedDateTime
Invoke-MgDismissRiskyUser -UserIds @($u.Id)           # verified-clean close-out -> none/dismissed

The history output shows the whole state machine you just drove: atRisk → remediated → confirmedCompromised → dismissed.

Step 9 — Teardown.

Get-MgIdentityConditionalAccessPolicy -Filter "startsWith(displayName,'CA30')" |
  ForEach-Object { Remove-MgIdentityConditionalAccessPolicy -ConditionalAccessPolicyId $_.Id }
Get-MgIdentityConditionalAccessNamedLocation -Filter "displayName eq 'NL-Lab-Trusted'" |
  ForEach-Object { Remove-MgIdentityConditionalAccessNamedLocation -NamedLocationId $_.Id }
Remove-MgUser -UserId "risk.lab@<tenant>"

Step	What it proved
3	`isTrusted` named locations are code, not clicks
4	Risk policies deploy as Graph JSON in report-only
5–6	Detections are reproducible and queryable; report-only logs the would-be outcome
7	MFA driven by the risk policy remediates sign-in risk automatically
8	The full risk state machine responds to Graph admin actions

Common mistakes & troubleshooting

The playbook — symptom → root cause → confirm → fix. Bookmark this table; the prose above explains, this operates:

#	Symptom	Root cause	Confirm (exact path/cmd)	Fix
1	Users forced to change passwords, then locked out of on-prem resources	Hybrid tenant without password writeback — cloud reset never reaches AD	`Get-ADSyncAADPasswordResetConfiguration` on the Connect server; test SSPR end-to-end	Enable writeback in Entra Connect; or the on-prem-reset-clears-risk setting with PHS
2	Zero leaked-credential detections ever, in a large hybrid tenant	PHS disabled (PTA/federation only) — no hash to match against corpora	Entra Connect → sync features; `Get-MgDirectoryOnPremiseSynchronization` Features	Enable PHS (at least as backup auth)
3	Constant atypical-travel/unfamiliar-properties noise from your own workforce	Corporate/VPN/SWG egress not in trusted named locations	Tuning-discovery KQL (dismissed-risk by IP/ASN, ≥3 users)	Add ranges with `isTrusted=true`; tag Corporate in MDCA too
4	Named locations added but FP volume unchanged	`isTrusted` left false — labelling without scoring effect	`Get-MgIdentityConditionalAccessNamedLocation` — check `IsTrusted`	Set `isTrusted = $true` on the corporate ranges
5	MDCA-sourced detections (`mcasImpossibleTravel`, `newCountry`) persist after Entra tuning	Defender for Cloud Apps has its own IP-range store	MDCA → Settings → IP address ranges — corporate ranges missing	Tag the same ranges as Corporate in MDCA
6	User remediated but still trips the user-risk policy on every sign-in	Left at `confirmedCompromised`/High — nobody closed the loop	`Get-MgRiskyUser -Filter "riskState eq 'confirmedCompromised'"`	Verify clean, then `Invoke-MgDismissRiskyUser`
7	Risky user cannot self-remediate; helpdesk call every time	No MFA method registered — cannot satisfy either policy	`userRegistrationDetails` filter `isMfaCapable eq false`	Registration campaign before enforcement; temp access pass for stragglers
8	High-risk sign-ins succeeding with no challenge	CA policy scope hole: user/app excluded, or `notApplied`	Coverage-gap KQL (#1 in hunting pack); sign-in log CA tab	Close the scope gap; check exclusion-group membership sprawl
9	`passwordChange` policy fails to create via Graph	Control constraints violated: no MFA pair, OR operator, or app scope not All	Graph 400 response detail	`["mfa","passwordChange"]`, operator AND, All resources, no app exclusions
10	Break-glass account challenged for MFA on risky sign-in	Emergency accounts not excluded from the new risk policies	Exclusion-audit PowerShell (policy section)	Exclude the break-glass group from every CA policy; test quarterly
11	Sentinel `AADUserRiskEvents` / `AADRiskyUsers` empty	Diagnostic settings missing the risk log categories	`az monitor diagnostic-settings list --resource "/providers/microsoft.aadiam/diagnosticSettings"`	Add `UserRiskEvents`, `RiskyUsers` (+ SP variants) categories
12	Detections visible yesterday, gone today; auditors unhappy	Portal retention limits (sign-in logs 30 days; risk history ~90)	Compare portal vs workspace row counts	Export to Log Analytics/Sentinel; set workspace retention ≥ audit requirement
13	Every detection says “Additional risk detected”	Viewing tenant lacks P2 (or trial expired)	`subscribedSkus` for AAD_PREMIUM_P2	License P2; detail backfills for the retained window
14	Service accounts perpetually at risk, breaking jobs when challenged	User-shaped automation identities in scope of risk policies	Risky-users blade filtered to the svc-naming pattern	Migrate to managed identities/service principals; tightly-scoped exclusion group for the rest
15	Users report MFA prompts they didn’t initiate; no High risk raised	MFA-fatigue attack in progress below detection thresholds	`SigninLogs` ResultType 500121 (denied/timeout MFA) clustered per user	Number matching (default now), report-suspicious-activity education; consider confirm-compromised on the account
16	Legacy Identity Protection policy still on alongside CA versions	Migration never finished — double prompts, muddy impact data	Identity Protection blade → both legacy policies show configured	Disable legacy toggles after CA equivalents prove out in report-only

Best practices

Plumbing before policy. MFA registration >95%, SSPR verified, writeback (or on-prem-reset setting) tested end-to-end, PHS on — then enforce. Every gap is a future lockout ticket.
Break-glass exclusions first, verified quarterly. Two cloud-only emergency accounts, phishing-resistant methods, excluded from all CA policies, sign-in alerting on them wired before you enable anything risk-based.
CA policies, not legacy toggles, deployed as code (Graph/Terraform) through the same review pipeline as the rest of your CA estate.
Sign-in risk Medium+ → MFA with every-time frequency; user risk High-only → MFA + password change. Resist Medium user-risk enforcement outside high-security enclaves.
Report-only for 2–4 weeks before every change — initial rollout, threshold moves, scope expansions. Re-baseline after any egress architecture change.
Trust every corporate egress range, and only corporate egress. Maintain the named-location inventory as code; mirror it into MDCA IP tags; review on every network change ticket.
Prefer phishing-resistant authentication strength at High sign-in risk over plain MFA once FIDO2/WHfB coverage allows — it is the control that actually defeats AiTM.
Label with discipline. Confirm-safe verified false positives, confirm-compromise verified incidents, dismiss only after investigation, never in bulk. Audit dismissals weekly by analyst.
Close the confirmed-compromised loop: contain → verify → dismiss. A permanent High-risk user is a process failure, not a security posture.
Export everything to Sentinel/Log Analytics on day one — the portal’s retention is an investigation window, not an evidence store.
Run the KPIs weekly: self-remediation rate (>90% target), FP rate by detection type (falling), MTTR for High user risk (<1 business day), high-risk-success-with-notApplied count (zero).
Treat risky workload identities as the next frontier: license Workload ID Premium for your service-principal-heavy estate and put a block policy on high-risk service principals — they have no MFA to fall back on.

Security notes

Least privilege for the program itself. Investigators get Security Operator (act on risk) or Security Reader (view); policy authors get Conditional Access Administrator; nobody needs Global Admin for daily operations. The Graph automation identity gets exactly IdentityRiskEvent.Read.All (+ IdentityRiskyUser.ReadWrite.All only if the playbook remediates) with a federated credential or certificate — a leaked client secret on an app that can dismiss risk is an attacker’s dream primitive.
Protect the levers. confirmSafe/dismiss can neutralize the control loop: an attacker with a foothold in your SOC tooling can whitelist their own sign-ins. Gate those Graph permissions tightly, alert on anomalous dismissal volume, and require PIM activation for the roles that hold them.
Risk data is sensitive. Detections embed IPs, geolocation, travel patterns, and breach-derived facts about employees. Scope workspace RBAC on the AAD* tables, mind data-residency when routing to a SIEM, and loop in privacy/works-council stakeholders where jurisdiction requires it.
Exclusion groups are attack surface. Anyone who can add members to a risk-policy exclusion group can grant risk immunity. Make those groups role-assignable (blocks non-privileged owners), review membership in your access-reviews program, and alert on membership changes.
Pair risk policies with token-theft controls. Risk detection is reactive to AiTM; phishing-resistant auth strengths, CAE strict enforcement where viable, and device-bound credentials shrink the attack class rather than just detecting it.
Never weaken to “fix” noise. The anti-pattern chain — exclude the noisy department, trust the ISP’s whole range, drop to High-only permanently — each converts a tuning task into a standing bypass. Tune inputs (locations, MDCA tags) and labels, not coverage.

Cost & sizing

What actually drives the bill:

Entra ID P2 is the license gate: list around USD 9 / ₹750–820 per user per month standalone, but nearly everyone lands it via Microsoft 365 E5 or the E5 Security add-on — if you already own E5, the marginal cost of this entire program is engineering time. Scope reality: P2 must cover every user the risk policies apply to, not just admins; under-licensing enforcement scope is a compliance finding waiting to happen.
Workload ID Premium (risky service principals + CA for workload identities) prices per workload identity per month (order of USD 3 / ₹250) — size it to your service-principal estate, not your headcount.
Sentinel/Log Analytics ingestion is the operational cost: analytics-tier ingestion runs roughly USD 4–5 / ₹350–450 per GB. SigninLogs dominates volume — as a planning number, ~1 KB per interactive sign-in event; a 10,000-user org commonly lands in the 1–4 GB/day range for sign-in + audit + risk categories combined (₹12,000–55,000/month before commitment-tier discounts). The risk tables themselves (AADUserRiskEvents, AADRiskyUsers) are tiny — never economize by dropping them; economize with commitment tiers and by routing NonInteractiveUserSignInLogs to cheaper handling if hunting needs allow.
Helpdesk economics is where the program pays: at a conservative ₹400–800 fully-loaded cost per password-reset ticket, the Meridian-style drop (340 → 11 tickets/month) recovers ₹1.6–3.2 lakh/month — before valuing a single prevented compromise.

Cost line	Driver	Rough figure (INR/month)	Levers
Entra ID P2	Per user in policy scope	₹750–820/user (list; bundled in E5)	E5 bundling; scope honestly
Workload ID Premium	Per service principal protected	~₹250/workload	Protect the crown-jewel SPs first
Log ingestion	GB/day of SigninLogs + audit + risk	₹12k–55k @ 10k users	Commitment tiers; table-level routing
Log retention beyond interactive	GB retained past included window	Low — risk tables are small	Archive tier for compliance years
Playbook/Logic Apps executions	Per-run consumption	Negligible (< ₹1k)	—
Helpdesk offset	Tickets avoided	−₹1.6–3.2 lakh at 12k seats	The line that funds the program

Sizing guidance: pilot the ingestion for two weeks before committing to a tier; risk-table volume scales with detection count (tiny), sign-in volume scales with headcount × workload mix (Office-heavy orgs sign in more). Prices move — treat the numbers as ratios and check the Azure pricing calculator and your EA rates.

Interview & exam questions

1. Distinguish sign-in risk from user risk and give the flagship detection for each. Sign-in risk is the per-authentication probability that the request wasn’t made by the account owner — flagship: anonymous IP or unfamiliar sign-in properties, evaluated in real time. User risk is the per-identity probability of compromise, accumulated across signals — flagship: leaked credentials, where Microsoft matched the user’s credential pair in a breach corpus. They are enforced by different policies with different remediations: MFA for sign-in risk, secure password change for user risk.

2. Why can a leaked-credentials detection never block the sign-in that used the leaked password? Because it is an offline detection: Microsoft discovers the leak in external corpora after the fact, minutes to hours later. The compensating design is the user-risk Conditional Access policy forcing secure password change at the next authentication, plus Continuous Access Evaluation revoking live sessions on CAE-capable services when user risk goes High, plus a revocation runbook for everything else.

3. Why build risk enforcement as Conditional Access policies instead of the built-in Identity Protection policies? The legacy toggles are deprecated and lack everything operational: no report-only mode, no granular scoping or exclusions (break-glass!), no session controls like every-time sign-in frequency, no authentication strengths, one policy per risk type, and poor policy-as-code support. CA risk conditions provide all of it and consolidate enforcement in one engine.

4. What are the platform constraints on the passwordChange grant control? It must be combined with mfa using operator AND (identity proof before rotation — otherwise the attacker holding the leaked password could rotate it), the policy must target all resources with no app exclusions, and it cannot be mixed with other grant controls. Users also need a working change path — SSPR registration and, for hybrid, password writeback.

5. Walk through the correct close-out of a confirmed compromise. Confirm compromised (sets High/confirmedCompromised, feeds a true-positive label, fires the CAE critical event) → revoke sessions/refresh tokens → force credential rotation → blast-radius sweep (inbox rules, consents, devices, role changes) → after verified remediation, dismiss the user risk so they return to none. Skipping the final dismiss leaves them re-tripping the user-risk policy indefinitely; skipping revocation leaves stolen tokens alive.

6. A user’s sign-in shows riskLevelDuringSignIn: low but riskLevelAggregated: high. What happened and why does it matter? Real-time scoring at the moment was low, but offline detections (e.g., the IP was later attributed to a threat actor, or MDCA analytics flagged the session) landed afterwards and raised the final score. It matters because gating decisions made at sign-in time used the low value — so investigation must cover what the now-high-risk session accessed, and CAE/user-risk policy become the enforcement path.

7. How does AiTM phishing defeat classic MFA, how is it detected, and what prevents it? An AiTM kit proxies the real login page, relays credentials and the MFA exchange, and steals the resulting session cookie — the user “passed MFA” for the attacker’s session. Detection is the offline attackerinTheMiddle risk event (Defender/MSTIC-sourced), plus anomalousToken on replay. Prevention is phishing-resistant, origin-bound authentication (FIDO2/passkeys/WHfB) — a relay cannot forward a credential cryptographically bound to the legitimate origin — which is why “require authentication strength” at High sign-in risk beats plain MFA.

8. Hybrid tenant, PTA only, no PHS. What Identity Protection coverage is missing? Leaked-credentials detection for synced users — the match runs against the password hash Microsoft stores, which PTA-only tenants never sync. Also, the on-premises-password-change-resets-risk path is unavailable. Enabling PHS (even as backup) restores both.

9. What is dismissal hygiene and why does bulk dismissal actively harm the tenant? Admin verdicts (dismiss, confirm safe, confirm compromised) are labels consumed by the risk model. Bulk-dismissing noise teaches the model those patterns are benign — including the pattern a real attacker will use — and desensitizes analysts. Correct hygiene: tune the structural cause (named locations), then label only investigated outcomes.

10. How do trusted named locations reduce false positives, and what’s the classic misconfiguration? Ranges marked isTrusted = true feed the risk engine’s familiarity model, damping atypical-travel/unfamiliar-properties scoring for controlled corporate egress (offices, VPN, SWG). The classic miss: creating the named location but leaving isTrusted false (labelling with no scoring effect) — and forgetting that MDCA-sourced detections need the same ranges tagged Corporate inside Defender for Cloud Apps.

11. Which Graph permissions and endpoints would a SOC integration use to pull detections and remediate, and how should it authenticate? GET /identityProtection/riskDetections and /riskyUsers under IdentityRiskEvent.Read.All / IdentityRiskyUser.Read.All; remediation actions POST /riskyUsers/confirmCompromised and /dismiss under IdentityRiskyUser.ReadWrite.All. Application permissions with admin consent, authenticated by federated credential or certificate — never a client secret, since these permissions can silence the control loop.

12. Which Sentinel tables carry Identity Protection data and what belongs in each? Via Entra diagnostic settings: AADUserRiskEvents (individual detections), AADRiskyUsers (risk-state snapshots), plus AADServicePrincipalRiskEvents/AADRiskyServicePrincipals for workload identities; SigninLogs carries per-sign-in risk fields (RiskLevelDuringSignIn, RiskState, RiskEventTypes_v2, CA results). The alert connector adds curated SecurityAlert rows for incident creation.

These map directly to SC-300 (implement and manage Identity Protection; plan and implement risk policies), AZ-500 (identity and access security controls), SC-200 (investigate identity alerts in Sentinel/Defender), and conceptually to SC-900.

Question theme	Primary cert	Objective area
Risk model, detections, risk policies	SC-300	Plan and implement Entra Identity Protection
Risk-based CA design, auth strengths	SC-300 / AZ-500	Plan and implement Conditional Access
Investigation, Sentinel KQL, playbooks	SC-200	Mitigate identity threats; hunt with KQL
Hybrid mechanics (PHS, writeback)	SC-300	Implement hybrid identity
AiTM, token theft, phishing resistance	SC-300 / SC-200	Attack mitigation and authentication methods

Quick check

A leaked-credentials detection fires at 14:00 for a user who signed in successfully at 13:45. Could a sign-in-risk policy have blocked that 13:45 sign-in? What control handles this case?
Your user-risk policy requires password change, and hybrid users who comply immediately lock out of file shares. What’s broken?
An analyst wants to clear 60 atypical-travel detections from the sales team’s new SD-WAN egress in one action. What should happen instead?
Which two admin actions feed the model a label, and what does each do to the user’s risk level?
Sign-in risk Medium+ requires MFA in your tenant. Why add “sign-in frequency: every time” to that policy?

Answers

No — leaked credentials is an offline, user-level detection; it cannot gate the sign-in that revealed it. The handling controls are the user-risk policy (secure password change at next sign-in), CAE (High-user-risk critical event revoking live CAE-capable sessions), and the containment runbook (revocation) for everything else.
Password writeback is disabled (or broken): the cloud password change remediates risk but never reaches on-prem AD, so Kerberos/NTLM resources still expect the old password. Verify with Get-ADSyncAADPasswordResetConfiguration and an end-to-end test; alternatively enable the on-prem-password-change-resets-risk path with PHS.
Fix the structural cause first: add the SD-WAN egress ranges as trusted named locations (isTrusted = true) and tag them Corporate in Defender for Cloud Apps. Then close the existing detections deliberately — confirm-safe/dismiss what was investigated — never bulk-dismiss, because those verdicts are training labels.
Confirm compromised — forces user risk to High (confirmedCompromised), a true-positive label. Confirm safe / dismiss — clears risk to none (confirmedSafe/dismissed), a false-positive/benign label. Both shape future scoring tenant-wide.
Without it, a risky sign-in can be satisfied by a cached MFA claim from an earlier session — no fresh proof of presence at the moment of risk. frequencyInterval: everyTime forces re-authentication for each policy evaluation, so the challenge actually tests the human behind the risky request.

Glossary

Identity Protection — Entra ID’s ML risk engine producing sign-in risk and user risk from Microsoft-scale signals; sensor to Conditional Access’s enforcer.
Sign-in risk — probability a specific authentication wasn’t performed by the account owner; conditions signInRiskLevels in CA.
User risk — probability an identity is compromised, accumulated across signals; conditions userRiskLevels in CA.
Real-time detection — scored during authentication (anonymous IP, unfamiliar properties, threat-actor IP); can gate that sign-in; reporting latency 5–10 minutes.
Offline detection — scored post-authentication (leaked credentials, atypical travel, spray, AiTM, MDCA-sourced); affects subsequent evaluations only.
Risk state (riskState) — lifecycle value: none, atRisk, remediated, dismissed, confirmedSafe, confirmedCompromised.
Risk detail (riskDetail) — why the state last changed, e.g. userPerformedSecuredPasswordChange, adminGeneratedTemporaryPassword.
Leaked credentials — offline user-risk detection matching the user’s credential pair in breach/paste/dark-web corpora; requires PHS for hybrid users.
AiTM (attacker-in-the-middle) — phishing proxy that relays real credentials and MFA to steal the session cookie; detected offline as attackerinTheMiddle; defeated by origin-bound (phishing-resistant) authentication.
Password spray — low-and-slow campaign trying few passwords across many accounts; offline detection passwordSpray.
Anomalous token — token with abnormal lifetime/replay characteristics — the token-theft tripwire; respond with revocation.
Secure password change — MFA-then-password-change flow driven by the user-risk policy; self-remediates user risk (userPerformedSecuredPasswordChange).
Self-remediation — user clearing their own risk via risk-policy MFA or secure password change; the program’s economic core.
Trusted named location — IP ranges with isTrusted = true feeding the risk engine’s familiarity model; the structural false-positive fix.
CAE (Continuous Access Evaluation) — near-real-time session revocation on critical events, including elevation to High user risk, for CAE-capable services.
Report-only — CA state (enabledForReportingButNotEnforced) that logs would-be outcomes without enforcement; the mandatory soak before every risk-policy change.
Workload identity risk — risky-service-principal detection and policy, licensed via Microsoft Entra Workload ID Premium.
AADUserRiskEvents / AADRiskyUsers — the Log Analytics/Sentinel tables carrying detections and risk-state snapshots via Entra diagnostic settings.

Next steps

You now run risk as a control loop — detections in, enforcement and labels out. Build outward:

Next: Designing Conditional Access at Scale: A Persona-Based Policy Framework with Authentication Context and Filters — slot CA300/CA301 into a full persona architecture with naming, What-If change management, and policy-as-code CI/CD.
Related: Engineering Break-Glass Emergency Access Accounts in Entra ID — the exclusion you must engineer before any risk policy goes enforced.
Related: Rolling Out FIDO2 Passwordless Authentication in Entra ID — the phishing-resistant endgame that shrinks the attack classes Identity Protection spends its time detecting.
Related: Microsoft Entra Connect Sync Deep Dive: PHS, PTA, and Seamless SSO — the hybrid plumbing (PHS, writeback) that gates leaked-credential detection and self-remediation.
Related: Troubleshooting Entra MFA Registration and Sign-In Method Failures — fixing the registration gaps that turn risk policies into lockout machines.
Related: KQL Threat Hunting with MITRE ATT&CK, UEBA and Notebooks — grow the five-query hunting pack into a full identity-threat-hunting practice.