Defender XDR Advanced Hunting: Custom Detection Rules and Automatic Attack Disruption

A SIEM is only as good as the questions you ask it. Defender XDR’s advantage over a raw log lake is that endpoint, identity, email, and cloud-app telemetry are already normalized into one schema and pre-correlated into incidents. Advanced hunting is where you turn that schema into detection engineering: write a cross-domain KQL query that follows an attacker across the kill chain, promote it to a scheduled custom detection that takes its own response actions, and let automatic attack disruption contain the worst-case scenarios faster than any human SOC can. This guide is the workflow I run with platform and security teams to get from “interesting query” to “rule that isolates a host at 02:00 without paging anyone.”

Everything here assumes Defender for Endpoint Plan 2 (included in Microsoft 365 E5 / E5 Security). Advanced hunting and custom detections do not exist on Plan 1.

1. The unified advanced hunting schema

The single biggest reason to hunt in Defender XDR rather than bare Sentinel is that the workloads share a schema. You pivot from a process to the user who ran it to the email that delivered the payload to the SaaS app they then logged into, all without join-ing across disparate log sources. The tables you will live in:

Domain	Core tables	Backed by
Device	`DeviceProcessEvents`, `DeviceNetworkEvents`, `DeviceFileEvents`, `DeviceLogonEvents`, `DeviceRegistryEvents`	Defender for Endpoint
Identity	`IdentityLogonEvents`, `IdentityDirectoryEvents`, `IdentityInfo`, `IdentityQueryEvents`	Defender for Identity
Email	`EmailEvents`, `EmailAttachmentInfo`, `EmailUrlInfo`, `EmailPostDeliveryEvents`, `UrlClickEvents`	Defender for Office 365
Cloud apps	`CloudAppEvents`, `OAuthAppInfo`	Defender for Cloud Apps
Correlation	`AlertInfo`, `AlertEvidence`	All workloads (the join key)

IdentityLogonEvents is the one people misread: it carries both on-prem AD auth (from Defender for Identity sensors) and sign-ins to Microsoft online services seen via Defender for Cloud Apps. The AlertEvidence table is the connective tissue. It lists every file, IP, URL, user, device, and mailbox that a Defender alert touched, keyed by AlertId, which lets you start from a known-bad alert and fan out to everything related.

Open the hunting console at security.microsoft.com -> Hunting -> Advanced hunting. A first orientation query — what’s actually flowing, and at what volume:

union withsource=TableName DeviceProcessEvents, IdentityLogonEvents, EmailEvents, CloudAppEvents
| where Timestamp > ago(1h)
| summarize Events = count() by TableName
| sort by Events desc

That row count tells you which tables are cheap to scan and which need tight filters — CloudAppEvents and DeviceProcessEvents are usually your highest-volume tables and the ones that will time a query out if you are sloppy.

2. Writing cross-domain correlation queries

A single-table query is a search. A detection follows the kill chain. The pattern that delivers the most value: deliver (email) -> execute (device) -> persist/move (identity). Here is a hunt for a phishing-delivered payload that actually ran, correlating EmailEvents to DeviceProcessEvents through the recipient’s account.

let lookback = 3d;
// Stage 1: malicious or junked inbound mail
let suspiciousMail =
    EmailEvents
    | where Timestamp > ago(lookback)
    | where EmailDirection == "Inbound"
    | where ThreatTypes has_any ("Malware", "Phish")
        or DeliveryAction == "Blocked"
    | project MailTime = Timestamp, RecipientEmailAddress, SenderFromAddress,
              NetworkMessageId, Subject;
// Stage 2: process execution by those recipients shortly after delivery
DeviceProcessEvents
| where Timestamp > ago(lookback)
| where InitiatingProcessFileName in~ ("winword.exe", "excel.exe", "outlook.exe")
| where FileName in~ ("powershell.exe", "cmd.exe", "wscript.exe", "mshta.exe", "rundll32.exe")
| project ProcTime = Timestamp, DeviceName, AccountUpn,
          FileName, ProcessCommandLine, InitiatingProcessFileName
| join kind=inner suspiciousMail on $left.AccountUpn == $right.RecipientEmailAddress
| where ProcTime between (MailTime .. (MailTime + 1h))
| project ProcTime, MailTime, DeviceName, AccountUpn, SenderFromAddress,
          Subject, InitiatingProcessFileName, FileName, ProcessCommandLine
| sort by ProcTime desc

The signal is the temporal join: an Office app spawning a scripting engine within an hour of a flagged email to the same user. Neither half is conclusive alone; together they are a high-fidelity “the user opened the lure and it executed.”

A second example — impossible-travel-style credential abuse correlated with on-host activity — leans on the identity tables:

let window = 1d;
IdentityLogonEvents
| where Timestamp > ago(window)
| where LogonType == "Interactive" or Protocol == "OAuth2"
| where isnotempty(IPAddress) and isnotempty(AccountUpn)
| summarize Countries = dcount(Location), LocationSet = make_set(Location, 10),
            IPs = make_set(IPAddress, 10), arg_min(Timestamp, Location)
          by AccountUpn, bin(Timestamp, 1h)
| where Countries >= 2

When you find something, do not rebuild context by hand. Select a row and use Go hunt to pivot every entity (account, device, IP) into a fresh query, or Link to incident to attach the result rows as evidence on an existing incident.

3. Tuning queries for performance

Custom detections inherit the performance of the query behind them, and a rule that times out simply does not run. Three rules of thumb that keep hunts fast and within quota.

Filter on Timestamp first, and keep the window honest. The lookback in your query should match the rule’s run frequency. A rule that runs hourly should look back ~1 hour (plus a small overlap buffer), not 7 days — re-scanning a week every hour is wasted quota and duplicate alerts.

Filter before you join, and put the smaller, more-selective table on the left. KQL evaluates the left side of an inner join first. Reduce both sides with where and project before joining so you are not shuffling fat rows.

Aggregate with summarize and cap your sets. make_set() and make_list() accept a max-size argument — use it. Unbounded sets on high-cardinality columns blow up memory.

DeviceNetworkEvents
| where Timestamp > ago(1h)                                  // 1. time filter first
| where RemotePort in (443, 8443)                            // 2. cheap filters next
| where isnotempty(RemoteUrl)
| summarize ConnCount = count(),
            Hosts = make_set(DeviceName, 50)                 // 3. bounded set
          by RemoteUrl, bin(Timestamp, 10m)
| where ConnCount > 100                                      // 4. threshold last

Two hard limits to design around. A custom detection only ever surfaces the first 150 results per run, so a noisy query silently truncates — aggregate or threshold until each run returns well under that. And the rule’s lookback cannot exceed 30 days. Validate timing with summarize count() by bin(Timestamp, 1h) to confirm your window actually contains the events you expect before you schedule anything.

4. Promoting hunts to custom detection rules

Once a query is reliable and quiet, promote it. From the advanced hunting editor, click Create detection rule. Two non-negotiable requirements:

The query must project an entity column the platform can act on and an alert-mapping column — typically Timestamp, plus one or more of DeviceId, AccountObjectId / AccountSid, RecipientEmailAddress, FileName + SHA1. No mappable entity, no rule.
The query must reference at least one Defender XDR table for automated response actions to be available. Pure-Sentinel-only queries can alert but cannot isolate a device.

Frequency options and what they cost:

Frequency	Lookback used	Use for
Continuous (NRT)	streaming, every few minutes	Highest-severity, single-table detections
Every hour	last 1 hour	Most production rules
Every 3 / 12 / 24 hours	matching window	Low-urgency, broad-sweep hunts

Near-real-time has constraints (no multi-table join in NRT; one event table), so reserve it for tight single-table logic. Then wire response actions — this is the payoff. For our phishing-execution rule, isolate the device and collect forensics:

Create detection rule
  Alert:    Title: "Office app spawned script engine after flagged email"
            Severity: High   Category: Execution
            MITRE techniques: T1566 (Phishing), T1059 (Command and Scripting Interpreter)
  Impacted entities:
            Device:  DeviceId
            Mailbox: RecipientEmailAddress
  Actions on devices:
            [x] Isolate device (Full)
            [x] Collect investigation package
            [x] Run antivirus scan
  Actions on users:
            [x] Mark user as compromised   (Defender for Identity)
  Frequency: Every hour

Available response actions, by entity:

Device: isolate (full or selective), collect investigation package, run AV scan, restrict app execution.
File: quarantine / block file (by SHA1) across the fleet.
User: mark as compromised, force sign-out, disable user (with Defender for Identity).
Email: move to junk, delete (soft/hard), via the NetworkMessageId + RecipientEmailAddress pair.

Start every new rule with no automated actions, severity Informational, running for a week. Read the alert volume, tune the false positives out, then attach Isolate. A rule that auto-isolates on a bad assumption is an outage you wrote yourself.

5. Configuring automatic attack disruption

Custom detections are your logic. Automatic attack disruption is Microsoft’s — a built-in capability that correlates millions of signals across endpoint, identity, email, and SaaS into a single high-confidence incident, identifies the assets the attacker controls, and contains them in real time, independent of your AIR settings. It targets the scenarios where minutes matter: human-operated ransomware, business email compromise (BEC), and adversary-in-the-middle (AiTM).

You do not write the detections; you enable the org-wide response surface and let them fire. The two automated actions it takes:

Device containment — Defender for Endpoint blocks inbound/outbound traffic to a compromised device from other onboarded devices, choking lateral movement even from unmanaged-but-reached hosts.
Disable user — Defender for Identity suspends a compromised account in Entra ID / AD to stop lateral movement, mailbox abuse, and further sign-ins. This is the BEC kill switch.

Prerequisites that actually gate it: Defender for Endpoint devices in block mode with automated investigation enabled, Defender for Identity deployed with the action account configured, and the relevant workloads (Office 365, Cloud Apps) connected. Confirm and tune in Settings -> Microsoft Defender XDR -> Automatic attack disruption, where you can scope automated response exclusions for sensitive assets (a domain controller you never want auto-contained, a service account you cannot afford to disable).

Disrupted incidents are labelled so the SOC can see the machine acted:

Incident title: BEC financial fraud attack launched from a compromised account (attack disruption)
Status:         Active
Tags:           Attack disruption
Actions taken:  User <upn> disabled (Defender for Identity)
                Device <hostname> contained (Defender for Endpoint)

When an action lands, the analyst’s job is to validate and release (undo containment / re-enable the user) once remediated — the platform does not auto-rollback.

6. Managing AIR and approval levels

Automated investigation and remediation (AIR) is the layer between “alert raised” and “human triages.” When an alert fires, AIR launches an investigation, walks the related entities, reaches verdicts (Malicious / Suspicious / No threat), and proposes remediation. Whether those remediations execute automatically depends on the device group’s automation level, set in Settings -> Endpoints -> Device groups.

Automation level	Behavior
Full	Remediate automatically, no approval — recommended by Microsoft
Semi (require approval for all folders)	Every remediation waits for an analyst
Semi (core folders)	Auto-remediate non-core; approve actions in OS folders
No automated response	Investigate only; remediation is manual

The mature posture is Full automation on standard workstation groups (Microsoft’s data shows it remediates more threats with no increase in false-positive harm) and Semi (core folders) on servers and Tier-0 device groups where you want eyes on anything touching system32. Track and approve pending actions in the Action center (security.microsoft.com -> Actions & submissions -> Action center), which is also where you bulk-undo if a custom detection or AIR over-reaches.

7. A reusable hunting library mapped to MITRE ATT&CK

Detection engineering scales only if hunts are version-controlled, tagged to technique, and reviewable — not pasted into the portal and forgotten. Keep them in Git as .kql files with a metadata header, and map every one to ATT&CK so you can reason about coverage instead of counting rules.

detections/
  T1110.003-password-spray-identitylogon.kql
  T1566.001-phish-attachment-exec.kql
  T1486-ransomware-mass-file-rename.kql
  T1098.005-oauth-consent-grant-abuse.kql

A lightweight, machine-readable header on each file:

// name: Password spray against on-prem AD
// mitre: T1110.003
// tactic: CredentialAccess
// severity: Medium
// frequency: 1h
// entities: AccountSid, IPAddress
// version: 3
IdentityLogonEvents
| where Timestamp > ago(1h)
| where ActionType == "LogonFailed"
| summarize FailedAccounts = dcount(AccountUpn),
            Accounts = make_set(AccountUpn, 25)
          by IPAddress, bin(Timestamp, 1h)
| where FailedAccounts >= 10        // one source, many accounts == spray

Sync the library with the Defender XDR custom detection API (under the microsoft.graph.security endpoints) so a CI pipeline is the source of truth and the portal is just the runtime — review in PRs, deploy on merge.

Verify

Confirm each layer is actually live before you trust it.

Hunt runs and scopes correctly. Run the query in the console; confirm it returns rows and each run is < 150 results and < 10s.

EmailEvents
| where Timestamp > ago(1h)
| summarize EventCount = count() by bin(Timestamp, 10m)

The custom detection exists and last-ran cleanly. Settings -> Detection rules (or Hunting -> Custom detection rules) shows your rule with a recent Last run time and Succeeded status, not “Failed”.
Response actions are attached. Open the rule -> Actions tab and confirm the device/user actions you intended are listed.
Attack disruption is enabled. Settings -> Microsoft Defender XDR -> Automatic attack disruption shows the capability on, and your exclusions (DCs, critical service accounts) are present.
AIR automation level is what you think. Settings -> Endpoints -> Device groups — confirm prod workstation groups are Full and Tier-0 groups are Semi (core folders).
Action center reflects reality. Actions & submissions -> Action center -> History lists recent automated actions; nothing is stuck Pending that should have auto-approved.

Enterprise scenario

A retail platform team running Microsoft 365 E5 across ~14,000 endpoints turned on a custom detection for AiTM session-cookie reuse — sign-in from a new ASN immediately followed by a high-value mailbox rule creation — and wired it to Disable user with Full automation on every device group. The detection was sound. The blast radius was not: on the second night it fired on a shared finance service mailbox during a legitimate quarter-close batch run from a new datacenter egress IP, disabled the account, and broke an automated payments reconciliation job feeding SAP. The on-call SOC analyst could see the disrupted incident but did not have rights to re-enable the identity, so recovery waited on an identity admin escalation — about 40 minutes of failed jobs.

The constraint: they needed aggressive auto-response for real users but could not let it touch a small set of Tier-0 service identities or shared mailboxes. The fix was two-layered. First, they scoped the custom detection to exclude service accounts by filtering on an Entra group, so the rule never raised an actionable alert for those identities. Second, they added those same accounts to the automated response exclusions under automatic attack disruption, so even Microsoft’s built-in BEC disruption would not disable them — defense in depth against both their logic and Microsoft’s.

// Exclude Tier-0 / service identities by Entra group membership before alerting
let excluded = IdentityInfo
    | where Timestamp > ago(1d)
    | where GroupMembership has "SG-NoAutoContain"     // managed Entra group
    | distinct AccountUpn;
IdentityLogonEvents
| where Timestamp > ago(1h)
| where Protocol == "OAuth2" and isnotempty(IPAddress)
| where AccountUpn !in~ (excluded)                     // never auto-disable these
| summarize ASNs = dcount(ISP), arg_min(Timestamp, IPAddress) by AccountUpn, bin(Timestamp, 1h)
| where ASNs >= 2

The lesson the team baked into their standard: automated response and exclusion lists are the same design decision. Every rule that can disable a user or isolate a device ships with an explicit, Entra-group-driven exclusion of Tier-0 assets, reviewed in the same PR as the detection logic. Aggressive automation is safe only when its boundaries are as deliberate as its triggers.

Defender XDR Advanced Hunting: Custom Detection Rules and Automatic Attack Disruption

1. The unified advanced hunting schema

2. Writing cross-domain correlation queries

3. Tuning queries for performance

4. Promoting hunts to custom detection rules

5. Configuring automatic attack disruption

6. Managing AIR and approval levels

7. A reusable hunting library mapped to MITRE ATT&CK

Verify

Enterprise scenario

Checklist

Written by Vinod

Comments

Keep Reading

Stopping Token Theft: Conditional Access Token Protection and Authentication Context

Defender EASM: Discovering and Reducing Your Internet-Facing Attack Surface

Defender for Cloud Attack Path Analysis: Custom Recommendations and Governance Rules