KQL Detection Engineering - Reference & Challenges

A working reference for hunting, parsing, correlation, and time-series detection in Microsoft Sentinel / ADX, with 20 graded challenges.

Audience: SOC / DFIR · Scope: Sysmon, SecurityEvent, DeviceEvents, custom tables · Light-mode, print-ready, single file.

1. Pipeline model & flow control

KQL is a left-to-right pipeline: each tabular operator consumes the row-set produced by the previous one. Order matters for both correctness and cost - operators that reduce row count or column width should run first.

let, materialize, toscalar

let binds an expression, table, function, or scalar. materialize() caches a tabular subquery so it isn't re-executed each time it's referenced. toscalar() converts a 1×1 result into a scalar usable in where and other operators.

let Lookback = ago(7d);
let AdminUsers = dynamic(["svc-backup", "administrator", "da-bryce"]);
let Recent4625 = materialize(
    SecurityEvent
    | where TimeGenerated > Lookback
    | where EventID == 4625
);
let TotalFails = toscalar(Recent4625 | count);
Recent4625
| summarize n = count() by Account
| extend share = round(100.0 * n / TotalFails, 2)
| order by n desc

Pitfall Without materialize(), a let-bound subquery referenced multiple times is recomputed each time. With it, the result is held in memory once. Use it for any reused subquery, but watch memory: the cached set must fit.

Functions (let-bound)

// A reusable parameterized function
let FailedLogonsByUser = (lookback:timespan, user:string) {
    SecurityEvent
    | where TimeGenerated > ago(lookback)
    | where EventID == 4625
    | where Account =~ user
    | project TimeGenerated, Computer, Account, IpAddress, FailureReason, LogonType
};
FailedLogonsByUser(24h, "CORP\\bryce")

vs. Microsoft docs Microsoft's reference shows let only as a name binding. In practice, treat your let block as a small standard library at the top of every saved hunt - function definitions, named constants (lookback windows, suppression lists), and materialized subqueries - so the body of the query is the actual logic.

2. Filtering & shaping

String predicates, ranked by cost (cheapest first)

Operator	Indexed?	When to reach for it	Example
`==` / `=~`	Yes (exact)	Known full value; `=~` for case-insensitive	`EventID == 4624`
`has` / `has_cs`	Yes (term)	Whole-term match, anywhere in field	`CommandLine has "rundll32"`
`has_any` / `has_all`	Yes	Multi-term OR / AND	`CommandLine has_any (lolbins)`
`startswith` / `endswith`	Partial	Prefix / suffix	`FileName endswith ".dll"`
`contains`	No (scan)	Substring inside a term	`Image contains "syswow"`
`matches regex`	No (scan)	Pattern match - last resort	`cmd matches regex @"\\b[a-f0-9]{32}\\b"`
`in` / `in~` / `!in`	Yes	Membership against literal or dynamic set	`EventID in (4624,4625,4634)`

Rule of thumb has beats contains by an order of magnitude on indexed columns. If you're tempted to write contains "powershell", try has "powershell" first and only fall back to contains when the term is glued to other characters ("powershell.exe" still works with has because it's tokenized; "poweshellxx" would not).

The project family

DeviceProcessEvents
| project        TimeGenerated, DeviceName, ProcessCommandLine, AccountName     // keep only these
| project-keep   "Time*", DeviceName                                            // keep matching wildcards
| project-away   ProcessId, InitiatingProcessId                                 // drop these
| project-rename Host = DeviceName, User = AccountName                         // rename in place
| project-reorder TimeGenerated, Host, User, ProcessCommandLine                // reorder, keep rest

extend - derive columns

SecurityEvent
| where EventID == 4688
| extend
    HourOfDay     = datetime_part("hour", TimeGenerated),
    IsAfterHours  = iff(datetime_part("hour", TimeGenerated) !between (7 .. 19), true, false),
    ParentLeaf    = tostring(split(ParentProcessName, "\\")[-1]),
    ChildLeaf     = tostring(split(NewProcessName,    "\\")[-1]),
    UnusualParent = iff(ChildLeaf =~ "powershell.exe" and ParentLeaf in~ ("winword.exe","excel.exe","outlook.exe"), true, false)
| where UnusualParent

distinct vs. summarize-by

Use distinct when you want unique combinations and nothing else. Use summarize ... by when you also need an aggregate. distinct is implemented as a shuffle and is generally fine, but on huge cardinality it can be expensive - consider summarize hint.shufflekey=Key by Key.

DeviceProcessEvents | distinct DeviceName, AccountName

// equivalent in row count, but you can also keep an aggregate cheaply:
DeviceProcessEvents | summarize firstSeen = min(TimeGenerated) by DeviceName, AccountName

3. Parsing & type coercion

parse - declarative pattern parsing

// Sysmon EID 1 (Process Create) is often shipped as a single Message blob
Event
| where Source == "Microsoft-Windows-Sysmon" and EventID == 1
| parse EventData with * "<Data Name='UtcTime'>" UtcTimeRaw "</Data>"
                          * "<Data Name='Image'>"   Image      "</Data>"
                          * "<Data Name='CommandLine'>" CommandLine "</Data>"
                          * "<Data Name='ParentImage'>"  ParentImage "</Data>"
                          *
| extend UtcTime = todatetime(UtcTimeRaw)
| project UtcTime, Computer, Image, ParentImage, CommandLine

Tip The * in a parse pattern is "skip everything up to the next literal." Use parse-where to discard rows that don't match instead of returning empty strings, and parse kind=regex for regex captures. Use parse-kv for key=value, key=value blobs (firewall logs, Linux audit, etc.).

parse-kv - when log lines are key=value soup

Syslog
| where ProcessName == "sshd"
| parse-kv SyslogMessage as (user:string, rhost:string, port:int) with (pair_delimiter=" ", kv_delimiter="=")
| project TimeGenerated, Computer, user, rhost, port

extract / extract_all - regex captures

DeviceNetworkEvents
| where RemoteUrl matches regex @"https?://[^/]+"
| extend Host = extract(@"https?://([^/:]+)", 1, RemoteUrl)
| extend AllIPs = extract_all(@"\b(?:\d{1,3}\.){3}\d{1,3}\b", RemoteUrl)
| project TimeGenerated, DeviceName, RemoteUrl, Host, AllIPs

parse_json & dynamic paths

DeviceEvents
| where ActionType == "AsrLsassCredentialTheftAudited"
| extend Add = parse_json(AdditionalFields)
| extend
    TargetImage  = tostring(Add.TargetImageFileName),
    SourceCmd    = tostring(Add.SourceCommandLine),
    Hashes       = tostring(Add.SHA256)
| project TimeGenerated, DeviceName, TargetImage, SourceCmd, Hashes

Gotcha Dynamic columns are loosely typed. tostring, tolong, todatetime on JSON-extracted values is not optional - without it, downstream where clauses will silently fail to match because you're comparing string vs dynamic.

split, strcat, replace_string, replace_regex

| extend PathParts  = split(FolderPath, "\\")
| extend Drive      = tostring(PathParts[0])
| extend FileName   = tostring(PathParts[-1])
| extend Cleaned    = replace_regex(CommandLine, @"\s+", " ")
| extend Anonymized = replace_string(Cleaned, "CORP\\", "REDACTED\\")
| extend Joined     = strcat(DeviceName, "::", Image, "::", AccountName)

Type coercion

From → To	Function	Returns null on bad input?
any → datetime	`todatetime()`	Yes
any → timespan	`totimespan()`	Yes
string → int/long/double	`toint() / tolong() / todouble()`	Yes
string → bool	`tobool()`	Yes
any → string	`tostring()`	No (returns "")
any → dynamic	`todynamic() / parse_json()`	Yes
hex string → long	`tolong(strcat("0x", h))` trick	If malformed

4. Time, bins & date math

The time toolkit

// Common building blocks
| extend Bucket5m    = bin(TimeGenerated, 5m)              // floor to 5-minute boundary
| extend AlignedHour = bin_at(TimeGenerated, 1h, datetime(2026-01-01 00:00))  // align to a chosen origin
| extend DayStart    = startofday(TimeGenerated)
| extend DayOfWeek   = dayofweek(TimeGenerated)         // 0d == Sunday, 6d == Saturday
| extend AgeMinutes  = 1.0 * (now() - TimeGenerated) / 1m
| extend Pretty      = format_datetime(TimeGenerated, "yyyy-MM-dd HH:mm:ss.fff")

Sliding windows with bin

// Per-host login attempts in 1-minute buckets, last 24h
SecurityEvent
| where TimeGenerated > ago(24h) and EventID in (4624,4625)
| summarize
    Success = countif(EventID == 4624),
    Failed  = countif(EventID == 4625)
  by Computer, bin(TimeGenerated, 1m)
| order by Computer asc, TimeGenerated asc

Format tokens that bite

Token	Meaning	Token	Meaning
`yyyy`	4-digit year	`HH`	24-hour hour
`MM`	2-digit month	`hh`	12-hour hour
`dd`	day-of-month	`mm`	minute
`fff`	milliseconds	`ss`	second
`ddd`	abbrev. weekday	`tt`	AM/PM

Pitfall Sysmon's UtcTime field is a string, not a datetime. Always wrap it: todatetime(UtcTime). Worse, on some collectors the value is rendered with a trailing space - trim_end() first.

5. Aggregation & `summarize`

Core aggregators

Aggregator	Use	Notes
`count()` / `countif(pred)`	Row counts, conditional counts	`countif` avoids splitting into multiple summarize passes
`dcount()` / `dcountif()`	Approx distinct count (HLL)	4th param sets accuracy 0..4 (3 ≈ 0.8% error, default 1)
`count_distinct()`	Exact distinct	More expensive; prefer `dcount` at scale
`arg_max(x, *)` / `arg_min`	Row with max/min of x; `*` keeps all cols	Replaces `top 1 by` in grouped queries
`make_list(x [, n])`	Ordered list (preserves input order in serialized streams)	Optional cap; combine with `serialize` for guaranteed order
`make_set(x [, n])`	Deduplicated unordered set	Cheaper than `make_list` for cardinality questions
`make_bag(d [, n])`	Merge dynamic property-bags	Pair with `bag_unpack`
`percentile(x, p)` / `percentiles(x, p1, p2, ...)`	Latency analysis, anomaly thresholds	Use `percentiles_array` when feeding to series functions
`hll(x)` / `tdigest(x)`	Composable distinct counts / quantiles	Aggregate per shard, then `hll_merge` / `tdigest_merge`

arg_max - the operator you'll use weekly

// Latest known IP per device (and any other column you want to keep)
DeviceNetworkInfo
| summarize arg_max(TimeGenerated, *) by DeviceName
| project DeviceName, IPAddresses, MacAddress, NetworkAdapterName, TimeGenerated

make_list with order - the serialize trick

// Per-host process tree in order of appearance
DeviceProcessEvents
| where TimeGenerated > ago(1h)
| order by DeviceName asc, TimeGenerated asc
| serialize
| summarize ProcessChain = make_list(FileName, 100) by DeviceName

Why serialize? summarize on a parallelized stream does not guarantee input order inside make_list. order by ... | serialize forces a single ordered stream feeding the aggregator. Without it, the chain may be shuffled.

Pivots with bag_unpack

SecurityEvent
| where TimeGenerated > ago(7d) and EventID in (4624,4625,4634,4672,4688)
| summarize n = count() by Computer, EventID
| extend packed = bag_pack(tostring(EventID), n)
| summarize bag = make_bag(packed) by Computer
| evaluate bag_unpack(bag)

6. Joins, lookups, unions

Join kinds at a glance. innerunique is the default - and the source of most surprise duplicates / missing rows.

The eight kinds, with the trap explained

kind=	What it returns	Default?
`innerunique`	Left de-duplicated on key, then inner-joined to right. Fast, but means duplicate left rows are silently dropped.	Yes - this is the default if you omit `kind=`.
`inner`	Cartesian of all matching pairs. The "SQL inner join" most analysts expect.	No
`leftouter`	All L, matched R or `null`.	No
`rightouter`	All R, matched L or `null`.	No
`fullouter`	Both, with nulls on either side.	No
`leftanti`	L rows with no match in R. Use for "first-time-seen".	No
`rightanti`	R rows with no match in L.	No
`leftsemi` / `rightsemi`	Like inner, but returns only the L (or R) columns. Cheaper than projecting away after a join.	No

Default trap T1 | join T2 on Key is innerunique. If T1 has 100 rows and 10 share a key, you'll get back fewer rows than you'd expect from a SQL inner join. Always specify kind= explicitly for production rules.

The "first time seen" idiom (leftanti)

// Process executions in the last hour that did NOT appear anywhere in the prior 30 days
let Baseline = DeviceProcessEvents
    | where TimeGenerated between (ago(30d) .. ago(1h))
    | distinct DeviceName, FileName, SHA256;
DeviceProcessEvents
| where TimeGenerated > ago(1h)
| join kind=leftanti Baseline on DeviceName, FileName, SHA256
| project TimeGenerated, DeviceName, FileName, SHA256, ProcessCommandLine, AccountName

lookup - optimized one-sided join

let AdminUsers = externaldata(SamAccountName:string, IsAdmin:bool)
    [@"https://watchlist.example/admins.csv"] with(format="csv", ignoreFirstRecord=true);
SecurityEvent
| where EventID == 4624
| lookup kind=leftouter AdminUsers on $left.Account == $right.SamAccountName
| where IsAdmin == true

union - across tables and workspaces

union isfuzzy=true
    (SecurityEvent | where EventID == 4624 | project TimeGenerated, Computer, Account, IpAddress, _Source = "WinSec"),
    (SigninLogs     | project TimeGenerated, Computer = DeviceDetail.deviceId, Account = UserPrincipalName, IpAddress = IPAddress, _Source = "AAD")
| summarize n = count() by _Source, Account
| order by n desc

Hint When one side is small and one is huge, hint the join: | join hint.strategy=broadcast SmallTable on Key. When both are large but well-keyed, hint.shufflekey=Key distributes the work.

7. Multi-value: `mv-expand` vs `mv-apply`

mv-expand explodes one row into many. mv-apply runs a subquery per row over the array and projects the result back - preserving the parent row.

mv-expand

DeviceProcessEvents
| where ProcessCommandLine has "powershell"
| extend Tokens = split(ProcessCommandLine, " ")
| mv-expand Token = Tokens
| where Token matches regex @"^[A-Za-z0-9+/=]{200,}$"          // long base64-ish tokens
| project TimeGenerated, DeviceName, ProcessCommandLine, Token

mv-apply - when you need where/order inside the array

// Per host: keep only DLL load events whose ImageLoaded matches your watch-list
let watch = dynamic(["samlib.dll","vaultcli.dll","wdigest.dll"]);
DeviceImageLoadEvents
| where TimeGenerated > ago(1d)
| summarize Loads = make_list(pack("t", TimeGenerated, "dll", FileName)) by DeviceName, InitiatingProcessFileName
| mv-apply Load = Loads to typeof(dynamic) on (
    where tostring(Load.dll) in~ (watch)
    | summarize Hits = make_list(Load)
  )
| where array_length(Hits) >= 2             // at least 2 of the watched DLLs in this batch
| project DeviceName, InitiatingProcessFileName, Hits

When to choose which Use mv-expand when you want a row per array element and don't care about the parent. Use mv-apply when you want to filter / sort / aggregate inside the array but keep one row per parent. mv-apply is also where you do "first N elements matching predicate," which is otherwise awkward.

pack / bag_unpack - building and flattening property bags

| extend Evt = pack("id", EventID, "src", IpAddress, "acct", Account)
| summarize Events = make_list(Evt) by Computer
// elsewhere, to flatten a single-row property bag back into columns:
| evaluate bag_unpack(Properties)

8. Advanced: `partition`, `evaluate`, `serialize`

partition - run a sub-pipeline per key

// For each device, keep only the top 5 commandlines by frequency in the last day.
DeviceProcessEvents
| where TimeGenerated > ago(1d)
| partition hint.strategy=native by DeviceName (
    summarize Hits = count() by ProcessCommandLine, DeviceName
    | top 5 by Hits
)

strategy	When to use
`hint.strategy=native`	Best when keys partition naturally across the cluster (e.g. tenant id, device id with high cardinality). Cheapest at scale.
`hint.strategy=shuffle`	When data isn't already partitioned by key; forces a redistribute.
`hint.strategy=legacy`	Default; processes per-key serially. Fine for low cardinality.

evaluate plugins - the power tools

Plugin	Purpose
`autocluster`	Finds the smallest set of column patterns that cover most of the rows. Triage gold for noisy alert tables.
`basket`	Like autocluster but exhaustively enumerates frequent itemsets above a min-support threshold.
`diffpatterns`	Compares two sub-populations (e.g. anomaly vs baseline) and surfaces columns that differentiate them.
`pivot`	Excel-style pivot: turn distinct values of a column into columns.
`bag_unpack`	Flattens a dynamic property bag into individual columns.
`narrow`	Inverse of pivot - collapses many columns into `(Column, Value)` pairs.
`infer_storage_schema`	Reads sample data to infer schema for `externaldata`.

// Triage 7 days of process create alerts: what patterns dominate?
DeviceProcessEvents
| where TimeGenerated > ago(7d) and InitiatingProcessFileName =~ "powershell.exe"
| project DeviceName, AccountName, FileName, FolderPath, ProcessCommandLine
| evaluate autocluster(0.5)         // 0.5 = cover 50% of rows with the fewest patterns

// What columns differ between "rare" and "common" parent->child relationships?
DeviceProcessEvents
| where TimeGenerated > ago(7d)
| summarize n = count() by InitiatingProcessFileName, FileName, AccountDomain, FolderPath
| extend bucket = iff(n <= 2, "rare", "common")
| evaluate diffpatterns(bucket, "rare", "common")

serialize and the windowing functions

Once a stream is serialized (single ordered partition), you can use row-aware functions: prev(), next(), row_number(), row_rank_min/dense(), row_cumsum(), row_window_session().

// Inter-event interval per host
DeviceProcessEvents
| where TimeGenerated > ago(1h)
| order by DeviceName asc, TimeGenerated asc
| serialize
    PrevTime = prev(TimeGenerated, 1),
    PrevHost = prev(DeviceName,    1),
    Gap      = iff(DeviceName == prev(DeviceName, 1), TimeGenerated - prev(TimeGenerated, 1), timespan(null))
| where isnotnull(Gap) and Gap < 1s
| project DeviceName, TimeGenerated, FileName, Gap

row_window_session - sessionize without joining

// Group events into sessions per device when gap > 5 minutes
DeviceProcessEvents
| where TimeGenerated > ago(1d)
| order by DeviceName asc, TimeGenerated asc
| extend SessionStart = row_window_session(TimeGenerated, 5m, 5m, DeviceName != prev(DeviceName))
| summarize Events = count(), Files = make_set(FileName, 100) by DeviceName, SessionStart

range - synthesize rows

// Build a continuous 1-min spine to left-join sparse counts against
range ts from ago(1d) to now() step 1m
| join kind=leftouter (
    DeviceNetworkEvents
    | where TimeGenerated > ago(1d)
    | summarize n = count() by ts = bin(TimeGenerated, 1m)
) on ts
| project ts, n = coalesce(n, 0)
| order by ts asc

9. Time series & the `step` parameter

make-series - vectorized time series construction

DeviceNetworkEvents
| where TimeGenerated > ago(7d) and ActionType == "ConnectionSuccess"
| make-series n = count() default=0
    on TimeGenerated from ago(7d) to now() step 15m
    by DeviceName, RemoteIP

step is the bucket size. default=0 fills empty buckets with zero - critical, because gaps in the series will break series_decompose.

Anomalies and forecasts

| extend (anomalies, score, baseline) =
    series_decompose_anomalies(n, 2.0, -1, "linefit")   // thr=2σ, period=auto, fit=linefit
| extend AnomalyAt = series_iif(anomalies != 0, TimeGenerated, datetime(null))
| mv-expand TimeGenerated to typeof(datetime), n to typeof(long), anomalies to typeof(int), score to typeof(double)
| where anomalies != 0

Function	Use
`series_decompose`	Splits series into baseline / seasonal / residual components.
`series_decompose_anomalies`	Above + flags points exceeding σ threshold.
`series_decompose_forecast`	Above + projects forward N points.
`series_outliers`	Tukey-style outlier flagging (no seasonal model).
`series_periods_detect`	Detects dominant periodicities - useful for beaconing.
`series_fit_line` / `series_fit_2lines`	Linear regression / piecewise; returns slope, intercept, RSquare. `2lines` finds the breakpoint.
`series_fir` / `series_iir`	Filters: moving avg / exponential smoothing.
`series_pearson_correlation`	Correlation between two series - find hosts behaving alike.

Beaconing detection (period detection)

DeviceNetworkEvents
| where TimeGenerated > ago(1d) and RemoteIPType == "Public"
| make-series n = count() default=0
    on TimeGenerated from ago(1d) to now() step 1m
    by DeviceName, RemoteIP
| extend (periods, scores) = series_periods_detect(n, 0.05, 1200.0, 5)
| mv-expand period = periods to typeof(double), score = scores to typeof(double)
| where score > 0.8 and period between (5 .. 600)
| project DeviceName, RemoteIP, IntervalMinutes = period, Confidence = score
| order by Confidence desc

Reading the result series_periods_detect(series, min_period_pct, max_period, num_periods) returns the most likely periods (in step-units) and their scores. step 1m means a returned period of 15 equals 15 minutes between callbacks.

10. Detection-engineering patterns

A then B within Δt - the sequence pattern

// Failed logon followed by a successful logon for the same account from the same IP within 5 minutes
let Window = 5m;
let Fails = SecurityEvent | where EventID == 4625
    | project FailTime = TimeGenerated, Account, IpAddress, Computer;
let Wins  = SecurityEvent | where EventID == 4624
    | project WinTime  = TimeGenerated, Account, IpAddress, Computer;
Fails
| join kind=inner Wins on Account, IpAddress, Computer
| where WinTime > FailTime and WinTime - FailTime <= Window
| summarize Fails = count(), FirstFail = min(FailTime), LastWin = max(WinTime) by Account, IpAddress, Computer
| where Fails >= 5

Rare-process-as-service (prevalence math)

let Window = 14d;
let TotalHosts = toscalar(DeviceProcessEvents | where TimeGenerated > ago(Window) | summarize dcount(DeviceName));
DeviceProcessEvents
| where TimeGenerated > ago(Window)
| where InitiatingProcessFileName =~ "services.exe"
| summarize Hosts = dcount(DeviceName), Sample = any(ProcessCommandLine) by FileName, FolderPath, SHA256
| extend Prevalence = 100.0 * Hosts / TotalHosts
| where Prevalence < 1.0 and Hosts between (1 .. 3)
| order by Prevalence asc

Stack counting for triage

DeviceProcessEvents
| where TimeGenerated > ago(1d) and ProcessCommandLine has_any ("-enc","-EncodedCommand","FromBase64String")
| summarize Hits = count(), Hosts = dcount(DeviceName), Sample = any(ProcessCommandLine)
    by InitiatingProcessFileName, FileName, AccountDomain
| order by Hits desc

Cross-process correlation (the WerFault/LSASS shape)

// Sysmon: WerFault.exe spawned with LSASS PID and -pr flag, followed by ProcessAccess to LSASS
let Window = 2m;
let WerSpawn = DeviceProcessEvents
    | where FileName =~ "WerFault.exe" and ProcessCommandLine has "-pr"
    | project SpawnTime = TimeGenerated, DeviceName, WerPid = ProcessId, WerCmd = ProcessCommandLine;
let LsassAccess = DeviceEvents
    | where ActionType == "ProcessAccessByOthers" and FileName =~ "lsass.exe"
    | extend Add = parse_json(AdditionalFields)
    | extend AccessMask = tostring(Add.GrantedAccess), SourcePid = tolong(Add.SourceProcessId)
    | project AccessTime = TimeGenerated, DeviceName, SourcePid, AccessMask;
WerSpawn
| join kind=inner LsassAccess on DeviceName
| where AccessTime between (SpawnTime .. SpawnTime + Window) and SourcePid == WerPid
| project DeviceName, SpawnTime, WerCmd, AccessTime, AccessMask

11. Performance & idioms

Do	Don't
Filter by time first, then by indexed columns, then derive.	Don't `extend` first and filter on the derived column - defeats index use.
Use `has` on tokenized text columns.	Don't reach for `contains` or `matches regex` by reflex.
Use `=~` for case-insensitive equality.	Don't `tolower(col) == "x"` - kills indexing.
Use `countif` / `dcountif` with a predicate.	Don't run two `summarize` branches and join them.
Use `arg_max(t, *)` for "latest row per group."	Don't `top 1 by t desc` inside a `partition` if a single `summarize arg_max` works.
Use `materialize()` on subqueries referenced 2+ times.	Don't materialize huge results "just in case" - memory cost.
For "first-time-seen": `join kind=leftanti` against a baseline.	Don't `!in` against a giant dynamic list.
For unbounded distinct counts: `dcount(x, 3)` (HLL).	Don't reach for `count_distinct()` on hundreds of millions of rows.

The "filter early" rule, illustrated

// Bad: extend first, filter the derived column
DeviceProcessEvents
| extend Lower = tolower(ProcessCommandLine)
| where Lower has "mimikatz"
| where TimeGenerated > ago(1h)

// Good: time first, indexed-token filter, never lowercase
DeviceProcessEvents
| where TimeGenerated > ago(1h)
| where ProcessCommandLine has "mimikatz"

12. Sentinel-specifics

Analytics rule anatomy

Sentinel scheduled rules expect: a query that returns rows; TimeGenerated on each row; entity mappings (Account / Host / IP / FileHash / URL …) referencing column names; optional custom details for alert fields; and a query period ≤ query frequency × N (set in rule UI).

// Skeleton ready to paste into Sentinel "Custom rule wizard"
let Lookback = 1h;
DeviceProcessEvents
| where TimeGenerated > ago(Lookback)
| where FileName =~ "WerFault.exe" and ProcessCommandLine has "-pr"
| project TimeGenerated, DeviceName, AccountName, ProcessCommandLine, ProcessId, SHA256
// Entities to map in UI: Host=DeviceName, Account=AccountName, FileHash=SHA256
// Custom details: ProcessCommandLine, ProcessId

Watchlists & externaldata

// Built-in watchlist
let CrownJewels = _GetWatchlist("CrownJewelHosts") | project Hostname;
DeviceLogonEvents
| where TimeGenerated > ago(1h)
| where DeviceName in (CrownJewels)

// External CSV (e.g. published Tier-0 list)
let Tier0 = externaldata(Sam:string)[@"https://<<ENV>>/tier0.csv"] with(format="csv", ignoreFirstRecord=true);

Sentinel functions you'll actually use

Function	Purpose
`_GetWatchlist("Name")`	Resolves a Sentinel watchlist as a table.
`SecurityAlert` / `SecurityIncident`	Built-in alert & incident tables.
`geo_info_from_ip_address(ip)`	Built-in GeoIP enrichment (country, city, lat/long).
`iff(EventID == 4624, ...)`	Branchy column derivation; nests cleanly.
`_ResourceId`	Hidden column - useful for cross-tenant queries.

[ Slot for screenshot: Sentinel "Analytics rule wizard - Set rule logic" tab showing entity mapping for a custom rule. ]

Replace this slot with: <img src="entity-mapping.png" alt="...">

Challenges (20, easy → expert)

Solutions are collapsed by default. In print they expand automatically. Each solution is one valid approach; there are usually several. Where a challenge depends on a table you don't have (e.g. IdentityInfo is UEBA-licensed), an inline note shows a substitute.

Easy (warm-up: single table, one or two operators)

01 · Easy Firewall-related EventIDs

Query SecurityEvent for the last 24 hours and return every distinct EventID related to Windows Firewall (rule add / change / delete / advanced setting changes). Sort by event count descending.

Show solution

let FwIds = dynamic([4946,4947,4948,4949,4950,4951,4952,4953,4954,4956,4957,4958,5024,5025,5027,5028,5029,5030,5031,5032,5033,5034,5035,5037]);
SecurityEvent
| where TimeGenerated > ago(24h)
| where EventID in (FwIds)
| summarize n = count() by EventID, Activity
| order by n desc

Why it's easy: single where with in, no parsing, no joining. Goal is to internalize "membership filter against a typed dynamic literal."

02 · Easy Parse Sysmon UtcTime to a real datetime

From the raw Event table where Sysmon EID 1 lands as XML in EventData, extract UtcTime, Image, and CommandLine, and return them with a properly typed UtcTime column (not a string).

Show solution

Event
| where Source == "Microsoft-Windows-Sysmon" and EventID == 1
| parse EventData with * "<Data Name='UtcTime'>" UtcTimeRaw "</Data>"
                          * "<Data Name='Image'>" Image "</Data>"
                          * "<Data Name='CommandLine'>" CommandLine "</Data>"
                          *
| extend UtcTime = todatetime(trim(@"\s", UtcTimeRaw))
| project UtcTime, Computer, Image, CommandLine
| order by UtcTime desc

Notes: trim handles the trailing-whitespace artifact from some collectors. If your Sysmon arrives via AMA / DCR, fields are already split out - use the dedicated columns directly.

03 · Easy Distinct processes per host (Sysmon EID 1)

Return one row per (DeviceName, FileName) with first-seen / last-seen and a sample command line, last 7 days.

Show solution

DeviceProcessEvents
| where TimeGenerated > ago(7d)
| summarize
    FirstSeen = min(TimeGenerated),
    LastSeen  = max(TimeGenerated),
    Sample    = any(ProcessCommandLine),
    Hits      = count()
  by DeviceName, FileName
| order by DeviceName asc, Hits desc

04 · Easy Failed logons per account

Count 4625 events per account in the last 24h, exclude machine accounts (ending in $), and only return accounts with 5+ failures.

Show solution

SecurityEvent
| where TimeGenerated > ago(24h) and EventID == 4625
| where Account !endswith "$"
| summarize Failures = count(), Sources = dcount(IpAddress), AnIp = any(IpAddress) by Account
| where Failures >= 5
| order by Failures desc

05 · Easy Outbound connections per host

From DeviceNetworkEvents, list each device's distinct outbound RemotePort count and the top remote IP by connection count, last hour.

Show solution

DeviceNetworkEvents
| where TimeGenerated > ago(1h) and ActionType == "ConnectionSuccess" and RemoteIPType == "Public"
| summarize
    DistinctPorts = dcount(RemotePort),
    TopPeer       = arg_max(tostring(RemoteIP), count()),
    Conns         = count()
  by DeviceName
| order by Conns desc

The arg_max idiom here gives you the most-frequent peer cheaply. If you want a tie-breaker, group differently and use top-nested.

Medium (joins, parsing, sequences)

06 · Medium Failed-then-successful logon, same account, same source IP, within 5 min

Find accounts that had ≥3 failed logons followed by a successful one from the same IP within 5 minutes. Return the account, IP, host, fail count, time-to-success, and the first/last fail timestamps.

Show solution

let Window = 5m;
let Fails = SecurityEvent
    | where TimeGenerated > ago(1d) and EventID == 4625 and Account !endswith "$"
    | project FailTime = TimeGenerated, Account, IpAddress, Computer;
let Wins  = SecurityEvent
    | where TimeGenerated > ago(1d) and EventID == 4624
    | project WinTime  = TimeGenerated, Account, IpAddress, Computer;
Fails
| join kind=inner Wins on Account, IpAddress, Computer
| where WinTime > FailTime and WinTime - FailTime <= Window
| summarize
    FirstFail = min(FailTime),
    LastFail  = max(FailTime),
    SuccessAt = min(WinTime),
    FailCount = count()
  by Account, IpAddress, Computer
| extend TimeToSuccess = SuccessAt - LastFail
| where FailCount >= 3
| order by SuccessAt desc

07 · Medium Rare process by hash prevalence

Across a 14-day baseline, find process executions where the SHA256 has been seen on 3 or fewer hosts (less than 1% of fleet). Show prevalence percentage. Account for the case where a hash is brand new in the last hour.

Show solution

let Window = 14d;
let Recent = 1h;
let TotalHosts = toscalar(
    DeviceProcessEvents | where TimeGenerated > ago(Window) | summarize dcount(DeviceName)
);
let Prevalence = DeviceProcessEvents
    | where TimeGenerated > ago(Window)
    | summarize Hosts = dcount(DeviceName) by SHA256, FileName
    | extend Pct = 100.0 * Hosts / TotalHosts;
DeviceProcessEvents
| where TimeGenerated > ago(Recent)
| join kind=leftouter Prevalence on SHA256, FileName
| extend Hosts = coalesce(Hosts, toint(1)),
         Pct   = coalesce(Pct,   100.0 / TotalHosts)
| where Hosts <= 3 or Pct < 1.0
| project TimeGenerated, DeviceName, AccountName, FileName, FolderPath, ProcessCommandLine, SHA256, Hosts, Pct
| order by Hosts asc, TimeGenerated desc

Trap: a brand-new hash won't appear in the baseline at all, so the leftouter join leaves Hosts null. coalesce patches this.

08 · Medium Beaconing on an outbound destination

Find (DeviceName, RemoteIP) pairs whose outbound connections show a strong periodic signature in the last 24 hours. Return likely beacon interval in minutes and a confidence score.

Show solution

DeviceNetworkEvents
| where TimeGenerated > ago(1d)
    and ActionType == "ConnectionSuccess"
    and RemoteIPType == "Public"
| make-series n = count() default=0
    on TimeGenerated from ago(1d) to now() step 1m
    by DeviceName, RemoteIP
| where array_sum(n) >= 20                                // suppress sparse pairs
| extend (periods, scores) = series_periods_detect(n, 0.05, 1200.0, 3)
| mv-expand period = periods to typeof(double), score = scores to typeof(double)
| where score > 0.85 and period between (2 .. 600)
| project DeviceName, RemoteIP, IntervalMinutes = period, Confidence = score
| order by Confidence desc

09 · Medium Office spawning shell - parent/child anomaly

Detect winword.exe / excel.exe / outlook.exe / powerpnt.exe spawning powershell.exe, cmd.exe, wscript.exe, or cscript.exe in the last day. Decode the command line if it's base64, and join to DeviceLogonEvents to attach the most recent interactive logon for that device.

Show solution

let Office = dynamic(["winword.exe","excel.exe","outlook.exe","powerpnt.exe"]);
let Shells = dynamic(["powershell.exe","cmd.exe","wscript.exe","cscript.exe","pwsh.exe"]);
let RecentLogon = DeviceLogonEvents
    | where TimeGenerated > ago(2d) and LogonType == "Interactive" and ActionType == "LogonSuccess"
    | summarize arg_max(TimeGenerated, AccountName, AccountDomain) by DeviceName;
DeviceProcessEvents
| where TimeGenerated > ago(1d)
| where InitiatingProcessFileName in~ (Office) and FileName in~ (Shells)
| extend EncIdx = indexof(tolower(ProcessCommandLine), "-enc")
| extend B64    = extract(@"(?i)-(?:e|en|enc|encod|encodedcommand)\s+([A-Za-z0-9+/=]+)", 1, ProcessCommandLine)
| extend Decoded = iff(isnotempty(B64), make_string(base64_decode_toarray(B64)), "")
| join kind=leftouter RecentLogon on DeviceName
| project TimeGenerated, DeviceName, AccountName, AccountDomain,
          Parent = InitiatingProcessFileName, Child = FileName,
          ProcessCommandLine, Decoded, LastInteractiveLogon = TimeGenerated1
| order by TimeGenerated desc

The base64 in PowerShell -EncodedCommand is UTF-16LE - for a fully clean decode use extract + a small Python script in your runbook, or base64_decode_tostring() if your cluster supports it. The pattern above is good enough to triage.

10 · Medium Decode PowerShell ScriptBlock (4104) and stack-rank suspicious tokens

From Event table EID 4104 (or DeviceEvents with ActionType=="PowerShellCommand"), pull script blocks containing any of Invoke-Expression, DownloadString, FromBase64String, IEX, Add-Type, WMIObject. Group by host and produce a top-10 of suspicious script samples per host.

Show solution

let Tokens = dynamic(["Invoke-Expression","DownloadString","FromBase64String","IEX","Add-Type","WMIObject","Reflection.Assembly"]);
Event
| where TimeGenerated > ago(1d) and EventLog == "Microsoft-Windows-PowerShell/Operational" and EventID == 4104
| parse EventData with * "<Data Name='ScriptBlockText'>" ScriptBlock "</Data>" *
| where ScriptBlock has_any (Tokens)
| extend Hits = array_length(extract_all(strcat(@"\b(", strcat_array(Tokens, "|"), @")\b"), ScriptBlock))
| summarize Samples = make_list(substring(ScriptBlock, 0, 200), 10), Total = count(), Sus = sum(Hits)
    by Computer
| order by Sus desc

Hard (multi-stage logic, windowing, evaluate plugins)

11 · Hard Lateral movement spread (4624 type 3 fan-out)

Detect a single account that successfully authenticated to ≥5 distinct destination hosts via network logon (LogonType 3) within any 1-hour window. Return account, source IP, distinct destinations, the window start, and the list of destinations in chronological order.

Show solution

SecurityEvent
| where TimeGenerated > ago(1d) and EventID == 4624 and LogonType == 3 and Account !endswith "$"
| project TimeGenerated, Account, IpAddress, Computer
| order by Account asc, TimeGenerated asc
| extend WindowStart = row_window_session(TimeGenerated, 1h, 5m, Account != prev(Account))
| summarize
    Destinations = make_set(Computer, 200),
    Sources      = make_set(IpAddress, 50),
    DestCount    = dcount(Computer),
    First        = min(TimeGenerated),
    Last         = max(TimeGenerated)
  by Account, WindowStart
| where DestCount >= 5
| order by DestCount desc

row_window_session(timestamp, lookback, idle_timeout, reset_predicate) - assigns a session id (the start time) per account, breaking on either an account change or a 5-minute idle gap.

12 · Hard Cluster suspicious commandlines with autocluster

From all powershell.exe executions in the last 24h, surface the top column-pattern clusters that cover at least 70% of rows. Return one row per cluster, the columns that define it, and an example.

Show solution

DeviceProcessEvents
| where TimeGenerated > ago(1d) and FileName =~ "powershell.exe"
| project DeviceName, AccountDomain, AccountName,
          Parent = tostring(split(InitiatingProcessFolderPath, "\\")[-1]),
          CmdHead = substring(ProcessCommandLine, 0, 120)
| evaluate autocluster(0.7, 5, 2)

autocluster(SizeWeight, WeightFactor, NumSeeds). With 0.7 we cover ~70% of rows. Pre-projecting to a small column set and trimming long strings gives the algorithm clean signal - feeding it raw ProcessCommandLine tends to produce one cluster per row.

13 · Hard Diffpatterns: what changed today vs the prior week?

Compare today's DeviceProcessEvents distribution to the prior 7 days. Use diffpatterns to surface the column-value patterns that are over-represented today. Restrict to admin-level accounts.

Show solution

let Admins = dynamic(["administrator","da-bryce","svc-backup"]);  // or _GetWatchlist("AdminAccounts") | project Sam
DeviceProcessEvents
| where TimeGenerated > ago(8d) and AccountName in~ (Admins)
| extend bucket = iff(TimeGenerated > ago(1d), "today", "baseline")
| project bucket, DeviceName, FileName,
          Folder = tostring(split(FolderPath, "\\")[-2]),
          Parent = tostring(split(InitiatingProcessFolderPath, "\\")[-1])
| evaluate diffpatterns(bucket, "baseline", "today", "~", 0.05, 2)

The output columns CountA / CountB / PercentA / PercentB / PercentDiffAB tell you how strongly each pattern is over-represented in today.

14 · Hard Time-series anomaly: per-host outbound bytes

For each device, build a 7-day series of outbound bytes in 1-hour buckets. Flag points where the residual exceeds 3σ AND the device has emitted at least 100MB on that hour.

Show solution

DeviceNetworkEvents
| where TimeGenerated > ago(7d) and ActionType == "ConnectionSuccess"
| extend Bytes = tolong(parse_json(AdditionalFields).BytesSent)
| where isnotnull(Bytes)
| make-series Out = sum(Bytes) default=0
    on TimeGenerated from ago(7d) to now() step 1h
    by DeviceName
| extend (anomalies, score, baseline) = series_decompose_anomalies(Out, 3.0, -1, "linefit")
| mv-expand TimeGenerated to typeof(datetime),
            Out         to typeof(long),
            anomalies   to typeof(int),
            score       to typeof(double),
            baseline    to typeof(double)
| where anomalies != 0 and Out >= 100 * 1024 * 1024
| project TimeGenerated, DeviceName, MB = Out / 1024 / 1024, BaselineMB = round(baseline / 1024 / 1024, 1), Score = score
| order by Score desc

15 · Hard The full one - admin user, evil.exe, then specific DLLs ~10 min later, with N events between, jitter tolerance

On hosts where the logged-on user is in the admin watchlist, find the device where evil.exe ran and was followed (≈10 minutes ± 90 seconds of jitter) by image-loads of all of {samlib.dll, vaultcli.dll, wdigest.dll}, with at least 5 unrelated process events in between. Return the host, the user, the evil.exe instance, the timing, and the matched DLLs.

Show solution

let Window     = 15m;                                  // hard outer bound
let Target     = 10m;                                  // expected delay
let Jitter     = 90s;                                  // tolerance
let InterMin   = 5;                                    // minimum unrelated events between
let Watch      = dynamic(["samlib.dll","vaultcli.dll","wdigest.dll"]);
let Admins     = dynamic(["administrator","da-bryce","svc-backup"]);  // or _GetWatchlist("AdminAccounts")
// 1. Trigger: evil.exe execution
let Evil = DeviceProcessEvents
    | where TimeGenerated > ago(7d)
    | where FileName =~ "evil.exe"
    | where AccountName in~ (Admins)
    | project
        EvilTime   = TimeGenerated,
        DeviceName,
        AccountName,
        EvilPid    = ProcessId,
        EvilHash   = SHA256,
        EvilCmd    = ProcessCommandLine;
// 2. Image loads in the window after each evil.exe
let Loads = DeviceImageLoadEvents
    | where TimeGenerated > ago(7d)
    | where tolower(FileName) in (Watch)
    | project LoadTime = TimeGenerated, DeviceName, LoadedDll = tolower(FileName), LoadingProc = InitiatingProcessFileName, LoadingPid = InitiatingProcessId;
// 3. Inter-event noise: any process event NOT the evil.exe and NOT one of the watch DLLs
let Inter = DeviceProcessEvents
    | where TimeGenerated > ago(7d)
    | project InterTime = TimeGenerated, DeviceName, InterFile = FileName;
// 4. Join evil.exe to its DLL loads in the window
Evil
| join kind=inner Loads on DeviceName
| where LoadTime between (EvilTime + (Target - Jitter) .. EvilTime + (Target + Jitter))
       or LoadTime between (EvilTime .. EvilTime + Window)               // loose path
| summarize
    DllsHit  = make_set(LoadedDll),
    FirstLoad= min(LoadTime),
    LastLoad = max(LoadTime)
  by DeviceName, AccountName, EvilTime, EvilPid, EvilHash, EvilCmd
| where array_length(set_intersect(DllsHit, Watch)) == array_length(Watch)   // require ALL three
| extend Delay = LastLoad - EvilTime
| where Delay between (Target - Jitter .. Target + Jitter)                // jitter check on aggregate
// 5. Confirm InterMin unrelated events occurred between EvilTime and FirstLoad
| join kind=inner (
    Inter
    | summarize Bag = make_list(pack("t", InterTime, "f", InterFile)) by DeviceName
  ) on DeviceName
| mv-apply evt = Bag to typeof(dynamic) on (
    where
        todatetime(evt.t) between (EvilTime .. FirstLoad)
        and tolower(tostring(evt.f)) !in ("evil.exe")
    | summarize Inters = count()
)
| where Inters >= InterMin
| project
    DeviceName, AccountName,
    EvilTime, EvilPid, EvilHash, EvilCmd,
    FirstLoad, LastLoad, Delay,
    DllsHit, InterEventsBetween = Inters
| order by EvilTime desc

Reading the query: (1) materialize the evil.exe trigger restricted to admin users; (2) pull DLL loads in the window; (3) require all three target DLLs in the matched set; (4) verify the aggregate delay falls inside the jitter band - this is what gives you "≈10 min ± 90s"; (5) re-join unrelated process events and use mv-apply with a where on the array to count noise events between the trigger and the first DLL load.
Real-world note: if your environment has tight IR-style memory budgets, replace step (5) with a summarize that pre-buckets per host into a property bag and avoids the second join entirely.

Expert (partition + mv-apply + windowing + cross-table)

16 · Expert Per-host rare-DLL hunt with partition + mv-apply

For each device, find DLLs loaded by lsass.exe that are NOT in the device's own 30-day baseline. Output one row per host listing the new DLLs and their first-seen times. Use partition so the per-host computation is parallelized.

Show solution

let Recent = 1d;
let Baseline = 30d;
DeviceImageLoadEvents
| where TimeGenerated > ago(Baseline)
| where InitiatingProcessFileName =~ "lsass.exe"
| project DeviceName, FileName = tolower(FileName), TimeGenerated
| partition hint.strategy=native by DeviceName (
    summarize
        BaselineDlls = make_set_if(FileName, TimeGenerated between (ago(Baseline) .. ago(Recent))),
        RecentLoads  = make_list_if(pack("t", TimeGenerated, "d", FileName), TimeGenerated > ago(Recent))
        by DeviceName
    | mv-apply rec = RecentLoads to typeof(dynamic) on (
        where tostring(rec.d) !in (BaselineDlls)
        | summarize NewDlls = make_set(tostring(rec.d)),
                    FirstSeen = min(todatetime(rec.t))
      )
)
| where array_length(NewDlls) > 0
| order by FirstSeen desc

17 · Expert Reconstruct a per-host attack timeline (serialize + prev/next)

For a candidate host, produce a chronological narrative of "interesting" events (process create, network connect, image load, registry mod) with the inter-event delta and a chained "last-3" rolling memory of preceding events. Output the timeline as one row per event with columns: t, kind, summary, gap, last3.

Show solution

let Host = "WS-FINANCE-07";
let Window = 2h;
let P = DeviceProcessEvents
    | where DeviceName == Host and TimeGenerated > ago(Window)
    | project t = TimeGenerated, kind = "proc", summary = strcat(InitiatingProcessFileName, " -> ", FileName, " ", substring(ProcessCommandLine, 0, 100));
let N = DeviceNetworkEvents
    | where DeviceName == Host and TimeGenerated > ago(Window) and ActionType == "ConnectionSuccess"
    | project t = TimeGenerated, kind = "net",  summary = strcat(InitiatingProcessFileName, " => ", RemoteIP, ":", RemotePort);
let L = DeviceImageLoadEvents
    | where DeviceName == Host and TimeGenerated > ago(Window)
    | project t = TimeGenerated, kind = "load", summary = strcat(InitiatingProcessFileName, " loads ", FileName);
let R = DeviceRegistryEvents
    | where DeviceName == Host and TimeGenerated > ago(Window)
    | project t = TimeGenerated, kind = "reg",  summary = strcat(ActionType, " ", RegistryKey, "\\", RegistryValueName);
union P, N, L, R
| order by t asc
| serialize
    gap   = t - prev(t, 1),
    last3 = strcat(prev(summary, 3), "\n", prev(summary, 2), "\n", prev(summary, 1))
| project t, kind, summary, gap, last3

18 · Expert Correlate auth events with watchlist + GeoIP and find impossible travel

For each successful interactive logon (4624 type 2 or 10), enrich with GeoIP. Per account, if two consecutive logons are from cities >500 km apart and within a window shorter than realistic travel time (assume max 800 km/h), flag as impossible travel. Bring in a watchlist of high-risk accounts to prioritize.

Show solution

let Window = 7d;
let Risky = externaldata(Sam:string) [@"https://<<ENV>>/risky-users.csv"] with(format="csv", ignoreFirstRecord=true);
SecurityEvent
| where TimeGenerated > ago(Window) and EventID == 4624 and LogonType in (2,10)
| where Account !endswith "$" and isnotempty(IpAddress) and IpAddress !in ("-","::1","127.0.0.1")
| extend Geo = geo_info_from_ip_address(IpAddress)
| extend Lat = toreal(Geo.latitude), Lon = toreal(Geo.longitude), City = tostring(Geo.city), Country = tostring(Geo.country)
| where isnotnull(Lat) and isnotnull(Lon)
| order by Account asc, TimeGenerated asc
| serialize
    PrevTime = prev(TimeGenerated),
    PrevLat  = prev(Lat),
    PrevLon  = prev(Lon),
    PrevCity = prev(City),
    PrevAcct = prev(Account)
| where Account == PrevAcct
| extend KM     = geo_distance_2points(PrevLon, PrevLat, Lon, Lat) / 1000
| extend Hours  = (TimeGenerated - PrevTime) / 1h
| extend NeededKmh = KM / iff(Hours == 0, 0.0001, Hours)
| where KM > 500 and NeededKmh > 800
| join kind=leftouter Risky on $left.Account == $right.Sam
| extend Priority = iff(isnotempty(Sam), "HIGH", "NORMAL")
| project TimeGenerated, Account, From = PrevCity, To = City, KM = round(KM, 0), Gap = TimeGenerated - PrevTime, NeededKmh = round(NeededKmh, 0), Priority
| order by Priority asc, NeededKmh desc

geo_distance_2points takes (lon1, lat1, lon2, lat2) - note longitude first; getting this backwards is a classic bug.

19 · Expert WerFault / LSASS dual-process detection (production rule shape)

Detect the dual-WerFault credential-access pattern: WerFault.exe -pr <PID> spawned with the LSASS PID, followed within 2 minutes by a process-access on lsass.exe from that same WerFault PID with a credential-grade access mask (0x1010, 0x1410, 0x1438, 0x143A). Suppress the legitimate Edge "AppHangXProcB1" branch (parent is a hung browser, not a service-host). Map entities and emit one alert per (Device, EvilSequence).

Show solution

let Window     = 2m;
let CredMasks  = dynamic(["0x1010","0x1410","0x1438","0x143A","0x143a"]);
let EdgeParents = dynamic(["msedge.exe","chrome.exe","firefox.exe"]);
let WerSpawn = DeviceProcessEvents
    | where TimeGenerated > ago(1d)
    | where FileName =~ "WerFault.exe"
    | where ProcessCommandLine has "-pr"
    | extend ParentLeaf = tolower(tostring(split(InitiatingProcessFolderPath, "\\")[-1]))
    | where ParentLeaf !in~ (EdgeParents)                        // suppress AppHangXProcB1 path
    | extend TargetPid = toint(extract(@"-pr\s+(\d+)", 1, ProcessCommandLine))
    | project
        SpawnTime  = TimeGenerated,
        DeviceName,
        WerPid     = ProcessId,
        WerCmd     = ProcessCommandLine,
        TargetPid,
        Parent     = InitiatingProcessFileName,
        Account    = AccountName;
let LsassAccess = DeviceEvents
    | where TimeGenerated > ago(1d)
    | where ActionType == "ProcessAccessByOthers" and FileName =~ "lsass.exe"
    | extend Add = parse_json(AdditionalFields)
    | extend
        AccessMask = tostring(Add.GrantedAccess),
        SourcePid  = toint(Add.SourceProcessId),
        SourceImg  = tostring(Add.SourceImageFileName),
        LsassPid   = toint(Add.TargetProcessId)
    | where AccessMask in~ (CredMasks)
    | project AccessTime = TimeGenerated, DeviceName, SourcePid, SourceImg, LsassPid, AccessMask;
// 1. WerFault was spawned with -pr targeting LSASS
// 2. Same WerFault PID then opens LSASS with cred-grade mask within 2 minutes
WerSpawn
| join kind=inner LsassAccess on DeviceName
| where SourcePid == WerPid                                   // the access came from the WerFault we just saw spawn
| where TargetPid == LsassPid                                 // the -pr target IS the lsass that got accessed
| where AccessTime between (SpawnTime .. SpawnTime + Window)
| summarize
    AccessTimes = make_set(AccessTime, 10),
    Masks       = make_set(AccessMask, 10),
    SourceImgs  = make_set(SourceImg, 5),
    Hits        = count()
  by DeviceName, Account, Parent, SpawnTime, WerPid, TargetPid, WerCmd
| extend
    EvilSequence = strcat(format_datetime(SpawnTime, "yyyy-MM-dd HH:mm:ss"), "|", DeviceName, "|", WerPid, "->", TargetPid)
| project SpawnTime, DeviceName, Account, Parent, WerPid, TargetPid, Masks, AccessTimes, Hits, WerCmd, EvilSequence
// Sentinel entity mapping:  Host=DeviceName  Account=Account  CustomDetails={Masks, WerCmd, EvilSequence}

Built around the case-specific facts you've already validated: WerFault -pr spawned at SYSTEM IL parented by WerSvcGroup (or directly by services.exe) ≠ Edge AppHangXProcB1 path. Tighten EdgeParents to your environment.

20 · Expert Anomaly + characterization in one query (make-series + diffpatterns)

For each device, build a 7-day series of process-create count in 1-hour buckets. Pick the most anomalous hour per device (highest score). Then, using diffpatterns, compare what was happening in that hour versus the rest of the week, and surface the column-value patterns that distinguish the anomaly. Final output: device + anomaly hour + score + the top distinguishing patterns.

Show solution

let Window = 7d;
let Step   = 1h;
// 1. Find each device's worst-anomaly hour
let Anomalies = DeviceProcessEvents
    | where TimeGenerated > ago(Window)
    | make-series n = count() default=0
        on TimeGenerated from ago(Window) to now() step Step
        by DeviceName
    | extend (anomalies, score, baseline) = series_decompose_anomalies(n, 2.5, -1, "linefit")
    | mv-expand ts = TimeGenerated to typeof(datetime),
                 cnt = n        to typeof(long),
                 a   = anomalies to typeof(int),
                 s   = score    to typeof(double)
    | where a != 0 and s > 0
    | summarize arg_max(s, ts, cnt) by DeviceName
    | project DeviceName, AnomalyHour = bin(ts, Step), Score = s, Cnt = cnt;
// 2. For each anomaly, label rows in/out of that hour and run diffpatterns
DeviceProcessEvents
| where TimeGenerated > ago(Window)
| join kind=inner Anomalies on DeviceName
| extend bucket = iff(bin(TimeGenerated, Step) == AnomalyHour, "anomaly", "baseline")
| project DeviceName, bucket,
          Parent = InitiatingProcessFileName,
          Child  = FileName,
          Folder = tostring(split(FolderPath, "\\")[-2]),
          User   = AccountName
| partition hint.strategy=native by DeviceName (
    evaluate diffpatterns(bucket, "baseline", "anomaly", "~", 0.05, 2)
)
| join kind=leftouter Anomalies on DeviceName
| project DeviceName, AnomalyHour, Score, SegmentId, CountA, CountB, PercentA, PercentB, PercentDiffAB, Pattern
| order by Score desc, PercentDiffAB desc

What you get: "On host X at 03:00 UTC the process volume was anomalously high; the patterns that distinguish that hour from the rest of the week are parent=services.exe / child=cmd.exe / user=svc-backup." That's a finished triage finding in one query.

Appendix · Quick reference index

Operators (alphabetical)

as · count · distinct · evaluate · extend · facet · find · fork · getschema · invoke · join · limit/take · lookup · make-series · mv-apply · mv-expand · order/sort · parse · parse-kv · parse-where · partition · print · project · project-away · project-keep · project-rename · project-reorder · range · reduce · render · sample · sample-distinct · search · serialize · summarize · top · top-hitters · top-nested · union · where

High-leverage scalar/aggregate functions

ago · arg_max · arg_min · array_length · array_sum · bag_pack · bag_unpack · base64_decode_toarray · bin · bin_at · case · coalesce · count_distinct · countif · dcount · dcountif · datetime_part · extract · extract_all · format_datetime · geo_distance_2points · geo_info_from_ip_address · iff · iif · indexof · isnotempty · isnotnull · make_bag · make_list · make_list_if · make_set · make_set_if · materialize · mv-expand to typeof · now · pack · parse_json · parse_url · percentile · percentiles · prev · next · replace_regex · replace_string · row_cumsum · row_number · row_rank_dense · row_window_session · series_decompose · series_decompose_anomalies · series_decompose_forecast · series_fit_2lines · series_fit_line · series_pearson_correlation · series_periods_detect · set_difference · set_has_element · set_intersect · set_union · split · startofday · strcat · strcat_array · substring · todatetime · todynamic · toint · tolong · toreal · toscalar · tostring · totimespan · trim

[ Slot for cheat sheet image: KQL operator pipeline diagram (left-to-right). Replace with your own diagram or screenshot. ]

End of document. To regenerate or extend: edit this single .html file. All styling, code samples, diagrams, and image slots are inline - no external dependencies. For air-gapped sharing, copy the file as-is.

1. Pipeline model & flow control

let, materialize, toscalar

Functions (let-bound)

2. Filtering & shaping

String predicates, ranked by cost (cheapest first)

The project family

extend - derive columns

distinct vs. summarize-by

3. Parsing & type coercion

parse - declarative pattern parsing

parse-kv - when log lines are key=value soup

extract / extract_all - regex captures

parse_json & dynamic paths

split, strcat, replace_string, replace_regex

Type coercion

4. Time, bins & date math

The time toolkit

Sliding windows with bin

Format tokens that bite

5. Aggregation & summarize

Core aggregators

arg_max - the operator you'll use weekly

make_list with order - the serialize trick

Pivots with bag_unpack

6. Joins, lookups, unions

The eight kinds, with the trap explained

The "first time seen" idiom (leftanti)

lookup - optimized one-sided join

union - across tables and workspaces

7. Multi-value: mv-expand vs mv-apply

mv-expand

mv-apply - when you need where/order inside the array

pack / bag_unpack - building and flattening property bags

8. Advanced: partition, evaluate, serialize

partition - run a sub-pipeline per key

evaluate plugins - the power tools

serialize and the windowing functions

row_window_session - sessionize without joining

range - synthesize rows

9. Time series & the step parameter

make-series - vectorized time series construction

Anomalies and forecasts

Beaconing detection (period detection)

10. Detection-engineering patterns

A then B within Δt - the sequence pattern

Rare-process-as-service (prevalence math)

Stack counting for triage

Cross-process correlation (the WerFault/LSASS shape)

11. Performance & idioms

The "filter early" rule, illustrated

12. Sentinel-specifics

Analytics rule anatomy

Watchlists & externaldata

Sentinel functions you'll actually use

Challenges (20, easy → expert)

Easy (warm-up: single table, one or two operators)

01 · Easy Firewall-related EventIDs

02 · Easy Parse Sysmon UtcTime to a real datetime

03 · Easy Distinct processes per host (Sysmon EID 1)

04 · Easy Failed logons per account

05 · Easy Outbound connections per host

Medium (joins, parsing, sequences)

06 · Medium Failed-then-successful logon, same account, same source IP, within 5 min

07 · Medium Rare process by hash prevalence

08 · Medium Beaconing on an outbound destination

09 · Medium Office spawning shell - parent/child anomaly

10 · Medium Decode PowerShell ScriptBlock (4104) and stack-rank suspicious tokens

Hard (multi-stage logic, windowing, evaluate plugins)

11 · Hard Lateral movement spread (4624 type 3 fan-out)

12 · Hard Cluster suspicious commandlines with autocluster

13 · Hard Diffpatterns: what changed today vs the prior week?

14 · Hard Time-series anomaly: per-host outbound bytes

15 · Hard The full one - admin user, evil.exe, then specific DLLs ~10 min later, with N events between, jitter tolerance

Expert (partition + mv-apply + windowing + cross-table)

16 · Expert Per-host rare-DLL hunt with partition + mv-apply

17 · Expert Reconstruct a per-host attack timeline (serialize + prev/next)

18 · Expert Correlate auth events with watchlist + GeoIP and find impossible travel

19 · Expert WerFault / LSASS dual-process detection (production rule shape)

20 · Expert Anomaly + characterization in one query (make-series + diffpatterns)

Appendix · Quick reference index

5. Aggregation & `summarize`

7. Multi-value: `mv-expand` vs `mv-apply`

8. Advanced: `partition`, `evaluate`, `serialize`

9. Time series & the `step` parameter