Purple Teaming Operations: Closing the Gap Between Offense and Defense

The average detection coverage rate sits at 24% — and the primary cause is not tool failure but organizational separation. Red teams operate in isolation, blue teams review findings weeks later in a PDF, and the gap between what an adversary actually does and what defenders actually detect remains unmeasured and unmanaged. Purple teaming closes that gap by fusing offense and defense into a single operational cycle where every attack step produces an immediate defensive outcome.

This is part 5 in a series on threat-informed defense. Start with part 1.

The Siloed Team Problem

Part 1 identified siloed purple teams as one of five recurring failure modes in threat-informed defense programs. The pattern is consistent across organizations: a red team executes an adversary emulation, documents findings in a slide deck, delivers it to the blue team in a debrief two weeks later, and moves on. The blue team then tries to reconstruct the attack path from memory and screenshots, writes detection rules based on incomplete context, and tests those rules in isolation. The cycle repeats quarterly — or annually.

Three structural problems emerge from this separation:

Problem	Mechanism	Impact
Context decay	Red team context degrades between execution and debrief; blue team lacks real-time visibility into adversary tooling, timing, and variance	Detection rules target the artifact (filename, hash) instead of the behavior (technique, sub-technique)
Feedback latency	Weeks elapse between attack execution and detection validation; no opportunity to iterate on a technique in real time	A detection gap exposed in Q1 remains open until Q2 or Q3; mean time to remediate a detection gap averages 87 days (Picus Blue Report 2025)
Measurement loss	No per-step observation data; only aggregate pass/fail results recorded	MTTD and MTTR cannot be measured for individual techniques; coverage maps remain estimates

Purple teaming is not a team structure — it is an operating model that eliminates all three problems by embedding offensive and defensive operators in the same execution loop.

Purple Teaming Defined

A purple team exercise is a structured, time-boxed engagement where offensive and defensive participants execute and observe adversary techniques in real time. The purple team is not a third team. It is the collaboration layer between red and blue — sometimes a dedicated facilitator, sometimes a shared protocol, always a real-time feedback channel.

Three attributes distinguish purple teaming from other security testing:

Technique-level granularity. The unit of work is a single ATT&CK technique or sub-technique — not a full kill chain. Each step is executed, observed, measured, and iterated before proceeding.
Immediate feedback. The red operator declares the technique before execution. The blue operator confirms detection (or non-detection) within minutes, not weeks. If detection fails, both sides collaborate on root cause while the telemetry is still in the SIEM.
Coverage as the output. The deliverable is not a PDF report — it is an updated coverage map with per-technique detection status, gap classification, and a prioritized remediation backlog.

The Purple Team Exercise Lifecycle

A purple team exercise follows a six-phase lifecycle that maps directly to the five-phase threat-informed cycle introduced in Part 1 (Profile → Map → Assess → Emulate → Iterate):

Phase 1: Scope and Threat Profile

Define the threat group or technique set for the exercise. The selection criteria come from the coverage map: prioritize techniques that are (a) high-prevalence in the relevant threat landscape, (b) currently at gap status in the coverage map, or (c) recently added or modified in ATT&CK updates.

Example scope for a financial services organization targeting APT28 (Fancy Bear):

T1566.002 — Spearphishing Link
T1195 — Supply Chain Compromise
T1078 — Valid Accounts
T1059.001 — PowerShell
T1053.005 — Scheduled Task
T1087.001 — Account Discovery: Local Account
T1046 — Network Service Discovery
T1070.004 — File Deletion

This scope is narrow enough for a single two-day exercise but broad enough to exercise an attack path with lateral movement and persistence.

Phase 2: Pre-Exercise Telemetry Verification

Before executing a single technique, verify that the required telemetry sources are active and flowing. Part 4 established the gap classification triad: telemetry gap, detection gap, tuning gap. Running an emulation against a telemetry gap wastes time — the blue team cannot detect what they cannot see.

The pre-exercise checklist confirms:

Telemetry Source	Required For	Verification
Sysmon (EID 1, 7, 10, 11)	T1055 Process Injection, T1059.001 PowerShell	Confirm Sysmon service running; verify EID 1 events arriving in SIEM with CommandLine field populated
PowerShell ScriptBlock Logging (EID 4104)	T1059.001 obfuscated commands	Execute test ScriptBlock; confirm 4104 events arrive with full script text
Windows Security Event Log (EID 4624, 4625, 4672)	T1078 Valid Accounts, T1087 Account Discovery	Generate test logon events; confirm arrival and field mapping
Azure AD / Entra ID Sign-in Logs	T1078.011 Cloud Accounts	Verify log export connector or diagnostic settings forwarding to SIEM
EDR telemetry	All endpoint techniques	Confirm agent health and event forwarding with a test process creation

If a telemetry source is missing, pause the exercise for that technique. Document the gap, record it in the backlog, and proceed to techniques where telemetry is present. This discipline separates purple teaming from ad hoc red teaming — every observation is grounded in confirmed data availability.

Phase 3: Execute and Observe (The Core Loop)

The core of a purple team exercise is the execute-observe-classify loop, run once per technique:

Declare. Red operator announces the technique ID and expected observable artifacts (e.g., "Executing T1059.001 via Invoke-Expression with base64-encoded payload; expect Sysmon EID 1 with CommandLine containing -EncodedCommand").
Execute. Red operator runs the technique. Timing is recorded.
Observe. Blue operator searches SIEM/EDR for expected signals. Timer starts at execution time.
Classify. Blue operator reports one of four outcomes:
- Detected — alert fired, correct technique mapped
- Detected — No Alert — telemetry present, detection logic exists but threshold or context filter suppressed the alert
- Telemetry Present — No Detection — data is in the SIEM but no rule covers this technique variant
- No Telemetry — required data source not flowing (should have been caught in Phase 2)
Record MTTD. If detected, measure the time from execution to alert. If not detected, mark MTTD as gap.
Iterate (optional). If the detection failed and time permits, blue operator drafts a detection rule on the spot. Red operator re-executes the technique to validate. This in-exercise iteration is the highest-value activity in purple teaming — it turns a gap into a closed detection within hours instead of months.

A single technique loop takes 10–30 minutes depending on complexity. A two-day exercise with eight-hour execution windows covers 16–48 technique executions, including re-runs for validation.

Phase 4: Gap Triage and Sprint Planning

After the exercise, every technique has a classification. Translate these into the backlog using the Part 4 gap triage framework:

Classification	Gap Type	Typical Resolution	Priority Signal
No Telemetry	Telemetry gap	Deploy missing data source (Sysmon, ScriptBlock Logging, CloudTrail)	Always P1 — detection is impossible without data
Telemetry Present — No Detection	Detection gap	Author new Sigma rule or SIEM-native detection	P1 if technique is top-10 prevalence; P2 otherwise
Detected — No Alert	Tuning gap	Adjust threshold, add context filter, or fix allowlisting error	P2 — tuning is faster than authoring but risks false positives if rushed
Detected	None (validated)	N/A — update coverage map to validated status	N/A

The sprint plan follows the Part 4 emulation-to-sprint loop: receive the exercise report, classify gaps, prioritize by technique prevalence, execute a two-week detection sprint, and re-validate in the next purple team exercise.

Phase 5: Coverage Map Update

Every technique exercised gets its status updated in the organization's coverage map. The coverage map — introduced in Part 1 — tracks per-technique status across three states: detected, mitigated, gap. Purple team exercises add a fourth dimension: validated. A technique marked detected based on rule existence but never exercised is an assumption. A technique marked validated has been exercised against real adversary behavior and confirmed to produce an alert.

The maturity progression from Part 4 maps directly:

Level 1 — Ad-Hoc: Coverage map is aspirational; no exercises conducted
Level 2 — Mapped: Coverage map exists; rules written but untested against adversary execution
Level 3 — Validated: Purple team exercises have confirmed detection for exercised techniques; gaps are documented and prioritized
Level 4 — Continuously Validated: Purple team exercises run on a cadence (quarterly or per-ATT&CK-update); new techniques are validated within 30 days of mapping

Most organizations sit at Level 2. Moving to Level 3 requires one discipline: running the execute-observe-classify loop on a recurring basis.

Phase 6: Iterate

The fifth phase of the threat-informed cycle is iterate — and it is where purple teaming becomes a continuous practice rather than a periodic event. Three iteration triggers restart the lifecycle:

ATT&CK update — MITRE releases major ATT&CK updates twice per year (typically April and October). New techniques and sub-techniques invalidate parts of the coverage map. Each update is a trigger for a scoped exercise.
Threat landscape shift — A new threat group profile relevant to the organization's vertical (e.g., Volt Typhoon for critical infrastructure, Lazarus for financial services) demands a targeted exercise against that group's technique set.
Detection sprint completion — When a two-week sprint closes gaps from the previous exercise, the next exercise validates those closures. This creates a validate-remediate-revalidate cadence.

Running Purple Team Exercises: The Operational Playbook

Beyond the lifecycle, three operational considerations determine whether purple teaming produces lasting value or becomes another shelfware exercise.

Facilitation and Roles

Every purple team exercise needs a facilitator — someone who is neither executing techniques nor writing detections. The facilitator enforces the loop protocol, records observations and timing, and prevents the two most common exercise failures:

Scope creep — Part 3 identified this explicitly. A red operator discovers a new attack vector mid-exercise and chases it. The facilitator records the discovery for a future exercise but redirects back to the scoped technique list.
Debug drift — A detection fails and both operators spend 90 minutes troubleshooting the SIEM query grammar. The facilitator caps debugging at 15 minutes per technique; unresolved failures go to the backlog.

Role mapping:

Role	Responsibility	Typical Assignment
Facilitator	Declare-execute-observe protocol enforcement, timer, scope guard, observation logging	Security architect or detection engineering lead (neutral party)
Red Operator	Technique execution, artifact declaration, variant testing	Pen test team member or adversary emulation specialist
Blue Operator	SIEM/EDR observation, detection classification, on-the-spot rule authoring	Detection engineer or SOC analyst
Observer	Shadow and learn; ask questions during debrief	SOC analysts, incident responders, threat intelligence analysts

Communication Protocol

The declare-execute-observe loop requires a structured communication channel. Two formats work in practice:

Co-located: A shared screen (SIEM dashboard) and verbal declarations. Red operator announces technique, blue operator queries in real time. Fastest iteration — 5–10 minutes per technique.
Remote: A dedicated chat channel (Slack, Teams) with a templated message format: [TECHNIQUE] T1059.001 | [VARIANT] EncodedCommand | [OBSERVABLES] Sysmon EID 1, CommandLine contains -EncCommand | [STATUS] Executing. Blue operator replies with classification. Slightly slower but works across time zones and creates an automatic log.

Avoid email-based coordination — the latency destroys the real-time loop that makes purple teaming effective.

Tooling Alignment

Offensive and defensive tooling must be aligned before the exercise. Mismatches waste hours:

Alignment Check	Failure Mode	Resolution
SIEM field names match expected detection rules	Sigma rule references `CommandLine` but SIEM maps it as `command_line`; detection misfires	Run sigma-cli conversion with correct pipeline before exercise
EDR agent supports required telemetry	Sysmon EID 10 (ProcessAccess) not forwarded; T1055 detection impossible	Deploy missing Sysmon configuration or EDR sensor update in Phase 2
Red team tooling matches exercise fidelity tier	Atomic Red Team test creates a benign signal (notepad.exe spawning from PowerShell); CALDERA or C2 framework creates authentic adversary tooling; mismatch on fidelity expectations causes confusion	Agree on fidelity tier per technique before exercise: unit test (Atomic), integration test (CALDERA), or end-to-end (C2 + CTID plan)
Network architecture reflects production	Exercise runs in flat lab network; production has segmentation, proxy, Zero Trust policies; detection rules validated in lab fail in production	Use production-adjacent environment or canary testing with limited-scope production execution

Measuring Purple Team Outcomes

Five metrics track the effectiveness of a purple team program over time:

Metric	Definition	Benchmark (2025)	Target
Detection Coverage Rate	Percentage of exercised techniques that produced an alert	24% industry average (Picus Blue Report 2025)	>50% after first exercise cycle; >70% after three cycles
Mean Time to Detect (MTTD)	Average time from technique execution to alert generation	CrowdStrike 2026: breakout time is 29 minutes (down from 48 min)	MTTD < breakout time for all high-prevalence techniques
Gap Closure Rate	Percentage of identified gaps closed within one sprint cycle (2 weeks)	No industry benchmark; informal estimates suggest 30–40%	>60% for P1 gaps; >40% for P2 gaps
Revalidation Pass Rate	Percentage of previously validated techniques that still produce alerts in subsequent exercises	No industry benchmark	>90% (regression below this signals rule decay or infrastructure change)
Exercise Velocity	Number of technique executions per exercise day	10–15 techniques/day (co-located with experienced team)	Stable or increasing; declining velocity signals process friction

The most revealing metric is the revalidation pass rate. A detection rule that passed in March but fails in June is a regression — typically caused by SIEM schema changes, log source outages, or adversary technique variance. Continuous purple teaming catches regression before a real adversary does.

Exceptions and Limits

Purple teaming is powerful within its domain but carries boundaries worth stating explicitly:

LotL techniques resist structured loops. Living-off-the-land techniques (T1059.001 PowerShell, T1078 Valid Accounts, T1046 Network Service Discovery) use legitimate admin tools. The declare-execute-observe loop still works, but detection requires behavioral baselining and contextual correlation — not a single-event Sigma rule. Blue operators must assess whether the observed behavior is within baseline or anomalous, which takes longer and introduces subjectivity. Budget 30–45 minutes per LotL technique instead of 10–15.
Cloud and identity layers have different observation models. Endpoint telemetry (Sysmon, EDR) provides rich per-event data. Cloud audit logs (CloudTrail, Azure Activity, Entra ID sign-in logs) are coarser — they record API calls, not process trees. A purple team exercise targeting T1078.011 Cloud Accounts or T1098 Account Manipulation requires the blue operator to work with log analytics (KQL, Log Analytics) rather than a SIEM correlation engine. The loop structure is the same; the tooling and query patterns differ.
Zero-day techniques are outside scope by definition. Purple teaming validates detection against known techniques. Novel techniques (those not yet in ATT&CK) require a different discipline — threat hunting informed by anomaly detection and threat intelligence. Part 6 in this series covers AI-augmented approaches to this problem.
Organizational readiness is a prerequisite, not a result. A team at detection maturity Level 1 (Ad-Hoc) cannot run a productive purple team exercise — they lack the telemetry, detection rules, and coverage map to classify observations. The progression is sequential: map controls to ATT&CK (Level 2), then validate via purple exercises (Level 3). Running purple exercises prematurely produces frustration and no useful output.
Exercise fatigue is real. Quarterly exercises covering the same technique set produce diminishing returns once coverage exceeds 70%. At that point, the exercise scope should shift to newly added ATT&CK techniques, threat group profiles not yet emulated, or cloud/identity layers not yet exercised. Stagnation kills program momentum.

Honest Assessment

Dimension	Siloed Red + Blue (Traditional)	Purple Team Operations
Feedback latency	Weeks to months (report delivery cycle)	Minutes (real-time declare-observe loop)
Detection coverage visibility	Estimated from rule counts; unvalidated	Measured per technique; validated against adversary execution
Gap classification accuracy	Inferred from post-exercise reconstruction	Classified live with telemetry confirmed in Phase 2
Mean time to close a detection gap	87 days average (Picus 2025)	Hours for in-exercise iteration; 2 weeks for sprint-triaged gaps
Regression detection	None — rules are written and forgotten	Revalidation pass rate catches rule decay proactively
Organizational cost	Low per event; high cumulative (repeat testing without progress)	Higher per exercise (dedicated facilitation, 2-day block); lower cumulative (each exercise moves the coverage map forward)

The trade-off is upfront investment versus compounding returns. The first purple team exercise costs more in coordination and facilitation than a traditional red team engagement. The third exercise costs less than a red team retest — because the coverage map has progressed, the scope narrows to net-new techniques, and the operating rhythm is established.

Actionable Takeaways

Run the declare-execute-observe loop before building anything new. The quickest win in purple teaming is confirming which existing detections actually fire against adversary execution. The coverage map correction from assumption to validation often reveals that 30–40% of "detected" techniques are actually tuning gaps.
Appoint a facilitator and protect the scope. Scope creep and debug drift destroy more purple team exercises than any other factor. A neutral facilitator with a timer and a technique list is the single highest-leverage role in the exercise.
Verify telemetry before executing techniques. Phase 2 is not optional. Running an exercise against missing telemetry generates "No Telemetry" classifications that should have been caught in pre-work. Every "No Telemetry" result is a wasted technique slot.
Track revalidation pass rate as the leading health indicator. Coverage at a single point in time is a snapshot. Coverage that stays stable across exercise cycles is a program. If revalidation pass rate drops below 90%, investigate immediately — rule decay is often the first sign of a SIEM or EDR configuration change that will affect real incident detection.
Transition from periodic events to a continuous cadence. One purple team exercise per year is a project. One per quarter is a practice. One per ATT&CK update cycle is a program. The iteration trigger model (ATT&CK update + threat landscape shift + sprint completion) provides a natural cadence that is event-driven rather than calendar-driven.

This is part 5 in a series on threat-informed defense. Start with part 1. Part 6 will cover AI-augmented threat profiling — using large language models and graph analysis to extend threat-informed defense beyond the known technique catalog.

Purple Teaming Operations: Closing the Gap Between Offense and Defense

The Siloed Team Problem

Purple Teaming Defined

The Purple Team Exercise Lifecycle

Phase 1: Scope and Threat Profile

Phase 2: Pre-Exercise Telemetry Verification

Phase 3: Execute and Observe (The Core Loop)

Phase 4: Gap Triage and Sprint Planning

Phase 5: Coverage Map Update

Phase 6: Iterate

Running Purple Team Exercises: The Operational Playbook

Facilitation and Roles

Communication Protocol

Tooling Alignment

Measuring Purple Team Outcomes

Exceptions and Limits

Honest Assessment

Actionable Takeaways

Topics

More

Follow

The Siloed Team Problem

Purple Teaming Defined

The Purple Team Exercise Lifecycle

Phase 1: Scope and Threat Profile

Phase 2: Pre-Exercise Telemetry Verification

Phase 3: Execute and Observe (The Core Loop)

Phase 4: Gap Triage and Sprint Planning

Phase 5: Coverage Map Update

Phase 6: Iterate

Running Purple Team Exercises: The Operational Playbook

Facilitation and Roles

Communication Protocol

Tooling Alignment

Measuring Purple Team Outcomes

Exceptions and Limits

Honest Assessment

Actionable Takeaways

Related Posts

Detection Engineering: Mapping Controls to Real Adversary Techniques

Adversary Emulation: Testing Defenses Against Real Attack Paths

AI-Augmented Threat Profiling Beyond ATT&CK

Topics

More

Follow