Purple Teaming Operations: Closing the Gap Between Offense and Defense
The average detection coverage rate sits at 24% — and the primary cause is not tool failure but organizational separation. Red teams operate in isolation, blue teams review findings weeks later in a PDF, and the gap between what an adversary actually does and what defenders actually detect remains unmeasured and unmanaged. Purple teaming closes that gap by fusing offense and defense into a single operational cycle where every attack step produces an immediate defensive outcome.
This is part 5 in a series on threat-informed defense. Start with part 1.
The Siloed Team Problem
Part 1 identified siloed purple teams as one of five recurring failure modes in threat-informed defense programs. The pattern is consistent across organizations: a red team executes an adversary emulation, documents findings in a slide deck, delivers it to the blue team in a debrief two weeks later, and moves on. The blue team then tries to reconstruct the attack path from memory and screenshots, writes detection rules based on incomplete context, and tests those rules in isolation. The cycle repeats quarterly — or annually.
Three structural problems emerge from this separation:
| Problem | Mechanism | Impact |
|---|---|---|
| Context decay | Red team context degrades between execution and debrief; blue team lacks real-time visibility into adversary tooling, timing, and variance | Detection rules target the artifact (filename, hash) instead of the behavior (technique, sub-technique) |
| Feedback latency | Weeks elapse between attack execution and detection validation; no opportunity to iterate on a technique in real time | A detection gap exposed in Q1 remains open until Q2 or Q3; mean time to remediate a detection gap averages 87 days (Picus Blue Report 2025) |
| Measurement loss | No per-step observation data; only aggregate pass/fail results recorded | MTTD and MTTR cannot be measured for individual techniques; coverage maps remain estimates |
Purple teaming is not a team structure — it is an operating model that eliminates all three problems by embedding offensive and defensive operators in the same execution loop.
Purple Teaming Defined
A purple team exercise is a structured, time-boxed engagement where offensive and defensive participants execute and observe adversary techniques in real time. The purple team is not a third team. It is the collaboration layer between red and blue — sometimes a dedicated facilitator, sometimes a shared protocol, always a real-time feedback channel.
Three attributes distinguish purple teaming from other security testing:
- Technique-level granularity. The unit of work is a single ATT&CK technique or sub-technique — not a full kill chain. Each step is executed, observed, measured, and iterated before proceeding.
- Immediate feedback. The red operator declares the technique before execution. The blue operator confirms detection (or non-detection) within minutes, not weeks. If detection fails, both sides collaborate on root cause while the telemetry is still in the SIEM.
- Coverage as the output. The deliverable is not a PDF report — it is an updated coverage map with per-technique detection status, gap classification, and a prioritized remediation backlog.
The Purple Team Exercise Lifecycle
A purple team exercise follows a six-phase lifecycle that maps directly to the five-phase threat-informed cycle introduced in Part 1 (Profile → Map → Assess → Emulate → Iterate):
Phase 1: Scope and Threat Profile
Define the threat group or technique set for the exercise. The selection criteria come from the coverage map: prioritize techniques that are (a) high-prevalence in the relevant threat landscape, (b) currently at gap status in the coverage map, or (c) recently added or modified in ATT&CK updates.
Example scope for a financial services organization targeting APT28 (Fancy Bear):
- T1566.002 — Spearphishing Link
- T1195 — Supply Chain Compromise
- T1078 — Valid Accounts
- T1059.001 — PowerShell
- T1053.005 — Scheduled Task
- T1087.001 — Account Discovery: Local Account
- T1046 — Network Service Discovery
- T1070.004 — File Deletion
This scope is narrow enough for a single two-day exercise but broad enough to exercise an attack path with lateral movement and persistence.
Phase 2: Pre-Exercise Telemetry Verification
Before executing a single technique, verify that the required telemetry sources are active and flowing. Part 4 established the gap classification triad: telemetry gap, detection gap, tuning gap. Running an emulation against a telemetry gap wastes time — the blue team cannot detect what they cannot see.
The pre-exercise checklist confirms:
| Telemetry Source | Required For | Verification |
|---|---|---|
| Sysmon (EID 1, 7, 10, 11) | T1055 Process Injection, T1059.001 PowerShell | Confirm Sysmon service running; verify EID 1 events arriving in SIEM with CommandLine field populated |
| PowerShell ScriptBlock Logging (EID 4104) | T1059.001 obfuscated commands | Execute test ScriptBlock; confirm 4104 events arrive with full script text |
| Windows Security Event Log (EID 4624, 4625, 4672) | T1078 Valid Accounts, T1087 Account Discovery | Generate test logon events; confirm arrival and field mapping |
| Azure AD / Entra ID Sign-in Logs | T1078.011 Cloud Accounts | Verify log export connector or diagnostic settings forwarding to SIEM |
| EDR telemetry | All endpoint techniques | Confirm agent health and event forwarding with a test process creation |
If a telemetry source is missing, pause the exercise for that technique. Document the gap, record it in the backlog, and proceed to techniques where telemetry is present. This discipline separates purple teaming from ad hoc red teaming — every observation is grounded in confirmed data availability.
Phase 3: Execute and Observe (The Core Loop)
The core of a purple team exercise is the execute-observe-classify loop, run once per technique:
- Declare. Red operator announces the technique ID and expected observable artifacts (e.g., "Executing T1059.001 via
Invoke-Expressionwith base64-encoded payload; expect Sysmon EID 1 with CommandLine containing-EncodedCommand"). - Execute. Red operator runs the technique. Timing is recorded.
- Observe. Blue operator searches SIEM/EDR for expected signals. Timer starts at execution time.
- Classify. Blue operator reports one of four outcomes:
- Detected — alert fired, correct technique mapped
- Detected — No Alert — telemetry present, detection logic exists but threshold or context filter suppressed the alert
- Telemetry Present — No Detection — data is in the SIEM but no rule covers this technique variant
- No Telemetry — required data source not flowing (should have been caught in Phase 2)
- Record MTTD. If detected, measure the time from execution to alert. If not detected, mark MTTD as gap.
- Iterate (optional). If the detection failed and time permits, blue operator drafts a detection rule on the spot. Red operator re-executes the technique to validate. This in-exercise iteration is the highest-value activity in purple teaming — it turns a gap into a closed detection within hours instead of months.
A single technique loop takes 10–30 minutes depending on complexity. A two-day exercise with eight-hour execution windows covers 16–48 technique executions, including re-runs for validation.
Phase 4: Gap Triage and Sprint Planning
After the exercise, every technique has a classification. Translate these into the backlog using the Part 4 gap triage framework:
| Classification | Gap Type | Typical Resolution | Priority Signal |
|---|---|---|---|
| No Telemetry | Telemetry gap | Deploy missing data source (Sysmon, ScriptBlock Logging, CloudTrail) | Always P1 — detection is impossible without data |
| Telemetry Present — No Detection | Detection gap | Author new Sigma rule or SIEM-native detection | P1 if technique is top-10 prevalence; P2 otherwise |
| Detected — No Alert | Tuning gap | Adjust threshold, add context filter, or fix allowlisting error | P2 — tuning is faster than authoring but risks false positives if rushed |
| Detected | None (validated) | N/A — update coverage map to validated status | N/A |
The sprint plan follows the Part 4 emulation-to-sprint loop: receive the exercise report, classify gaps, prioritize by technique prevalence, execute a two-week detection sprint, and re-validate in the next purple team exercise.
Phase 5: Coverage Map Update
Every technique exercised gets its status updated in the organization's coverage map. The coverage map — introduced in Part 1 — tracks per-technique status across three states: detected, mitigated, gap. Purple team exercises add a fourth dimension: validated. A technique marked detected based on rule existence but never exercised is an assumption. A technique marked validated has been exercised against real adversary behavior and confirmed to produce an alert.
The maturity progression from Part 4 maps directly:
- Level 1 — Ad-Hoc: Coverage map is aspirational; no exercises conducted
- Level 2 — Mapped: Coverage map exists; rules written but untested against adversary execution
- Level 3 — Validated: Purple team exercises have confirmed detection for exercised techniques; gaps are documented and prioritized
- Level 4 — Continuously Validated: Purple team exercises run on a cadence (quarterly or per-ATT&CK-update); new techniques are validated within 30 days of mapping
Most organizations sit at Level 2. Moving to Level 3 requires one discipline: running the execute-observe-classify loop on a recurring basis.
Phase 6: Iterate
The fifth phase of the threat-informed cycle is iterate — and it is where purple teaming becomes a continuous practice rather than a periodic event. Three iteration triggers restart the lifecycle:
- ATT&CK update — MITRE releases major ATT&CK updates twice per year (typically April and October). New techniques and sub-techniques invalidate parts of the coverage map. Each update is a trigger for a scoped exercise.
- Threat landscape shift — A new threat group profile relevant to the organization's vertical (e.g., Volt Typhoon for critical infrastructure, Lazarus for financial services) demands a targeted exercise against that group's technique set.
- Detection sprint completion — When a two-week sprint closes gaps from the previous exercise, the next exercise validates those closures. This creates a validate-remediate-revalidate cadence.
Running Purple Team Exercises: The Operational Playbook
Beyond the lifecycle, three operational considerations determine whether purple teaming produces lasting value or becomes another shelfware exercise.
Facilitation and Roles
Every purple team exercise needs a facilitator — someone who is neither executing techniques nor writing detections. The facilitator enforces the loop protocol, records observations and timing, and prevents the two most common exercise failures:
- Scope creep — Part 3 identified this explicitly. A red operator discovers a new attack vector mid-exercise and chases it. The facilitator records the discovery for a future exercise but redirects back to the scoped technique list.
- Debug drift — A detection fails and both operators spend 90 minutes troubleshooting the SIEM query grammar. The facilitator caps debugging at 15 minutes per technique; unresolved failures go to the backlog.
Role mapping:
| Role | Responsibility | Typical Assignment |
|---|---|---|
| Facilitator | Declare-execute-observe protocol enforcement, timer, scope guard, observation logging | Security architect or detection engineering lead (neutral party) |
| Red Operator | Technique execution, artifact declaration, variant testing | Pen test team member or adversary emulation specialist |
| Blue Operator | SIEM/EDR observation, detection classification, on-the-spot rule authoring | Detection engineer or SOC analyst |
| Observer | Shadow and learn; ask questions during debrief | SOC analysts, incident responders, threat intelligence analysts |
Communication Protocol
The declare-execute-observe loop requires a structured communication channel. Two formats work in practice:
- Co-located: A shared screen (SIEM dashboard) and verbal declarations. Red operator announces technique, blue operator queries in real time. Fastest iteration — 5–10 minutes per technique.
- Remote: A dedicated chat channel (Slack, Teams) with a templated message format:
[TECHNIQUE] T1059.001 | [VARIANT] EncodedCommand | [OBSERVABLES] Sysmon EID 1, CommandLine contains -EncCommand | [STATUS] Executing. Blue operator replies with classification. Slightly slower but works across time zones and creates an automatic log.
Avoid email-based coordination — the latency destroys the real-time loop that makes purple teaming effective.
Tooling Alignment
Offensive and defensive tooling must be aligned before the exercise. Mismatches waste hours:
| Alignment Check | Failure Mode | Resolution |
|---|---|---|
| SIEM field names match expected detection rules | Sigma rule references CommandLine but SIEM maps it as command_line; detection misfires | Run sigma-cli conversion with correct pipeline before exercise |
| EDR agent supports required telemetry | Sysmon EID 10 (ProcessAccess) not forwarded; T1055 detection impossible | Deploy missing Sysmon configuration or EDR sensor update in Phase 2 |
| Red team tooling matches exercise fidelity tier | Atomic Red Team test creates a benign signal (notepad.exe spawning from PowerShell); CALDERA or C2 framework creates authentic adversary tooling; mismatch on fidelity expectations causes confusion | Agree on fidelity tier per technique before exercise: unit test (Atomic), integration test (CALDERA), or end-to-end (C2 + CTID plan) |
| Network architecture reflects production | Exercise runs in flat lab network; production has segmentation, proxy, Zero Trust policies; detection rules validated in lab fail in production | Use production-adjacent environment or canary testing with limited-scope production execution |
Measuring Purple Team Outcomes
Five metrics track the effectiveness of a purple team program over time:
| Metric | Definition | Benchmark (2025) | Target |
|---|---|---|---|
| Detection Coverage Rate | Percentage of exercised techniques that produced an alert | 24% industry average (Picus Blue Report 2025) | >50% after first exercise cycle; >70% after three cycles |
| Mean Time to Detect (MTTD) | Average time from technique execution to alert generation | CrowdStrike 2026: breakout time is 29 minutes (down from 48 min) | MTTD < breakout time for all high-prevalence techniques |
| Gap Closure Rate | Percentage of identified gaps closed within one sprint cycle (2 weeks) | No industry benchmark; informal estimates suggest 30–40% | >60% for P1 gaps; >40% for P2 gaps |
| Revalidation Pass Rate | Percentage of previously validated techniques that still produce alerts in subsequent exercises | No industry benchmark | >90% (regression below this signals rule decay or infrastructure change) |
| Exercise Velocity | Number of technique executions per exercise day | 10–15 techniques/day (co-located with experienced team) | Stable or increasing; declining velocity signals process friction |
The most revealing metric is the revalidation pass rate. A detection rule that passed in March but fails in June is a regression — typically caused by SIEM schema changes, log source outages, or adversary technique variance. Continuous purple teaming catches regression before a real adversary does.
Exceptions and Limits
Purple teaming is powerful within its domain but carries boundaries worth stating explicitly:
- LotL techniques resist structured loops. Living-off-the-land techniques (T1059.001 PowerShell, T1078 Valid Accounts, T1046 Network Service Discovery) use legitimate admin tools. The declare-execute-observe loop still works, but detection requires behavioral baselining and contextual correlation — not a single-event Sigma rule. Blue operators must assess whether the observed behavior is within baseline or anomalous, which takes longer and introduces subjectivity. Budget 30–45 minutes per LotL technique instead of 10–15.
- Cloud and identity layers have different observation models. Endpoint telemetry (Sysmon, EDR) provides rich per-event data. Cloud audit logs (CloudTrail, Azure Activity, Entra ID sign-in logs) are coarser — they record API calls, not process trees. A purple team exercise targeting T1078.011 Cloud Accounts or T1098 Account Manipulation requires the blue operator to work with log analytics (KQL, Log Analytics) rather than a SIEM correlation engine. The loop structure is the same; the tooling and query patterns differ.
- Zero-day techniques are outside scope by definition. Purple teaming validates detection against known techniques. Novel techniques (those not yet in ATT&CK) require a different discipline — threat hunting informed by anomaly detection and threat intelligence. Part 6 in this series covers AI-augmented approaches to this problem.
- Organizational readiness is a prerequisite, not a result. A team at detection maturity Level 1 (Ad-Hoc) cannot run a productive purple team exercise — they lack the telemetry, detection rules, and coverage map to classify observations. The progression is sequential: map controls to ATT&CK (Level 2), then validate via purple exercises (Level 3). Running purple exercises prematurely produces frustration and no useful output.
- Exercise fatigue is real. Quarterly exercises covering the same technique set produce diminishing returns once coverage exceeds 70%. At that point, the exercise scope should shift to newly added ATT&CK techniques, threat group profiles not yet emulated, or cloud/identity layers not yet exercised. Stagnation kills program momentum.
Honest Assessment
| Dimension | Siloed Red + Blue (Traditional) | Purple Team Operations |
|---|---|---|
| Feedback latency | Weeks to months (report delivery cycle) | Minutes (real-time declare-observe loop) |
| Detection coverage visibility | Estimated from rule counts; unvalidated | Measured per technique; validated against adversary execution |
| Gap classification accuracy | Inferred from post-exercise reconstruction | Classified live with telemetry confirmed in Phase 2 |
| Mean time to close a detection gap | 87 days average (Picus 2025) | Hours for in-exercise iteration; 2 weeks for sprint-triaged gaps |
| Regression detection | None — rules are written and forgotten | Revalidation pass rate catches rule decay proactively |
| Organizational cost | Low per event; high cumulative (repeat testing without progress) | Higher per exercise (dedicated facilitation, 2-day block); lower cumulative (each exercise moves the coverage map forward) |
The trade-off is upfront investment versus compounding returns. The first purple team exercise costs more in coordination and facilitation than a traditional red team engagement. The third exercise costs less than a red team retest — because the coverage map has progressed, the scope narrows to net-new techniques, and the operating rhythm is established.
Actionable Takeaways
- Run the declare-execute-observe loop before building anything new. The quickest win in purple teaming is confirming which existing detections actually fire against adversary execution. The coverage map correction from assumption to validation often reveals that 30–40% of "detected" techniques are actually tuning gaps.
- Appoint a facilitator and protect the scope. Scope creep and debug drift destroy more purple team exercises than any other factor. A neutral facilitator with a timer and a technique list is the single highest-leverage role in the exercise.
- Verify telemetry before executing techniques. Phase 2 is not optional. Running an exercise against missing telemetry generates "No Telemetry" classifications that should have been caught in pre-work. Every "No Telemetry" result is a wasted technique slot.
- Track revalidation pass rate as the leading health indicator. Coverage at a single point in time is a snapshot. Coverage that stays stable across exercise cycles is a program. If revalidation pass rate drops below 90%, investigate immediately — rule decay is often the first sign of a SIEM or EDR configuration change that will affect real incident detection.
- Transition from periodic events to a continuous cadence. One purple team exercise per year is a project. One per quarter is a practice. One per ATT&CK update cycle is a program. The iteration trigger model (ATT&CK update + threat landscape shift + sprint completion) provides a natural cadence that is event-driven rather than calendar-driven.
This is part 5 in a series on threat-informed defense. Start with part 1. Part 6 will cover AI-augmented threat profiling — using large language models and graph analysis to extend threat-informed defense beyond the known technique catalog.