← Back to Factory Recipes
🚨

Incident Response Factory

Automated incident triage, investigation, and remediation pipeline

Operations 7 stages 4 specialists v1.0.0

About

A seven-stage incident response pipeline that takes an alert through triage, log analysis, root cause identification, remediation planning, safety review, human approval, and postmortem documentation. Multiple specialists collaborate across the pipeline, with a reflection gate ensuring the remediation plan is safe before human approval.

Input / Output

Input

Incident alert or error report to investigate

alert

Output

Incident report with root cause analysis and remediation steps

min quality: 0.8

Pipeline Stages

triage

Execute

Assess incident severity, affected systems, and blast radius

👤 guardian 🔧 shell, file_read, grep

log analysis

Execute

Analyze logs, metrics, and traces to identify error patterns

👤 analyst 🔧 shell, file_read, grep ← triage

root cause

Execute

Determine root cause from log analysis and system state

👤 analyst 🔧 file_read, grep, shell ← log analysis

remediation plan

Execute

Propose remediation steps with rollback strategy

👤 engineer 🔧 file_read, file_write ← root cause
🔍

review plan

Reflect

Review remediation plan for safety and completeness

← remediation plan quality ≥ 0.9 max depth: 2

approval

Approval

Human approval before executing remediation

← review plan timeout: 30m

postmortem

Execute

Generate incident postmortem with timeline, root cause, and prevention measures

👤 writer 🔧 file_write ← approval

Tags

incidentopstriageremediation