Runbook: <incident-name>
Trigger
- <what signal indicates this incident>
Impact
- <affected area>
- <severity and user/business impact>
Fast triage
- <first quick check>
- <second quick check>
- <identify failing stage/component>
Most common causes
- <cause 1>
- <cause 2>
- <cause 3>
Recovery steps
- <step 1>
- <step 2>
- <step 3>
Validation (exit criteria)
- <technical verification>
- <quality gate/check command>
- <user-visible verification>
Escalation
- <who to involve if unresolved>
- <maximum time before escalation>
Follow-up
- Policy or architecture gap → create or update an ADR.
- Process gap → update this runbook.
- Add a follow-up task with owner and date.
Incident log (optional)
- Date:
- Owner:
- Summary:
- Preventive actions:
Page history
| Date | Change | Author |
|---|---|---|
| Added Page history section (repository baseline). | Ivan Boyarkin |