Runbook: <incident-name>

Trigger

  • <what signal indicates this incident>

Impact

  • <affected area>
  • <severity and user/business impact>

Fast triage

  1. <first quick check>
  2. <second quick check>
  3. <identify failing stage/component>

Most common causes

  • <cause 1>
  • <cause 2>
  • <cause 3>

Recovery steps

  1. <step 1>
  2. <step 2>
  3. <step 3>

Validation (exit criteria)

  • <technical verification>
  • <quality gate/check command>
  • <user-visible verification>

Escalation

  • <who to involve if unresolved>
  • <maximum time before escalation>

Follow-up

  • Policy or architecture gap → create or update an ADR.
  • Process gap → update this runbook.
  • Add a follow-up task with owner and date.

Incident log (optional)

  • Date:
  • Owner:
  • Summary:
  • Preventive actions:

Page history

Date Change Author
Added Page history section (repository baseline). Ivan Boyarkin