Templates

Incident Templates

Pre-defined templates for common incident types. Each template includes a recommended severity, checklist, and responder roster.

Templates

6

Available to use

Avg Checklist Steps

9

Per template

Critical Severity

3

Templates for major incidents

Categories

4

Outage, degradation, security, data

Category:
OutageCritical

Tier-1 Service Outage

Complete outage of a Tier-1 service (auth, payments, gateway) affecting all customers in one or more regions.

Est. Duration
30-120 min
Checklist
10 steps
Responders
On-call SREService ownerEngineering managerIncident commander
Checklist preview
  • Acknowledge within 2 minutes
  • Page on-call engineer + manager
  • Open war room (Slack + video bridge)
  • Post initial public status update
  • + 6 more
DegradationHigh

Service Degradation

Increased latency, error rate, or partial unavailability of a service. Customers impacted but service is still usable.

Est. Duration
20-90 min
Checklist
9 steps
Responders
On-call SREService owner
Checklist preview
  • Acknowledge within 5 minutes
  • Page on-call engineer
  • Identify scope and impact
  • Check dashboards and recent changes
  • + 5 more
SecurityCritical

Security Incident

Suspected or confirmed security breach, vulnerability exploitation, or unauthorized access. Requires security team engagement.

Est. Duration
Hours to days
Checklist
10 steps
Responders
Security on-callCISOLegalIncident commanderComms
Checklist preview
  • Acknowledge within 2 minutes
  • Page security on-call + CISO
  • Engage legal and compliance
  • Preserve forensic evidence
  • + 6 more
DataCritical

Data Incident

Data loss, corruption, or unauthorized data access. Includes production DB issues, missing backups, or data leak.

Est. Duration
1-8 hours
Checklist
10 steps
Responders
DBAService ownerData protection officerIncident commander
Checklist preview
  • Acknowledge within 5 minutes
  • Page DBA + service owner
  • Stop write traffic if needed
  • Identify scope of data affected
  • + 6 more
OutageHigh

Single-Region Partial Outage

Loss of service in a single region with multi-region failover available. Customers in affected region impacted.

Est. Duration
30-90 min
Checklist
8 steps
Responders
On-call SREInfra leadIncident commander
Checklist preview
  • Acknowledge within 5 minutes
  • Trigger regional failover
  • Verify failover success
  • Notify customers in affected region
  • + 4 more
DegradationMedium

Performance Regression

Latency or throughput regression detected via synthetic monitoring or RUM. May not be customer-impacting yet.

Est. Duration
30-60 min
Checklist
6 steps
Responders
Service ownerPerformance engineer
Checklist preview
  • Acknowledge within 15 minutes
  • Identify affected endpoints/services
  • Check recent deployments for regressions
  • Roll back suspect deployment if needed
  • + 2 more

Command Palette

Search for a command to run...