Incident Templates

Pre-defined templates for common incident types. Each template includes a recommended severity, checklist, and responder roster.

Templates

Available to use

Avg Checklist Steps

Per template

Critical Severity

Templates for major incidents

Tier-1 Service Outage

Complete outage of a Tier-1 service (auth, payments, gateway) affecting all customers in one or more regions.

Est. Duration

30-120 min

Checklist

10 steps

Responders

On-call SREService ownerEngineering managerIncident commander

Checklist preview

Acknowledge within 2 minutes
Page on-call engineer + manager
Open war room (Slack + video bridge)
Post initial public status update
+ 6 more

DegradationHigh

Service Degradation

Increased latency, error rate, or partial unavailability of a service. Customers impacted but service is still usable.

Est. Duration

20-90 min

Checklist

9 steps

Responders

On-call SREService owner

Checklist preview

Acknowledge within 5 minutes
Page on-call engineer
Identify scope and impact
Check dashboards and recent changes
+ 5 more

SecurityCritical

Security Incident

Suspected or confirmed security breach, vulnerability exploitation, or unauthorized access. Requires security team engagement.

Est. Duration

Hours to days

Checklist

10 steps

Responders

Security on-callCISOLegalIncident commanderComms

Checklist preview

Acknowledge within 2 minutes
Page security on-call + CISO
Engage legal and compliance
Preserve forensic evidence
+ 6 more

DataCritical

Data Incident

Data loss, corruption, or unauthorized data access. Includes production DB issues, missing backups, or data leak.

Est. Duration

1-8 hours

Checklist

10 steps

Responders

DBAService ownerData protection officerIncident commander

Checklist preview

Acknowledge within 5 minutes
Page DBA + service owner
Stop write traffic if needed
Identify scope of data affected
+ 6 more

OutageHigh

Single-Region Partial Outage

Loss of service in a single region with multi-region failover available. Customers in affected region impacted.

Est. Duration

30-90 min

Checklist

8 steps

Responders

On-call SREInfra leadIncident commander

Checklist preview

Acknowledge within 5 minutes
Trigger regional failover
Verify failover success
Notify customers in affected region
+ 4 more

DegradationMedium

Performance Regression

Latency or throughput regression detected via synthetic monitoring or RUM. May not be customer-impacting yet.

Est. Duration

30-60 min

Checklist

6 steps

Responders

Service ownerPerformance engineer

Checklist preview

Acknowledge within 15 minutes
Identify affected endpoints/services
Check recent deployments for regressions
Roll back suspect deployment if needed
+ 2 more

Incident Templates

Tier-1 Service Outage

Service Degradation

Security Incident

Data Incident

Single-Region Partial Outage

Performance Regression

Response Playbook

Quick Actions

Resources

Incident Templates

Tier-1 Service Outage

Service Degradation

Security Incident

Data Incident

Single-Region Partial Outage

Performance Regression

Command Palette