Incident Templates
Pre-defined templates for common incident types. Each template includes a recommended severity, checklist, and responder roster.
Templates
Available to use
Avg Checklist Steps
Per template
Critical Severity
Templates for major incidents
Categories
Outage, degradation, security, data
Tier-1 Service Outage
Complete outage of a Tier-1 service (auth, payments, gateway) affecting all customers in one or more regions.
- Acknowledge within 2 minutes
- Page on-call engineer + manager
- Open war room (Slack + video bridge)
- Post initial public status update
- + 6 more
Service Degradation
Increased latency, error rate, or partial unavailability of a service. Customers impacted but service is still usable.
- Acknowledge within 5 minutes
- Page on-call engineer
- Identify scope and impact
- Check dashboards and recent changes
- + 5 more
Security Incident
Suspected or confirmed security breach, vulnerability exploitation, or unauthorized access. Requires security team engagement.
- Acknowledge within 2 minutes
- Page security on-call + CISO
- Engage legal and compliance
- Preserve forensic evidence
- + 6 more
Data Incident
Data loss, corruption, or unauthorized data access. Includes production DB issues, missing backups, or data leak.
- Acknowledge within 5 minutes
- Page DBA + service owner
- Stop write traffic if needed
- Identify scope of data affected
- + 6 more
Single-Region Partial Outage
Loss of service in a single region with multi-region failover available. Customers in affected region impacted.
- Acknowledge within 5 minutes
- Trigger regional failover
- Verify failover success
- Notify customers in affected region
- + 4 more
Performance Regression
Latency or throughput regression detected via synthetic monitoring or RUM. May not be customer-impacting yet.
- Acknowledge within 15 minutes
- Identify affected endpoints/services
- Check recent deployments for regressions
- Roll back suspect deployment if needed
- + 2 more