Incident Command Center
Centralized war room view of all active incidents, response timelines, and on-call coverage.
3 active incidents ยท 5 paging alerts ยท 3 firing
Active Incidents
0 critical
Ack SLA (24h)
Response SLA (24h)
MTTR (30d)
Active War Rooms
Live incident response rooms with commander and responder details
Checkout API elevated 5xx rate in us-east-1
Approximately 3% of checkout attempts in US-East are failing with a 500 error. Retry usually succeeds on second attempt.
Marcus Anderson: Posted public update: 'We are investigating elevated checkout failures in US-East. Some payment attempts may fail. Please retry.'
05:48 AMBilling invoice generation backlog
Invoice emails delayed by up to 6 hours. No financial impact. Payments continue to process normally.
Caleb Foster: Status changed to monitoring. Queue depth back under 1k.
04:12 AMWeb app slow page loads (LCP regression)
No functional impact, but slower page loads affect SEO ranking and user experience.
Mei Lin: Identified: new hero image is 4.2MB. Need to add proper sizing + lazy loading.
06:20 PMResponse Timeline
Latest activity across all incidents
Posted public update: 'We are investigating elevated checkout failures in US-East. Some payment attempts may fail. Please retry.'
7/2/2026, 5:48:00 AMLinked runbook: Checkout 3DS Timeout
7/2/2026, 5:38:00 AMInternal note: error rate started climbing 4 minutes after canary rollout began. Pattern matches a known issue with the new 3DS retry logic.
7/2/2026, 5:31:00 AMLinked deployment DEP-2024-006 (canary v2.4.1-rc1)
7/2/2026, 5:24:00 AMAdded service: checkout-api, billing-service
7/2/2026, 5:22:00 AMAdded Sofia Bianchi (payments on-call) and Priya Raman (reliability on-call) as responders
7/2/2026, 5:18:00 AMMarcus Anderson acknowledged and took commander role
7/2/2026, 5:16:00 AMAlert ALR-1248 linked to incident
7/2/2026, 5:14:00 AMIncident auto-created from alert ALR-1248 (checkout 5xx > 1%)
7/2/2026, 5:12:00 AMStatus changed to monitoring. Queue depth back under 1k.
7/2/2026, 4:12:00 AMScaled workers from 4 to 8. Queue depth dropping ~500/min.
7/1/2026, 11:02:00 PMCaleb Foster acknowledged as commander
7/1/2026, 10:45:00 PMIncident auto-created from alert ALR-1245 (invoice queue depth > 10k)
7/1/2026, 10:30:00 PMIdentified: new hero image is 4.2MB. Need to add proper sizing + lazy loading.
7/1/2026, 6:20:00 PM401 rate back to baseline. Resolving.
7/1/2026, 2:18:00 PMPurging JWKS cache across all authz-engine instances
7/1/2026, 2:12:00 PMIdentified stale JWKS cache as cause
7/1/2026, 2:08:00 PMTheo Lambert acknowledged
7/1/2026, 2:03:00 PMOn-Call Now
Currently paged engineer
Escalation Ladder
Platform - Standard policy
Firing Alerts
Awaiting ack or escalation