Incident Timeline
Cross-incident chronological timeline of every event across the platform.
22 events
Most recent first
Posted public update: 'We are investigating elevated checkout failures in US-East. Some payment attempts may fail. Please retry.'
Checkout API elevated 5xx rate in us-east-1
Linked runbook: Checkout 3DS Timeout
Checkout API elevated 5xx rate in us-east-1
Internal note: error rate started climbing 4 minutes after canary rollout began. Pattern matches a known issue with the new 3DS retry logic.
Checkout API elevated 5xx rate in us-east-1
Linked deployment DEP-2024-006 (canary v2.4.1-rc1)
Checkout API elevated 5xx rate in us-east-1
Added service: checkout-api, billing-service
Checkout API elevated 5xx rate in us-east-1
Added Sofia Bianchi (payments on-call) and Priya Raman (reliability on-call) as responders
Checkout API elevated 5xx rate in us-east-1
Marcus Anderson acknowledged and took commander role
Checkout API elevated 5xx rate in us-east-1
Alert ALR-1248 linked to incident
Checkout API elevated 5xx rate in us-east-1
Incident auto-created from alert ALR-1248 (checkout 5xx > 1%)
Checkout API elevated 5xx rate in us-east-1
Status changed to monitoring. Queue depth back under 1k.
Billing invoice generation backlog
Scaled workers from 4 to 8. Queue depth dropping ~500/min.
Billing invoice generation backlog
Caleb Foster acknowledged as commander
Billing invoice generation backlog
Incident auto-created from alert ALR-1245 (invoice queue depth > 10k)
Billing invoice generation backlog
Identified: new hero image is 4.2MB. Need to add proper sizing + lazy loading.
Web app slow page loads (LCP regression)
401 rate back to baseline. Resolving.
Auth service token validation errors in EU
Purging JWKS cache across all authz-engine instances
Auth service token validation errors in EU
Identified stale JWKS cache as cause
Auth service token validation errors in EU
Theo Lambert acknowledged
Auth service token validation errors in EU
Incident created from alert ALR-1240 (auth 401 rate spike)
Auth service token validation errors in EU
Linked deployment DEP-2024-008 (v4.2)
Web app slow page loads (LCP regression)
Mei Lin acknowledged
Web app slow page loads (LCP regression)
Incident created from alert ALR-1238 (LCP > 2.5s)
Web app slow page loads (LCP regression)