Engineering Overview
Real-time view of system reliability, active incidents, and team operations across all environments.
3 active incidents requiring attention
HighINC-2841 - Checkout API elevated 5xx rate in us-east-1
Active Incidents
3
-12%vs last week
Firing Alerts
3
+8%vs yesterday
Services Healthy
9/11
2 degraded
MTTR (30d)
18.4min
-8%improving
SLO Breached
1
2 at risk
Deploy Freq
14.2/day
+4%vs last week
API Latency Percentiles
p50, p95, p99 across all gateway routes
Global Error Rate
5xx rate across all production services
Service Health Distribution
Current health of all monitored services
82%Healthy
Incident Volume (30 days)
Incidents by severity over the last 30 days
Response Performance
MTTR, MTTD, and MTTA over the last 30 days
Active Incidents
3 requiring attention
On-Call Now
Currently paged engineers
Caleb Foster
primary on-call ยท Platform
Firing Alerts
Highest priority alerts requiring attention
Deployment Frequency (30 days)
Successful deployments and failures per day
Top Services by Request Volume
Requests per second across all production services
Quick Access
Jump to common workflows