Engineering Overview

Engineering Overview

Real-time view of system reliability, active incidents, and team operations across all environments.

Report Incident

3 active incidents requiring attention

High

INC-2841 - Checkout API elevated 5xx rate in us-east-1

Respond

Active Incidents

3
-12%vs last week

Firing Alerts

3
+8%vs yesterday

Services Healthy

9/11

2 degraded

MTTR (30d)

18.4min
-8%improving

SLO Breached

1

2 at risk

Deploy Freq

14.2/day
+4%vs last week

API Latency Percentiles

p50, p95, p99 across all gateway routes

Global Error Rate

5xx rate across all production services

Service Health Distribution

Current health of all monitored services

82%Healthy

Incident Volume (30 days)

Incidents by severity over the last 30 days

Response Performance

MTTR, MTTD, and MTTA over the last 30 days

Top Services by Request Volume

Requests per second across all production services

Command Palette

Search for a command to run...