Reliability Dashboard
Cross-service reliability posture, SLO compliance, error budget health, and uptime trends.
Avg Uptime (30d)
99.928%
+0.04%vs prev 30d
Error Budget Remaining
71%
-6%vs last week
SLO Compliance
67%
6/9 healthy
MTTR (30d)
18.4min
-8%improving
SLO Compliance by Service
Current SLI value vs target
Composite Availability
Weighted availability across tier-1 services
MTTR / MTTD / MTTA
30-day incident response performance
Availability Calendar - api-gateway
Daily uptime for the last 90 days (green = 100%, red = degraded)
Less
More90-day window
Reliability Scorecards
Per-service reliability posture with tier, SLO, and error budget
Service | Tier | Health | Uptime 30d | SLO | Error Budget | p95 | Trend |
|---|---|---|---|---|---|---|---|
api-gateway chi | Tier 1 | Operational | 99.992% | Healthy | 86% | 38ms | 0.14% |
billing-service NestJS | Tier 1 | Degraded | 99.860% | At Risk | 28% | 124ms | 0.05% |
auth-service gin | Tier 1 | Operational | 99.970% | Healthy | 92% | 52ms | 0.42% |
postgres-primary PostgreSQL 16 | Tier 1 | Operational | 99.999% | Healthy | 100% | 4ms | 1.92% |
redis-cluster Redis 7 | Tier 2 | Operational | 99.980% | Healthy | 100% | 1ms | 1.97% |
checkout-api Spring Boot | Tier 1 | Partial Outage | 99.620% | Breached | 0% | 312ms | 1.36% |
kafka-bus Kafka 3.7 | Tier 2 | Operational | 99.960% | Healthy | 95% | 8ms | 0.95% |
web-app Next.js 16 | Tier 2 | Operational | 99.910% | At Risk | 38% | 220ms | 0.34% |
Showing 1-8 of 11
1 / 2
Customer Impact
Current state
Impacted Services
2/ 11
Est. Users Affected
3
per second, approx