API Gateway 5xx Spike
Operational runbook - 3 steps
Author
Last Updated
2026-06-18
Views
184
Steps
3
Procedure
Follow each step in order. Mark complete as you proceed.
1
Triage
Check the rate of 5xx responses per upstream in Grafana. If only one upstream is failing, route investigation to its owning team.
2
Mitigate
If the upstream is overloaded, enable circuit breaker via the runtime config. If a bad deploy is suspected, prepare for rollback.
3
Rollback
Use `sg deploy rollback api-gateway --ref <last-good>` to revert to the previous stable release.
3 steps total
Quick Actions
Related Runbooks
4 related
Related Incidents
0 on this service
No related incidents
Related Alerts
3 on this service
ALR-1247firing
API Gateway p99 latency is 218ms (threshold 200ms)
ALR-1246firing
Host db-replica-2 disk usage is 87% (threshold 85%)
ALR-1244suppressed
API Gateway p99 latency is 215ms (threshold 200ms)