NGINX Error Rate Monitoring
5xx alerts that fire in seconds, not minutes.
Per-status-code error rate from the access log. Alerts on 5xx spikes, 499 timeouts and 502/504 upstream failures. Routes to Slack, Telegram, email, webhooks.
Error metrics we graph
nginx.http.status.5xx
Server-error rate. Broken down by 500, 502, 503, 504.
nginx.http.status.4xx
Client-error rate. 404, 403, 499 tracked individually.
nginx.http.status.502
Bad-gateway rate — almost always an upstream problem.
nginx.http.status.504
Gateway-timeout rate — slow upstream or proxy_read_timeout too low.
Example alert recipes
- 5xx spike: nginx.http.status.5xx > 5 rps for 2 minutes → Slack
- 502 sustained: nginx.http.status.502 > 1 rps for 5 minutes → Telegram + webhook to PagerDuty
- 499 storm: nginx.http.status.4xx.499 > 20% of total requests → email (often a CDN / client-side issue)