Understanding Alert Conditions: Spikes, Rates, and Dead Services
Learn when each alert condition type is most useful.
Error Spike
Fires when the absolute error count exceeds a threshold. Best for catching sudden bursts.
Example: "Alert if more than 10 errors in 5 minutes" catches a deployment that breaks something.
Error Rate
Fires when errors as a percentage of total logs exceed a threshold. Best for high-traffic services where absolute counts are misleading.
Example: "Alert if error rate exceeds 5%" catches quality degradation even when overall volume is normal.
Log Contains
Fires when logs matching a text pattern exceed a threshold. Best for watching specific business events.
Example: "Alert if logs containing 'out of stock' exceed 20 in 1 hour" catches inventory problems.
New Error
Fires when an error type that has never been seen before appears. Best for catching bugs introduced by new deployments.
5xx Spike
Fires when HTTP 5xx responses exceed a threshold. Requires request logging to be enabled. Best for API services.
No Logs (Dead Service)
Fires when no logs are received from a service for a period. Best for detecting crashed services or broken pipelines.
Example: "Alert if no logs from api-gateway for 15 minutes" catches silent failures.
Set thresholds higher than you think necessary. You can always lower them. Too many false alerts cause your team to ignore all alerts.