/Alerts & Notifications

Understanding Alert Conditions: Spikes, Rates, and Dead Services

Learn when each alert condition type is most useful.

Error Spike

Fires when the absolute error count exceeds a threshold. Best for catching sudden bursts.

Example: "Alert if more than 10 errors in 5 minutes" catches a deployment that breaks something.

Fires when errors as a percentage of total logs exceed a threshold. Best for high-traffic services where absolute counts are misleading.

Example: "Alert if error rate exceeds 5%" catches quality degradation even when overall volume is normal.

Fires when logs matching a text pattern exceed a threshold. Best for watching specific business events.

Example: "Alert if logs containing 'out of stock' exceed 20 in 1 hour" catches inventory problems.

Fires when an error type that has never been seen before appears. Best for catching bugs introduced by new deployments.

Fires when HTTP 5xx responses exceed a threshold. Requires request logging to be enabled. Best for API services.

Fires when no logs are received from a service for a period. Best for detecting crashed services or broken pipelines.

Example: "Alert if no logs from api-gateway for 15 minutes" catches silent failures.

Start conservative

Set thresholds higher than you think necessary. You can always lower them. Too many false alerts cause your team to ignore all alerts.