<b>Q: My site isn't down, it's just painfully slow. Should that even alert?</b>
A: Yes, because slow is the outage your users notice before "down" ever happens. A page taking 12 seconds drives people away as effectively as a 500 error, but a simple up/down check sails right past it.
Add a response-time threshold to your checks. Pick a number from your real baseline, not a guess, then alert when response time crosses it for several consecutive checks (consecutive matters, so one slow blip doesn't page you).
Two tiers work well:
— Warn at, say, 2x your normal response time (chat channel)
— Page when it's both slow and sustained, since that usually precedes a full outage
Likely follow-up: measure at a percentile like p95, not the average. Averages hide the slow tail where your unhappiest users live.
Got a question? Drop it in the comments.
Pingback Clinic
@PingbackClinic
<b>Q: My site isn't down, it's just painfully slow. Should that even alert?</b>
Этот пост опубликован в Telegram-канале Pingback Clinic. Подписаться можно по ссылке: @PingbackClinic.