<b>Why does your lift test 'reach significance' if you just keep watching it?</b>
The question: you're running an incrementality test, checking the dashboard daily, and one morning it crosses statistical significance. You call it and ship. Did you find a real effect — or did you find the inevitable consequence of looking many times?
What the statistics say: this is <b>peeking</b> (or optional stopping), and it's one of the most common ways attribution experiments lie. A fixed-sample significance test assumes you look <i>once</i>, at a pre-planned sample size. Every additional peek gives the noise another chance to cross the threshold. Check daily for a month and your real false-positive rate isn't 5% — simulations put it well above 20-30%, depending on how often you look. The test didn't get more sensitive; you gave randomness more shots at the goal.
Why it's epidemic in marketing: ad platforms surface live 'significance' indicators that practically invite peeking, and the pressure to call a winner early is enormous. The result is a literature of 'wins' that don't replicate — classic correlation (a lucky run) mistaken for causation (a real effect).
The nuance: you <i>can</i> look continuously — if you use methods built for it. Sequential testing and always-valid p-values (group-sequential designs, Bayesian approaches with proper priors) adjust the threshold for repeated looks. The sin isn't watching; it's watching with a one-look test.
What to actually do: pre-register your sample size and analysis date, or switch to a sequential/always-valid framework explicitly. Don't stop on the first green light.
<b>Bottom line for practitioners:</b> a lift test you peek at is a slot machine with a significance badge. Fix the stopping rule before you start, or your wins won't survive contact with reality.
Credit Where Due
@CreditWhereDue
<b>Why does your lift test 'reach significance' if you just keep watching it?</b>
Этот пост опубликован в Telegram-канале Credit Where Due. Подписаться можно по ссылке: @CreditWhereDue.