Why most lander A/B "wins" are noise — the statistics every optimizer skips
Deep dive: This is the meta-post. Before trusting any tactic above, you need to trust the tests behind it — and most lander A/B tests are statistically incapable of detecting the effects people claim from them.
The core problem is power. To reliably detect a small lift (say 5-10% relative) on a base rate of a few percent, you need thousands of conversions per variant, often tens of thousands of visitors. Most affiliate tests call a winner after a few hundred visits and a handful of conversions. At that volume the confidence interval is so wide it includes "no effect" and "the loser is actually better." The declared 20% lift is mostly sampling noise.
Three specific traps compound it. Peeking: checking results repeatedly and stopping when significance appears inflates false positives dramatically — sequential looks at a fixed-horizon test can push the real error rate from 5% toward 20-30% (this is why peeking is the cardinal A/B sin). Regression to the mean: an early extreme result drifts back toward true value as data accumulates, so "huge early win" usually shrinks. Multiple comparisons: test ten things, expect one false "winner" at p
Above Fold Lab
@AboveFoldLab
Why most lander A/B "wins" are noise — the statistics every optimizer skips
Этот пост опубликован в Telegram-канале Above Fold Lab. Подписаться можно по ссылке: @AboveFoldLab.