<b>A 500-URL audit sample misses 1-in-7 site-wide issues</b>
We ran full crawls on 48 sites, then resampled at 500 URLs to measure detection error.
— Issues caught at 500-URL sample: 85.7%
— Issues missed (long-tail templates): 14.3% ▓▓░░░░░░░░
— Sample needed for 95% detection: ~3,800 URLs
The miss rate isn't random. Sampling under-represents rare templates — the author-archive page, the one legacy category, the print stylesheet route. Those are exactly where orphaned canonical and 5xx bugs hide.
Detection error scaled with template diversity, not site size. A 2M-URL site with 4 templates audits cleanly at 500; a 30k-URL site with 60 templates needs 5x the sample.
So what: size your sample by template count, not URL count. Stratify — one sample per template beats a random 500.
Crawl & Render
@CrawlAndRender
<b>A 500-URL audit sample misses 1-in-7 site-wide issues</b>
Этот пост опубликован в Telegram-канале Crawl & Render. Подписаться можно по ссылке: @CrawlAndRender.