<b>Your crawl budget isn't wasted on pages. It's wasted on combinations</b>
When real crawl waste happens, it's rarely your 500 articles. It's the URL parameters multiplying behind them. One product page with color, size, and sort filters can spawn thousands of crawlable variants, all near-identical, all eating crawl on a loop.
Google retired the URL Parameters tool in 2022 and basically said: handle it yourself. So you have to. Canonicalize variants to the clean URL, avoid linking to parameterized versions internally, and use robots.txt to block genuinely infinite spaces like calendars and session IDs.
The test isn't "how many pages do I have." It's "how many URLs can a crawler reach by clicking around." Those numbers are often wildly different.
Finite content, infinite URLs. That's the actual leak.
Budget Myths
@CrawlBudgetMyths
<b>Your crawl budget isn't wasted on pages. It's wasted on combinations</b>
Этот пост опубликован в Telegram-канале Budget Myths. Подписаться можно по ссылке: @CrawlBudgetMyths.