<b>The crawl-budget tax of large hreflang clusters</b>
An under-discussed cost: hreflang creates a dense graph of cross-references, and Google must crawl and reconcile every node before it can trust the cluster. For large multilingual sites, this interacts with crawl budget in ways that delay correct behavior.
The mechanism: a cluster is only valid once Google has recrawled enough members to confirm reciprocity. Add a 40th language, and the annotation isn't confirmed until the crawler revisits a meaningful share of the other 39 pages to see the new return tags.
What we observe:
— On sites with constrained crawl budget, newly added locales can take weeks to activate, because the bottleneck is recrawl frequency of the slowest cluster members, not the new page itself.
— Frequent structural churn (adding/removing locales, changing URL patterns) keeps the cluster in a perpetual semi-confirmed state. Each change resets the reconciliation clock for affected nodes.
— Sitemaps with accurate <code>lastmod</code> meaningfully help here — they hint which members changed, focusing recrawl where it matters.
The practical implication: international rollouts should be batched and stable, not drip-fed. Adding 20 markets at once, then leaving the structure alone, confirms faster than adding one per week for 20 weeks.
Caveats: crawl budget is mostly a concern at scale (hundreds of thousands of URLs); small sites rarely hit it. And we can't directly measure Google's internal reconciliation state — we infer it from the lag between deploying tags and seeing correct per-locale serving.
Hreflang Lab
@HreflangLab
<b>The crawl-budget tax of large hreflang clusters</b>
Этот пост опубликован в Telegram-канале Hreflang Lab. Подписаться можно по ссылке: @HreflangLab.