<b>Sitemap hygiene SOP for dynamic page sets</b>
A sitemap that lists <code>noindex</code> pages, 404s, or redirects sends conflicting signals at scale. Treat it as a managed artifact, regenerated nightly.
The generation contract:
☐ Step 1 — Source of truth is the index ladder, not the route table. Only Rung-1+ pages get listed. Gate: a <code>noindex</code> URL in the sitemap fails CI.
☐ Step 2 — Status pre-check. Sample-fetch entries; any non-200 is excluded. Gate: fail the build if more than 1% of a sample returns errors.
☐ Step 3 — Shard at 45,000 URLs (under the 50k limit, with headroom) and register all shards in a sitemap index.
☐ Step 4 — Honest <code>lastmod</code> from the record's real update timestamp. Gate: no blanket "today" stamps.
☐ Step 5 — Diff vs. yesterday. Log added/removed URLs. A sudden 10k drop should alert, not ship silently.
Guardrail: monthly, reconcile sitemap URL count against Search Console's indexed count. A widening gap means the firewall upstream is leaking thin pages.
Ship gate: don't publish until all boxes are checked.
Scale Engine SOP
@ScaleEngineSOP
<b>Sitemap hygiene SOP for dynamic page sets</b>
Этот пост опубликован в Telegram-канале Scale Engine SOP. Подписаться можно по ссылке: @ScaleEngineSOP.