<b>Guardrail: the duplicate-title scanner</b>
At scale, near-duplicate titles are the quiet killer — "Best {x} in {city}" times 5,000 reads as one page to a clustering algorithm. Install a hard guard.
The title-uniqueness routine:
☐ Step 1 — Generate all titles in a dry run, no publish.
☐ Step 2 — Strip the variable tokens, hash the static skeleton. Gate: if 100% of titles share one skeleton with only the city swapped, the template fails. Inject a second varying data point (rating, count, year).
☐ Step 3 — Levenshtein-cluster the full title strings. Gate: fail any cluster where more than 50 titles sit within edit-distance 5 of each other.
☐ Step 4 — Enforce a length band of 50-60 characters AFTER token substitution, using the longest real value, not the average. Gate: fail if the max-length value truncates.
☐ Step 5 — Meta descriptions get the same scan, with a 30-character minimum unique span per page.
Guardrail: this scanner runs in CI on every template change, not just at launch.
Ship gate: don't publish until all boxes are checked.
Scale Engine SOP
@ScaleEngineSOP
<b>Guardrail: the duplicate-title scanner</b>
Этот пост опубликован в Telegram-канале Scale Engine SOP. Подписаться можно по ссылке: @ScaleEngineSOP.