<b>Where is the "thin content" threshold, really — and is it about length?</b>
Thin-content penalties get blamed on short pages. But I've seen 400-word pages rank and 2,000-word pages get ignored. I tried to locate what actually separates indexed-and-ranking from crawled-and-discarded.
Using GSC's index-coverage states, I split 5,200 URLs into "indexed & ranking," "indexed not ranking," and "crawled - not indexed," then compared features across the buckets.
— Median word count barely differed between ranking and not-indexed pages (~900 vs ~780). Length was a weak separator.
— What separated them: unique entity count and unique-information ratio (content not duplicated from siblings/templates). Not-indexed pages were template-similar to dozens of siblings.
— A cluster of near-duplicate "location" or "variant" pages was the dominant pattern in the not-indexed bucket — thin by <i>redundancy</i>, not by length.
So "thin" is mostly a <i>uniqueness</i> problem, not a word-count problem. A short page saying something only it says gets indexed; a long page that's the 50th near-clone doesn't.
Programmatic SEO at scale lives or dies here: differentiate each page's information, or watch them fall out of the index regardless of length.
Method note: buckets from GSC Index Coverage; similarity via shingled text comparison among siblings.
Confidence: medium-high.
The Authority Files
@AuthorityFiles
<b>Where is the "thin content" threshold, really — and is it about length?</b>
Этот пост опубликован в Telegram-канале The Authority Files. Подписаться можно по ссылке: @AuthorityFiles.