<b>Anchors as a feature in a classifier, not a rule in a checklist</b>
We keep discussing anchors as if a threshold function fires: cross X percent, get penalized. Modern spam detection almost certainly does not work that way, and the framing distorts every conclusion that follows from it.
Google has described link-spam detection systems built on machine learning — most visibly SpamBrain. In that paradigm anchor ratio is one feature among hundreds, contributing weight to a probability, never a standalone trigger. That changes the shape of the risk surface entirely.
— A rule-based model has a cliff: safe on one side, penalized on the other.
— A learned model has a gradient: each suspicious feature nudges a score, and outcome depends on the joint configuration.
This explains an otherwise puzzling pattern — why two sites with identical exact-match ratios get different outcomes. Their other features differ, so the same anchor input lands at different points on the decision surface.
Limitation: SpamBrain's actual features and weights are undisclosed; that it is ML-based is confirmed, but 'anchor ratio is a feature' is an inference from how such systems are typically built.
Open question: if anchors are one weak feature in a high-dimensional classifier, is anchor-ratio optimization simply over-fitting to one column of a matrix the model reads holistically?
Anchor Theory
@AnchorTheory
<b>Anchors as a feature in a classifier, not a rule in a checklist</b>
Этот пост опубликован в Telegram-канале Anchor Theory. Подписаться можно по ссылке: @AnchorTheory.