Logfile Roundup
Logfile Roundup
@LogfileRoundup

<b>Parsing User-Agent strings reliably: sources that save pain</b>

<b>Parsing User-Agent strings reliably: sources that save pain</b>
UA strings are a mess of legacy tokens. Don't hand-roll the regex.

→ <b>ua-parser / uap-core (GitHub)</b> — the community regex database behind half the world's UA parsing. Use it instead of inventing patterns.
→ <b>matomo-org/device-detector</b> — excellent bot list; tags Googlebot, Bingbot, GPTBot, and hundreds of scrapers by name.
★ <b>Pick of the week — Google's full crawler UA reference</b> — the authoritative table of every Googlebot variant (Smartphone, Image, Video, AdsBot, GoogleOther). Most people miss that GoogleOther isn't search crawl.
→ <b>Dark Visitors</b> — fresh catalog of AI crawler UAs (GPTBot, ClaudeBot, PerplexityBot) for your robots and log labeling.

Takeaway: classify with uap-core or device-detector, then check Google's table to separate search crawl from AdsBot/GoogleOther.
Этот пост опубликован в Telegram-канале Logfile Roundup. Подписаться можно по ссылке: @LogfileRoundup.
tech

Свежие посты в категории «Tech Infrastructure»

Все каналы категории →

start

Готовы запустить рекламу через сеть public.tg?

Новый оффер, продукт, GEO, кейс, событие или партнёрский запуск — соберём маршрут под задачу и отдадим медиаплан.

Telegram для медиаплана: @dumay. Быстрый тест: $20 за канал, $1000 за пакет по сети.