问HN:我是否通过用户代理日志被广告推销一个ARG?
我正在查看我未命名的反向代理和CDN服务的日志。爬虫机器人群体像是对我的PHP应用感到不满一样,不断发起请求,所以我在查看哪些奇怪的用户代理字符串被允许连接。其中有“Sogou”和“meta-webindexer”,还有少量来自“SleepBot/1.0”的请求。
什么是SleepBot?
其ASN是谷歌,用户代理字符串为:“Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; 兼容; SleepBot/1.0; +http://sleepbot.com/) Chrome/131.0.0.0 Safari/537.36”(已编辑为不可点击链接)。
于是我访问了这个网站。看起来这是一个有趣的科技和氛围音乐爱好者的主页,他仍在运行一个Shoutcast在线广播流,但在过去五年里几乎没有在线活动。Wayback Machine显示十多年来几乎没有变化。但是,简历链接指向一个不同URL和用户名的GitHub账户,该账户在今年三月报告了一个问题。事情似乎更复杂了。
到底发生了什么?是谷歌或相关公司的员工在运行个人爬虫,还是在浏览网页时使用了自定义的用户代理字符串?难道有人打错了字?还是这是一种奇怪的安全游戏/替代现实游戏,而我正是那个上钩的人?
查看原文
I'm here looking through logs on my unnamed reverse proxy and CDN service. The crawler bot swarm has been hitting my PHP application like I've upset them so I'm seeing which weird user agent strings are being allowed to connect. There's "Sogou" and "meta-webindexer" and a small number of requests from "SleepBot/1.0"<p>What's SleepBot?<p>The ASN is Google and the UA string is: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; SleepBot/1.0; +http //sleepbot com/) Chrome/131.0.0.0 Safari/537.36" [edited to make link non-clickable]<p>So I visit the site. And it looks like the homepage of an interesting tech and ambient music guy who is still running a Shoutcast online radio stream but otherwise hasn't been seen online in 5 years. The Wayback Machine shows few changes in over a decade. But the resume link brings up a GitHub account with a different URL and username which reported 1 issue in March of this year. It goes deeper.<p>What's going on? Is a Google or adjacent employee running a personal scraper or just custom UA string while browsing the web? Did someone make a typo? Or is it some kind of weird security game / ARG and I'm the sap who's taken the bait?