问HN:我是否通过用户代理日志被广告推销一个ARG?

2作者: SpecialistK4 天前原帖
我正在查看我未命名的反向代理和CDN服务的日志。爬虫机器人群体像是对我的PHP应用感到不满一样,不断发起请求,所以我在查看哪些奇怪的用户代理字符串被允许连接。其中有“Sogou”和“meta-webindexer”,还有少量来自“SleepBot/1.0”的请求。 什么是SleepBot? 其ASN是谷歌,用户代理字符串为:“Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; 兼容; SleepBot/1.0; +http://sleepbot.com/) Chrome/131.0.0.0 Safari/537.36”(已编辑为不可点击链接)。 于是我访问了这个网站。看起来这是一个有趣的科技和氛围音乐爱好者的主页,他仍在运行一个Shoutcast在线广播流,但在过去五年里几乎没有在线活动。Wayback Machine显示十多年来几乎没有变化。但是,简历链接指向一个不同URL和用户名的GitHub账户,该账户在今年三月报告了一个问题。事情似乎更复杂了。 到底发生了什么?是谷歌或相关公司的员工在运行个人爬虫,还是在浏览网页时使用了自定义的用户代理字符串?难道有人打错了字?还是这是一种奇怪的安全游戏/替代现实游戏,而我正是那个上钩的人?
查看原文
I&#x27;m here looking through logs on my unnamed reverse proxy and CDN service. The crawler bot swarm has been hitting my PHP application like I&#x27;ve upset them so I&#x27;m seeing which weird user agent strings are being allowed to connect. There&#x27;s &quot;Sogou&quot; and &quot;meta-webindexer&quot; and a small number of requests from &quot;SleepBot&#x2F;1.0&quot;<p>What&#x27;s SleepBot?<p>The ASN is Google and the UA string is: &quot;Mozilla&#x2F;5.0 AppleWebKit&#x2F;537.36 (KHTML, like Gecko; compatible; SleepBot&#x2F;1.0; +http &#x2F;&#x2F;sleepbot com&#x2F;) Chrome&#x2F;131.0.0.0 Safari&#x2F;537.36&quot; [edited to make link non-clickable]<p>So I visit the site. And it looks like the homepage of an interesting tech and ambient music guy who is still running a Shoutcast online radio stream but otherwise hasn&#x27;t been seen online in 5 years. The Wayback Machine shows few changes in over a decade. But the resume link brings up a GitHub account with a different URL and username which reported 1 issue in March of this year. It goes deeper.<p>What&#x27;s going on? Is a Google or adjacent employee running a personal scraper or just custom UA string while browsing the web? Did someone make a typo? Or is it some kind of weird security game &#x2F; ARG and I&#x27;m the sap who&#x27;s taken the bait?