请问HN:你们如何在谷歌搜索结果中检测合规风险?

1作者: paolocermelli大约 13 小时前原帖
我正在尝试一种基于规则的方法来对谷歌搜索结果页面(SERP)片段进行分类(中立/不利/权威-监管),以用于合规和尽职调查的案例。<p>我遇到的一个问题是来自高权威来源的误报:单个监管机构的PDF文件或一项旧的执法行动可能会压倒数十个中立结果,即使上下文已经发生了实质性的变化。<p>对于从事开放源信息(OSINT)、风险或搜索分析的人员:你们通常如何在大规模中验证误报与真实的不利信号?你们是否以不同的方式权衡权威性,或者应用时间或上下文衰减?
查看原文
I’m experimenting with a rules-based approach to classify Google SERP snippets (neutral &#x2F; adverse &#x2F; authority-regulatory) for compliance and due diligence use cases.<p>One issue I keep running into is false positives from high-authority sources: a single regulator PDF or an old enforcement action can outweigh dozens of neutral results, even when the context has materially changed.<p>For those working in OSINT, risk, or search analysis: how do you usually validate false positives vs. true adverse signals at scale? Do you weight authority differently, or apply temporal or contextual decay?