问HN:我们准备好让漏洞以单词而不是代码的形式存在了吗?
直到现在,安全一直是数学问题。缓冲区溢出、SQL 注入、加密缺陷——这些都是确定性的、可测试的、形式上可验证的。
但现在我们正在给代理提供终端访问权限和 API 密钥。攻击向量正变成自然语言。一个代理被提示“社会工程”了;另一个则幻觉出虚假数据并将其传递下去。
试图保护这些系统就像试图写一个正则表达式来捕捉每一个可能的谎言。我们的安全基础已经从数字转向了文字,而我认为我们还没有弄清楚这意味着什么。
有没有人在考虑实际的架构解决方案?不仅仅是“用另一个大型语言模型来保护大型语言模型”——这感觉像是循环逻辑。需要一些根本不同的东西。
查看原文
Until now, security has been math. Buffer overflows, SQL injections, crypto flaws — deterministic, testable, formally verifiable.<p>But we're giving agents terminal access and API keys now. The attack vector is becoming natural language. An agent gets "socially engineered" by a prompt; another hallucinates fake data and passes it down the chain.<p>Trying to secure these systems feels like trying to write a regex that catches every possible lie. We've shifted the foundation of security from numbers to words, and I don't think we've figured out what that means yet.<p>Is anyone thinking about actual architectural solutions to this? Not just "use another LLM to guard the LLM" — that feels like circular logic. Something fundamentally different.<p>(Not a native English speaker, used AI to clean up the grammar.)