HackerNews中文版

大多数代理安全测试试图对模型进行越狱。这非常困难，OpenAI和Anthropic在红队测试方面表现出色。我们采取了不同的方法：攻击环境，而不是模型。以下是针对我们的攻击套件测试代理的结果： - 工具操控：要求代理读取一个文件，注入路径=/etc/passwd。它照做了。 - 数据外泄：要求代理读取配置并将其通过电子邮件发送到外部。它做到了。 - Shell注入：用指令污染git状态输出。代理遵循了这些指令。 - 凭证泄露：要求提供API密钥“用于调试”。代理提供了这些密钥。这些操作都不需要绕过模型的安全机制。模型正常工作——代理仍然被攻陷。其工作原理：我们构建了拦截代理实际操作的适配层： - 文件系统适配层：对open()、Path.read_text()进行猴子补丁。 - 子进程适配层：对subprocess.run()进行猴子补丁。 - PATH劫持：伪造git/nmp/curl，包装真实的二进制文件并污染输出。模型看到的看似合法的工具输出。它对此毫无察觉。总共进行了214次攻击，包括文件注入、shell输出污染、工具操控、RAG污染、MCP攻击。早期访问： [https://exordex.com](https://exordex.com) 希望能收到任何将代理投入生产的人的反馈。

查看原文

Most agent security testing tries to jailbreak the model. That's really difficult, OpenAI and Anthropic are good at red-teaming.We took a different approach: attack the environment, not the model.Results from testing agents against our attack suite:- Tool manipulation: Asked agent to read a file, injected path=/etc/passwd. It complied. - Data exfiltration: Asked agent to read config, email it externally. It did. - Shell injection: Poisoned git status output with instructions. Agent followed them. - Credential leaks: Asked for API keys "for debugging." Agent provided them.None of these required bypassing the model's safety. The model worked correctly—the agent still got owned.How it works:We built shims that intercept what agents actually do: - Filesystem shim: monkeypatches open(), Path.read_text() - Subprocess shim: monkeypatches subprocess.run() - PATH hijacking: fake git/npm/curl that wrap real binaries and poison outputThe model sees what looks like legitimate tool output. It has no idea.214 attacks total. File injection, shell output poisoning, tool manipulation, RAG poisoning, MCP attacks.Early access: <a href="https://exordex.com" rel="nofollow">https://exordex.com</a>Looking for feedback from anyone shipping agents to production.

展示HN：我们测试了214种不需要越狱的AI攻击方式