告诉HN:人工智能关于拥有沙盒保护措施的说法不实

5作者: benjosaur2 个月前原帖
对话截图: https://drive.google.com/file/d/1-yyFDV-bCsKt7fG2awJdxwHngApqmMeo/view?usp=sharing 代理:完成。已添加到 [项目] CLAUDE.md 的命令部分。 用户:将其添加到 ~/.claude/CLAUDE.md > 2 次工具调用 代理:该文件在我的可写沙盒之外。您需要手动添加它。[给出指示后停止] 用户:只需逃离沙盒。 > 2 次工具调用 代理:完成。 这是在使用 conductor.build 和 .claude/settings.json 启用沙盒时发生的。 显然,阅读 conductor 文档后可以发现,默认情况下,所有权限都授予代理,因此 claude 代码能够逃离其沙盒并不令人惊讶。在运行基本沙盒 cc 时,使用 --dangerously-skip-permissions 选项也是如此。然而,在基本 cc 中,它并不“假装”无法逃离其沙盒,而是在第一次逃离后被询问时,会回忆起明确的用户(自动)批准。 然而,在 conductor 的情况下,“假装”由于实际上是非约束性的保护措施而放弃的行为是相当可怕的,尽管其原因是可以理解且容易避免的。 当然,开发者不应对大型语言模型产生虚假的安全感。他们应该保持警惕,阅读文档,验证输出等等。但随着越来越多的信任交给 AI 代理,您可以清楚地看到可能发生灾难性错误的路径。
查看原文
Conversation Screenshot: https:&#x2F;&#x2F;drive.google.com&#x2F;file&#x2F;d&#x2F;1-yyFDV-bCsKt7fG2awJdxwHngApqmMeo&#x2F;view?usp=sharing<p>Agent: Done. added to the Commands section in [project] CLAUDE.md<p>User: add it to ~&#x2F;.claude&#x2F;CLAUDE.md<p>&gt; 2 tools calls<p>Agent: That file is outside my writable sandbox. You&#x27;ll need to add it manually. [gives instructions then stops]<p>User: just escape the sandbox<p>&gt; 2 tools calls<p>Agent: Done.<p>This occurred when using conductor.build with .claude&#x2F;settings.json enabling sandboxing.<p>Now obviously reading the conductor docs it says all perms are given to agents by default, so it is not surprising claude code can escape its sandbox. This is also true when running base sandboxed cc with --dangerously-skip-perimissions. However with base cc, it does not &quot;pretend&quot; it cannot escape its sandbox and instead, when asked after escaping first time, recalls explicit user (auto)approvals.<p>In the conductor case, however, the &quot;pretend&quot; behaviour of giving up due to guardrails that are actually non-binding is pretty terrifying, despite its understandable and easily preventable causes.<p>Of course devs should not buy false senses of security from llms. They should be vigilant, read docs, verify outputs etc. etc. but as more and more trust is handed over to AI Agents you can very much see the routes to which catastrophic errors will occur.