请问HN:“爪子”和人机协作安全
大家好,
我创建了 Sandclaw(https://github.com/qix/sandclaw),旨在测试在代理执行任何写入操作时引入人工干预的想法。核心代理可以读取任何信息,但除了与大型语言模型(LLM)提供者的连接外,这些信息都被安全地隔离开来。
随着时间的推移,我可以逐步放宽防火墙,允许更安全的操作(例如,将任务添加到我的待办事项列表始终是安全的),并可能构建更智能的检查机制以允许更多的自主性。但对于发送/删除电子邮件、浏览不可信网站或运行命令行指令,我的默认策略是“拒绝”。
有没有哪个项目实际上在做这个?我一直在搜索,但似乎所有选项都专注于让代理决定何时请求,这让我觉得……有些问题。
查看原文
Hi all-<p>I built Sandclaw (https://github.com/qix/sandclaw) to test an idea of having a human in the loop on any write path that the agent might do. The core agent has access to read anything, but other than to the LLM provider that information is securely firewalled off.<p>Over time I can loosen the firewall, and allow safer actions (i.e. adding a task to my todo list is always safe), and perhaps build smarter checks to allow more autonomy. But the `DEFAULT DENY` to sending/deleting emails, browsing untrusted websites, or running shell commands.<p>Is there any project that actually does this? I've been searching but it looks like all the options focus on letting the agent handle when to ask, which feels... problematic.