HackerNews中文版

我们正在构建能够执行实际操作的人工智能代理——例如退款、数据库写入和API调用。像“绝对不要做X”这样的提示指令并不可靠。当上下文较长或用户强烈要求时，大型语言模型（LLMs）会忽略这些指令。我们很好奇其他人是如何处理这个问题的： - 在每个操作之前进行硬编码检查？ - 使用某种中间件层？ - 只是寄希望于最好的结果？我们为此构建了一个控制层——针对结构化数据、非结构化输出和保护措施采用不同的方法（https://limits.dev）。我们真心希望了解其他人是如何处理这个问题的。

查看原文

We're building AI agents that take real actions — refunds, database writes, API calls.Prompt instructions like "never do X" don't hold up. LLMs ignore them when context is long or users push hard.Curious how others are handling this: - Hard-coded checks before every action? - Some middleware layer? - Just hoping for the best?We built a control layer for this — different methods for structured data, unstructured outputs, and guardrails (https://limits.dev). Genuinely want to learn how others approach it.

请问HN：你们是如何控制采取实际行动的人工智能代理的？