HackerNews中文版

我在使用大型语言模型（LLM）/代理系统时遇到了一个问题，意外的循环或重复调用会迅速增加成本。我见过的大多数工具都专注于可观察性（日志、追踪、仪表板），但在运行时并没有实际的执行措施。我很好奇大家在生产环境中是如何处理这个问题的：- 你们是强制设定硬性限制（预算、速率等），还是仅仅进行监控？- 你们是在应用层面处理这个问题，还是通过某些中间件/代理来解决？- 你们是否为此开发了内部工具？感觉这是一个未解决的问题，尤其是在代理方面。希望听听其他人是如何应对的。

查看原文

I’ve been running into an issue with LLM/agent systems where unexpected loops or repeated calls can quickly drive up costs.Most tools I’ve seen focus on observability (logs, traces, dashboards), but not actual enforcement at runtime.Curious how people here are handling this in production:- Are you enforcing hard limits (budget, rate, etc.) or just monitoring?- Do you handle this at the app level or via some middleware/proxy?- Have you built something in-house for this?Feels like an unsolved problem, especially with agents.Would love to hear how others are dealing with it.

问HN：你们是如何控制成本并对大型语言模型调用设置限制的？