问HN:你们是如何控制成本并对大型语言模型调用设置限制的?

2作者: 8dazo6 天前原帖
我在使用大型语言模型(LLM)/代理系统时遇到了一个问题,意外的循环或重复调用会迅速增加成本。<p>我见过的大多数工具都专注于可观察性(日志、追踪、仪表板),但在运行时并没有实际的执行措施。<p>我很好奇大家在生产环境中是如何处理这个问题的:<p>- 你们是强制设定硬性限制(预算、速率等),还是仅仅进行监控?<p>- 你们是在应用层面处理这个问题,还是通过某些中间件/代理来解决?<p>- 你们是否为此开发了内部工具?<p>感觉这是一个未解决的问题,尤其是在代理方面。<p>希望听听其他人是如何应对的。
查看原文
I’ve been running into an issue with LLM&#x2F;agent systems where unexpected loops or repeated calls can quickly drive up costs.<p>Most tools I’ve seen focus on observability (logs, traces, dashboards), but not actual enforcement at runtime.<p>Curious how people here are handling this in production:<p>- Are you enforcing hard limits (budget, rate, etc.) or just monitoring?<p>- Do you handle this at the app level or via some middleware&#x2F;proxy?<p>- Have you built something in-house for this?<p>Feels like an unsolved problem, especially with agents.<p>Would love to hear how others are dealing with it.