HackerNews中文版

正在运行多个项目，整体API费用超过每月2000美元，涉及OpenAI、Anthropic和AWS Bedrock。我开始进行每月审计，发现自己超支了大约60%。到目前为止最大的收获包括：模型路由将成本降低了55%，且最终输出质量没有损失；提示压缩使我最常调用的端点节省了70%的费用；请求去重在重试中消除了15%的无效调用；缓存语义相似的查询又节省了20-30%的费用。但我觉得自己仍然遗漏了一些东西，尤其是在基础设施方面（如GPU实例大小、按需与竞价实例等）。那么，其他人使用了什么工具或方法？是否有人在系统地进行这些操作，还是大家只是随便查看仪表盘？请告诉我！

查看原文

Running several projects that collectively hit $2k+/mo in API costs across OpenAI, Anthropic,& AWS Bedrock. Started doing monthly audits then found I was overspending by about 60%. Biggest wins so far: Model routing cut costs 55% with no quality loss on final output Prompt compression saved 70% on my most called endpoint Request deduplication on retries eliminated 15% of wasted calls Caching semantically similar queries knocked out another 20-30% But I feel like I'm still missing things, especially on the infrastructure side (GPU instance sizing, spot vs. on-demand, etc). So what tools or approaches are others using? Is anyone doing this systematically or is everyone just eyeballing their dashboards? Let me know!

针对人工智能开发者和人工智能初创企业