针对人工智能开发者和人工智能初创企业
正在运行多个项目,整体API费用超过每月2000美元,涉及OpenAI、Anthropic和AWS Bedrock。我开始进行每月审计,发现自己超支了大约60%。到目前为止最大的收获包括:模型路由将成本降低了55%,且最终输出质量没有损失;提示压缩使我最常调用的端点节省了70%的费用;请求去重在重试中消除了15%的无效调用;缓存语义相似的查询又节省了20-30%的费用。
但我觉得自己仍然遗漏了一些东西,尤其是在基础设施方面(如GPU实例大小、按需与竞价实例等)。那么,其他人使用了什么工具或方法?是否有人在系统地进行这些操作,还是大家只是随便查看仪表盘?请告诉我!
查看原文
Running several projects that collectively hit $2k+/mo in API costs across OpenAI, Anthropic,& AWS Bedrock. Started doing monthly audits then found I was overspending by about 60%. Biggest wins so far: Model routing cut costs 55% with no quality loss on final output Prompt compression saved 70% on my most called endpoint Request deduplication on retries eliminated 15% of wasted calls Caching semantically similar queries knocked out another 20-30%
But I feel like I'm still missing things, especially on the infrastructure side (GPU instance sizing, spot vs. on-demand, etc). So what tools or approaches are others using? Is anyone doing this systematically or is everyone just eyeballing their dashboards? Let me know!