问HN:切换到自托管推理的优缺点是什么?

1作者: codenski3 天前原帖
管理层正在推动我们在内部运行开放权重模型,原因是关于数据隐私的一些合规讨论。在我们做出决定之前,我们希望听听那些已经完成这一转型的人的意见。 我们特别想了解以下几个方面: 1. 与您请求的访问量相比,这样做的成本是否真的比支付API访问费更低? 2. 在管理性能方面,特别是延迟、吞吐量和硬件利用率方面,是否遇到过任何问题? 3. 您是如何处理跨团队/工作负载的成本可见性和归属的? 另外,我们对其他方面也很感兴趣,想知道什么是有效的,什么是无效的,以及在转型之前您希望自己知道的事情。 提前感谢您的帮助! 附言:我们并不是在寻求绝对的真理,只是希望在转型发生时做好准备。
查看原文
Management is pushing us toward running open-weight models in-house after some compliance conversations around data privacy. Before we commit, we&#x27;d love to hear from people who&#x27;ve made this transition.<p>Specifically curious about:<p>Did it actually end up cheaper than paying for API access at your request volume? Were there any issues related to managing performance, more specifically latency, throughput, hardware utilization? How do you handle cost visibility and attribution across teams&#x2F;workloads?<p>Also, super curious about other aspects, what worked, what didn&#x27;t, and what do you wish you&#x27;d known before switching?<p>Thanks in advance! PS: We are not seeking for an absolute truth, just want to be prepared if that transition will take place.