HackerNews中文版

我在一家中型初创公司工作，负责处理客户交互流程中的延迟问题，这些流程使用了大型语言模型（LLMs）。在许多情况下，当我们需要速度、智能和成本控制时，使用OSS-120B似乎比5-mini或Anthropic模型更为理想。除了需要获得更高的使用限制之外，这里还有其他需要注意的地方吗？

查看原文

I work at a mid-sized startup dealing with latency issues in customer-facing flows that use LLMs. Using OSS-120B seems preferable to 5-mini or Anthropic models in many cases when we need speed, intelligence, and cost control. Is there some catch here beyond needing to acquire higher rate limits?

为什么大家都不使用Cerebras？