告诉HN:谷歌将现有微调模型的延迟增加了5倍。
五天前,我们的精细调优2.5 Flash模型的延迟突然增加了5倍。对于不太熟悉的人来说,这种精细调优的模型通常用于在特定任务上接近大型模型的性能,同时延迟和成本大大降低。这意味着它们通常用于实时生产场景,这些场景使用频繁,需要快速响应用户。否则,精细调优通常是不值得的。许多人在为一个这样的任务精细调优模型上花费几千美元(至少)。
五天前,谷歌向世界发布了Nano Banana Pro(Gemini 3.0图像预览)。自那时起,我们现有的精细调优模型的延迟突然增加了五倍。我们与其他也使用精细调优2.5 Flash模型的初创公司进行了交谈,他们也遇到了同样的情况,甚至在不同地区也是如此。显然,这对我们所有的产品都有很大影响。
从谷歌方面来看,除了沉默一无所获,甚至在付费支持方面也是如此。对初始支持请求的回复只是要求提供已经在该请求中提供的基本信息,或者是显而易见的内容。从那时起,已经超过48小时没有任何进展。
当然,这个时机可能纯属巧合——尽管我们以前从未见过这种延迟不稳定的情况——但我们都能看出最可能的原因;Nano Banana Pro和Gemini 3预览消耗了大量计算资源,而它们显然是以牺牲精细调优模型的输出为代价来实现这一点。在经历了这一切之后,想要将他们视为商业用途的合作伙伴几乎是不可能的,谁知道他们下次会做什么。尽管存在许多缺陷,OpenAI在稳定性方面一直是一个堡垒,尽管它是所有前沿模型提供商中最专注于B2C的。谷歌的Vertex声称专注于企业,却为了让消费者更快地获取Ghibli图像而破坏了其商业客户的产品。他们肯定收到了很多关于此问题的支持请求,考虑到谷歌的工程能力,他们一定有自动监控系统能够立即发现如此巨大的延迟增加。短暂的故障是可以理解的,发生在各处,最近AWS和Cloudflare也有类似情况,但5天以上的5倍延迟——即使他们修复了——实际上就是服务的5天以上的停机。
我发布这个主要是为了警告其他初创公司,今后不要依赖谷歌Vertex来满足用户面向的模型需求。
查看原文
Since 5 days ago, the latency of our Finetuned 2.5 Flash models has suddenly jumped by 5x. For those less familiar, such finetuned models are often used to get close to the performance of a big model at one specific task with much less latency and cost. This means they're usually used for realtime, production use cases that see a lot of use and where you want to respond to the user quickly. Otherwise, finetuning generally isn't worth it. Many spend a few thousand dollars (at a minimum) on finetuning a model for one such task.<p>Five days ago, Google released Nano Banana Pro (Gemini 3.0 Image Preview) to the world. And since five days ago, the latency of our existing finetuned models has suddenly quintupled. We've talked with other startups who also make use of finetuned 2.5 Flash models, and they're seeing the exact same, even those in different regions. Obviously this has a big impact on all of our products.<p>From Google's side, nothing but silence, and this is talking about paid support. The reply to the initial support ticket is a request for basic information that has already been provided in that ticket or is trivially obvious. Since then, it's been more than 48 hours of nothingness.<p>Of course the timing could be a pure coincidence - though we've never seen any such latency instability before - but we can all see what's most likely here; Nano Banana Pro and Gemini 3 Preview consuming a huge amount of compute, and they're simply sacrificing finetuned model output for those. It's impossible to take them seriously for business use after this, who knows what they'll do next time. For all their faults, OpenAI have been a bastion of stability, despite being the most B2C-focused of all the frontier model providers. Google with Vertex claims to be all about enterprise and then breaks product of their business customers to get consumers their Ghibli images 1% faster. They've surely gotten plenty of tickets about this, and given Google's engineering, they must have automated monitoring that catches such a huge latency increase immediately. Temporary outages are understandable and happen everywhere, see AWS and Cloudflare recently, but 5+ days - if they even fix it - of 5x latency is effectively a 5+ day outage of a service.<p>I'm posting this mostly as a warning to other startups here to not rely on Google Vertex for user-facing model needs going forward.