问HN:过去的LLM模型变得更笨了吗?

3作者: hmate93 个月前原帖
我很好奇其他人是否也观察到了这一点,或者这只是我个人的感知或确认偏误。我在X上看到讨论,提到较旧的模型(例如Claude 4.5)似乎随着时间的推移而退化——这可能是由于在新模型发布后,增加了量化、限流或其他推理成本优化。对此是否有任何确凿的证据,或者支持或反驳这一观点的技术分析?还是说我们主要看到的是没有控制基准的主观评估?
查看原文
I’m curious whether others have observed this or if it’s just perception or confirmation bias on my part. I’ve seen discussion on X suggesting that older models (e.g., Claude 4.5) appear to degrade over time — possibly due to increased quantization, throttling, or other inference-cost optimizations after newer models are released. Is there any concrete evidence of this happening, or technical analysis that supports or disproves it? Or are we mostly seeing subjective evaluation without controlled benchmarks?