HackerNews中文版

个人而言，我测试过ChatGPT、Claude、Deepseek和Gemini。除了Gemini，其他的语言模型在某种程度上都显得过于迎合，以至于除了基本问题和编程（Claude）之外几乎无法使用。 Gemini在某种程度上也有迎合的感觉，但根据我的测试，可以说它在保持客观的同时也表现得比较外交。至少在我进行的小规模测试中（Gemini Pro 2.5），它的表现要比其他三个好得多。你有什么经验？我对这种行为有点厌倦。我没有足够的时间和金钱去测试Grok和其他模型。至少，当我坚持说2 + 2 = 5时，没有哪个语言模型会妥协。但如果给它们一些真正模棱两可的内容，它们就会屈服于那些最愚蠢/明显/透明的挑战。

查看原文

Personally, I've tested ChatGPT, Claude, Deepseek and Gemini. Other than Gemini, the other LLMs are way too much of a sycophant to the point that they're unusuable other than basic questions and coding (Claude).<p>Gemini feels a bit like a sycophant, but based on my testing, it can be argued that it's being diplomatic while staying objective. At least, in the small tests I've (Gemini Pro 2.5). And that's a lot better than the other 3.<p>What are your experiences? I'm getting a bit sick of this behavior. I haven't had the money and time to test Grok and others.<p>At least, no LLM would budge when I insisted on saying that 2 + 2 = 5. But give them actually ambiguous stuff and they will bend the knee to even the most silly/obvious/transparent challenges.

请问HN：哪个大型语言模型是称职的、相对客观的，并且不是阿谀奉承者？