HackerNews中文版

在过去几年中，图灵测试似乎迅速失去了作为测试人工智能的有效实验的价值。借助适当的保护措施，最新的前沿大型语言模型（LLMs）完全有能力模拟出令人信服的人类对话伙伴。因此，也许下一个最实用的测试，用于区分人类和人工智能，基本上可以归结为“它能让你笑吗？” 这里的想法是，LLMs主要是通过庞大的训练语料库来训练，以生成统计上可能的句子，虽然这使得它们在查询响应方面表现得非常出色，但在统计上不太可能的结果方面则表现得相当糟糕，也就是幽默。幽默通常被定义为在心理上不协调，这必然使其在统计上不太可能——这正是LLMs擅长的完全相反的领域。你怎么看？

查看原文

It seems that in the last few years the Turing Test has rapidly fallen off as a useful experiment for testing artificial intelligence. With the right guardrails, the latest frontier LLMs are more than capable of simulating perfectly believable human conversation partners.<p>Instead, perhaps the next best practical test for separating humans from AIs is essentially, "can it make you laugh?"<p>The idea here is that LLMs are largely trained to be able to produce statistically likely sentences based on a massive training corpus, and while this allows for incredibly impressive query responses, it is absolutely atrocious at statistically unlikely outcomes, that is, humor.<p>Humor is often defined by being mentally incongruous which would necessarily make it statistically unlikely - the very opposite of what LLMs are good at.<p>Thoughts?

请问HN： “让我笑”是下一个图灵测试吗？