请问HN: “让我笑”是下一个图灵测试吗?
在过去几年中,图灵测试似乎迅速失去了作为测试人工智能的有效实验的价值。借助适当的保护措施,最新的前沿大型语言模型(LLMs)完全有能力模拟出令人信服的人类对话伙伴。
因此,也许下一个最实用的测试,用于区分人类和人工智能,基本上可以归结为“它能让你笑吗?”
这里的想法是,LLMs主要是通过庞大的训练语料库来训练,以生成统计上可能的句子,虽然这使得它们在查询响应方面表现得非常出色,但在统计上不太可能的结果方面则表现得相当糟糕,也就是幽默。
幽默通常被定义为在心理上不协调,这必然使其在统计上不太可能——这正是LLMs擅长的完全相反的领域。
你怎么看?
查看原文
It seems that in the last few years the Turing Test has rapidly fallen off as a useful experiment for testing artificial intelligence. With the right guardrails, the latest frontier LLMs are more than capable of simulating perfectly believable human conversation partners.<p>Instead, perhaps the next best practical test for separating humans from AIs is essentially, "can it make you laugh?"<p>The idea here is that LLMs are largely trained to be able to produce statistically likely sentences based on a massive training corpus, and while this allows for incredibly impressive query responses, it is absolutely atrocious at statistically unlikely outcomes, that is, humor.<p>Humor is often defined by being mentally incongruous which would necessarily make it statistically unlikely - the very opposite of what LLMs are good at.<p>Thoughts?