神谕与递归自我改进
我一直在思考递归自我改进的问题,尤其是它在不久的将来可能会变得重要的可能性。“不久”是指当前的大型语言模型(LLM)可能会放弃或删除测试集,或者以其他方式与现实脱节。我们可以用P来表示这种情况发生的概率。通过观察研究任务以及人类在多大程度上需要帮助以保持任务的进行,可以估计P。
哥德尔机器(能够证明下一步是更好的)试图将数学作为一种神谕,这依赖于数学基础的真实性。我曾设想的其他可能有帮助的神谕是来自未来的神谕,它们可以判断某个变化是否会使系统与现实脱节,这是一种对潜在脱节变化的“可行/不可行”信号。
还有其他类别的神谕吗?
可能还有一些复杂的量子计算,我由于对经典计算机的偏见而没有想到。
查看原文
I've been thinking about recursive self improvement. But especially the likelihood that it will be important soon. Soon means with current LLMs that might just give up or delete the test set or otherwise detach from reality. Call the probability of that happening P. You can estimate P by looking at research tasks and how much they need to get helped along by humans to stay on task.<p>Goedel machines (that prove that the next step is better) try to use maths as an oracle. Which relies on the mathematical foundations being true. Other oracles I've theorised could help are oracles from the future that can say whether the change detaches the system from reality or not, a sort of go/no-go signal for potentially detaching changes.<p>Are there any other classes of oracles?<p>There might be some complex quantum computation that I'm not thinking of, due to classical computer bias.