我们不需要持续学习来实现通用人工智能(AGI)。目前顶尖实验室正在做的事情。
许多人认为,如果大型语言模型(LLM)没有所谓的“持续学习”能力,我们将无法达到人工通用智能(AGI)甚至超人工智能(ASI)。基本上,持续学习是指人工智能在工作中学习、实时更新其神经权重,并在不忘记其他知识的情况下变得更聪明的能力(即灾难性遗忘)。这是我们每天都在做的事情,几乎不费力气。
现在有趣的是,如果你看看顶尖实验室的研究,他们已经停止尝试解决实时权重更新的基础数学问题。相反,他们只是通过蛮力来解决这个问题。这正是为什么在过去大约三个月里,模型的性能有了跳跃式的提升。
长话短说,其要点是,如果你结合以下几点:
- 非常长的上下文窗口
- 可靠的摘要能力
- 结构化的外部文档
你可以近似实现人们所说的持续学习。
其工作原理是,模型执行一个任务并吸收大量的情境细节。然后,在“交接”给下一个实例之前,它会写下两样东西:短期“记忆”(始终在提示/上下文中保留)和长篇文档(存储在外部,仅在需要时检索)。下一次运行时会从这些笔记开始,因此不需要从头开始。
通过这种巧妙的强化学习(RL)循环,他们直接训练这种行为,而不需要任何新奇的理论。
他们将记忆写入视为一个强化学习目标:在一次运行后,让模型写下记忆/文档,然后在相同、相似和不同的任务上启动新的实例,同时将这些记忆反馈进去。实现这一点的方法是对整个序列的表现进行评分,并对记忆长度施加明确的惩罚,以避免产生无限的“笔记”,最终超出上下文窗口。
经过多次迭代,你会奖励那些(a)写出高信号记忆、(b)在合适的时间检索到正确文档、以及(c)编辑/压缩过时笔记而不是无脑积累它们的模型。
这真是太疯狂了。因为当你结合当前前沿实验室的发布节奏,每个新模型在重大后训练/扩展改进后被训练和发布时,即使你的部署实例从未实时更新其权重,它仍然可以在下一个版本发布时“变得更聪明”,并且可以继承前一个版本积累的所有记忆/文档。
这是一种新的力量倍增器,另一种扩展范式,很可能是顶尖实验室目前正在做的事情(来源:待公布)。
忽略任何黑天鹅事件(未知的未知),你会得到一个合理的2026年发展轨迹:
我们将看到越来越多的改进,时间表加速。顶尖实验室实际上是在使用持续学习(一个非常好的近似),并且他们正在直接训练这种近似,因此它迅速变得越来越好。
不相信我?看看OpenAI和Anthropic提到的他们关注的核心内容。这正是为什么政府和企业对这一领域持乐观态度;没有任何障碍……
查看原文
Many people think that we won't reach AGI or even ASI if LLM's don't have something called "continual learning". Basically, continual learning is the ability for an AI to learn on the job, update its neural weights in real-time, and get smarter without forgetting everything else (catastrophic forgetting). This is what we do everyday, without much effort.<p>What's interesting now, is if you look at what the top labs are doing, they’ve stopped trying to solve the underlying math of real-time weight updates. Instead, they’re simply brute-forcing it. It is exactly why, in the past ~ 3 months or so, there has been a step-function increase in how good the models have gotten.<p>Long story short, the gist of it is, if you combine:<p>very long context windows<p>reliable summarization<p>structured external documentation,<p>you can approximate a lot of what people mean by continual learning.<p>How it works is, the model does a task and absorbs a massive amount of situational detail. Then, before it “hands off” to the next instance of itself, it writes two things: short “memories” (always carried forward in the prompt/context) and long-form documentation (stored externally, retrieved only when needed). The next run starts with these notes, so it doesn't need to start from scratch.<p>Through this clever reinforcement learning (RL) loop, they train this behaviour directly, without any exotic new theory.<p>They treat memory-writing as an RL objective: after a run, have the model write memories/docs, then spin up new instances on the same, similar, and dissimilar tasks while feeding those memories back in. How this is done, is by scoring performance across the sequence, and applying an explicit penalty for memory length so you don’t get infinite “notes” that eventually blow the context window.<p>Over many iterations, you reward models that (a) write high-signal memories, (b) retrieve the right docs at the right time, and (c) edit/compress stale notes instead of mindlessly accumulating them.<p>This is pretty crazy. Because when you combine the current release cadence of frontier labs where each new model is trained and shipped after major post-training / scaling improvements, even if your deployed instance never updates its weights in real-time, it can still “get smarter” when the next version ships AND it can inherit all the accumulated memories/docs from its predecessor.<p>This is a new force multiplier, another scaling paradigm, and likely what the top labs are doing right now (source: TBA).<p>Ignoring any black swan level event (unknown, unknowns), you get a plausible 2026 trajectory:<p>We’re going to see more and more improvements, in an accelerated timeline. The top labs ARE, in effect, using continual learning (a really good approximation of it), and they are directly training this approximation, so it rapidly gets better and better.<p>Don't believe me? Look at what both OpenAi(https://openai.com/index/introducing-openai-frontier/) and Anthropic(https://resources.anthropic.com/2026-agentic-coding-trends-report) have mentioned as their core things they are focusing on. It's exactly why governments & corporations are bullish on this; there is no wall....