HackerNews中文版

简而言之：我们现在才开始扩展长期的外部协调能力，以前的工作主要是内部问题解决训练，偶尔使用一些工具。我们实际上还不知道扩展协调训练会产生什么结果。它可能会培养出更优秀的工具使用助手，但这些助手仍然在根本上对人类指令保持反应性。或者，它可能会产生一些具有更高自主性的东西。我的直觉告诉我是后者。我第一次预见到在不久的将来（最早在2027-2028年）可能会出现不对齐的迅速发展。一年前，我的一位学习社会科学的朋友询问我对2027年人工智能及其不对齐接管的看法。我笑着说，这几乎是不可能的，因为技术的实际运作方式。我告诉他，LLM（大型语言模型）的工作方式过于逐步。它有一个提示，模型预测下一个标记，然后“死亡”。提示之间没有连续性——它可以在数据库中存储一些文本，但没有持久的推理。这显得显然是安全的。然而，随着过去几个月代理性发展的出现，我开始怀疑之前的理解。第一代LLM，直到GPT-4，基本上是复杂的文本自动补全器。它们是在网络爬虫抓取的互联网数据上训练的，经过RLHF（强化学习与人类反馈）微调，以赋予它们聊天机器人的风味。它们感觉无害，完全符合我给朋友的描述。它们的能力完全受限于上下文窗口和提示-回答时间窗口。输入提示，输出完成，结束。第二代模型增加了推理能力。这些模型不再仅仅是纯粹的自动补全器——它们可以在存储的知识中搜索，串联思维，解决问题。训练数据也发生了变化：成功的推理轨迹被重新纳入训练。但关键是，它们仍然受到相同的限制。它们有更多的时间思考和处理，但在回答结束时，它们仍然大部分“消失”。这种能力仍然是模型内部的。现在进入这一代代理性LLM，它们真正随着像Claude Code这样的工具变得越来越强大而起飞。这些模型不再像自动补全器。它们甚至不再像推理者。它们开始感觉像协调者。它们不再仅限于内部——它们作为一个连接的系统，协调工具和外部资源以实现目标。让我最感到恐惧的是我们现在生成和收集的新型训练数据：成功的长期协调轨迹。它们将使我们能够扩展协调类型的智能。这种智能不再局限于内部。它转变为一种外部共生类型的智能。我们正在训练它们几乎将所有内容外化，并优化它们在较长时间内协调所有这些外部。这感觉像是在为一个共生系统进行优化，这与今天简单的内部优化LLM截然不同。我们真的感觉到LLM所处理的方程正在发生变化，LLM成为外部协调引擎，这些外部共同构成了整个系统。我们知道推理自动补全是如何扩展的，但我们不知道协调引擎是如何扩展的。我觉得可能会出现不同的新兴能力。我们基本上第一次在扩展LLM的前额叶皮层。这是我第一次真正预见到不对齐的迅速发展之路。更不用说在恶意行为者手中，人工智能可能造成的其他伤害。这让我质疑实验室是否应该继续走这条路。将LLM的问题解决大部分保持在其自身参数内部，难道不是更安全的吗？在所有人工智能公司中，Anthropic不应该在像Claude Code这样的系统上表现得更低调吗？他们在这个即将扩展的新范式中加速得最为明显。

查看原文

Tldr: We are only now gonna start to scale long term external orchestration, everything beforehand was mostly internal problem solving training with here and there a tool call. We don't actually know yet what scaling orchestration training produces. It might produce much better tool-using assistants that remain fundamentally reactive to human instructions. Or it might produce something with more emergent autonomy. My gut feeling tells me the second. For the first time I foresee in the near future (as soon as 2027-2028) a potential for a misaligned takeoff.A year ago, a friend of mine who studied social science asked my opinion about AI 2027 and the prospect of a misaligned AI takeover. I laughed and said it was quite impossible given how the technology actually worked. An LLM works too stepwise, I told him. There's a prompt, the model predicts the next tokens, and then it "dies." There's no continuity between prompts — it can store some text in a database, but there's no persistent reasoning. It felt obviously safe.With the recent agentic developments of the past few months, I'm starting to doubt that earlier understanding.The first generation of LLMs, up through GPT-4, were essentially sophisticated text autocompleters. They were trained on internet data from web crawls, fine-tuned with RLHF to give them a chatbot flavor. They felt harmless, and they fit the description I gave my friend perfectly. Their capabilities were entirely bounded by the context window and the prompt-answer time window. Prompt in, completion out, done.The second generation added reasoning capabilities. These models stopped feeling like pure autocompleters — they could search within their stored knowledge, chain thoughts together, and work through problems. The training data changed too: successful reasoning traces got folded back into training. But crucially, they were still bounded by the same constraints. They got more time to think and process, but at the end of the answer, they were still mostly gone. The capability was still internal to the model.Now enter this third generation of agentic LLMs, which really took off with tools like Claude Code becoming increasingly capable. These don't feel like autocompleters. They don't even feel like reasoners. They're starting to feel like orchestrators. They aren't limited to their internals — they act as a connected system, coordinating tools and externals to achieve goals. What scares me most is the new type of training data we're now generating and collecting: succesful long term orchestration traces. They will allow us to scale orchestration kind of intelligence. This kind of intelligence is not bound to its internal. It changes to an external symbiotic type of intelligence. We are training them to externalize almost everything. And optimizing them to orchestrate all these externals over a long time. This feels like optimizing for a symbiotic system, very different from the simple internally optimized llms of today. It really feels like the equation of what the llm is processing, is changing. The llm becomes an orchestration engine of externals, which together make up the whole system. We know how reasoning autocompletion scales, we dont know how orchestration engines scale. I feel like different and new emergent capabilities might appear. We are basically for the first time scaling the prefrontal cortex of llms.For the first time, I can genuinely foresee the path to an unaligned takeoff. Let alone all other harm AI can do in the hands of bad actors. And it makes me question whether labs should continue down this path. Is it not far safer to keep LLM problem solving mostly internal to its own parameters? Of all the AI companies, shouldn't Anthropic have been less loud with systems like claude code. They have been accelerating the most in this new paradigm of what is gonna be scaled.

扩展工具编排数据将产生不同的智能和大型语言模型（LLMs）。