大型语言模型(LLMs)学习的是程序员创造的内容,而不是程序员的工作方式。
我进行了一项实验,想看看命令行界面(CLI)是否真的是调用工具的最直观格式(正如一位前Manus AI后端工程师所声称的)。我给我的模型提供了随机场景和一个单一的工具“运行”——我告诉它这就像一个命令行界面。我让它猜测命令。
它猜测出了很好的命令,但总是以冒号开头格式化,比如:
:help
:browser
:search
:curl
它是根据终端的外观进行训练的,而不是根据你实际输入的内容(你并不会输入“:”)。
此后,我更新了我的代理工具代码,以避免与这种直觉相悖。
大型语言模型(LLMs)学习的是文档/资料中命令的样子,而不是人类在键盘上实际输入的内容。
这看起来是显而易见的。这就是为什么你必须测试你的LLM,看看它是如何自然工作的,这样你就不必在系统提示中与其抗争。
顺便提一下,这是Kimi K2.5。
查看原文
I ran an experiment to see if CLI actually was the most intuitive format for tool calling. (As claimed by a ex-Manus AI Backend Engineer) I gave my model random scenarios and a single tool "run" - i told it that it worked like a CLI. I told it to guess commands.<p>it guessed great commands, but it formatted it always with a colon up front, like
:help
:browser
:search
:curl<p>It was trained on how terminals look, not what you actually type (you don't type the ":")<p>I have since updated my code in my agent tool to stop fighting against this intuition.<p>LLMs they learn what commands look like in documentation/artifacts, not what the human actually typed on the keyboard.<p>Seems so obvious. This is why you have to test your LLM and see how it naturally works, so you don't have to fight it with your system prompt.<p>This is Kimi K2.5 Btw.