展示HN:我厌倦了我的大型语言模型的废话。所以我修复了它。
作为一个英俊的本地AI爱好者™,你可能注意到了大型语言模型(LLMs)的一个重大缺陷:<p>它们会撒谎。自信地撒谎。<i>一直都是。</i><p>我有自闭症,对基于氛围的工具极其过敏,所以……我做了一个东西。也许对你也有用。<p>这个东西:llama-conductor<p>llama-conductor是一个路由器,位于你的<i>前端</i>(例如:OWUI)和<i>后端</i>(llama.cpp + llama-swap)之间。它是本地优先的,但如果你指向OpenAI兼容的任何东西,它也可以进行通信(注意:实验性,所以结果可能会有所不同)。<p>LC是一个透明的盒子,使得整个堆栈表现得像一个<i>确定性系统</i>,而不是一个醉汉在讲述关于逃脱的鱼的故事。<p>简而言之:“我们信仰上帝,其他人必须提供数据。”<p>三个例子:<p>1. 知识库机制(markdown, JSON, 校验和)<p>你将“知识”保存在磁盘上的简单文件夹中。将文档(`.txt`, `.md`, `.pdf`)放入其中。然后:<p>* `>>attach <kb>` — 附加一个知识库文件夹
* `>>summ new` — 生成带有<i>SHA-256来源</i>的`SUMM_<i>.md`文件,并将原始文件移动到子文件夹中<p>现在,当你问类似的问题时:<p>> “嘿,1982年Commodore C64的零售价格是多少?”<p>……它只会从附加的知识库中回答。<p>如果事实不在其中,它会明确告诉你,而不是随便编造。例如:<p>> 提供的事实表明Commodore 64的推出价格为595美元,并降至250美元,但未具体说明1982年的零售价格。Amiga的定价和时间线在提供的事实中也没有详细说明。
>
> 缺失的信息包括Commodore产品线在1982年的确切零售价格,以及当时销售的具体型号。
>
> 信心:中等 | 来源:混合<p>没有氛围。只有:这里是你文档中的内容,这里是缺失的,不要让自己陷入愚蠢的境地。<p>然后,如果你对摘要满意,你可以:<p>* `>>move to vault`<p>2. Mentats:证明或拒绝模式(仅限Vault)<p>Mentats是针对你的<i>策划</i>来源的“深度思考”管道。<p>* 没有聊天历史
* 没有文件系统知识库
* 没有Vodka
* <i>仅限Vault的基础</i>(Qdrant)<p>它运行三次传递(思考者 → 批评者 → 思考者)。故意设计得很慢。你可以审计它。如果Vault中没有相关内容?它会拒绝并告诉你去找别的事做:<p>最终答案:
提供的事实不包含关于Acorn计算机或其1995年售价的信息。<p>来源:Vault
使用的事实:无
[扎尔多兹已发言]<p>此外,是的,它会写一个mentats_debug.log。你可以随时查看。<p>流程基本上是:<p>附加知识库 → 总结 → 移动到Vault → Mentats。<p>没有神秘成分。没有“相信我,兄弟,嵌入。”<p>3. Vodka:在有限预算下的确定性记忆<p>低配电脑有两个经典问题:金鱼记忆 + 上下文膨胀,导致你的显存崩溃。<p>Vodka在不增加额外模型计算的情况下解决了这两个问题。<p>* `!!` 逐字存储事实(磁盘上的JSON)
* `??` 逐字回忆它们(TTL + 触碰限制,以免记忆变成垃圾场)
* <i>CTC(Cut The Crap)</i> 硬性限制上下文(最后N条消息 + 字符限制),以避免在400条消息后显存激增<p>所以,而不是:<p>> “记住我的服务器是203.0.113.42” → “明白了!” → [100条消息后] → “127.0.0.1 ”<p>你得到的是:<p>> `!! 我的服务器是203.0.113.42`
>
> `?? 服务器IP` → <i>203.0.113.42</i>(带有TTL/触碰元数据)<p>而且由于上下文保持在边界内:稳定的键值缓存,稳定的速度,你的低配电脑不再哭泣。<p>---<p>在自述文件中还有更多(很多更多),但我已经在这篇帖子中过度自闭了。<p>简而言之:<p>如果你想让你的本地LLM在<i>不知道的时候闭嘴</i>,并在<i>知道的时候出示凭证</i>,来试试它:<p>* <i>主要(Codeberg):</i> [<a href="https://codeberg.org/BobbyLLM/llama-conductor" rel="nofollow">https://codeberg.org/BobbyLLM/llama-conductor</a>](<a href="https://codeberg.org/BobbyLLM/llama-conductor" rel="nofollow">https://codeberg.org/BobbyLLM/llama-conductor</a>)
* <i>镜像(GitHub):</i> [<a href="https://github.com/BobbyLLM/llama-conductor" rel="nofollow">https://github.com/BobbyLLM/llama-conductor</a>](<a href="https://github.com/BobbyLLM/llama-conductor" rel="nofollow">https://github.com/BobbyLLM/llama-conductor</a>)<p>PS:抱歉关于AI的糟糕图片。我画得很糟糕。<p>PPS:一位有自闭症谱系障碍的人用Notepad++写了这篇文章。如果格式或语言有些奇怪,你现在知道原因了。*
查看原文
As a handsome local AI enjoyer™ you’ve probably noticed one of the big flaws with LLMs:<p>It lies. Confidently. <i>ALL THE TIME.</i><p>I’m autistic and extremely allergic to vibes-based tooling, so … I built a thing. Maybe it’s useful to you too.<p>The thing: llama-conductor<p>llama-conductor is a router that sits between your <i>frontend</i> (eg: OWUI) & <i>backend</i> (llama.cpp + llama-swap). Local-first but it should talk to anything OpenAI-compatible if you point it there (note: experimental so YMMV).<p>LC is a glass-box that makes the stack behave like a <i>deterministic system</i>, instead of a drunk telling a story about the fish that got away.<p>TL;DR: “In God we trust. All others must bring data.”<p>Three examples:<p>1. KB mechanics (markdown, JSON, checksums)<p>You keep “knowledge” as dumb folders on disk. Drop docs (`.txt`, `.md`, `.pdf`) in them. Then:<p>* `>>attach <kb>` — attaches a KB folder
* `>>summ new` — generates `SUMM_<i>.md` files with </i>SHA-256 provenance* baked in + moves the original to a sub-folder<p>Now, when you ask something like:<p>> “yo, what did the Commodore C64 retail for in 1982?”<p>…it answers from the attached KBs <i>only</i>.<p>If the fact isn’t there, it tells you - explicitly - instead of winging it. Eg:<p>> The provided facts state the Commodore 64 launched at $595 and was reduced to $250, but do not specify a 1982 retail price. The Amiga’s pricing and timeline are also not detailed in the given facts.
>
> Missing information includes the exact 1982 retail price for Commodore’s product line and which specific model(s) were sold then.
>
> Confidence: medium | Source: Mixed<p>No vibes. Just: here’s what’s in your docs, here’s what’s missing, don't GIGO yourself into stupid.<p>Then, if you're happy with the summary, you can:<p>* `>>move to vault`<p>2. Mentats: proof-or-refusal mode (Vault-only)<p>Mentats is the “deep think” pipeline against your <i>curated</i> sources.<p>* no chat history
* no filesystem KBs
* no Vodka
* <i>Vault-only grounding</i> (Qdrant)<p>It runs a triple-pass (thinker → critic → thinker). It’s slow on purpose. You can audit it. And if the Vault has nothing relevant? It refuses and tells you to go pound sand:<p>FINAL_ANSWER:
The provided facts do not contain information about the Acorn computer or its 1995 sale price.<p>Sources: Vault
FACTS_USED: NONE
[ZARDOZ HATH SPOKEN]<p>Also yes, it writes a mentats_debug.log. Go look at it any time you want.<p>The flow is basically:<p>Attach KBs → SUMM → Move to Vault → Mentats.<p>No mystery meat. No “trust me bro, embeddings.”<p>3. Vodka: deterministic memory on a potato budget<p>Potato PCs have two classic problems: goldfish memory + context bloat that murders your VRAM.<p>Vodka fixes both without extra model compute.<p>* `!!` stores facts verbatim (JSON on disk)
* `??` recalls them verbatim (TTL + touch limits so memory doesn’t become landfill)
* <i>CTC (Cut The Crap)</i> hard-caps context (last N messages + char cap) so you don’t get VRAM spikes after 400 messages<p>So instead of:<p>> “Remember my server is 203.0.113.42” → “Got it!” → [100 msgs later] → “127.0.0.1 ”<p>you get:<p>> `!! my server is 203.0.113.42`
>
> `?? server ip` → <i>203.0.113.42</i> (with TTL/touch metadata)<p>And because context stays bounded: stable KV cache, stable speed, your potato PC stops crying.<p>---<p>There’s more (a lot more) in the README, but I’ve already over-autism’ed this post.<p>TL;DR:<p>If you want your local LLM to <i>shut up when it doesn’t know</i> and <i>show receipts when it does</i>, come poke it:<p>* <i>Primary (Codeberg):</i> [<a href="https://codeberg.org/BobbyLLM/llama-conductor" rel="nofollow">https://codeberg.org/BobbyLLM/llama-conductor</a>](<a href="https://codeberg.org/BobbyLLM/llama-conductor" rel="nofollow">https://codeberg.org/BobbyLLM/llama-conductor</a>)
* <i>Mirror (GitHub):</i> [<a href="https://github.com/BobbyLLM/llama-conductor" rel="nofollow">https://github.com/BobbyLLM/llama-conductor</a>](<a href="https://github.com/BobbyLLM/llama-conductor" rel="nofollow">https://github.com/BobbyLLM/llama-conductor</a>)<p>PS: Sorry about the AI slop image. I can't draw for shit.<p>PPS: A human with ASD wrote this using Notepad++. If it the formatting or language are weird, now you know why.*