展示HN:我厌倦了我的大型语言模型的废话。所以我修复了它。

2作者: BobbyLLM12 天前原帖
作为一个英俊的本地AI爱好者™,你可能注意到了大型语言模型(LLMs)的一个重大缺陷:<p>它们会撒谎。自信地撒谎。<i>一直都是。</i><p>我有自闭症,对基于氛围的工具极其过敏,所以……我做了一个东西。也许对你也有用。<p>这个东西:llama-conductor<p>llama-conductor是一个路由器,位于你的<i>前端</i>(例如:OWUI)和<i>后端</i>(llama.cpp + llama-swap)之间。它是本地优先的,但如果你指向OpenAI兼容的任何东西,它也可以进行通信(注意:实验性,所以结果可能会有所不同)。<p>LC是一个透明的盒子,使得整个堆栈表现得像一个<i>确定性系统</i>,而不是一个醉汉在讲述关于逃脱的鱼的故事。<p>简而言之:“我们信仰上帝,其他人必须提供数据。”<p>三个例子:<p>1. 知识库机制(markdown, JSON, 校验和)<p>你将“知识”保存在磁盘上的简单文件夹中。将文档(`.txt`, `.md`, `.pdf`)放入其中。然后:<p>* `>>attach <kb>` — 附加一个知识库文件夹 * `>>summ new` — 生成带有<i>SHA-256来源</i>的`SUMM_<i>.md`文件,并将原始文件移动到子文件夹中<p>现在,当你问类似的问题时:<p>&gt; “嘿,1982年Commodore C64的零售价格是多少?”<p>……它只会从附加的知识库中回答。<p>如果事实不在其中,它会明确告诉你,而不是随便编造。例如:<p>&gt; 提供的事实表明Commodore 64的推出价格为595美元,并降至250美元,但未具体说明1982年的零售价格。Amiga的定价和时间线在提供的事实中也没有详细说明。 &gt; &gt; 缺失的信息包括Commodore产品线在1982年的确切零售价格,以及当时销售的具体型号。 &gt; &gt; 信心:中等 | 来源:混合<p>没有氛围。只有:这里是你文档中的内容,这里是缺失的,不要让自己陷入愚蠢的境地。<p>然后,如果你对摘要满意,你可以:<p>* `>>move to vault`<p>2. Mentats:证明或拒绝模式(仅限Vault)<p>Mentats是针对你的<i>策划</i>来源的“深度思考”管道。<p>* 没有聊天历史 * 没有文件系统知识库 * 没有Vodka * <i>仅限Vault的基础</i>(Qdrant)<p>它运行三次传递(思考者 → 批评者 → 思考者)。故意设计得很慢。你可以审计它。如果Vault中没有相关内容?它会拒绝并告诉你去找别的事做:<p>最终答案: 提供的事实不包含关于Acorn计算机或其1995年售价的信息。<p>来源:Vault 使用的事实:无 [扎尔多兹已发言]<p>此外,是的,它会写一个mentats_debug.log。你可以随时查看。<p>流程基本上是:<p>附加知识库 → 总结 → 移动到Vault → Mentats。<p>没有神秘成分。没有“相信我,兄弟,嵌入。”<p>3. Vodka:在有限预算下的确定性记忆<p>低配电脑有两个经典问题:金鱼记忆 + 上下文膨胀,导致你的显存崩溃。<p>Vodka在不增加额外模型计算的情况下解决了这两个问题。<p>* `!!` 逐字存储事实(磁盘上的JSON) * `??` 逐字回忆它们(TTL + 触碰限制,以免记忆变成垃圾场) * <i>CTC(Cut The Crap)</i> 硬性限制上下文(最后N条消息 + 字符限制),以避免在400条消息后显存激增<p>所以,而不是:<p>&gt; “记住我的服务器是203.0.113.42” → “明白了!” → [100条消息后] → “127.0.0.1 ”<p>你得到的是:<p>&gt; `!! 我的服务器是203.0.113.42` &gt; &gt; `?? 服务器IP` → <i>203.0.113.42</i>(带有TTL/触碰元数据)<p>而且由于上下文保持在边界内:稳定的键值缓存,稳定的速度,你的低配电脑不再哭泣。<p>---<p>在自述文件中还有更多(很多更多),但我已经在这篇帖子中过度自闭了。<p>简而言之:<p>如果你想让你的本地LLM在<i>不知道的时候闭嘴</i>,并在<i>知道的时候出示凭证</i>,来试试它:<p>* <i>主要(Codeberg):</i> [<a href="https://codeberg.org/BobbyLLM/llama-conductor" rel="nofollow">https://codeberg.org/BobbyLLM/llama-conductor</a>](<a href="https://codeberg.org/BobbyLLM/llama-conductor" rel="nofollow">https://codeberg.org/BobbyLLM/llama-conductor</a>) * <i>镜像(GitHub):</i> [<a href="https://github.com/BobbyLLM/llama-conductor" rel="nofollow">https://github.com/BobbyLLM/llama-conductor</a>](<a href="https://github.com/BobbyLLM/llama-conductor" rel="nofollow">https://github.com/BobbyLLM/llama-conductor</a>)<p>PS:抱歉关于AI的糟糕图片。我画得很糟糕。<p>PPS:一位有自闭症谱系障碍的人用Notepad++写了这篇文章。如果格式或语言有些奇怪,你现在知道原因了。*
查看原文
As a handsome local AI enjoyer™ you’ve probably noticed one of the big flaws with LLMs:<p>It lies. Confidently. <i>ALL THE TIME.</i><p>I’m autistic and extremely allergic to vibes-based tooling, so … I built a thing. Maybe it’s useful to you too.<p>The thing: llama-conductor<p>llama-conductor is a router that sits between your <i>frontend</i> (eg: OWUI) &amp; <i>backend</i> (llama.cpp + llama-swap). Local-first but it should talk to anything OpenAI-compatible if you point it there (note: experimental so YMMV).<p>LC is a glass-box that makes the stack behave like a <i>deterministic system</i>, instead of a drunk telling a story about the fish that got away.<p>TL;DR: “In God we trust. All others must bring data.”<p>Three examples:<p>1. KB mechanics (markdown, JSON, checksums)<p>You keep “knowledge” as dumb folders on disk. Drop docs (`.txt`, `.md`, `.pdf`) in them. Then:<p>* `&gt;&gt;attach &lt;kb&gt;` — attaches a KB folder * `&gt;&gt;summ new` — generates `SUMM_<i>.md` files with </i>SHA-256 provenance* baked in + moves the original to a sub-folder<p>Now, when you ask something like:<p>&gt; “yo, what did the Commodore C64 retail for in 1982?”<p>…it answers from the attached KBs <i>only</i>.<p>If the fact isn’t there, it tells you - explicitly - instead of winging it. Eg:<p>&gt; The provided facts state the Commodore 64 launched at $595 and was reduced to $250, but do not specify a 1982 retail price. The Amiga’s pricing and timeline are also not detailed in the given facts. &gt; &gt; Missing information includes the exact 1982 retail price for Commodore’s product line and which specific model(s) were sold then. &gt; &gt; Confidence: medium | Source: Mixed<p>No vibes. Just: here’s what’s in your docs, here’s what’s missing, don&#x27;t GIGO yourself into stupid.<p>Then, if you&#x27;re happy with the summary, you can:<p>* `&gt;&gt;move to vault`<p>2. Mentats: proof-or-refusal mode (Vault-only)<p>Mentats is the “deep think” pipeline against your <i>curated</i> sources.<p>* no chat history * no filesystem KBs * no Vodka * <i>Vault-only grounding</i> (Qdrant)<p>It runs a triple-pass (thinker → critic → thinker). It’s slow on purpose. You can audit it. And if the Vault has nothing relevant? It refuses and tells you to go pound sand:<p>FINAL_ANSWER: The provided facts do not contain information about the Acorn computer or its 1995 sale price.<p>Sources: Vault FACTS_USED: NONE [ZARDOZ HATH SPOKEN]<p>Also yes, it writes a mentats_debug.log. Go look at it any time you want.<p>The flow is basically:<p>Attach KBs → SUMM → Move to Vault → Mentats.<p>No mystery meat. No “trust me bro, embeddings.”<p>3. Vodka: deterministic memory on a potato budget<p>Potato PCs have two classic problems: goldfish memory + context bloat that murders your VRAM.<p>Vodka fixes both without extra model compute.<p>* `!!` stores facts verbatim (JSON on disk) * `??` recalls them verbatim (TTL + touch limits so memory doesn’t become landfill) * <i>CTC (Cut The Crap)</i> hard-caps context (last N messages + char cap) so you don’t get VRAM spikes after 400 messages<p>So instead of:<p>&gt; “Remember my server is 203.0.113.42” → “Got it!” → [100 msgs later] → “127.0.0.1 ”<p>you get:<p>&gt; `!! my server is 203.0.113.42` &gt; &gt; `?? server ip` → <i>203.0.113.42</i> (with TTL&#x2F;touch metadata)<p>And because context stays bounded: stable KV cache, stable speed, your potato PC stops crying.<p>---<p>There’s more (a lot more) in the README, but I’ve already over-autism’ed this post.<p>TL;DR:<p>If you want your local LLM to <i>shut up when it doesn’t know</i> and <i>show receipts when it does</i>, come poke it:<p>* <i>Primary (Codeberg):</i> [<a href="https:&#x2F;&#x2F;codeberg.org&#x2F;BobbyLLM&#x2F;llama-conductor" rel="nofollow">https:&#x2F;&#x2F;codeberg.org&#x2F;BobbyLLM&#x2F;llama-conductor</a>](<a href="https:&#x2F;&#x2F;codeberg.org&#x2F;BobbyLLM&#x2F;llama-conductor" rel="nofollow">https:&#x2F;&#x2F;codeberg.org&#x2F;BobbyLLM&#x2F;llama-conductor</a>) * <i>Mirror (GitHub):</i> [<a href="https:&#x2F;&#x2F;github.com&#x2F;BobbyLLM&#x2F;llama-conductor" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;BobbyLLM&#x2F;llama-conductor</a>](<a href="https:&#x2F;&#x2F;github.com&#x2F;BobbyLLM&#x2F;llama-conductor" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;BobbyLLM&#x2F;llama-conductor</a>)<p>PS: Sorry about the AI slop image. I can&#x27;t draw for shit.<p>PPS: A human with ASD wrote this using Notepad++. If it the formatting or language are weird, now you know why.*