HackerNews中文版

作为一个英俊的本地AI爱好者™，你可能注意到了大型语言模型（LLMs）的一个重大缺陷：它们会撒谎。自信地撒谎。一直都是。我有自闭症，对基于氛围的工具极其过敏，所以……我做了一个东西。也许对你也有用。这个东西：llama-conductorllama-conductor是一个路由器，位于你的前端（例如：OWUI）和后端（llama.cpp + llama-swap）之间。它是本地优先的，但如果你指向OpenAI兼容的任何东西，它也可以进行通信（注意：实验性，所以结果可能会有所不同）。LC是一个透明的盒子，使得整个堆栈表现得像一个确定性系统，而不是一个醉汉在讲述关于逃脱的鱼的故事。简而言之：“我们信仰上帝，其他人必须提供数据。”三个例子：1. 知识库机制（markdown, JSON, 校验和）你将“知识”保存在磁盘上的简单文件夹中。将文档（`.txt`, `.md`, `.pdf`）放入其中。然后：* `>>attach <kb>` — 附加一个知识库文件夹 * `>>summ new` — 生成带有SHA-256来源的`SUMM_.md`文件，并将原始文件移动到子文件夹中现在，当你问类似的问题时：> “嘿，1982年Commodore C64的零售价格是多少？”……它只会从附加的知识库中回答。如果事实不在其中，它会明确告诉你，而不是随便编造。例如：> 提供的事实表明Commodore 64的推出价格为595美元，并降至250美元，但未具体说明1982年的零售价格。Amiga的定价和时间线在提供的事实中也没有详细说明。 > > 缺失的信息包括Commodore产品线在1982年的确切零售价格，以及当时销售的具体型号。 > > 信心：中等 | 来源：混合没有氛围。只有：这里是你文档中的内容，这里是缺失的，不要让自己陷入愚蠢的境地。然后，如果你对摘要满意，你可以：* `>>move to vault`2. Mentats：证明或拒绝模式（仅限Vault）Mentats是针对你的策划来源的“深度思考”管道。* 没有聊天历史 * 没有文件系统知识库 * 没有Vodka * 仅限Vault的基础（Qdrant）它运行三次传递（思考者 → 批评者 → 思考者）。故意设计得很慢。你可以审计它。如果Vault中没有相关内容？它会拒绝并告诉你去找别的事做：最终答案：提供的事实不包含关于Acorn计算机或其1995年售价的信息。来源：Vault 使用的事实：无 [扎尔多兹已发言]此外，是的，它会写一个mentats_debug.log。你可以随时查看。流程基本上是：附加知识库 → 总结 → 移动到Vault → Mentats。没有神秘成分。没有“相信我，兄弟，嵌入。”3. Vodka：在有限预算下的确定性记忆低配电脑有两个经典问题：金鱼记忆 + 上下文膨胀，导致你的显存崩溃。Vodka在不增加额外模型计算的情况下解决了这两个问题。* `!!` 逐字存储事实（磁盘上的JSON） * `??` 逐字回忆它们（TTL + 触碰限制，以免记忆变成垃圾场） * CTC（Cut The Crap） 硬性限制上下文（最后N条消息 + 字符限制），以避免在400条消息后显存激增所以，而不是：> “记住我的服务器是203.0.113.42” → “明白了！” → [100条消息后] → “127.0.0.1 ”你得到的是：> `!! 我的服务器是203.0.113.42` > > `?? 服务器IP` → 203.0.113.42（带有TTL/触碰元数据）而且由于上下文保持在边界内：稳定的键值缓存，稳定的速度，你的低配电脑不再哭泣。---在自述文件中还有更多（很多更多），但我已经在这篇帖子中过度自闭了。简而言之：如果你想让你的本地LLM在不知道的时候闭嘴，并在知道的时候出示凭证，来试试它：* 主要（Codeberg）： [<a href="https://codeberg.org/BobbyLLM/llama-conductor" rel="nofollow">https://codeberg.org/BobbyLLM/llama-conductor</a>](<a href="https://codeberg.org/BobbyLLM/llama-conductor" rel="nofollow">https://codeberg.org/BobbyLLM/llama-conductor</a>) * 镜像（GitHub）： [<a href="https://github.com/BobbyLLM/llama-conductor" rel="nofollow">https://github.com/BobbyLLM/llama-conductor</a>](<a href="https://github.com/BobbyLLM/llama-conductor" rel="nofollow">https://github.com/BobbyLLM/llama-conductor</a>)PS：抱歉关于AI的糟糕图片。我画得很糟糕。PPS：一位有自闭症谱系障碍的人用Notepad++写了这篇文章。如果格式或语言有些奇怪，你现在知道原因了。*

查看原文

As a handsome local AI enjoyer™ you’ve probably noticed one of the big flaws with LLMs:It lies. Confidently. ALL THE TIME.I’m autistic and extremely allergic to vibes-based tooling, so … I built a thing. Maybe it’s useful to you too.The thing: llama-conductorllama-conductor is a router that sits between your frontend (eg: OWUI) & backend (llama.cpp + llama-swap). Local-first but it should talk to anything OpenAI-compatible if you point it there (note: experimental so YMMV).LC is a glass-box that makes the stack behave like a deterministic system, instead of a drunk telling a story about the fish that got away.TL;DR: “In God we trust. All others must bring data.”Three examples:1. KB mechanics (markdown, JSON, checksums)You keep “knowledge” as dumb folders on disk. Drop docs (`.txt`, `.md`, `.pdf`) in them. Then:* `>>attach <kb>` — attaches a KB folder * `>>summ new` — generates `SUMM_.md` files with SHA-256 provenance* baked in + moves the original to a sub-folderNow, when you ask something like:> “yo, what did the Commodore C64 retail for in 1982?”…it answers from the attached KBs only.If the fact isn’t there, it tells you - explicitly - instead of winging it. Eg:> The provided facts state the Commodore 64 launched at $595 and was reduced to $250, but do not specify a 1982 retail price. The Amiga’s pricing and timeline are also not detailed in the given facts. > > Missing information includes the exact 1982 retail price for Commodore’s product line and which specific model(s) were sold then. > > Confidence: medium | Source: MixedNo vibes. Just: here’s what’s in your docs, here’s what’s missing, don't GIGO yourself into stupid.Then, if you're happy with the summary, you can:* `>>move to vault`2. Mentats: proof-or-refusal mode (Vault-only)Mentats is the “deep think” pipeline against your curated sources.* no chat history * no filesystem KBs * no Vodka * Vault-only grounding (Qdrant)It runs a triple-pass (thinker → critic → thinker). It’s slow on purpose. You can audit it. And if the Vault has nothing relevant? It refuses and tells you to go pound sand:FINAL_ANSWER: The provided facts do not contain information about the Acorn computer or its 1995 sale price.Sources: Vault FACTS_USED: NONE [ZARDOZ HATH SPOKEN]Also yes, it writes a mentats_debug.log. Go look at it any time you want.The flow is basically:Attach KBs → SUMM → Move to Vault → Mentats.No mystery meat. No “trust me bro, embeddings.”3. Vodka: deterministic memory on a potato budgetPotato PCs have two classic problems: goldfish memory + context bloat that murders your VRAM.Vodka fixes both without extra model compute.* `!!` stores facts verbatim (JSON on disk) * `??` recalls them verbatim (TTL + touch limits so memory doesn’t become landfill) * CTC (Cut The Crap) hard-caps context (last N messages + char cap) so you don’t get VRAM spikes after 400 messagesSo instead of:> “Remember my server is 203.0.113.42” → “Got it!” → [100 msgs later] → “127.0.0.1 ”you get:> `!! my server is 203.0.113.42` > > `?? server ip` → 203.0.113.42 (with TTL/touch metadata)And because context stays bounded: stable KV cache, stable speed, your potato PC stops crying.---There’s more (a lot more) in the README, but I’ve already over-autism’ed this post.TL;DR:If you want your local LLM to shut up when it doesn’t know and show receipts when it does, come poke it:* Primary (Codeberg): [<a href="https://codeberg.org/BobbyLLM/llama-conductor" rel="nofollow">https://codeberg.org/BobbyLLM/llama-conductor</a>](<a href="https://codeberg.org/BobbyLLM/llama-conductor" rel="nofollow">https://codeberg.org/BobbyLLM/llama-conductor</a>) * Mirror (GitHub): [<a href="https://github.com/BobbyLLM/llama-conductor" rel="nofollow">https://github.com/BobbyLLM/llama-conductor</a>](<a href="https://github.com/BobbyLLM/llama-conductor" rel="nofollow">https://github.com/BobbyLLM/llama-conductor</a>)PS: Sorry about the AI slop image. I can't draw for shit.PPS: A human with ASD wrote this using Notepad++. If it the formatting or language are weird, now you know why.*

展示HN：我厌倦了我的大型语言模型的废话。所以我修复了它。