展示HN:从大规模运行Claude Code群体中获得的经验教训

7作者: sermakarevich24 天前原帖
不久前,我构建了一个简单的应用程序,用于运行一群编码代理——我称之为“舰队”(<a href="https://news.ycombinator.com/item?id=48256389">https://news.ycombinator.com/item?id=48256389</a>)。它基于集中式的组件,使用Python作为协调者,可以运行任何编码器(Claude、agy、Codex)。最近,我添加了一个用户界面来管理整个代理生命周期:添加新任务、监控正在运行的任务,以及一个基于MCP的聊天界面,使用集中式SQLite数据库。从用户界面中,我可以在任何目录中启动代理,定义对其他任务的依赖关系,并指定哪个编码器/模型来完成工作。如今,我可以同时运行10到15个代理。在这样的规模下,你会很快达到限制,因此我花了一些时间调查这些限制的来源以及如何最大化效率。以下是我在运行舰队几周后总结的经验教训: - CLAUDE.md是一个糟糕的抽象。这些文件会无条件加载,通常包含与当前任务无关的描述,并且它们从你的工作目录向上堆叠。结果是浪费了令牌,并且因为将无关指令注入会话而造成混淆。 - 技能虽然不好,但比CLAUDE.md稍微好一点。它们采用渐进式披露的方法:只有技能描述会进入会话,而Claude在需要时会使用工具加载完整的技能文本。这虽然好了一些,但仍然无法扩展——你不能创建10K个技能,因为那会消耗掉你所有可用的上下文。Claude最近引入了技能预算,默默地将不常用的技能从会话中完全剔除。你仍然可以在交互式会话中调用它们,但模型无法在后台会话中调用它们。 - 一些插件可能会被安装多次。在清理过程中,我发现我的几个插件在多个位置被安装,导致重复指令消耗双倍的令牌。 - 在大规模操作中,将插件附加到每个会话是个坏主意。你需要准确了解哪些插件实际上是有用的,并按任务附加它们。 - 使用分层知识库代替CLAUDE.md / 技能 / 插件。这让你能够真正受益于渐进式披露:将你的指令和工具描述保存在其中,让Claude能够快速且廉价地浏览。 - 系统工具消耗约15K令牌(占会话的7%)。你无法管理这些——它们只是附加的,禁用工具并不会将其从上下文中移除。 - AskUserQuestion在后台会话中不可用。你需要实现自己的工具——基于MCP或CLI——以便让`claude -p`能够与你对话。 - 你会变得更加挑剔,选择哪个模型处理每个任务。将工作分解为更难和更简单的子任务,以便将简单的任务路由到较弱、成本更低的模型,从而节省令牌。 - 随着时间的推移,你的上下文切换能力会有所提高。 舰队仓库:<a href="https://github.com/sermakarevich/fleet" rel="nofollow">https://github.com/sermakarevich/fleet</a>
查看原文
Some time ago I built a simple app to run swarms of coding agents — I call it fleet (<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=48256389">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=48256389</a>). It&#x27;s based on centralized beads with a Python orchestrator and can run any coder (Claude, agy, Codex). Recently I added a UI to manage the whole agent lifecycle: adding new tasks, monitoring running ones, and a chat interface built on MCP with a centralized SQLite DB. From the UI I can spawn agents to run in any directory, define dependencies on other tasks, and specify which coder&#x2F;model should do the job. Today I can run 10–15 agents concurrently. At that scale you burn through limits very fast, so I spent some time investigating where those limits go and how to maximize efficiency. Here are the lessons learned after a few weeks of running the fleet:<p>- CLAUDE.md is a terrible abstraction. These files load unconditionally, they often contain descriptions irrelevant to the task at hand, and they stack from your working directory upward. The result is wasted tokens and confusion from injecting irrelevant instructions into the session.<p>- Skills are bad, but not as bad as CLAUDE.md. They use a progressive disclosure approach: only the skill description goes into the session, and Claude loads the full skill text with a tool when it&#x27;s needed. That&#x27;s one level better, but it still doesn&#x27;t let you scale — you can&#x27;t create 10K skills, as that would eat your entire usable context. Claude recently introduced a skills budget that silently drops less frequently used skills from the session entirely. You can still invoke them in an interactive session, but the model can&#x27;t invoke them in a background session.<p>- Some plugins may be installed more than once. During cleanup I found that a few of mine were installed in multiple locations, consuming double the tokens on duplicated instructions.<p>- Attaching plugins to every session is a bad idea at scale. You want to be precise about which plugins are actually useful and attach them per task.<p>- Use a hierarchical knowledge base instead of CLAUDE.md &#x2F; skills &#x2F; plugins. It lets you benefit from real progressive disclosure: keep your instructions and tool descriptions in it and let Claude navigate through it quickly and cheaply.<p>- System tools consume ~15K tokens (7% of the session). You can&#x27;t manage this — they&#x27;re just attached, and disabling tools doesn&#x27;t remove them from the context.<p>- AskUserQuestion isn&#x27;t available in background sessions. You need to implement your own tool — MCP- or CLI-based — to give `claude -p` the ability to talk to you.<p>- You become selective about which model handles each task. Decompose work into harder and simpler subtasks so you can route the simpler ones to weaker, cheaper models and save tokens.<p>- Your context-switching skill improves over time.<p>Fleet repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;sermakarevich&#x2F;fleet" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;sermakarevich&#x2F;fleet</a>