展示HN:我在过去的三天里不停地运行了三个编码代理。以下是我的做法。
1. 无头模式
无头模式允许您将 AI 作为命令行工具用于自动化和脚本编写。在 Claude Code 中,您可以使用 -p 标志运行它:claude -p;在 Codex 中使用 exec;在 Opencode 中使用 run。
2. 请求人类
在无头模式下,传统的与操作员的通信渠道将无法使用——我们需要实现一个专用工具。以下是如何实现的示例:<a href="https://github.com/sermakarevich/claude/tree/main/mcp/ask_human" rel="nofollow">https://github.com/sermakarevich/claude/tree/main/mcp/ask_human</a>
3. 任务队列
Beads 是一个轻量级的分布式图形问题跟踪器,专为 AI 代理而设计,基于 Dolt。您可以创建任务,定义任务之间的依赖关系,并设置状态、优先级和层级。Beads 有助于防止多个任务被超过一个工作者认领。
4. 工作者工件
我们希望能够监控工作者的状态、当前阶段,并在重启后恢复任务。对于每个任务,我们可以使用 beads 任务 ID 创建一个专用文件夹,并将所需内容放入其中。我放入的内容包括:
- 计划和状态 md
- 知识 md
- events.jsonl
- stderr
工作者在其提示中被指示检查工件是否存在,这使其能够从任务中断的地方继续。
5. 工作者隔离
为了准备运行多个工作者,我们需要对其进行隔离。这里可以使用 Git worktree。我正在测试这种方法:
- 工作者获取任务并实施
- 下一个自动生成的工作者验证任务是否完成,进行测试,合并工作树,关闭工单,并在需要时创建修复工单
6. 多个工作者
为了能够运行多个工作者,我们需要一个简单的调度器。一个无限循环不断检查 beads/config,并在需要时触发新的工作者。
7. 编码无关
工作者基本上可以是任何编码人员。我从 Claude 开始,添加了 Codex 和 Agy,最后添加了 Opencode。
8. 订阅限制
3 个编码代理可以在 30 分钟内消耗 Claude 的 $200 订阅限制,即使您切换到 Sonnet 4.6。API 令牌的成本是订阅令牌的 40 倍——这太贵了。我正在测试的想法是:
- 使用最强大的模型进行分析/设计并添加任务
- 使用本地模型作为工作者
- 使用更强大的模型来验证工作者并添加新任务以修复潜在的错误实现
我正在使用 qwen3.6:36B 本地模型与 Ollama 部署在 2 张 GPU 卡上,总共 36GB,具有 256K 的上下文窗口。这虽然较慢,但是免费的。令人惊讶的是,它的效果比我预期的要好得多。Fable 5 在创建清晰简单的工单方面表现极佳。
我考虑的另一种方法是 Bedrock qwen,按令牌付费,或租用每月 $1400 的 96GB GPU。
我发现同时运行 3 个工作者是最佳选择,即使 Ollama 一次只处理 1 个请求。原因是 ask_human 工具。如果工作者在夜间问我问题——它必须等到早上才能继续。运行三个以上的工作者可以确保 GPU 负载达到 100%。
9. 良好的集成
用户界面 - 用于观察任务/beads/config/chat/analytics
当模型提出问题时,很容易错过。它在用户界面中可见——聊天旁边有一个绿色圆圈,但仅此而已。因此,我添加了 Telegram 集成——现在我可以在 Telegram 上接收工作者的问题并进行回复,获取任务状态,创建新任务等。
我这样做是为了我的概念验证项目:
- 改善车队
- 构建与数据收集和分析相关的应用程序
我看到的是,24x7 的编码人员比我想象的更接近。即使是较弱的模型,在任务简单且定义明确时也能提供良好的结果。构建这些系统所需的所有组件都已具备。
代码库:<a href="https://github.com/sermakarevich/fleet" rel="nofollow">https://github.com/sermakarevich/fleet</a>
查看原文
1. Headless mode<p>Headless mode allows you to use the AI as a command-line utility for automation and scripting. In Claude Code you run it with the -p flag: claude -p, in codex - exec, opencode - run.<p>2. Ask human<p>The traditional communication channel with the operator won't work in headless mode - we need to implement a dedicated tool. Here is an example of how this can be done <a href="https://github.com/sermakarevich/claude/tree/main/mcp/ask_human" rel="nofollow">https://github.com/sermakarevich/claude/tree/main/mcp/ask_hu...</a><p>3. Tasks queue<p>Beads is a lightweight distributed graph issue tracker for AI agents, powered by Dolt. You can create tasks, define dependencies between tasks, and have status, priorities, hierarchy. Beads helps prevent multiple tasks from being claimed by > 1 worker.<p>4. Worker artifacts<p>We want to be able to monitor how a worker is doing, at what stage it is, and resume it after a restart. For every task we can create a dedicated folder using the beads task id and put into it what we need. I put there:
- plan and status md
- knowledge md
- events.jsonl
- stderr<p>The worker is instructed in its prompt to check if artifacts exist, which allows it to proceed from where the job was left.<p>5. Worker isolation<p>To prepare to run multiple workers we need to isolate them. Git worktree can be used here. I am testing this approach:
- worker gets the task and implements it
- the next worker, spawned automatically, validates the task is done, tests it, merges the worktree, closes the ticket and creates another one for a fix if required<p>6. Multiple workers<p>To be able to run multiple workers we need a simple orchestrator. An infinite loop constantly checking beads / config and triggering new workers when required.<p>7. Coder agnostic<p>A worker can be basically any coder. I started with Claude, added Codex and Agy. And last added Opencode.<p>8. Subscription limits.<p>3 coding agents can burn the Claude $200 subscription limit in 30 minutes even if you switch to Sonnet 4.6. API tokens cost x40 compared to tokens in the subscription - this is too expensive. The idea I am testing is:
- use the strongest model possible to analyse/design and add tasks
- use a local model as a worker
- use a stronger model to validate workers and add new tasks to fix potential misimplementations<p>I am using the qwen3.6:36B local model with Ollama, deployed on 2 GPU cards, 36GB in total, with a 256K context window. This is slower, but it is free of charge. And surprisingly it worked, and worked way better than I would expect it to. Fable 5 was extremely great at creating clear and simple tickets until it was.<p>Another approach I was considering is Bedrock qwen, paying per token, or renting a 96GB GPU for $1400 per month.<p>I found that it's optimal to run 3 workers concurrently even though Ollama processes 1 request at a time. The reason is the ask_human tool. If a worker asks me something at night - it has to wait until morning doing nothing. Running three +/- guarantees GPU load at 100%.<p>9. Nice integrations<p>UI - to observe tasks / beads / config / chat / analytics<p>It's easy to miss when a model asks a question. It's visible in the UI - a green circle near chat, but that's it. So I added a Telegram integration - now I receive questions from workers on Telegram and can reply there, get the status of tasks, create new tasks etc.<p>I am doing this for my PoC projects ofc:
- improving fleet
- building a data collection and analysis related app<p>What I am seeing is that 24x7 coders are closer than I thought they are. Even weaker models can deliver good results when the task is simple and well defined. All components for building these systems are there.<p>Repo: <a href="https://github.com/sermakarevich/fleet" rel="nofollow">https://github.com/sermakarevich/fleet</a>