展示HN:一个针对本地AI代理的四层自愈系统(曾经默默失效)

1作者: ramsbaby-dev3 个月前原帖
我在Mac mini上运行一个本地AI助手(OpenClaw)。当它崩溃时,我希望它能够自动修复自己。因此,我建立了一个四层恢复系统: <p>第一级 → launchd自动重启(秒级) 第二级 → 每60秒进行一次看门狗健康检查 第三级 → Claude Code AI诊断与修复(在失败超过30分钟后触发) 第四级 → Discord/Telegram人工升级 <p>这个系统在纸面上看起来很完美。但v3.0.0存在一个关键错误:第二级记录了“升级到第三级...”但实际上从未调用脚本。这个链条在最重要的交接处断开了——默默地持续了几周。 <p>v3.1.0修复了这个问题,并增加了: - 安装程序中的链条验证(每个级别都经过端到端测试) - 优雅的.env加载,以确保第三级不会在没有API密钥的情况下静默崩溃 - 看门狗补救模式,用于错过的时间间隔 <p>GitHub: <a href="https://github.com/Ramsbaby/openclaw-self-healing" rel="nofollow">https://github.com/Ramsbaby/openclaw-self-healing</a> <p>主要教训:“我有监控” ≠ “我的监控链实际上是连接的。” 测试每个组件的启动并不等同于测试系统的端到端功能。
查看原文
I run a local AI assistant (OpenClaw) on a Mac mini. When it crashes, I wanted it to fix itself automatically. So I built a 4-tier recovery system:<p>Level 1 → launchd auto-restart (seconds) Level 2 → watchdog health check every 60s Level 3 → Claude Code AI diagnosis + repair (fires after 30+ min failure) Level 4 → Discord&#x2F;Telegram human escalation<p>The system looked great on paper. But v3.0.0 had a critical bug: Level 2 logged &quot;Escalating to Level 3...&quot; but never actually called the script. The chain was broken at the most important handoff — silently, for weeks.<p>v3.1.0 fixes this plus adds: - Chain verification in the installer (each level is tested end-to-end) - Graceful .env loading so Level 3 doesn&#x27;t die silently without API keys - Watchdog Catch-up mode for missed intervals<p>GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;Ramsbaby&#x2F;openclaw-self-healing" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Ramsbaby&#x2F;openclaw-self-healing</a><p>The meta-lesson: &quot;I have monitoring&quot; ≠ &quot;my monitoring chain is actually connected.&quot; Testing each component starts is not the same as testing the system works end-to-end.