HackerNews中文版

我在Mac mini上运行一个本地AI助手（OpenClaw）。当它崩溃时，我希望它能够自动修复自己。因此，我建立了一个四层恢复系统： 第一级 → launchd自动重启（秒级）第二级 → 每60秒进行一次看门狗健康检查第三级 → Claude Code AI诊断与修复（在失败超过30分钟后触发）第四级 → Discord/Telegram人工升级 这个系统在纸面上看起来很完美。但v3.0.0存在一个关键错误：第二级记录了“升级到第三级...”但实际上从未调用脚本。这个链条在最重要的交接处断开了——默默地持续了几周。 v3.1.0修复了这个问题，并增加了： - 安装程序中的链条验证（每个级别都经过端到端测试） - 优雅的.env加载，以确保第三级不会在没有API密钥的情况下静默崩溃 - 看门狗补救模式，用于错过的时间间隔 GitHub: <a href="https://github.com/Ramsbaby/openclaw-self-healing" rel="nofollow">https://github.com/Ramsbaby/openclaw-self-healing</a> 主要教训：“我有监控” ≠ “我的监控链实际上是连接的。” 测试每个组件的启动并不等同于测试系统的端到端功能。

查看原文

I run a local AI assistant (OpenClaw) on a Mac mini. When it crashes, I wanted it to fix itself automatically. So I built a 4-tier recovery system:Level 1 → launchd auto-restart (seconds) Level 2 → watchdog health check every 60s Level 3 → Claude Code AI diagnosis + repair (fires after 30+ min failure) Level 4 → Discord/Telegram human escalationThe system looked great on paper. But v3.0.0 had a critical bug: Level 2 logged "Escalating to Level 3..." but never actually called the script. The chain was broken at the most important handoff — silently, for weeks.v3.1.0 fixes this plus adds: - Chain verification in the installer (each level is tested end-to-end) - Graceful .env loading so Level 3 doesn't die silently without API keys - Watchdog Catch-up mode for missed intervalsGitHub: <a href="https://github.com/Ramsbaby/openclaw-self-healing" rel="nofollow">https://github.com/Ramsbaby/openclaw-self-healing</a>The meta-lesson: "I have monitoring" ≠ "my monitoring chain is actually connected." Testing each component starts is not the same as testing the system works end-to-end.

展示HN：一个针对本地AI代理的四层自愈系统（曾经默默失效）