展示HN:我不再指望我的大型语言模型会合作

2作者: seanlf大约 13 小时前原帖
一次运行中出现42个验证错误。Claude在道歉而不是编写HTML。OAuth令牌在摘要处理中间过期。<p>然后我修复了约束条件。八天内,零失败,零干预。<p>秘密不在于更好的提示……而在于将大型语言模型(LLM)视为一个受限的函数:使用模式验证的工具调用,拒绝格式错误的输出并强制重试,采用两次处理架构将编辑判断与格式化分开,以及无聊的开发运维(重试逻辑、速率限制、结构化日志记录)。<p>Claude的调用大约占2000行系统中的30行。大部分工作都是围绕它的其他部分。<p><a href="https://seanfloyd.dev/blog/llm-reliability" rel="nofollow">https://seanfloyd.dev/blog/llm-reliability</a> <a href="https://github.com/SeanLF/claude-rss-news-digest" rel="nofollow">https://github.com/SeanLF/claude-rss-news-digest</a>
查看原文
42 validation errors in one run. Claude apologising instead of writing HTML. OAuth tokens expiring mid-digest.<p>Then I fixed the constraints. Eight days, zero failures, zero intervention.<p>The secret wasn&#x27;t better prompts... it was treating the LLM as a constrained function: schema-validated tool calls that reject malformed output and force retries, two-pass architecture separating editorial judgment from formatting, and boring DevOps (retry logic, rate limiting, structured logging).<p>The Claude invocation is ~30 lines in a 2000-line system. Most of the work is everything around it.<p><a href="https:&#x2F;&#x2F;seanfloyd.dev&#x2F;blog&#x2F;llm-reliability" rel="nofollow">https:&#x2F;&#x2F;seanfloyd.dev&#x2F;blog&#x2F;llm-reliability</a> <a href="https:&#x2F;&#x2F;github.com&#x2F;SeanLF&#x2F;claude-rss-news-digest" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;SeanLF&#x2F;claude-rss-news-digest</a>