HackerNews中文版

一次运行中出现42个验证错误。Claude在道歉而不是编写HTML。OAuth令牌在摘要处理中间过期。然后我修复了约束条件。八天内，零失败，零干预。秘密不在于更好的提示……而在于将大型语言模型（LLM）视为一个受限的函数：使用模式验证的工具调用，拒绝格式错误的输出并强制重试，采用两次处理架构将编辑判断与格式化分开，以及无聊的开发运维（重试逻辑、速率限制、结构化日志记录）。Claude的调用大约占2000行系统中的30行。大部分工作都是围绕它的其他部分。<a href="https://seanfloyd.dev/blog/llm-reliability" rel="nofollow">https://seanfloyd.dev/blog/llm-reliability</a> <a href="https://github.com/SeanLF/claude-rss-news-digest" rel="nofollow">https://github.com/SeanLF/claude-rss-news-digest</a>

查看原文

42 validation errors in one run. Claude apologising instead of writing HTML. OAuth tokens expiring mid-digest.Then I fixed the constraints. Eight days, zero failures, zero intervention.The secret wasn't better prompts... it was treating the LLM as a constrained function: schema-validated tool calls that reject malformed output and force retries, two-pass architecture separating editorial judgment from formatting, and boring DevOps (retry logic, rate limiting, structured logging).The Claude invocation is ~30 lines in a 2000-line system. Most of the work is everything around it.<a href="https://seanfloyd.dev/blog/llm-reliability" rel="nofollow">https://seanfloyd.dev/blog/llm-reliability</a> <a href="https://github.com/SeanLF/claude-rss-news-digest" rel="nofollow">https://github.com/SeanLF/claude-rss-news-digest</a>

展示HN：我不再指望我的大型语言模型会合作