HackerNews中文版

代理工作流程通常涉及对不同服务（如大型语言模型、数据API、网页抓取）进行10次以上的API调用。第七层（Layer 7）不可靠会导致工作流程失败或引发重试风暴。我想到的一些常见故障模式包括： - 429速率限制 → 代理重试 → 更加频繁地冲击API - 部分服务中断 → 客户之间的同步重试 - LangGraph工作流程在执行中失败 → 如何恢复？对于大规模运行代理系统的用户： - 你们如何处理第七层的故障？ - 重试协调？断路器？ - 你们如何防止对下游依赖的重试风暴？ - LangGraph工作流程是否能优雅地处理API故障？我很好奇实际生产环境的情况。

查看原文

Agent workflows often involve 10+ API calls to different services (LLMs, data APIs, web scraping). Layer 7 being unreliable = workflows fail or cause retry storms.<p>Common failure modes I'm thinking about: - 429 rate limits → agents retry → hammer API worse - Partial outages → synchronized retries across customers - LangGraph workflows fail mid-execution → how to resume?<p>For those running agent systems at scale: - How do you handle Layer 7 failures? - Retry coordination? Circuit breakers? - How do you prevent retry storms to downstream dependencies? - Do LangGraph workflows gracefully handle API failures?<p>Curious what the production reality looks like.

请问HN：当第七层不可靠时，如何扩展代理系统？