HackerNews中文版

我一直在尝试让自主编码（agentic coding）发挥作用，但我在网上看到的与我能够实现的之间的差异让我感到困惑。有没有超越炒作的真实证据表明自主编码能够产生净正面效果？如果你们中有人真的成功实现了这一点，能否详细分享一下你们是如何做到的？我所说的“成功实现”是指： * 创造的价值超过技术债务，且 * 生成的代码结构足够稳健，以至于负责架构的人可以签字认可。最近，我看到一种趋势，推崇最小化或根本不进行代码审查，声称我们应该从“验证架构”转向“验证行为”。在实践中，这似乎意味着：不看代码；如果测试和持续集成（CI）通过，就可以发布。我无法想象这种做法在长期内如何维持下去。我的预期是，最终会得到“意大利面条”代码（spaghetti code），虽然在理想情况下能运行，但随着时间的推移会积累微妙且难以调试的错误。当我尝试在现有代码库上使用Codex时，无论是否设置了保护措施，我花了一半的时间在修复它所犯的微小错误或引入的重复代码上。上个周末，我尝试从零开始构建一个用于宠物喂养提醒的iOS应用。我指示Codex首先研究并提出一个SwiftUI的架构蓝图。然后，我与它合作编写了一份描述应该实现什么以及如何实现的规范。第一次实现的结果出乎意料地好，尽管有一些错误。然而，事情很快就开始恶化。我花了周末的剩余时间让Codex使功能正常，修复错误而不引入新的错误，并研究最佳实践，而不是凭空捏造。尽管我让它记录我发现的新指南和保护措施，但情况并没有改善。最后，我只能放弃。我个人无法接受发布未经审查的代码。这让我觉得不对劲。产品必须正常工作，但代码也必须是高质量的。

查看原文

I've been trying to get agentic coding to work, but the dissonance between what I'm seeing online and what I'm able to achieve is doing my head in.Is there real evidence, beyond hype, that agentic coding produces net-positive results? If any of you have actually got it to work, could you share (in detail) how you did it?By "getting it to work" I mean: * creating more value than technical debt, and * producing code that’s structurally sound enough for someone responsible for the architecture to sign off on.Lately I’ve seen a push toward minimal or nonexistent code review, with the claim that we should move from “validating architecture” to “validating behavior.” In practice, this seems to mean: don’t look at the code; if tests and CI pass, ship it. I can’t see how this holds up long-term. My expectation is that you end up with "spaghetti" code that works on the happy path but accumulates subtle, hard-to-debug failures over time.When I tried using Codex on my existing codebases, with or without guardrails, half of my time went into fixing the subtle mistakes it made or the duplication it introduced.Last weekend I tried building an iOS app for pet feeding reminders from scratch. I instructed Codex to research and propose an architectural blueprint for SwiftUI first. Then, I worked with it to write a spec describing what should be implemented and how.The first implementation pass was surprisingly good, although it had a number of bugs. Things went downhill fast, however. I spent the rest of my weekend getting Codex to make things work, fix bugs without introducing new ones, and research best practices instead of making stuff up. Although I made it record new guidelines and guardrails as I found them, things didn't improve. In the end I just gave up.I personally can't accept shipping unreviewed code. It feels wrong. The product has to work, but the code must also be high-quality.

请问HN：你们有没有证据表明自主编码有效？