请问HN:你们有没有证据表明自主编码有效?
我一直在尝试让自主编码(agentic coding)发挥作用,但我在网上看到的与我能够实现的之间的差异让我感到困惑。
有没有超越炒作的真实证据表明自主编码能够产生净正面效果?如果你们中有人真的成功实现了这一点,能否详细分享一下你们是如何做到的?
我所说的“成功实现”是指:
* 创造的价值超过技术债务,且
* 生成的代码结构足够稳健,以至于负责架构的人可以签字认可。
最近,我看到一种趋势,推崇最小化或根本不进行代码审查,声称我们应该从“验证架构”转向“验证行为”。在实践中,这似乎意味着:不看代码;如果测试和持续集成(CI)通过,就可以发布。我无法想象这种做法在长期内如何维持下去。我的预期是,最终会得到“意大利面条”代码(spaghetti code),虽然在理想情况下能运行,但随着时间的推移会积累微妙且难以调试的错误。
当我尝试在现有代码库上使用Codex时,无论是否设置了保护措施,我花了一半的时间在修复它所犯的微小错误或引入的重复代码上。
上个周末,我尝试从零开始构建一个用于宠物喂养提醒的iOS应用。我指示Codex首先研究并提出一个SwiftUI的架构蓝图。然后,我与它合作编写了一份描述应该实现什么以及如何实现的规范。
第一次实现的结果出乎意料地好,尽管有一些错误。然而,事情很快就开始恶化。我花了周末的剩余时间让Codex使功能正常,修复错误而不引入新的错误,并研究最佳实践,而不是凭空捏造。尽管我让它记录我发现的新指南和保护措施,但情况并没有改善。最后,我只能放弃。
我个人无法接受发布未经审查的代码。这让我觉得不对劲。产品必须正常工作,但代码也必须是高质量的。
查看原文
I've been trying to get agentic coding to work, but the dissonance between what I'm seeing online and what I'm able to achieve is doing my head in.<p>Is there real evidence, beyond hype, that agentic coding produces net-positive results? If any of you have actually got it to work, could you share (in detail) how you did it?<p>By "getting it to work" I mean:
* creating more value than technical debt, and
* producing code that’s structurally sound enough for someone responsible for the architecture to sign off on.<p>Lately I’ve seen a push toward minimal or nonexistent code review, with the claim that we should move from “validating architecture” to “validating behavior.” In practice, this seems to mean: don’t look at the code; if tests and CI pass, ship it. I can’t see how this holds up long-term. My expectation is that you end up with "spaghetti" code that works on the happy path but accumulates subtle, hard-to-debug failures over time.<p>When I tried using Codex on my existing codebases, with or without guardrails, half of my time went into fixing the subtle mistakes it made or the duplication it introduced.<p>Last weekend I tried building an iOS app for pet feeding reminders from scratch. I instructed Codex to research and propose an architectural blueprint for SwiftUI first. Then, I worked with it to write a spec describing what should be implemented and how.<p>The first implementation pass was surprisingly good, although it had a number of bugs. Things went downhill fast, however. I spent the rest of my weekend getting Codex to make things work, fix bugs without introducing new ones, and research best practices instead of making stuff up. Although I made it record new guidelines and guardrails as I found them, things didn't improve. In the end I just gave up.<p>I personally can't accept shipping unreviewed code. It feels wrong. The product has to work, but the code must also be high-quality.