请问HN:你们如何衡量“人工智能的松散性”?
最近,我的雇主一直在大力推动在工程团队中采用大型语言模型(LLM),期望能够提高生产力。工程团队也随之而动,因此我收到了很多明显是由人工智能生成的拉取请求(PR)。这些请求的差异有时达到100行,而实际上可能只需要10行,遗漏了错误案例,违反了约定。这不仅仅是初级工程师的情况,现在常常也出现在其他高级工程师身上。
在我们的激励结构下,似乎没有好的方法来防止质量的下降。我很难量化“粗糙代码”为什么是坏事,但我的直觉是:
1. 代码库对人类工程师变得难以阅读。
2. 代码库中更多的糟糕示例会为未来的LLM变更创造一个负反馈循环。也许这已经成为新常态,但——
3. 一旦有足够的糟糕代码进入,未来的事件/严重故障(SEVs)将变得越来越难以解决。
(3)似乎是唯一一个对业务有实际影响的原因。即使真的发生了,我也不知道是否能将响应缓慢/收入损失与AI生成的糟糕代码联系起来。
我看到其他帖子对“随意编码”的弊端表示遗憾,但在LLM时代,有没有具体的方法来证明代码质量的重要性?我认为,跟踪一些代码质量指标,比如圈复杂度,可能会有助于观察与回归之间的相关性,但这似乎有些薄弱(而且是事后诸葛)。
查看原文
Recently, my employer has been pushing hard for LLM adoption across eng, with an expectation of increased productivity. Eng has followed suit, and as a result I've been getting a lot more PRs that are clearly AI generated. 100 line diffs that could have been 10, missed error cases, breaking convention. It's not just from junior engineers, but often from other senior engineers now.<p>With our incentive structures, it doesn't seem like there's a great way to prevent this decline in quality. It's been hard for me to quantify _why_ "slop" is bad, but my gut feelings are that:<p><pre><code> 1. The codebase becomes unreadable to human engineers.
2. Having more bad examples in the codebase creates a negative feedback loop for future LLM changes. And maybe this is the new norm, but ->
3. Once enough slop gets in, future incidents/SEVs become increasingly more difficult to resolve.
</code></pre>
(3) feels like the only reason that has tangible business impact. Even if it did occur, I don't know if it would be possible to tie the slow response/loss in revenue to AI slop.<p>I’ve seen other posts lamenting the ills of vibe coding, but is there a concrete way to justify code quality in the era of LLMs? My thoughts are that it might be useful to track some code quality metric like cyclomatic complexity, and see if there’s any correlation with regressions over time, but that feels kind of thin (and retroactive).