HackerNews中文版

我尝试了许多不同的模型，毫无疑问，它们生成的代码在“质量”上差异很大。这其中有些是主观的，但也有一些客观的标准来衡量“好”代码。我希望这能成为AI基准测试的一个指标，这样我就可以根据这个来选择模型，因为老实说，这是我最关心的事情之一。问题是：如何衡量这些东西，什么是指标？ ……也许根本没有办法做到这一点，因为这个指标并不在图表中。

查看原文

I've tried many different models and without doubt the code coming out of them differs a lot when it comes to "quality". Some of that is subjective for sure, but there are objective sides to "good" code.<p>I wish this was a metric for the AI benchmarks so I could choose a model based on this, because honestly it's one of the things I care most about.<p>Problem: How can you measure such things, whats the metrcis?<p>...maybe there just isn't a way to do it, since that metric isn't in the charts..

请问HN：有没有衡量人工智能代码质量的指标？