HackerNews中文版

我不断推动前沿模型的极限，并且有几个项目是它们仍然无法解决的，我在对新模型进行基准测试。每一个新模型都使得“解决”更难的问题变得更容易，但我仍然有一种感觉，它们99%依赖于我的想法。它们就是无法理解这些想法，我不得不手把手地引导它们并提供帮助。不要误解我的意思，任何接近完成的任务它们都表现出色，并且能够结合现有的技术。我说的是模型从未见过的新想法。举个例子，我有一个爱好项目，推动了路线优化的可能性。是的，它接近于当前的最先进水平，并且比其他所有解决方案（punnerud.github.io/mpee）要高效得多，但我必须手把手地引导模型，集思广益，想出如何压缩一个矩阵。而且这并不是一次性的事情，几天内发生了大约40到50次。那1%就是“新想法”的部分。为什么我能想出这些，而模型却不能？这是一个非常困难的挑战。现在这个项目是开放的，之后我考虑以同样的方式做一个前沿项目，保持其私密性并将其作为基准测试。这是测试模型中新想法的最佳方式吗？

查看原文

I keep pushing the frontier models to the limits and have several projects they still can’t solve, I benchmark new models on. Every new model make it easier to “solve” the even harder problems, but still I have this feeling that they rely 99% on my ideas. They just don’t get the ideas and I have to hold their hand and help them.<p>Don’t get me wrong, anything that is close to done already they excel at and can combine existing techniques. I’m talking about new ideas models have never seen before.<p>Example I have this hobby project that push what’s possible with route optimization. Yes it’s close to SOTA and way more efficient than all (?) other solutions out there (punnerud.github.io/mpee/), but I have to hold the model in the hand and brainstorm ideas on how to compress a matrix.<p>And it’s just not a one time thing, happens like 40-50 times in few days.<p>The 1% there is this “new ideas” part. Why can I come up with all these, and not the model? A really hard reval to create. Now this project is open, later I am thinking about making a frontier project in the same way, keeping it away from the public and using it as a benchmark. It’s that the best way to test for new ideas in models?

只是我觉得《神话/寓言》似乎差了1%吗？