只是我觉得《神话/寓言》似乎差了1%吗?

1作者: punnerud14 天前原帖
我不断推动前沿模型的极限,并且有几个项目是它们仍然无法解决的,我在对新模型进行基准测试。每一个新模型都使得“解决”更难的问题变得更容易,但我仍然有一种感觉,它们99%依赖于我的想法。它们就是无法理解这些想法,我不得不手把手地引导它们并提供帮助。 不要误解我的意思,任何接近完成的任务它们都表现出色,并且能够结合现有的技术。我说的是模型从未见过的新想法。 举个例子,我有一个爱好项目,推动了路线优化的可能性。是的,它接近于当前的最先进水平,并且比其他所有解决方案(punnerud.github.io/mpee)要高效得多,但我必须手把手地引导模型,集思广益,想出如何压缩一个矩阵。 而且这并不是一次性的事情,几天内发生了大约40到50次。 那1%就是“新想法”的部分。为什么我能想出这些,而模型却不能?这是一个非常困难的挑战。现在这个项目是开放的,之后我考虑以同样的方式做一个前沿项目,保持其私密性并将其作为基准测试。这是测试模型中新想法的最佳方式吗?
查看原文
I keep pushing the frontier models to the limits and have several projects they still can’t solve, I benchmark new models on. Every new model make it easier to “solve” the even harder problems, but still I have this feeling that they rely 99% on my ideas. They just don’t get the ideas and I have to hold their hand and help them.<p>Don’t get me wrong, anything that is close to done already they excel at and can combine existing techniques. I’m talking about new ideas models have never seen before.<p>Example I have this hobby project that push what’s possible with route optimization. Yes it’s close to SOTA and way more efficient than all (?) other solutions out there (punnerud.github.io&#x2F;mpee&#x2F;), but I have to hold the model in the hand and brainstorm ideas on how to compress a matrix.<p>And it’s just not a one time thing, happens like 40-50 times in few days.<p>The 1% there is this “new ideas” part. Why can I come up with all these, and not the model? A really hard reval to create. Now this project is open, later I am thinking about making a frontier project in the same way, keeping it away from the public and using it as a benchmark. It’s that the best way to test for new ideas in models?