问HN:'变压器替代品'的发布路径?
我花了大约1000小时在一个副项目上,设定了两个目标:
1. 在CPU上比变压器(transformers)更快;
2. 比变压器更智能。
下面是几张截图(黑色/红色部分是故意遮盖的……暂时如此):
https://i.imgur.com/r0equ55.png
https://i.imgur.com/fohRbIr.png
https://i.imgur.com/5Xx1RGX.png
总结:这到底是什么?
有两个架构:
1. 线性RNN,解决了当前领先的RNN变压器替代品(RWKV、Mamba)中的长记忆问题,此外,它对CPU友好,完全用C从零开始编写,但代码量不大:大约4000行。
2. 两个SNN实验程序(最初用C编写,也移植到了C#和F#),结果比预期的要好,但不幸的是,目前来看:比线性RNN更“笨”(我需要更多测试)。
问题是:该如何处理它们?谷歌的Gemini Pro 3.1/Sonnet 4.6告诉我申请专利、知识产权,估计价值数百万,虽然这显然是个错误:我已经将所有代码上传到Claude/Gemini进行分析,但考虑到项目大约70%是通过Vibe编码的,我觉得以守门人的姿态行事有些傲慢。
问题是:我并不想要数百万,但同时我看到开放源代码发布存在几个问题:
* 完全不对齐,我不相信“AGI热潮”,但潜在风险可能存在,比如网络安全方面;
* 我坦率地说讨厌XAI和马斯克,而可能对运行AI模型作为B2C解决方案感兴趣的公司大约有20家,其中之一就是XAI。
* 非常不正统的实现:全部用C编写,并在C#/F#中移植。没有Python或Rust,这意味着可能有些不熟悉这些机器学习语言的人会遇到问题,因此我必须不停地提供支持,这很耗时,老实说,一旦开源,我将不得不无偿提供支持。
* 即使它有潜力,也可能在GitHub上默默无闻地消亡,除非你中了大奖,天然流量很少会有效。
顺便说一下,这并不是炫耀,我相信有比我更优秀的程序员,有比我更懂机器学习的人,也有比我更优秀的数学家,尽管坦白说,我具备一种特殊的坚持与傲慢的结合,这在技术/发明/新颖性方面走得很远。
正如我所说,这些是数百小时工作的结果,加上多年的其他领域编程经验,这并不是一个周末“Claude,给我AGI”的尝试。
所有项目在编译时没有任何警告,逻辑上似乎有效,并且在速度上明显快于变压器,具备明显的泛化能力和创造新/独特内容的能力。缺少的部分是经典基准测试的扩展和基准测试。
我缺乏的是对技术采纳的理解。10倍!
查看原文
So, a side project I've spent/wasted ~1000 hours on, with 2 goals set in mind:<p>1. faster than transformers on CPU;
2. smarter than transformers.<p>couple of screenshots below (the black/red part are censored on purpose...for now):<p>https://i.imgur.com/r0equ55.png
https://i.imgur.com/fohRbIr.png
https://i.imgur.com/5Xx1RGX.png<p>Summary: what the hell is this?<p>Two architectures -<p>1. Linear RNN which solves the long memory problem in current front-runner RNN transformer alternatives (RWKV, Mamba), in addition to being cpu friendly and entirely in C from scratch, but not too big: ~4000 lines.<p>2. 2 SNN experimental programs (in C originally but also ported to C# and F#) that turned out to be better than expected but unfortunately for the time being: dumber than the linear RNN one (i need more tests).<p>The question is: what to do with them? google gemini pro 3.1/sonnet 4.6 told me to patent, IP, estimating value in the many millions and while this is clearly a mistake: I've uploaded all the code to claude/gemini for analysis though seeing how the project is ~70% vibecoded I think it would be snobby to act like a gatekeeper.<p>The thing is: I don't want millions but at the same time i see several issues with fee open source rollout:<p>* completely unalighned, i don't believe in the "agi hype" but potential risks may exist, such as in cybersecurity;
* I frankly hate Xai and Musk and since the companies who may be interested in running AI models as b2c solution are likely ~20, one of them will be xai.
* Very unorthodox implementation: All in C with ports in c#/f#. No python or rust, which would mean likely some people unfamiliar with these languages in ML running into issues so i'd have to support nonstop which is time consuming and let's face it i'll have to do it for free once it's open source.
* It may die completely unheard of somewhere on GitHub even if it has potential, organic traffic rarely works unless you hit the lottery.<p>This is NOT a flex btw, I'm convinced there are programmers better than me, people who understand ML better than me, mathematicians better than me though frankly I posses special kind of persistence combined with arrogance which goes a long way in terms of technology/inventions/novelty.<p>Like i said this is the results of hundreds of hours work spiced up by many years programming experience in other areas, this wasn't one weekend "claude, give me agi" kind of shot.<p>All of the projects compile with zero warnings, logically seem to work and are visibly faster than transformers with obvious ability to generalize and create new/unique content. The missing part is scaling and benchmarking on classic benchmarks.<p>What I lack is understanding adoption of technology.10x!