展示HN:SGR – 一种线性复杂度的“活细胞”,超越变换器
我正在开发一种名为稀疏门控共振(Sparse Gated Resonance,SGR)的架构。这是一种序列建模方法,旨在避免传统自注意力机制的平方扩展。我已经在维克多·雨果的《巴黎圣母院》(英文版)上对比了一个722k参数的SGR与一个921k参数的Transformer模型。
SGR用“因果脉冲”替代了注意力机制。它使用门控一维卷积生成一个导航向量,与字符嵌入的脑图进行共振。这使得模型能够保持“活细胞”状态,并以线性复杂度进行更新。
完整源代码和实现: [https://github.com/MrPan2048/GeometricTransformer](https://github.com/MrPan2048/GeometricTransformer)
基准测试数据(《巴黎圣母院》):
| 步骤 | 架构 | 损失 | 困惑度(PPL) | 熵 | 时间 |
|------|------|------|---------------|----|------|
| 3900 | SGR | 1.4481 | 4.26 | 1.5476 | 19.0ms |
| | STD | 2.0275 | 7.59 | 2.1476 | 40.3ms |
语义比较(生成自“卡西莫多”):
SGR: “卡西莫多。然后思考着那种……”
STD: “卡西莫多 ng, o uer tre the todo hemo’He wand at tine.”
技术观察:
计算效率:SGR保持了显著的延迟优势,运行时间稳定在约19ms,而Transformer则约为40ms。这证实了线性脉冲相较于平方注意力的效率。
收敛质量:到第3700步,SGR达到了4.46的困惑度(PPL),而Transformer则滞后于8.36。SGR成功生成了可识别的英语短语和标点,而Transformer仍然表现出“口吃”伪影(例如,“卡西莫多多多”)。
熵稳定性:SGR的熵稳定在约1.54,这代表了英语文本的最佳“掌握区”。而Transformer的较高熵(约2.14)与其缺乏结构连贯性相关。
我希望能获得支持,以便在arXiv(CS.ML)上发表关于此架构的正式论文。我相信这些结果表明,“活细胞”共振模型在参数受限和延迟敏感的环境中可以超越注意力机制。如果您是一位愿意支持或审阅数学形式化的研究人员,请通过GitHub与我联系。
查看原文
I am developing an architecture called Sparse Gated Resonance (SGR). It is a sequence modeling approach designed to avoid the quadratic scaling of traditional Self-Attention. I have been benchmarking a 722k-parameter SGR against a 921k-parameter Transformer on Victor Hugo’s "Notre-Dame de Paris" (English).<p>The SGR replaces the attention mechanism with a "Causal Pulse." It uses gated 1D convolutions to generate a navigation vector that resonates against a brain-map of character embeddings. This allows the model to maintain a "Living Cell" state that updates with linear complexity.<p>Full source and implementation: <a href="https://github.com/MrPan2048/GeometricTransformer/" rel="nofollow">https://github.com/MrPan2048/GeometricTransformer/</a><p>Benchmarking Data (Notre-Dame de Paris):<p>STEP 3900 ARCH | LOSS | PPL | ENT | TIME
SGR | 1.4481 | 4.26 | 1.5476 | 19.0ms
STD | 2.0275 | 7.59 | 2.1476 | 40.3ms<p>Semantic Comparison (Generation from "Quasimodo"):<p>SGR: "Quasimodo. Then minds that the accasteady which which the"
STD: "Quasimododo ng, o uer tre the todo hemo’He wand at tine."<p>Technical Observations:<p>Computational Efficiency: SGR maintains a significant latency advantage, consistently running at ~19ms compared to the Transformer's ~40ms. This confirms the efficiency of the linear pulse over quadratic attention.<p>Convergence Quality: By Step 3700, SGR reached a Perplexity (PPL) of 4.46, whereas the Transformer lagged at 8.36. SGR successfully produces recognizable English phrases and punctuation, while the Transformer still exhibits "stuttering" artifacts (e.g., "Quasimodododod").<p>Entropy Stability: SGR has stabilized at an entropy of ~1.54, which represents the optimal "Mastery Zone" for English text. The Transformer’s higher entropy (~2.14) correlates with its lack of structural coherence.<p>I am seeking an endorsement to publish a formal paper on this architecture to arXiv (CS.ML). I believe these results demonstrate that "Living Cell" resonance models can outperform Attention in parameter-constrained and latency-sensitive environments. If you are a researcher willing to endorse or review the mathematical formalization, please contact me via GitHub.