HackerNews中文版

我正在开发一种名为稀疏门控共振（Sparse Gated Resonance，SGR）的架构。这是一种序列建模方法，旨在避免传统自注意力机制的平方扩展。我已经在维克多·雨果的《巴黎圣母院》（英文版）上对比了一个722k参数的SGR与一个921k参数的Transformer模型。 SGR用“因果脉冲”替代了注意力机制。它使用门控一维卷积生成一个导航向量，与字符嵌入的脑图进行共振。这使得模型能够保持“活细胞”状态，并以线性复杂度进行更新。完整源代码和实现： [https://github.com/MrPan2048/GeometricTransformer](https://github.com/MrPan2048/GeometricTransformer) 基准测试数据（《巴黎圣母院》）： | 步骤 | 架构 | 损失 | 困惑度（PPL） | 熵 | 时间 | |------|------|------|---------------|----|------| | 3900 | SGR | 1.4481 | 4.26 | 1.5476 | 19.0ms | | | STD | 2.0275 | 7.59 | 2.1476 | 40.3ms | 语义比较（生成自“卡西莫多”）： SGR: “卡西莫多。然后思考着那种……” STD: “卡西莫多 ng, o uer tre the todo hemo’He wand at tine.” 技术观察：计算效率：SGR保持了显著的延迟优势，运行时间稳定在约19ms，而Transformer则约为40ms。这证实了线性脉冲相较于平方注意力的效率。收敛质量：到第3700步，SGR达到了4.46的困惑度（PPL），而Transformer则滞后于8.36。SGR成功生成了可识别的英语短语和标点，而Transformer仍然表现出“口吃”伪影（例如，“卡西莫多多多”）。熵稳定性：SGR的熵稳定在约1.54，这代表了英语文本的最佳“掌握区”。而Transformer的较高熵（约2.14）与其缺乏结构连贯性相关。我希望能获得支持，以便在arXiv（CS.ML）上发表关于此架构的正式论文。我相信这些结果表明，“活细胞”共振模型在参数受限和延迟敏感的环境中可以超越注意力机制。如果您是一位愿意支持或审阅数学形式化的研究人员，请通过GitHub与我联系。

查看原文

I am developing an architecture called Sparse Gated Resonance (SGR). It is a sequence modeling approach designed to avoid the quadratic scaling of traditional Self-Attention. I have been benchmarking a 722k-parameter SGR against a 921k-parameter Transformer on Victor Hugo’s "Notre-Dame de Paris" (English).The SGR replaces the attention mechanism with a "Causal Pulse." It uses gated 1D convolutions to generate a navigation vector that resonates against a brain-map of character embeddings. This allows the model to maintain a "Living Cell" state that updates with linear complexity.Full source and implementation: <a href="https://github.com/MrPan2048/GeometricTransformer/" rel="nofollow">https://github.com/MrPan2048/GeometricTransformer/</a>Benchmarking Data (Notre-Dame de Paris):STEP 3900 ARCH | LOSS | PPL | ENT | TIME SGR | 1.4481 | 4.26 | 1.5476 | 19.0ms STD | 2.0275 | 7.59 | 2.1476 | 40.3msSemantic Comparison (Generation from "Quasimodo"):SGR: "Quasimodo. Then minds that the accasteady which which the" STD: "Quasimododo ng, o uer tre the todo hemo’He wand at tine."Technical Observations:Computational Efficiency: SGR maintains a significant latency advantage, consistently running at ~19ms compared to the Transformer's ~40ms. This confirms the efficiency of the linear pulse over quadratic attention.Convergence Quality: By Step 3700, SGR reached a Perplexity (PPL) of 4.46, whereas the Transformer lagged at 8.36. SGR successfully produces recognizable English phrases and punctuation, while the Transformer still exhibits "stuttering" artifacts (e.g., "Quasimodododod").Entropy Stability: SGR has stabilized at an entropy of ~1.54, which represents the optimal "Mastery Zone" for English text. The Transformer’s higher entropy (~2.14) correlates with its lack of structural coherence.I am seeking an endorsement to publish a formal paper on this architecture to arXiv (CS.ML). I believe these results demonstrate that "Living Cell" resonance models can outperform Attention in parameter-constrained and latency-sensitive environments. If you are a researcher willing to endorse or review the mathematical formalization, please contact me via GitHub.

展示HN：SGR – 一种线性复杂度的“活细胞”，超越变换器