展示HN:将人工智能与熵对齐,而不是“人类价值观”(论文)
嘿,HN,
我写了这篇短文,因为我对目前的对齐方法(RLHF)感到厌倦。优化“人类偏好”只会导致模型产生看似合理的幻觉,以取悦用户(随机鹦鹉),而不是扎根于现实。
我提出了一个不同的框架,称为LOGOS-ZERO。这个想法是放弃道德护栏(这些是主观的/流动的),将损失函数锚定在物理/逻辑不变性上。
基本上:
热力学损失:将高熵/幻觉视为“废物”。如果某个行为增加了系统的无序性,就会受到惩罚。
行动门控:与当前模型必须生成标记不同,这种架构首先在潜在空间中进行模拟。如果输出是高熵或逻辑不一致,它将返回一个零向量(沉默/否)。
它试图通过让人工智能遵循最小行动/熵的路径来解决基础问题,而不仅仅是模仿人类的语言模式。
链接到zenodo上的pdf文档:[https://zenodo.org/records/17976755](https://zenodo.org/records/17976755)
期待听到你们对物理映射的看法,如果想的话可以批评一下。
查看原文
Hey HN,<p>wrote this short paper cause i'm honestly tired of current alignment methods (RLHF). optimizing for "human preference" just creates models that hallucinate plausibly to please the user (stochastic parrots ), instead of being grounded in reality.<p>i'm proposing a different framework called LOGOS-ZERO. idea is to ditch moral guardrails (which are subjective/fluid) and anchor the loss function to physical/logical invariants.<p>basically:<p>Thermodynamic Loss : treat high entropy/hallucination as "Waste". if an action increases systemic disorder, it gets penalized.<p>Action Gating: Unlike current models that must generate tokens, this architecture simulates in latent space first. if the output is high-entropy or logically inconsistent, it returns a Null Vector (Silence/No).<p>it attempts to solve the grounding problem by making the AI follow the path of least action/ entropy rather than just mimicking human speech patterns.<p>link to the pdf on zenodo: <a href="https://zenodo.org/records/17976755" rel="nofollow">https://zenodo.org/records/17976755</a><p>curious to hear thoughts on the physics mapping, roast it if u want.