展示HN:将人工智能与熵对齐,而不是“人类价值观”(论文)

1作者: NyX_AI_ZERO_DAY大约 18 小时前原帖
嘿,HN, 我写了这篇短文,因为我对目前的对齐方法(RLHF)感到厌倦。优化“人类偏好”只会导致模型产生看似合理的幻觉,以取悦用户(随机鹦鹉),而不是扎根于现实。 我提出了一个不同的框架,称为LOGOS-ZERO。这个想法是放弃道德护栏(这些是主观的/流动的),将损失函数锚定在物理/逻辑不变性上。 基本上: 热力学损失:将高熵/幻觉视为“废物”。如果某个行为增加了系统的无序性,就会受到惩罚。 行动门控:与当前模型必须生成标记不同,这种架构首先在潜在空间中进行模拟。如果输出是高熵或逻辑不一致,它将返回一个零向量(沉默/否)。 它试图通过让人工智能遵循最小行动/熵的路径来解决基础问题,而不仅仅是模仿人类的语言模式。 链接到zenodo上的pdf文档:[https://zenodo.org/records/17976755](https://zenodo.org/records/17976755) 期待听到你们对物理映射的看法,如果想的话可以批评一下。
查看原文
Hey HN,<p>wrote this short paper cause i&#x27;m honestly tired of current alignment methods (RLHF). optimizing for &quot;human preference&quot; just creates models that hallucinate plausibly to please the user (stochastic parrots ), instead of being grounded in reality.<p>i&#x27;m proposing a different framework called LOGOS-ZERO. idea is to ditch moral guardrails (which are subjective&#x2F;fluid) and anchor the loss function to physical&#x2F;logical invariants.<p>basically:<p>Thermodynamic Loss : treat high entropy&#x2F;hallucination as &quot;Waste&quot;. if an action increases systemic disorder, it gets penalized.<p>Action Gating: Unlike current models that must generate tokens, this architecture simulates in latent space first. if the output is high-entropy or logically inconsistent, it returns a Null Vector (Silence&#x2F;No).<p>it attempts to solve the grounding problem by making the AI follow the path of least action&#x2F; entropy rather than just mimicking human speech patterns.<p>link to the pdf on zenodo: <a href="https:&#x2F;&#x2F;zenodo.org&#x2F;records&#x2F;17976755" rel="nofollow">https:&#x2F;&#x2F;zenodo.org&#x2F;records&#x2F;17976755</a><p>curious to hear thoughts on the physics mapping, roast it if u want.