HackerNews中文版

花更长时间回答 2 + 2 更令人印象深刻吗？并不是。一个人花的时间越长，我们对他的智力评价就越低。然而，对于人工智能代理来说，花更长时间却得到了“保持对长期目标的关注”的赞誉。在 COVID 之后，我们的集体智商是否下降到了常温水平？对于一个在上下文窗口有限的工具来说，时间维度为什么会重要？无论你是在 1 秒内还是 60 分钟内填满窗口，这都没有区别。而且，这种方式非常容易被操控。插入随机延迟，减少每秒的标记数量，你就得到了一个能够在“长期目标”上保持关注的模型。也许更重要的是，为什么这个领域的人们会如此轻易地接受这些容易被操控的非指标？他们怎么没有培养出一种本能，能在听到代码行数、消耗的标记数量或处理任务所需时间等指标时，立刻识别出这些是废话？他们是如何对自己的代码进行基准测试的？运行时间越长越好？消耗的 CPU 周期数？

查看原文

Is it more impressive to take longer to answer 2 + 2? It’s not. The longer one takes, the less intelligent we would rate that person.Somehow for AI agents taking longer is getting praise with the framing “maintaining attention for long-time horizons?”Have we collectively gone down to room temperature IQs with COVID?Why would the time dimension matter for a tool that is limited in context window? Doesn’t matter if you fill up the window in 1 second or 60 minutes. Also, it’s super easy to game. Insert random lags, reduce tokens/sec, there you have a model that maintains attention over “long-time horizons”Maybe more importantly how do people in this field buy into these easily game-able non-indicators so easily? How did they not develop the instinct to instantly call out metrics like lines of code, number of tokens burned or time taken to process a task as BS the instant they hear it?How do they benchmark their code? The longer running the better? Number of CPU cycles spent?

问HN：我们为什么要关注“延长时间视野”和大型语言模型（LLMs）？