问HN:我们为什么要关注“延长时间视野”和大型语言模型(LLMs)?
花更长时间回答 2 + 2 更令人印象深刻吗?并不是。一个人花的时间越长,我们对他的智力评价就越低。
然而,对于人工智能代理来说,花更长时间却得到了“保持对长期目标的关注”的赞誉。
在 COVID 之后,我们的集体智商是否下降到了常温水平?
对于一个在上下文窗口有限的工具来说,时间维度为什么会重要?无论你是在 1 秒内还是 60 分钟内填满窗口,这都没有区别。而且,这种方式非常容易被操控。插入随机延迟,减少每秒的标记数量,你就得到了一个能够在“长期目标”上保持关注的模型。
也许更重要的是,为什么这个领域的人们会如此轻易地接受这些容易被操控的非指标?他们怎么没有培养出一种本能,能在听到代码行数、消耗的标记数量或处理任务所需时间等指标时,立刻识别出这些是废话?
他们是如何对自己的代码进行基准测试的?运行时间越长越好?消耗的 CPU 周期数?
查看原文
Is it more impressive to take longer to answer 2 + 2? It’s not. The longer one takes, the less intelligent we would rate that person.<p>Somehow for AI agents taking longer is getting praise with the framing “maintaining attention for long-time horizons?”<p>Have we collectively gone down to room temperature IQs with COVID?<p>Why would the time dimension matter for a tool that is limited in context window? Doesn’t matter if you fill up the window in 1 second or 60 minutes. Also, it’s super easy to game. Insert random lags, reduce tokens/sec, there you have a model that maintains attention over “long-time horizons”<p>Maybe more importantly how do people in this field buy into these easily game-able non-indicators so easily? How did they not develop the instinct to instantly call out metrics like lines of code, number of tokens burned or time taken to process a task as BS the instant they hear it?<p>How do they benchmark their code? The longer running the better? Number of CPU cycles spent?