HackerNews中文版

简而言之：我训练了一个分类器，用于将请求路由到最便宜的模型和推理深度。结合额外的自动化令牌效率技术，这使得在相同支出下使用量提高了三倍。对于任何想自己尝试的人：<a href="https://nerfguard.com" rel="nofollow">https://nerfguard.com</a> 最近，我和一些团队成员从Claude Code切换到了Codex。我们仍在这两种工具之间切换，但Codex的速度和可控性以及性能提升让人难以忽视。一个缺点是，按令牌计费的费用很快就开始生效。这种情况普遍存在，但我们在Codex中感受得尤为明显。我们是一家初创公司，团队成员全天候工作，热衷于构建产品——因此，我们的每日账单显得格外引人注目。幸运的是，我们正在追求一个重要的使命，速度比边际令牌支出更为重要。不过，这让我们思考，尽管我们的产品在减少令牌支出和显著加快代理工作流程方面有着意想不到的效果，但我们却在所有类型的内部编码任务中使用这些顶级模型，而没有进行任何优化。这种浪费显得相当荒谬——最明显的罪魁祸首是，我们似乎在每个任务上都使用了最高智能模型和最大推理深度，即使这些任务显然并不需要如此。作为一家花费大量时间在缓存智能上的公司，我们也很容易看到还有许多其他的低悬果实可以摘取。因此，在最近的一个周末，我迅速构建了一个工具来优化我们的使用。它的核心是一个非常快速的分类器，可以将请求分类到完成任务所需的最低智能水平，并在此基础上进行一些不错的令牌优化。结果是，令牌支出大幅降低，但质量大致相同。对我们来说，更令人兴奋的是，合理打包的智能和推理水平使我们的速度也显著提升。这并非微不足道。我们观察到，每个人每天节省了高达三倍的时间，这些时间本来是用来等待工具的响应和编码代理的反馈。对我们来说，这意味着工程效率的提升，以及在相同支出下显著更高的使用量。这也意味着在被限制之前可以使用更多的资源。当我告诉朋友们这件事时，他们也想开始使用它，以最大化他们从编码代理计划中获得的使用量。现在，许多最前沿的人工智能公司的工程师都在使用这个工具来优化他们的令牌利用率。不仅是为了节省资金，更是为了最大化产出。事实证明，避免被Claude限制的最佳方法是有选择性地主动限制自己。我们决定将其发布给其他开发者社区使用。现在，您可以为自己启用Nerfguard，今天就开始获得更多的使用量。

查看原文

Tl;dr: I trained a classifier to route to the least expensive model and reasoning depth to complete the request. Coupling that with additional automated token efficiency techniques has yielded 3x usage for the same spend. For anyone interested in trying it themselves: <a href="https://nerfguard.com" rel="nofollow">https://nerfguard.com</a>Various teammates and I switched over to Codex from Claude Code recently. We still bounce between the tools, but Codex’s speed and steerability coupled with performance gains were hard to ignore. One of the downsides was that the per token pricing kicked in way sooner. This is happening across the board, but we felt it in Codex more acutely. We’re a startup filled with people who work around the clock and are obsessed with building — naturally our daily bill alone was striking.Luckily we’re going after a big mission and speed matters significantly more than marginal token spend on the edges. Still, it got us thinking about how it was ludicrous that while our product has a side effect of decreasing token spend and speeding up agentic workflows by many orders of magnitude, we were using these top tier models for all types of internal coding tasks without any of those optimizations. The waste felt pretty ridiculous — the most glaring culprit was that we were seemingly using the max intelligence model on max reasoning for every task even when the task clearly didn’t require it. As a company who spends a lot of time on cached intelligence, it was also easy for us to see how there was plenty of other low hanging fruit as well.So, on a recent weekend, I quickly built a tool to optimize our usage. At its core is a very fast classifier that classifies your requests to the least intelligence required for the task and includes some nice token optimizations on top. The result is roughly the same quality for multiples lower token spend. But even more exciting for us, is that the properly bin packed intelligence and reasoning levels meant our speed also went up considerably. This wasn’t negligible.We’ve observed up to 3x savings and hours per day per person in saved time that we would have otherwise been waiting on tool turns and coding agent responses.For us, that means improved engineering velocity and significantly higher usage for the same spend. It also means more usage before getting throttled.As I told friends about this, they also wanted to start using it to maximize the usage they could get out of their coding agent plans. There are now engineers across many of the most cutting edge AI companies using this tool to optimize their token utilization in this way. Not just to save money, but to maximize output. Turns out that the best way to avoid getting nerfed by Claude is to intentionally nerf yourself selectively. We decided to release it for the rest of the builder community to use as well. You can now turn on Nerfguard for yourself and start getting more usage today.

展示HN：我故意削弱了我们的编码代理