展示HN:Autofix Bot – 混合静态分析与AI代码审查代理
大家好,HN!我们是来自DeepSource(YC W20)的Jai和Sanket,今天我们推出了Autofix Bot,这是一款结合静态分析和人工智能的混合代理,专为与AI编码代理协同使用而设计。
AI编码代理使得代码生成几乎变得免费,但这也将瓶颈转移到了代码审查上。仅依靠固定检查器的静态分析已经不够。仅依赖大型语言模型(LLM)的审查存在几个局限性:运行之间的非确定性、对安全问题的低召回率、在大规模应用时的高成本,以及容易“分心”的倾向。
在过去的六年里,我们一直在构建一个确定性的、仅基于静态分析的代码审查产品。今年早些时候,我们从头开始思考这个问题,并意识到静态分析能够解决LLM审查中的关键盲点。在过去的六个月中,我们构建了一个新的“混合”代理循环,结合静态分析和前沿的AI代理,以超越仅依赖静态分析和仅依赖LLM的工具,在发现和修复代码质量及安全问题方面表现更佳。今天,我们将其公开发布。
以下是混合架构的工作原理:
```
- 静态检查:5000多个确定性检查器(代码质量、安全性、性能)建立了一个高精度的基线。子代理抑制上下文特定的误报。
- AI审查:代理使用静态发现作为锚点来审查代码。它可以访问抽象语法树(AST)、数据流图、控制流图和导入图等工具,而不仅仅是grep和常规的shell命令。
- 修复:子代理生成修复建议。静态工具验证所有编辑,然后生成干净的git补丁。
```
静态分析解决了LLM的一些关键问题:运行之间的非确定性、对安全问题的低召回率(LLM容易被风格分散注意力)以及成本(静态分析缩小了提示大小和工具调用)。
在OpenSSF CVE基准测试中(200多个真实的JS/TS漏洞),我们的准确率达到了81.2%,F1值为80.0%;相比之下,Cursor Bugbot的准确率为74.5%(F1值77.42%),Claude Code的准确率为71.5%(F1值62.99%),CodeRabbit的准确率为59.4%(F1值36.19%),Semgrep CE的准确率为56.9%(F1值38.26%)。在秘密检测方面,我们的F1值为92.8%;相比之下,Gitleaks为75.6%,detect-secrets为64.1%,TruffleHog为41.2%。我们在这方面使用了我们的开源分类模型。
完整的方法论以及我们如何评估每个工具的详细信息,请访问: [https://autofix.bot/benchmarks](https://autofix.bot/benchmarks)
您可以通过我们的终端用户界面(TUI)在任何代码库上交互式使用Autofix Bot,或作为Claude Code的插件,或者通过我们的MCP在任何兼容的AI客户端(如OpenAI Codex)上使用。我们特别为AI编码代理优先的工作流程进行构建,因此您可以要求您的代理在每个检查点自动运行Autofix Bot。
今天就来试试吧:[https://autofix.bot](https://autofix.bot)。我们期待您的反馈!
---
[1] [https://github.com/ossf-cve-benchmark/ossf-cve-benchmark](https://github.com/ossf-cve-benchmark/ossf-cve-benchmark)
[2] [https://huggingface.co/deepsource/Narada-3.2-3B-v1](https://huggingface.co/deepsource/Narada-3.2-3B-v1)
[3] [https://autofix.bot/manual/#terminal-ui](https://autofix.bot/manual/#terminal-ui)
查看原文
Hi there, HN! We’re Jai and Sanket from DeepSource (YC W20), and today we’re launching Autofix Bot, a hybrid static analysis + AI agent purpose-built for in-the-loop use with AI coding agents.<p>AI coding agents have made code generation nearly free, and they’ve shifted the bottleneck to code review. Static-only analysis with a fixed set of checkers isn’t enough. LLM-only review has several limitations: non-deterministic across runs, low recall on security issues, expensive at scale, and a tendency to get ‘distracted’.<p>We spent the last 6 years building a deterministic, static-analysis-only code review product. Earlier this year, we started thinking about this problem from the ground up and realized that static analysis solves key blind spots of LLM-only reviews. Over the past six months, we built a new ‘hybrid’ agent loop that uses static analysis and frontier AI agents together to outperform both static-only and LLM-only tools in finding and fixing code quality and security issues. Today, we’re opening it up publicly.<p>Here’s how the hybrid architecture works:<p><pre><code> - Static pass: 5,000+ deterministic checkers (code quality, security, performance) establish a high-precision baseline. A sub-agent suppresses context-specific false positives.
- AI review: The agent reviews code with static findings as anchors. Has access to AST, data-flow graphs, control-flow, import graphs as tools, not just grep and usual shell commands.
- Remediation: Sub-agents generate fixes. Static harness validates all edits before emitting a clean git patch.
</code></pre>
Static solves key LLM problems: non-determinism across runs, low recall on security issues (LLMs get distracted by style), and cost (static narrowing reduces prompt size and tool calls).<p>On the OpenSSF CVE Benchmark [1] (200+ real JS/TS vulnerabilities), we hit 81.2% accuracy and 80.0% F1; vs Cursor Bugbot (74.5% accuracy, 77.42% F1), Claude Code (71.5% accuracy, 62.99% F1), CodeRabbit (59.4% accuracy, 36.19% F1), and Semgrep CE (56.9% accuracy, 38.26% F1).
On secrets detection, 92.8% F1; vs Gitleaks (75.6%), detect-secrets (64.1%), and TruffleHog (41.2%). We use our open-source classification model for this. [2]<p>Full methodology and how we evaluated each tool: <a href="https://autofix.bot/benchmarks" rel="nofollow">https://autofix.bot/benchmarks</a><p>You can use Autofix Bot interactively on any repository using our TUI, as a plugin in Claude Code, or with our MCP on any compatible AI client (like OpenAI Codex).[3] We’re specifically building for AI coding agent-first workflows, so you can ask your agent to run Autofix Bot on every checkpoint autonomously.<p>Give us a shot today: <a href="https://autofix.bot" rel="nofollow">https://autofix.bot</a>. We’d love to hear any feedback!<p>---<p>[1] <a href="https://github.com/ossf-cve-benchmark/ossf-cve-benchmark" rel="nofollow">https://github.com/ossf-cve-benchmark/ossf-cve-benchmark</a><p>[2] <a href="https://huggingface.co/deepsource/Narada-3.2-3B-v1" rel="nofollow">https://huggingface.co/deepsource/Narada-3.2-3B-v1</a><p>[3] <a href="https://autofix.bot/manual/#terminal-ui" rel="nofollow">https://autofix.bot/manual/#terminal-ui</a>