HackerNews中文版

我是一名网页开发承包商，在进行一个无关的爱好项目时偶然接触到了GPU原生的概率编程。所谓“GPU原生”，是指整个推断算法在GPU内核中运行，不需要CPU的协调——没有Python的开销，也没有步骤之间的内核启动延迟。我对15种不同的推断算法进行了与NumPyro、JAX和GPyTorch的基准测试。我没有统计学背景，因此确保跟踪了专家关心的质量指标。我的R-hat值在0.9999到1.0003之间（应该接近1.0），而在HMC上的有效样本数（ESS）每秒提升了多达600倍。一些质量指标更倾向于基线实现——我并不是说我的方法在每个维度上都优于其他方法，只是它在质量相当的情况下显著更快。测试在RTX 4060笔记本GPU上进行。完整基准结果： https://github.com/Aeowulf/nativeppl-results 目前还不分享实现细节，因为我仍在思考如何处理这一发现。但我希望能得到以下方面的反馈： - 这些基准测试是否有意义/公平？ - 我还应该测试哪些其他算法或问题规模？ - 是否存在对更快概率推断的市场需求？

查看原文

I'm a web dev contractor who stumbled onto GPU-native probabilistic programming while working on an unrelated hobby project.By "GPU-native" I mean the entire inference algorithm runs inside GPU kernels with no CPU coordination - no Python overhead, no kernel launch latency between steps.I benchmarked against NumPyro, JAX, and GPyTorch on 15 different inference algorithms. I don't have a statistics background, so I made sure to track the quality metrics that experts care about.My R-hat values are 0.9999-1.0003 (should be ~1.0), and ESS/second is up to 600x better on HMC. Some quality metrics favor the baseline implementations - I'm not claiming this beats everything on every dimension, just that it's significantly faster with comparable quality.Tested on an RTX 4060 Laptop GPU. Full benchmark results: https://github.com/Aeowulf/nativeppl-resultsNot sharing implementation details yet as I'm still figuring out what to make of this discovery. But I'd appreciate feedback on:- Are these benchmarks meaningful/fair?- What other algorithms or problem sizes should I test?- Is there a market for faster probabilistic inference?

我不小心让概率编程的速度提高了30到200倍。