展示HN:DevIndex – 使用静态JSON文件对5万名GitHub开发者进行排名
嘿,HN,
我一直对GitHub上缺乏准确的开源贡献者排名感到沮丧。现有的列表要么限制较早,要么高度本地化,完全遗漏了那些贡献数以万计甚至数十万的开发者。
因此,我构建了DevIndex,以根据开发者的终身贡献对全球前50,000名最活跃的开发者进行排名。
从工程的角度来看,我设定的限制是:*没有后端API。* 我希望将其完全托管在GitHub Pages上,免费提供,这意味着浏览器必须直接处理所有50,000条数据丰富的记录。
以下是我们如何实现的:
1. *自主数据工厂(后端):*
由于GitHub的API没有“终身贡献”端点,我们构建了一个在GitHub Actions上运行的Node.js管道。它使用“网络爬虫”遍历社交图(以打破算法过滤气泡),并使用更新器将GraphQL查询分块,以防止502超时。该管道持续更新一个单一的`users.jsonl`文件。
```
*隐私说明:* 我们使用“隐身星”架构来处理选择退出。如果开发者给我们的选择退出仓库加星,管道会进行加密验证,立即清除他们的数据,并将其列入黑名单。无需提供电子邮件。
```
2. *引擎级流式处理(O(1)内存解析):*
你不能在不冻结UI的情况下对一个23MB的JSONL文件进行`JSON.parse()`。我们使用`ReadableStream`和`TextDecoderStream`构建了一个流代理,以增量方式解析NDJSON,瞬间渲染前500个用户,同时在后台加载其余数据。
3. *涡轮模式与虚拟字段:*
实例化50,000个JS对象会消耗大量内存。存储保持原始POJO,完全按照解析的方式存储。复杂的计算字段(如“2024年总提交数”)使用由RecordFactory动态生成的基于原型的getter。添加60个新数据列对每条记录的内存开销为0字节。
4. *“固定DOM顺序”网格:*
我们不得不重写底层UI引擎(Neo.mjs)。传统的VDOM在处理大列表时会崩溃,因为滚动会触发成千上万的`insertBefore`/`removeChild`变更。我们实现了严格的DOM池。VDOM数组的长度始终不变。离开视口的行会通过硬件加速的CSS `translate3d`在原地回收。对50,000条记录的60fps垂直滚动不会产生任何结构性DOM变更。
5. *五重线程架构:*
为了保持排序速度并在单元格中渲染“动态小图”,我们积极地将工作负载分配给多个工作线程。主线程*仅*应用DOM更新。应用工作线程处理50,000个数据集、流式处理和VDOM生成。专用的Canvas工作线程使用`OffscreenCanvas`以60fps独立渲染小图。
整个后端管道、流式UI和核心引擎重写在一个月内由我和我的AI助手完成。
实时应用(查看你的排名): [https://neomjs.com/apps/devindex](https://neomjs.com/apps/devindex)
代码 / 26个架构指南: [https://github.com/neomjs/neo/tree/dev/apps/devindex](https://github.com/neomjs/neo/tree/dev/apps/devindex)
希望能听到大家对架构的反馈,特别是那些曾经处理过“胖客户端”扩展问题或大规模GraphQL聚合的人!
查看原文
Hey HN,<p>I’ve always been frustrated by the lack of an accurate ranking for top open-source contributors on GitHub. The available lists either cap out early or are highly localized, completely missing developers with tens or hundreds of thousands of contributions.<p>So, I built DevIndex to rank the top 50,000 most active developers globally based on their lifetime contributions.<p>From an engineering perspective, the constraint I imposed was: *No backend API.* I wanted to host this entirely on GitHub Pages for free, meaning the browser had to handle all 50,000 data-rich records directly.<p>Here is how we made it work:<p>1. *The Autonomous Data Factory (Backend):*
Because GitHub's API has no "Lifetime Contributions" endpoint, we built a Node.js pipeline running on GitHub Actions. It uses a "Network Walker" spider to traverse the social graph (to break out of algorithmic filter bubbles) and an Updater that chunks GraphQL queries to prevent 502 timeouts. The pipeline continuously updates a single `users.jsonl` file.<p><pre><code> *Privacy Note:* We use a "Stealth Star" architecture for opt-outs. If a dev stars our opt-out repo, the pipeline cryptographically verifies them, instantly purges their data, and blocklists them. No emails required.
</code></pre>
2. *Engine-Level Streaming (O(1) Memory Parsing):*
You can't `JSON.parse()` a 23MB JSONL file without freezing the UI. We built a Stream Proxy using `ReadableStream` and `TextDecoderStream` to parse the NDJSON incrementally, rendering the first 500 users instantly while the rest load in the background.<p>3. *Turbo Mode & Virtual Fields:*
Instantiating 50k JS objects crushes memory. The store holds raw POJOs exactly as parsed. Complex calculated fields (like "Total Commits 2024") use prototype-based getters dynamically generated by a RecordFactory. Adding 60 new data columns adds 0 bytes of memory overhead per record.<p>4. *The "Fixed-DOM-Order" Grid:*
We had to rewrite our underlying UI engine (Neo.mjs). Traditional VDOMs die on massive lists because scrolling triggers thousands of `insertBefore`/`removeChild` mutations. We implemented a strict DOM pool. The VDOM array length never changes. Rows leaving the viewport are recycled in place via hardware-accelerated CSS `translate3d`. A 60fps vertical scroll across 50,000 records generates 0 structural DOM mutations.<p>5. *Quintuple-Threaded Architecture:*
To keep sorting fast and render "Living Sparklines" in the cells, we aggressively split the workload across workers. The Main Thread <i>only</i> applies DOM updates. The App Worker handles the 50k dataset, streaming, and VDOM generation. A dedicated Canvas Worker renders the sparklines independently at 60fps using `OffscreenCanvas`.<p>The entire backend pipeline, streaming UI, and core engine rewrite were completed in one month by myself and my AI agent.<p>Live App (see where you rank): <a href="https://neomjs.com/apps/devindex/" rel="nofollow">https://neomjs.com/apps/devindex/</a>
Code / 26 Architectural Guides: <a href="https://github.com/neomjs/neo/tree/dev/apps/devindex" rel="nofollow">https://github.com/neomjs/neo/tree/dev/apps/devindex</a><p>Would love to hear feedback on the architecture, especially from anyone who has tackled "Fat Client" scaling issues or massive GraphQL aggregation!