HackerNews中文版

嘿，HN，我一直对GitHub上缺乏准确的开源贡献者排名感到沮丧。现有的列表要么限制较早，要么高度本地化，完全遗漏了那些贡献数以万计甚至数十万的开发者。因此，我构建了DevIndex，以根据开发者的终身贡献对全球前50,000名最活跃的开发者进行排名。从工程的角度来看，我设定的限制是：*没有后端API。* 我希望将其完全托管在GitHub Pages上，免费提供，这意味着浏览器必须直接处理所有50,000条数据丰富的记录。以下是我们如何实现的： 1. *自主数据工厂（后端）：* 由于GitHub的API没有“终身贡献”端点，我们构建了一个在GitHub Actions上运行的Node.js管道。它使用“网络爬虫”遍历社交图（以打破算法过滤气泡），并使用更新器将GraphQL查询分块，以防止502超时。该管道持续更新一个单一的`users.jsonl`文件。 ``` *隐私说明：* 我们使用“隐身星”架构来处理选择退出。如果开发者给我们的选择退出仓库加星，管道会进行加密验证，立即清除他们的数据，并将其列入黑名单。无需提供电子邮件。 ``` 2. *引擎级流式处理（O(1)内存解析）：* 你不能在不冻结UI的情况下对一个23MB的JSONL文件进行`JSON.parse()`。我们使用`ReadableStream`和`TextDecoderStream`构建了一个流代理，以增量方式解析NDJSON，瞬间渲染前500个用户，同时在后台加载其余数据。 3. *涡轮模式与虚拟字段：* 实例化50,000个JS对象会消耗大量内存。存储保持原始POJO，完全按照解析的方式存储。复杂的计算字段（如“2024年总提交数”）使用由RecordFactory动态生成的基于原型的getter。添加60个新数据列对每条记录的内存开销为0字节。 4. *“固定DOM顺序”网格：* 我们不得不重写底层UI引擎（Neo.mjs）。传统的VDOM在处理大列表时会崩溃，因为滚动会触发成千上万的`insertBefore`/`removeChild`变更。我们实现了严格的DOM池。VDOM数组的长度始终不变。离开视口的行会通过硬件加速的CSS `translate3d`在原地回收。对50,000条记录的60fps垂直滚动不会产生任何结构性DOM变更。 5. *五重线程架构：* 为了保持排序速度并在单元格中渲染“动态小图”，我们积极地将工作负载分配给多个工作线程。主线程*仅*应用DOM更新。应用工作线程处理50,000个数据集、流式处理和VDOM生成。专用的Canvas工作线程使用`OffscreenCanvas`以60fps独立渲染小图。整个后端管道、流式UI和核心引擎重写在一个月内由我和我的AI助手完成。实时应用（查看你的排名）： [https://neomjs.com/apps/devindex](https://neomjs.com/apps/devindex) 代码 / 26个架构指南： [https://github.com/neomjs/neo/tree/dev/apps/devindex](https://github.com/neomjs/neo/tree/dev/apps/devindex) 希望能听到大家对架构的反馈，特别是那些曾经处理过“胖客户端”扩展问题或大规模GraphQL聚合的人！

查看原文

Hey HN,I’ve always been frustrated by the lack of an accurate ranking for top open-source contributors on GitHub. The available lists either cap out early or are highly localized, completely missing developers with tens or hundreds of thousands of contributions.So, I built DevIndex to rank the top 50,000 most active developers globally based on their lifetime contributions.From an engineering perspective, the constraint I imposed was: *No backend API.* I wanted to host this entirely on GitHub Pages for free, meaning the browser had to handle all 50,000 data-rich records directly.Here is how we made it work:1. *The Autonomous Data Factory (Backend):* Because GitHub's API has no "Lifetime Contributions" endpoint, we built a Node.js pipeline running on GitHub Actions. It uses a "Network Walker" spider to traverse the social graph (to break out of algorithmic filter bubbles) and an Updater that chunks GraphQL queries to prevent 502 timeouts. The pipeline continuously updates a single `users.jsonl` file.<pre><code> *Privacy Note:* We use a "Stealth Star" architecture for opt-outs. If a dev stars our opt-out repo, the pipeline cryptographically verifies them, instantly purges their data, and blocklists them. No emails required. </code></pre> 2. *Engine-Level Streaming (O(1) Memory Parsing):* You can't `JSON.parse()` a 23MB JSONL file without freezing the UI. We built a Stream Proxy using `ReadableStream` and `TextDecoderStream` to parse the NDJSON incrementally, rendering the first 500 users instantly while the rest load in the background.3. *Turbo Mode & Virtual Fields:* Instantiating 50k JS objects crushes memory. The store holds raw POJOs exactly as parsed. Complex calculated fields (like "Total Commits 2024") use prototype-based getters dynamically generated by a RecordFactory. Adding 60 new data columns adds 0 bytes of memory overhead per record.4. *The "Fixed-DOM-Order" Grid:* We had to rewrite our underlying UI engine (Neo.mjs). Traditional VDOMs die on massive lists because scrolling triggers thousands of `insertBefore`/`removeChild` mutations. We implemented a strict DOM pool. The VDOM array length never changes. Rows leaving the viewport are recycled in place via hardware-accelerated CSS `translate3d`. A 60fps vertical scroll across 50,000 records generates 0 structural DOM mutations.5. *Quintuple-Threaded Architecture:* To keep sorting fast and render "Living Sparklines" in the cells, we aggressively split the workload across workers. The Main Thread only applies DOM updates. The App Worker handles the 50k dataset, streaming, and VDOM generation. A dedicated Canvas Worker renders the sparklines independently at 60fps using `OffscreenCanvas`.The entire backend pipeline, streaming UI, and core engine rewrite were completed in one month by myself and my AI agent.Live App (see where you rank): <a href="https://neomjs.com/apps/devindex/" rel="nofollow">https://neomjs.com/apps/devindex/</a> Code / 26 Architectural Guides: <a href="https://github.com/neomjs/neo/tree/dev/apps/devindex" rel="nofollow">https://github.com/neomjs/neo/tree/dev/apps/devindex</a>Would love to hear feedback on the architecture, especially from anyone who has tackled "Fat Client" scaling issues or massive GraphQL aggregation!

展示HN：DevIndex – 使用静态JSON文件对5万名GitHub开发者进行排名