HackerNews中文版

我构建了一个 TypeScript 迭代器库（iterflow），并希望测量懒惰和急切管道之间的实际堆内存差异。这是基准测试的详细说明。 管道 急切 - 标准数组链式调用： const data = Array.from(generateRows(500_000)); const results = data .filter(r => r.active && r.value > threshold) .map(r => ({ id: r.id, score: r.value * 1.5 })) .slice(0, 10_000); 每一步都会产生一个新的中间数组。 .filter() 分配一个，.map() 分配另一个，.slice() 则丢弃大部分的数组。懒惰 - 通过 iterflow 实现相同的管道： import { iter } from '@mathscapes/iterflow'; const results = iter(generateRows(500_000)) .filter(r => r.active && r.value > threshold) .map(r => ({ id: r.id, score: r.value * 1.5 })) .take(10_000) .toArray(); generateRows 是一个生成器，每次生成一行。在 .toArray() 拉取值通过链条之前，没有任何数据被实际化。没有中间数组。 结果 数据集：500,000 行管道：filter(active && value > 5000) → map(score) → take(10,000) 原生数组 (.filter → .map → .slice) 15.4 MB （最小 15.2 MB，最大 16.2 MB） iterflow (.filter → .map → .take) 5.8 MB （最小 5.8 MB，最大 5.8 MB） 方法论 - 指标：管道前后的 heapUsed 差异，而不是总进程内存 - 两个管道都来自相同的生成器源——差异仅测量管道分配，而不是源数据——在每次运行之间强制调用显式 gc() 的 expose-gc - 测量前丢弃一次热身运行 - 报告 5 次运行的中位数 - 原生数组运行在管道运行之前将完整的 500k 数据集实际化为数据。该分配不包括在差异中——两种方法在同一基础上进行测量。 关于库的一些说明 - iter() 是对 ES2015 生成器和迭代器协议的封装——没有魔法，只是一个流畅的 API，使调用位置看起来与数组链式调用相同 - .sum() 和 .mean() 仅限于 Iterflow<number> 类型——在非数字迭代器上调用它们会导致编译错误 - 具有一些流式统计操作（.streamingMean()、.ewma()、.windowedMin()），可以在不使用单独累加器的情况下进行运行聚合 - 零运行时依赖 https://www.npmjs.com/package/@mathscapes/iterflow

查看原文

I built a TypeScript iterator library (iterflow) and wanted to measure the actual heap difference between lazy and eager pipelines. This is the benchmark writeup.The pipelinesEager - standard array chaining:const data = Array.from(generateRows(500_000));const results = data .filter(r => r.active && r.value > threshold) .map(r => ({ id: r.id, score: r.value * 1.5 })) .slice(0, 10_000);Each step produces a new intermediate array. .filter() allocates one, .map() allocates another, .slice() then discards most of both.Lazy - same pipeline via iterflow:import { iter } from '@mathscapes/iterflow';const results = iter(generateRows(500_000)) .filter(r => r.active && r.value > threshold) .map(r => ({ id: r.id, score: r.value * 1.5 })) .take(10_000) .toArray(); generateRows is a generator, yields one row at a time. Nothing is materialized until .toArray() pulls values through the chain. No intermediate arrays.ResultsDataset: 500,000 rows Pipeline: filter(active && value > 5000) → map(score) → take(10,000)native array (.filter → .map → .slice) 15.4 MB (min 15.2 MB, max 16.2 MB) iterflow (.filter → .map → .take) 5.8 MB (min 5.8 MB, max 5.8 MB)Methodology- Metric: heapUsed delta before and after the pipeline, not total process memory - Both pipelines start from the same generator source — the delta measures pipeline allocations only, not source data --expose-gc with explicit gc() calls forced between every run - One warm-up run discarded before measurement - Median of 5 runs reported - The native array run materializes the full 500k dataset into data before the pipeline runs. That allocation is not included in the delta - both approaches are measured on the same footing.A few notes on the library- iter() is a wrapper around ES2015 generators and the iterator protocol - no magic, just a fluent API so the call site looks identical to array chaining - .sum() and .mean() are typed to Iterflow<number> only - calling them on a non-numeric iterator is a compile error - Has some streaming statistical operations (.streamingMean(), .ewma(), .windowedMin()) for running aggregations without a separate accumulator - Zero runtime dependencieshttps://www.npmjs.com/package/@mathscapes/iterflow

懒惰迭代与数组链式操作在50万行数据上的基准测试结果