展示HN:SPL – 多语言管道和您自己的迷你FaaS在一台机器上
你好,黑客们!
我想分享一个我和我的团队成员正在进行的个人项目。这个项目的核心思想是构建一个多语言的计算图,同时让你能够在本地机器上快速部署一个迷你FaaS(函数即服务)平台。换句话说,你可以使用本地框架和服务器,轻松地混合和匹配来自不同来源(甚至是不同第三方工具)的代码。我们将这个项目称为SPL。
这个想法是如何产生的呢?
在处理一个复杂模型的过程中,我们意识到需要结合来自不同语言、早期项目中提取的代码和工具。我们有独立的数据库查询、几种根本不同的大数据集预处理方法、一个两阶段的训练过程,以及对最终模型的评估和验证。
我们考虑了像Airflow、Dagster和Prefect等知名工具。然而,它们在某些简单场景下显得有些笨重,并且不适合快速原型开发。此外,我们的数据集的一部分需要使用C++进行更底层的处理,而不是标准的Python。这就是我们产生这个个人项目想法的原因——希望能够无缝地将那些本来不兼容的代码结合在一起,并且允许我们在团队内部分享我们的工作。基本上,我们有几个关键目标:
1. 构建一个由函数或工具组成的连接计算图,无论它们的语言或依赖关系如何。
2. 支持这些图的本地和远程执行(以便团队可以共享他们的工作)。
3. 使得只运行图的一部分成为可能,保持之前步骤的状态和结果,以简化新方法的测试。
一些实现细节
计算图是一个有向的、连接的图,包含节点(函数或工具)和它们之间的链接(输入和输出)。每个节点接受输入参数,执行特定任务,然后将结果传递给下一个节点。
我们的方法涉及一个特定语言的框架——如果我们正在运行代码——以及一个处理FaaS部分的服务器,协调图中的节点,并正确传递工件。
我们选择为Python构建SPL框架的第一个版本,因为这是我们使用最多的语言。最终结果将是一个库,让你可以直接在Python笔记本中直观地创建计算图。
为了管理计算图本身——添加或删除节点、保存结果以及仅运行某些部分——我们决定采用类似于PyTorch的机制。在PyTorch中,你通过顺序添加层到`nn.Sequential()`来构建模型。
迷你FaaS
在我们看来,最酷的功能之一是你可以快速在自己的机器上启动一个迷你FaaS。如果你的计算机可以访问互联网,你的函数和图将立即对其他用户可用。
目前,SPL服务器支持:
- 用于远程执行函数的HTTP API。
- 以JSON格式导入和导出图。
- 任务协调和分布式结果缓存。
- 一个简单的Web界面,用于查看和编辑图。
SPL的可能用例:
1. 本地开发:构建图和函数,可以在不同项目中重用,而无需不断复制代码。
2. 生产使用:将业务逻辑和基础设施分开,轻松进行零停机时间的更新。
3. 个人FaaS(包括函数市场):有可能将你的工作发布给他人(包括货币化),只交付结果而不是整个代码库。
4. 可视化业务流程:服务器支持图形渲染,并显示输入和输出端口,这对于高层次的项目管理非常方便。
我为什么要写这篇文章?
我们非常希望听到你的想法:
- 你是否遇到过类似的挑战?
- 这样的工具对你有用吗?
- 你希望在这样的项目中看到哪些功能?
我们期待在评论中讨论这些想法!
查看原文
Hello, Hackers!<p>I’d like to share a pet project my teammate and I have been working on. The core idea is to build a multi-language computational graph that also lets you quickly deploy a mini-FaaS (Function as a Service) platform on your local machine. In other words, you can easily mix and match code from various sources (and even different third-party tools) using a local framework and server. We’re calling this project SPL.<p>How did this idea come about?<p>While working on a complex model, we realized we needed to combine code and utilities written in different languages, pulled from earlier projects. We had separate DB-queries, several fundamentally different methods of preprocessing large datasets, a two-stage training process, plus final evaluation and validation of the resulting model.<p>We considered well-known tools like Airflow, Dagster, and Prefect. However, they felt a bit heavy for some simpler scenarios, and they weren’t ideal for rapid prototyping. Besides, part of our dataset required lower-level processing with C++ rather than standard Python. That’s how the idea for a pet project arose—something that would let us seamlessly bring together code that otherwise wouldn’t play nicely, and also allow us to share our work within the team. Essentially, we had a few key goals:<p>1. Build a connected computational graph made up of functions or utilities, regardless of their language or dependencies.<p>2. Support both local and remote execution of these graphs (so teams can share their work).<p>3. Make it possible to run only part of a graph, keeping the state and results of previous steps, to simplify testing new approaches.<p>Some implementation details<p>A computational graph is a directed, connected graph with nodes (functions or utilities) and links between them (inputs and outputs). Each node takes input parameters, performs a specific task, and sends the result along to the next node.<p>Our approach involves a framework for a specific language—if we’re running code—plus a server that handles the FaaS side, orchestrates nodes in the graph, and takes care of passing artifacts around correctly.<p>We chose to build the first version of the SPL framework for Python, since that’s the language we use most. The end result will be a library that lets you intuitively create computational graphs right from a Python notebook.<p>To manage the graph itself—adding or removing nodes, saving results, and running only certain parts—we decided on a mechanic similar to PyTorch. In PyTorch, you build a model by sequentially adding layers to `nn.Sequential()`.<p>A pocket-sized FaaS<p>One of the coolest features, in our opinion, is that you can quickly spin up your own mini-FaaS right on your machine. If your computer has internet access, your functions and graphs become instantly available to other users.<p>Right now, the SPL server supports:<p>-An HTTP API for remotely executing functions.<p>-Import and export of graphs in JSON format.<p>-Task coordination and distributed result caching.<p>-A simple web interface for viewing and editing graphs.<p>Possible use cases for SPL:
1. Local development: Build graphs and functions you can reuse across different projects without constantly copying code.<p>2. Production usage: Keep business logic and infrastructure separate, with easy zero-downtime updates.<p>3. Personal FaaS (including a function marketplace): Potentially publish your work for others (including monetization), delivering only results instead of the entire codebase.<p>4. Visualizing business processes: The server supports graph rendering and displays input and output ports, which can be handy for high-level project management.<p>Why am I writing this post?<p>We’d really love to hear what you think:<p>-Have you faced similar challenges?<p>-Would such a tool be useful for you?<p>-What features would you like to see in a project like this?<p>We’re excited to discuss these ideas in the comments!