HackerNews中文版

嘿，HN，亨利和罗曼在这里——我们正在构建一个跨平台框架，用于在智能手机上本地部署大型语言模型（LLMs）、视觉语言模型（VLMs）、嵌入模型和文本转语音（TTS）模型。 Ollama 使得在笔记本电脑和边缘服务器上本地部署 LLMs 成为可能，而 Cactus 则支持在手机上部署。直接在手机上部署可以方便地构建能够在手机上使用的 AI 应用和智能代理，而不会破坏隐私，支持实时推理且没有延迟，我们已经看到了为用户提供个性化 RAG 管道等功能。最近，苹果和谷歌分别通过推出 Apple Foundation Frameworks 和 Google AI Edge 积极进入本地 AI 模型领域。然而，这两者都是平台特定的，仅支持公司特定的模型。为此，Cactus： - 提供 Flutter、React-Native 和 Kotlin 多平台支持，适合跨平台开发者，因为如今大多数应用都是用这些技术构建的。 - 支持在 Huggingface 上找到的任何 GGUF 模型；如 Qwen、Gemma、Llama、DeepSeek、Phi、Mistral、SmolLM、SmolVLM、InternVLM、Jan Nano 等等。 - 支持从 FP32 到低至 2 位量化模型，以提高效率并减轻设备负担。 - 具有 MCP 工具调用，使其性能优越，真正有用（设置提醒、图库搜索、回复消息）等。 - 在复杂、受限或大上下文任务中回退到大型云模型，确保稳健性和高可用性。它是完全开源的。我们希望更多的人尝试它，并告诉我们如何使其变得更好！代码库： [https://github.com/cactus-compute/cactus](https://github.com/cactus-compute/cactus)

查看原文

Hey HN, Henry and Roman here - we've been building a cross-platform framework for deploying LLMs, VLMs, Embedding Models and TTS models locally on smartphones.Ollama enables deploying LLMs models locally on laptops and edge severs, Cactus enables deploying on phones. Deploying directly on phones facilitates building AI apps and agents capable of phone use without breaking privacy, supports real-time inference with no latency, we have seen personalised RAG pipelines for users and more.Apple and Google actively went into local AI models recently with the launch of Apple Foundation Frameworks and Google AI Edge respectively. However, both are platform-specific and only support specific models from the company. To this end, Cactus:- Is available in Flutter, React-Native & Kotlin Multi-platform for cross-platform developers, since most apps are built with these today.- Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek, Phi, Mistral, SmolLM, SmolVLM, InternVLM, Jan Nano etc.- Accommodates from FP32 to as low as 2-bit quantized models, for better efficiency and less device strain.- Have MCP tool-calls to make them performant, truly helpful (set reminder, gallery search, reply messages) and more.- Fallback to big cloud models for complex, constrained or large-context tasks, ensuring robustness and high availability.It's completely open source. Would love to have more people try it out and tell us how to make it great!Repo: <a href="https://github.com/cactus-compute/cactus">https://github.com/cactus-compute/cactus</a>

展示HN：Cactus – 智能手机上的Ollama