移动MCP:让大型语言模型自主发现Android应用的功能
大家好,
我们一直在思考当前移动AI助手的一个核心局限性:
大多数系统(例如,Apple Intelligence、Google Assistant风格的集成)依赖于预定义的架构和协调的API。应用程序必须明确实现助手的规范。这限制了可扩展性,并使生态系统受到严格控制。
另一方面,基于图形用户界面的代理(例如,AppAgent、AutoDroid、droidrun)依赖于屏幕截图和无障碍功能,这虽然赋予了广泛的能力,但能力边界却较弱。
因此,我们构建了Mobile-MCP,这是一个基于Android的模型上下文协议(MCP)实现,使用了Intent框架。
关键思想:
- 应用程序在其清单中声明MCP风格的能力(使用自然语言描述)。
- 基于大语言模型(LLM)的助手可以通过PackageManager自主发现设备上所有暴露的能力。
- LLM选择要调用的API,并根据自然语言描述生成参数。
- 调用通过标准的Android服务绑定/Intent进行。
与Apple/Android风格的协调集成不同:
- 没有预定义的动作领域。
- 每个助手没有集中式架构。
- 不需要每个助手的自定义集成。
- 工具可以动态添加并独立演变。
助手不需要事先了解特定应用程序——它在运行时发现并推理能力。
我们已经构建了一个可工作的原型,并发布了规范和演示:
GitHub: https://github.com/system-pclub/mobile-mcp
规范: https://github.com/system-pclub/mobile-mcp/blob/main/spec/mobile-mcp_spec_v1.md
演示: https://www.youtube.com/watch?v=Bc2LG3sR1NY&feature=youtu.be
论文: https://github.com/system-pclub/mobile-mcp/blob/main/paper/mobile_mcp.pdf
我们很想知道大家的看法:
操作系统原生能力广播 + LLM推理是否比固定助手架构或图形用户界面自动化更具可扩展性?
希望能得到从事移动代理、安全、MCP工具或Android系统设计的朋友们的反馈。
查看原文
Hi all,<p>We’ve been thinking about a core limitation in current mobile AI assistants:<p>Most systems (e.g., Apple Intelligence, Google Assistant–style integrations) rely on predefined schemas and coordinated APIs. Apps must explicitly implement the assistant’s specification. This limits extensibility and makes the ecosystem tightly controlled.<p>On the other hand, GUI-based agents (e.g., AppAgent, AutoDroid, droidrun) rely on screenshots + accessibility, which gives broad power but weak capability boundaries.<p>So we built Mobile-MCP, an Android-native realization of the Model Context Protocol (MCP) using the Intent framework.<p>The key idea:<p>- Apps declare MCP-style capabilities (with natural-language descriptions) in their manifest.<p>- An LLM-based assistant can autonomously discover all exposed capabilities on-device via the PackageManager.<p>- The LLM selects which API to call and generates parameters based on natural language description.<p>- Invocation happens through standard Android service binding / Intents.<p>Unlike Apple/Android-style coordinated integrations:<p>- No predefined action domains.<p>- No centralized schema per assistant.<p>- No per-assistant custom integration required.<p>- Tools can be dynamically added and evolve independently.<p>The assistant doesn’t need prior knowledge of specific apps — it discovers and reasons over capabilities at runtime.<p>We’ve built a working prototype + released the spec and demo:<p>GitHub: https://github.com/system-pclub/mobile-mcp<p>Spec: https://github.com/system-pclub/mobile-mcp/blob/main/spec/mobile-mcp_spec_v1.md<p>Demo: https://www.youtube.com/watch?v=Bc2LG3sR1NY&feature=youtu.be<p>Paper: https://github.com/system-pclub/mobile-mcp/blob/main/paper/mobile_mcp.pdf<p>Curious what people think:<p>Is OS-native capability broadcasting + LLM reasoning a more scalable path than fixed assistant schemas or GUI automation?<p>Would love feedback from folks working on mobile agents, security, MCP tooling, or Android system design.