HackerNews中文版

hCaptcha Challenger 利用多模态大型语言模型（MLLMs）的空间思维链（SCoT）推理能力，构建了一种自主工作流程框架。该架构使自主智能体能够在多样的空间视觉任务中通过动态问题解决工作流程进行零-shot 适应，消除了对特定任务微调或额外训练参数的需求。

查看原文

hCaptcha Challenger harnesses the spatial chain-of-thought (SCoT) reasoning capabilities of multimodal large language models (MLLMs) to construct an agentic workflow framework. This architecture empowers autonomous agents to perform zero-shot adaptation on diverse spatial-visual tasks through dynamic problem-solving workflows, eliminating the requirement for task-specific fine-tuning or additional training parameters.

使用多模态大型语言模型解决hCaptcha挑战