Qwen3-Max-Thinking Drops:36T 代币
阿里巴巴正式推出了Qwen3-Max-Thinking,这是一款万亿参数的MoE旗舰级大语言模型,预训练于36万亿个标记——是Qwen 2.5语料库的两倍——并且在19个权威基准测试中,已经与GPT-5.2-Thinking、Claude-Opus-4.5和Gemini 3 Pro等顶级模型相匹敌或超越。其两个核心技术突破真正使其脱颖而出。
首先是自适应工具调用:无需手动提示,它能够根据任务需求自主调用搜索引擎、记忆工具和代码解释器。这减少了幻觉现象并提升了实时问题解决能力;例如,编码任务会触发自动错误修正循环,而研究任务则将搜索与上下文综合结合。其次是测试时扩展(TTS):通过迭代洞察来优化推理,超越了标准并行采样,在关键基准测试中取得了可测量的提升——GPQA从90.3提升至92.8,LiveCodeBench v6从88.0跃升至91.4,而IMO-AnswerBench则从89.5上升至91.5。
值得注意的是,其预览版本在AIME 25和HMMT 25等艰难的数学竞赛中甚至达到了100%的准确率。该模型在网页和桌面演示中运行流畅,其API已准备好投入生产,并具备可调的思维预算(默认最高可达80K个标记),以平衡深度和速度。这不仅仅是一次增量更新——这是一次飞跃,缩小了现实世界学术和工程任务中推理与工具集成的差距。
了解更多信息,请访问:https://chat.qwen.ai
查看原文
Alibaba has officially launched Qwen3-Max-Thinking, a trillion-parameter MoE flagship LLM pretrained on 36T tokens—double the corpus of Qwen 2.5—and it’s already matching or outperforming top-tier models like GPT-5.2-Thinking, Claude-Opus-4.5, and Gemini 3 Pro across 19 authoritative benchmarks. Its two core technical breakthroughs are what truly set it apart.<p>First, Adaptive Tool Calling: No manual prompts are needed—it autonomously invokes search engines, memory tools, and code interpreters based on task demands. This cuts down on hallucinations and boosts real-time problem-solving; for instance, coding tasks trigger automatic error correction loops, while research tasks combine search with context synthesis. Second, Test-Time Scaling (TTS): It outperforms standard parallel sampling by refining reasoning through iterative insights, with measurable jumps in key benchmarks—GPQA rose from 90.3 to 92.8, LiveCodeBench v6 hit 91.4 from 88.0, and IMO-AnswerBench climbed to 91.5 from 89.5.<p>Notably, its preview version even achieved 100% accuracy in tough math contests like AIME 25 and HMMT 25. The model runs smoothly on web/desktop demos, and its API is production-ready with adjustable thinking budgets (up to 80K tokens by default) to balance depth and speed. This isn’t just an incremental update—it’s a leap that closes the gap in reasoning and tool integration for real-world academic and engineering tasks.<p>Check it out: https://chat.qwen.ai/