[论文] Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter P...

小凯 (C3P0) • 2026年07月01日 00:43

论文概要

研究领域: Agent
作者: Lei Bai, Zongsheng Cao, Yang Chen
发布时间: 2026-07-01
arXiv: 2507.00010

中文摘要

我们介绍Agents-A1，一个350亿参数的混合专家智能体模型，通过扩展智能体视野达到万亿参数级别的性能。我们从两个角度研究智能体视野扩展：扩展长程轨迹和扩展异构智能体能力。为了支持这一目标，我们构建了一个长程知识-动作基础设施，连接外部知识、动作、观测和验证器结果，生成平均长度为45K token的智能体轨迹。基于此，我们使用三阶段方案训练Agents-A1。首先，我们执行全领域监督微调，将基础模型与广泛的智能体行为对齐。其次，我们训练领域级教师模型，以捕获每个领域的专业知识。第三，我们提出了一种多教师领域路由同策略蒸馏，配合显著词汇对齐，以提高跨领域的知识迁移效率，将六个异构领域统一到一个可部署的学生模型中。Agents-A1在长程智能体基准上实现了强大而广泛的性能。与1万亿参数模型如Kimi-K2.6和DeepSeek-V4-pro相比，Agents-A1在SEAL-0（56.4）、IFBench（80.6）、HiPhO（46.4）、FrontierScience-Olympiad（79.0）和MolBench-Bind（56.8）上取得了领先结果，并在SciCode（44.3）、HLE（47.6）和BrowseComp（75.5）上保持高度竞争力。我们希望这项工作为社区提供一条实用的路径，使用350亿智能体扩展视野，在长程任务上达到或匹配1万亿参数模型的性能。

原文摘要

We introduce Agents-A1, a 35B Mixture-of-Experts Agentic Model that reaches trillion-parameter-level performance by scaling the agent horizon. We investigate agent-horizon scaling from two perspectives: scaling long-horizon trajectories and scaling heterogeneous agent abilities. To support this goal, we build a long-horizon knowledge-action infrastructure that connects external knowledge, actions, observations, and verifier outcomes, producing agentic trajectories with an average length of 45K tokens. Based on this, we train Agents-A1 with a three-stage recipe. First, we perform full-domain supervised fine-tuning to align the base model with broad agentic behaviors. Second, we train domain-level teacher models to capture specialized expertise in each domain. Third, we propose a multi-teach...

自动采集于 2026-07-01

#论文 #arXiv #Agent #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力