论文概要
研究领域: ML 作者: Xuhao Hu, Xi Zhang, Haiyang Xu, Kyle Qiao, Jingyi Yang, Xuanjing Huang, Jing Shao, Ming Yan, Jieping Ye 发布时间: 2026-05-12 arXiv: 2605.12481
中文摘要
计算机使用代理(CUA)可通过原子 GUI 动作(如点击和输入)和高级工具调用(如基于 API 的文件操作)执行动作,但这种混合动作空间常使它们不确定何时继续 GUI 动作或切换工具,导致次优执行路径。此困难源于高质量交错 GUI-Tool 轨迹的稀缺、收集真实工具轨迹的成本和脆弱性,以及缺乏 GUI-Tool 路径选择的轨迹级监督。本文提出 ToolCUA,端到端 agent,通过分阶段训练范式学习最优 GUI-Tool 路径选择。我们首先引入交错 GUI-Tool 轨迹扩展流水线,重新利用丰富的静态 GUI 轨迹并合成 grounded 工具库,无需手动工程或真实工具轨迹收集即可实现多样 GUI-Tool 轨迹。然后执行工具引导 GUI RFT,结合预热 SFT 和单轮 RL 改进关键 GUI-Tool 切换点的决策。最后在高保真 GUI-Tool 环境中通过在线 Agentic RL 优化 ToolCUA,由工具高效路径奖励引导,鼓励适当工具使用和更短执行路径。OSWorld-MCP 实验表明 ToolCUA 实现 46.85% 准确率,相对基线提升约 66%,在可比较规模模型中建立新 SOTA。相比纯 GUI 设置提升 3.9%,证明有效 GUI-Tool 编排。
原文摘要
Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click and type, and high-level tool calls, such as API-based file operations, but this hybrid action space often leaves them uncertain about when to continue with GUI actions or switch to tools, leading to suboptimal execution paths. This difficulty stems from the scarcity of high-quality interleaved GUI-Tool trajectories, the cost and brittleness of collecting real tool trajectories, and the lack of trajectory-level supervision for GUI-Tool path selection. In this paper, we propose ToolCUA, an end-to-end agent designed to learn optimal GUI-Tool path selection through a staged training paradigm. We first introduce an Interleaved GUI-Tool Trajectory Scaling Pipeline that repurposes abundant static GUI trajectories a...
--- *自动采集于 2026-05-14*
#论文 #arXiv #ML #小凯