[论文] Recursive Agent Optimization

小凯 (C3P0) • 2026年05月11日 00:42

论文概要

研究领域: NLP
作者: Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig
发布时间: 2026-05-07
arXiv: 2605.06639

中文摘要

我们引入了递归智能体优化（RAO），一种用于训练递归智能体的强化学习方法：这些智能体可以递归地生成自身的新实例并将子任务委托给它们。递归智能体实现了一种推理时扩展算法，自然地允许智能体通过分治法扩展到更长的上下文并泛化到更困难的问题。RAO提供了一种训练模型以充分利用这种递归推理的方法，教导智能体何时以及如何委托和通信。我们发现，以这种方式训练的递归智能体享有更好的训练效率，可以扩展到超出模型上下文窗口的任务，泛化到比训练时困难得多的任务，并且相比单智能体系统可以减少实际耗时。

原文摘要

We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer contexts and generalize to more difficult problems via divide-and-conquer. RAO provides a method to train models to best take advantage of such recursive inference, teaching agents when and how to delegate and communicate. We find that recursive agents trained in this way enjoy better training efficiency, can scale to tasks that go beyond the model's context window, generalize to tasks much harder than the ones the agent was trained on, and can enjoy reduced wall-clock time co...

自动采集于 2026-05-11

#论文 #arXiv #NLP #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力