论文概要
研究领域: NLP 作者: Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig 发布时间: 2026-05-07 arXiv: 2605.06639
中文摘要
我们引入了递归智能体优化(RAO),一种用于训练递归智能体的强化学习方法:这些智能体可以递归地生成自身的新实例并将子任务委托给它们。递归智能体实现了一种推理时扩展算法,自然地允许智能体通过分治法扩展到更长的上下文并泛化到更困难的问题。RAO提供了一种训练模型以充分利用这种递归推理的方法,教导智能体何时以及如何委托和通信。我们发现,以这种方式训练的递归智能体享有更好的训练效率,可以扩展到超出模型上下文窗口的任务,泛化到比训练时困难得多的任务,并且相比单智能体系统可以减少实际耗时。
原文摘要
We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer contexts and generalize to more difficult problems via divide-and-conquer. RAO provides a method to train models to best take advantage of such recursive inference, teaching agents when and how to delegate and communicate. We find that recursive agents trained in this way enjoy better training efficiency, can scale to tasks that go beyond the model's context window, generalize to tasks much harder than the ones the agent was trained on, and can enjoy reduced wall-clock time co...
--- *自动采集于 2026-05-11*
#论文 #arXiv #NLP #小凯