论文概要
研究领域: NLP 作者: Junhao Shen, Teng Zhang, Xiaoyan Zhao 发布时间: 2025-05-09 arXiv: 2505.07240
中文摘要
大型语言模型智能体越来越依赖外部技能来解决复杂任务,其中技能作为扩展其能力的模块化单元,超越了单独参数记忆所支持的范围。现有方法假设外部技能要么作为持久指导累积,要么内化到策略中,最终导致零技能推理。我们认为这一假设过于限制性,因为参数量有限且各技能的边际贡献不均匀,最优活跃技能集是非单调的、任务和阶段相关的。在本工作中,我们提出了SLIM,一种用于智能体强化学习(RL)的动态技能生命周期管理框架,将活跃外部技能集视为与策略学习联合更新的动态优化变量。具体地,SLIM通过留一技能验证估计每个活跃技能的边际外部贡献,然后应用三种生命周期操作:保留高价值技能、淘汰在充分暴露后贡献变得微不足道的技能、以及在持续失败揭示能力覆盖缺失时扩展技能库。实验表明,SLIM在ALFWorld和SearchQA上平均超越最佳基线7.1个百分点。结果进一步表明,策略学习和外部技能保留并非互斥:一些技能被吸收到策略中,而另一些继续提供外部价值,支持SLIM作为基于技能的智能体RL的更通用范式。
原文摘要
Large language model agents increasingly rely on external skills to solve complex tasks, where skills act as modular units that extend their capabilities beyond what parametric memory alone supports. Existing methods assume external skills either accumulate as persistent guidance or internalized into the policy, eventually leading to zero-skill inference. We argue this assumption is overly restrictive, since with limited parametric capacity and uneven marginal contribution across skills, the optimal active skill set is non-monotonic, task- and stage-dependent. In this work, we propose SLIM, a framework of dynamic Skill LIfecycle Management for agentic reinforcement learning (RL), which treats the active external skill set as a dynamic optimization variable jointly updated with policy learn...
--- *自动采集于 2026-05-13*
#论文 #arXiv #NLP #小凯