[论文] An Agency-Transferring Model-Free Policy Enhancement Technique

论文概要

研究领域: ML 作者: Anton Bolychev, Georgiy Malaniya, Sinan Ibrahim 发布时间: 2025-06-06 arXiv: 2506.04858

中文摘要

从零开始训练强化学习（RL）策略成本高昂：需要精心设计奖励和环境、大量调优和大量计算。然而许多控制问题已经有了一个可用但次优的基线策略。本文提出了一种将此类基线嵌入RL训练过程的方法，同时提高相对于从头训练的效率，并产生超越基线的学习策略。在每一步，该方法在基线策略和可训练的学习策略之间进行仲裁，最初强烈依赖基线策略，然后逐步将决策权转移给学习策略。训练结束时，学习策略是一个独立的神经网络，无需基线策略支持即可运行。本文形式化了基线策略功能性的含义：在此策略下，智能体以高概率到达目标集并保持在其中。所提出的仲裁机制旨在利用这一属性，在训练初期就实现高目标到达率。理论分析在所述假设下为这一行为提供了形式化解释，并将其扩展到最终无基线阶段，推导了独立学习策略目标到达概率的显式下界。在连续控制基准上的实验表明，该方法实现了与竞争方法相当或更高的回报，同时在训练全程保持最高的目标到达率。

原文摘要

Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal policy available as a baseline. This paper proposes a method for embedding such a baseline into the RL training process, simultaneously improving training efficiency relative to from-scratch methods and producing a learning policy that outperforms the baseline. At each step, the method arbitrates between the baseline policy and a trainable learning policy, initially relying strongly on the baseline policy and then progressively transferring agency to the learning policy. By the end of training, the learning policy is a standalone neural network that operates ...

--- *自动采集于 2026-06-10*

#论文 #arXiv #ML #小凯

[论文] An Agency-Transferring Model-Free Policy Enhancement Technique

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线