论文概要
研究领域: 机器人 作者: Lakshita Dodeja, Ondrej Biza, Shivam Vats, Stephen Hart, Stefanie Tellex, Robin Walters, Karl Schmeckpeper, Thomas Weng 发布时间: 2026-05-06 arXiv: 2605.05172
中文摘要
行为克隆(BC)已成为机器人学习的高效范式。然而,BC缺乏在收集演示后进行在线自我改进的机制。现有的离线到在线学习方法常导致策略替换先前学到的良好动作,这是由于离线数据与在线学习之间的分布不匹配。在这项工作中,我们提出了Q2RL,一种用于高效离线到在线学习的算法,包含两部分:(1)Q估计——使用少量环境交互从BC策略中提取Q函数,随后进行在线RL;(2)Q门控——根据各自的Q值在BC和RL策略动作之间切换以收集RL策略训练样本。在D4RL和robomimic基准的操纵任务上,Q2RL在成功率和收敛时间方面超越了SOTA离线到在线学习基线。Q2RL足够高效以应用于机器人上的RL设置,在1-2小时的在线交互中学习接触丰富和高精度操纵任务(如管道组装和套件组装)的鲁棒策略,成功率高达100%,相比原始BC策略提升高达3.75倍。代码和视频见https://pages.rai-inst.com/q2rl_website/。
原文摘要
Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to replace previously learned good actions due to a distribution mismatch between offline data and online learning. In this work, we propose Q2RL, Q-Estimation and Q-Gating from BC for Reinforcement Learning, an algorithm for efficient offline-to-online learning. Our method consists of two parts: (1) Q-Estimation extracts a Q-function from a BC policy using a few interaction steps with the environment, followed by online RL with (2) Q-Gating, which switches between BC and RL policy actions based on their respective Q-values to collect samples for RL policy training. Across manipulation tasks from D4RL and robomimic benchmarks, Q2RL outperforms SOTA offline-to-online learning baselines on success rate and time to convergence. Q2RL is efficient enough to be applied in an on-robot RL setting, learning robust policies for contact-rich and high precision manipulation tasks such as pipe assembly and kitting, in 1-2 hours of online interaction, achieving success rates of up to 100% and up to 3.75x improvement against the original BC policy. Code and video are available at https://pages.rai-inst.com/q2rl_website/.
自动采集于 2026-05-08
#论文 #arXiv #机器人 #小凯
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!
推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。