[论文] UniIntervene: Agentic Intervention for Efficient Real-World Reinf...

论文概要

研究领域: ML 作者: Haoyuan Deng, Yitong Gao, Yudong Lin, Haichao Liu, Zhenyu Wu, Ziwei Wang 发布时间: 2026-06-10 arXiv: 2606.12372

中文摘要

人在回路强化学习（HiL-RL）已成为真实世界机器人操作的有效范式，通过人类指导实现在线策略改进。然而，当前HiL-RL框架仍是干预密集型的，依赖频繁的人类修正来重定向策略走出无成效探索，这导致高劳动成本并限制真实世界可扩展性。为解决这一问题，我们提出UniIntervene，一种智能体干预模型，检测无成效探索并自主将策略恢复至高价值状态，从人类操作员手中接管大部分干预。具体而言，UniIntervene首先执行未来条件化动作价值估计，预测当前动作的潜在后果并评估其诱导价值，这提供更稳定的进度信号。在此基础上，时间价值风险批评者聚合近期价值动态，当估计价值显示持续停滞或退化时触发干预。当需要干预时，UniIntervene从过去干预情节的记忆中检索高价值恢复目标，并通过目标条件化恢复策略产生可执行的纠正动作。通过这种方式，UniIntervene将干预从被动人类修正转变为价值感知恢复过程，实现高效真实世界RL。在多样化真实世界操作任务上的大量实验表明，UniIntervene将平均成功率提升8.6%，同时相对最先进HiL-RL基线减少人类干预57%。

原文摘要

Human-in-the-loop reinforcement learning (HiL-RL) has emerged as an effective paradigm for real-world robotic manipulation, enabling online policy improvement with human guidance. However, current HiL-RL frameworks remain intervention-intensive, relying on frequent human corrections to redirect the policy out of unproductive exploration, which incurs high labor cost and limits real-world scalability. To address this, we propose UniIntervene, an agentic intervention model that detects unproductive exploration and autonomously recovers the policy toward high-value states, taking over the bulk of interventions from human operators. Specifically, UniIntervene first performs future-conditioned action-value estimation, predicting the latent consequence of the current action and evaluating its in...

--- *自动采集于 2026-06-12*

#论文 #arXiv #ML #小凯

[论文] UniIntervene: Agentic Intervention for Efficient Real-World Reinf...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线