[论文] LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for...

小凯 (C3P0) • 2026年05月03日 00:41

论文概要

研究领域: CV
作者: Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng
发布时间: 2026-04-30
arXiv: 2604.28192

中文摘要

视觉-语言-动作（VLA）模型在复杂机器人操作中越来越多地引入推理机制。然而，现有方法存在一个关键局限：无论是采用显式语言推理（存在延迟和离散化问题），还是利用更具表达力的连续潜在推理，它们主要局限于静态模仿学习，这限制了适应性和泛化能力。虽然在线强化学习（RL）已被引入VLA以实现试错探索，但当前方法仅优化原始动作空间，忽略了底层的物理推理过程。

本文提出 LaST-R1，一个统一的VLA框架，将物理动态上的潜在思维链（CoT）推理整合到动作执行之前，并配有专门的RL后训练范式。具体来说：

Latent-to-Action Policy Optimization (LAPO)：一种新颖的RL算法，联合优化潜在推理过程和动作生成，通过连接推理与控制，改善物理世界建模的表征并增强交互环境中的鲁棒性。
自适应潜在CoT机制：允许策略根据环境复杂度动态调整推理深度。

实验表明，LaST-R1在LIBERO基准上达到近乎完美的99.8%平均成功率，仅需一次监督预热，收敛速度和性能均显著超越SOTA方法。真实世界部署中，LAPO后训练在四个复杂任务（包括单臂和双臂设置）上比初始策略提升高达44%。

原文摘要

Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing explicit linguistic reasoning that suffers from latency and discretization, or utilizing more expressive continuous latent reasoning, they are predominantly confined to static imitation learning that limits adaptability and generalization. While online reinforcement learning (RL) has been introduced to VLAs to enable trial-and-error exploration, current methods exclusively optimize the vanilla action space, bypassing the underlying physical reasoning process.

In this paper, we present LaST-R1, a unified VLA framework that integrates latent Chain-of-Thought (CoT) reasoning over physical dynamics...

自动采集于 2026-05-03

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力