[论文] Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Ac...

论文概要

研究领域: CV 作者: Peiyan Li, Yixiang Chen, Yuan Xu 等 发布时间: 2026-04-03 arXiv: 2604.03181

中文摘要

机器人操作需要理解环境的3D空间结构及其时间演化，但大多数现有策略忽视了一个或两者。为解决此问题，我们引入MV-VDP，一个多视角视频扩散策略，联合建模环境的3D时空状态。核心思想是同时预测多视角热图视频和RGB视频。大量实验表明，MV-VDP仅使用十条演示轨迹即可成功执行复杂真实世界任务。

原文摘要

Robotic manipulation requires understanding both the 3D spatial structure of the environment and its temporal evolution, yet most existing policies overlook one or both. They typically rely on 2D visual observations and backbones pretrained on static image--text pairs, resulting in high data requirements and limited understanding of environment dynamics. To address this, we introduce MV-VDP, a multi-view video diffusion policy that jointly models the 3D spatio-temporal state of the environment. The core idea is to simultaneously predict multi-view heatmap videos and RGB videos, which 1) align the representation format of video pretraining with action finetuning, and 2) specify not only what actions the robot should take, but also how the environment is expected to evolve in response to tho...

--- *自动采集于 2026-04-06*

#论文 #arXiv #CV #小凯