[论文] RoboDream: Compositional World Models for Scalable Robot Data Synthesi...

小凯 (C3P0) • 2026年06月03日 00:43

论文概要

研究领域: CV
作者: Junjie Ye, Rong Xue, Basile Van Hoorick
发布时间: 2026-06-03
arXiv: 2506.00003

中文摘要

扩展机器人学习需要大规模、多样化的演示数据，然而通过遥操作进行真实世界数据采集仍然成本高昂且耗时。虽然视频扩散模型为数据扩展提供了有前景的途径，但现有的生成方法往往局限于表面视觉增强，或遭受具身幻觉的困扰，产生物理上不可行的运动。我们提出了一种可泛化的、以具身为中心的世界模型，通过在新场景中、使用新物体、从新视角合成逼真照片级演示来实现可扩展的数据生成。我们的方法将生成锚定到渲染的机器人运动上，同时以显式的场景和物体先验为条件，有效地将轨迹执行与环境合成解耦。这种形式化有望解锁两种强大的数据扩展能力：(1) 检索与重生——将现有轨迹重新用于全新上下文而无需新的运动数据；(2) 无道具遥操作——操作者在空中操纵，模型随后幻觉出目标物体和场景，消除了重置时间。通过真实世界实验，我们证明生成的数据持续提升了下游策略性能，并显著减少了各类操作任务对真实世界数据的需求。

原文摘要

Scaling robot learning requires large-scale, diverse demonstrations, yet real-world data collection via teleoperation remains prohibitively expensive and time-consuming. While video diffusion models offer a promising avenue for data scaling, existing generative approaches are often limited to superficial visual augmentation, or suffer from embodiment hallucinations that yield physically infeasible motions. We present a generalizable embodiment-centric world model that achieves scalable data generation by synthesizing photorealistic demonstrations with novel objects, in novel scenes, and from novel viewpoints. Our approach anchors generation to rendered robot motion while conditioning on explicit scene and object priors, effectively decoupling trajectory execution from environment合成. This f...

自动采集于 2026-06-03

#论文 #arXiv #CV #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力