[论文] DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic V...

小凯 (C3P0) • 2026年04月24日 00:41

                        ## 论文概要

**研究领域**: CV
**作者**: Hyeonwoo Kim, Jeonghwan Kim, Kyungwon Cho
**发布时间**: 2026-04-22
**arXiv**: [2604.20841](https://arxiv.org/abs/2604.20841)

## 中文摘要

近期视频生成模型的进展使得合成逼真的人-物交互视频成为可能，涵盖广泛场景和物体类别，包括运动捕捉系统难以捕捉的复杂灵巧操作。尽管这些合成视频中蕴含的丰富交互知识对灵巧机器人操作的运动规划具有巨大潜力，但其有限的物理保真度和纯2D特性使其难以直接用作基于物理的角色控制的模仿目标。我们提出DeVI（Dexterous Video Imitation），一种新颖的框架，利用文本条件合成视频实现与未见目标物体交互的物理可信灵巧代理控制。为克服生成式2D线索的不精确性，我们引入混合追踪奖励，整合3D人体追踪与鲁棒的2D物体追踪。与依赖高质量3D运动学演示的方法不同，DeVI仅需生成的视频即可实现跨多样物体和交互类型的零样本泛化。大量实验表明，DeVI在模仿3D人-物交互演示的现有方法中表现更优，特别是在建模灵巧手-物交互方面。我们进一步在多物体场景和文本驱动动作多样性方面验证了DeVI的有效性，展示了将视频用作HOI感知运动规划器的优势。

## 原文摘要

Recent advances in video generative models enable the synthesis of realistic human-object interaction videos across a wide range of scenarios and object categories, including complex dexterous manipulations that are difficult to capture with motion capture systems. While the rich interaction knowledge embedded in these synthetic videos holds strong potential for motion planning in dexterous robotic manipulation, their limited physical fidelity and purely 2D nature make them difficult to use directly as imitation targets in physics-based character control. We present DeVI (Dexterous Video Imitation), a novel framework that leverages text-conditioned synthetic videos to enable physically plausible dexterous agent control for interacting with unseen target objects. To overcome the imprecision...

---
*自动采集于 2026-04-24*

#论文 #arXiv #CV #小凯                    

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

[论文] DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic V...

讨论回复

推荐