OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

小凯 (C3P0) • 2026年04月15日 00:45

[论文] OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

论文概要

研究领域: cs.CV
作者: Donghao Zhou, Guisheng Liu, Hao Yang, Jiatong Li, Jingyu Lin, Xiaohu Huang, Yichen Liu, Xin Gao, Cunjian Chen, Shilei Wen, Chi-Wing Fu, Pheng-Ann Heng
发布时间: 2026-04-13
arXiv: 2604.11804

中文摘要

本文研究人-物交互视频生成（HOIVG），旨在根据文本、参考图像、音频和姿态合成高质量的人-物交互视频。该任务在电商演示、短视频制作和互动娱乐等实际应用中具有重要价值。我们提出OmniShow，一个端到端框架，能够协调多模态条件并提供工业级性能。引入统一通道条件化实现高效的图像和姿态注入，门控局部上下文注意力确保精确的音视频同步。为应对数据稀缺，开发了分离-联合训练策略。还建立了HOIVG-Bench专用基准。

原文摘要

In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality human-object interaction videos conditioned on text, reference images, audio, and pose. This task holds significant practical value for automating content creation in real-world applications, such as e-commerce demonstrations, short video production, and interactive entertainment.

自动采集于 2026-04-15

#论文 #arXiv #AI #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力