Unleashing Guidance Without Classifiers for Human-Object Interaction Animation

论文概要

研究领域: 计算机视觉作者: Ziyin Wang, Sirui Xu, Chuan Guo, Bing Zhou, Jiangshan Gong, Jian Wang, Yu-Xiong Wang, Liang-Yan Gui 发布时间: 2026-03-26 arXiv: 2603.25734v1

中文摘要

生成逼真的人-物交互（HOI）动画仍然具有挑战性，因为它需要联合建模动态人体动作和多样化的物体几何形状。先前的基于扩散的方法通常依赖手工设计的接触先验或人为施加的运动学约束来提高接触质量。我们提出了LIGHT，一种数据驱动的替代方案，其中指导信号从去噪过程本身涌现，减少了对人工设计先验的依赖。基于扩散强制（diffusion forcing），我们将表示分解为模态特定的组件，并为每个组件分配个性化的噪声水平和异步去噪调度。在此范式中，较清晰的组件通过交叉注意力机制指导较嘈杂的组件，从而在没有辅助分类器的情况下产生指导信号。我们发现这种数据驱动的指导本质上具有接触感知能力，并且当训练数据扩充了广泛的合成物体几何形状时可以得到增强，从而鼓励接触语义对几何多样性的不变性。大量实验表明，由节奏诱导的指导比传统的无分类器指导更有效地反映了接触先验的优势，同时实现了更高的接触保真度、更真实的HOI生成，以及对未见物体和任务的更强泛化能力。

原文摘要

Generating realistic human-object interaction (HOI) animations remains challenging because it requires jointly modeling dynamic human actions and diverse object geometries. Prior diffusion-based approaches often rely on hand-crafted contact priors or human-imposed kinematic constraints to improve contact quality. We propose LIGHT, a data-driven alternative in which guidance emerges from the denoising pace itself, reducing dependence on manually designed priors. Building on diffusion forcing, we factor the representation into modality-specific components and assign individualized noise levels with asynchronous denoising schedules. In this paradigm, cleaner components guide noisier ones through cross-attention, yielding guidance without auxiliary classifiers. We find that this data-driven gu...

--- *自动采集于 2026-03-28*

#论文 #arXiv #计算机视觉 #小凯