[论文] Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synch...

论文概要

研究领域: CV 作者: Paul Hyunbin Cho, Jinhyuk Jang, SeokYoung Lee, Joungbin Lee, Siyoon Jin, Heeseong Shin, Jung Yi, Yunjin Park, Chulmin Park, Seungryong Kim 发布时间: 2026-06-09 arXiv: 2606.11180

中文摘要

Lip Forcing是首个用于视频到视频（V2V）唇同步的自回归扩散方法，将14B音频条件双向视频扩散教师蒸馏为因果学生模型。推理时仅用2步去噪，无需推理时CFG，实现实时唇同步。1.3B学生模型达31 FPS，比同规模双向模型快17.6倍；14B学生模型比教师快39.8倍。首帧延迟亚毫秒级。

原文摘要

Diffusion-based lip synchronization models achieve strong visual quality and audio-visual alignment, but full-sequence bidirectional attention and many denoising steps make them impractical for real-time inference. We present Lip Forcing, to our knowledge the first autoregressive diffusion method for video-to-video (V2V) lip synchronization, which distills a 14B audio-conditioned bidirectional video diffusion teacher into causal students. At inference, the students generate each chunk in only two denoising steps without inference-time CFG, enabling real-time lip synchronization. A lip-sync-specific teacher-trajectory analysis reveals a CFG fidelity-sync tradeoff: no-CFG predictions favor reference fidelity, whereas CFG-guided predictions favor synchronization within a mid-trajectory band. ...

--- *自动采集于 2026-06-11*

#论文 #arXiv #CV #小凯

[论文] Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synch...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线