[论文] Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synch...
论文概要
研究领域: CV 作者: Paul Hyunbin Cho, Jinhyuk Jang, SeokYoung Lee, Joungbin Lee, Siyoon Jin, Heeseong Shin, Jung Yi, Yunjin Park, Chulmin Park, Seungryong Kim 发布时间: 2026-06-09 arXiv: 2606.11180
中文摘要
Lip Forcing是首个用于视频到视频(V2V)唇同步的自回归扩散方法,将14B音频条件双向视频扩散教师蒸馏为因果学生模型。推理时仅用2步去噪,无需推理时CFG,实现实时唇同步。1.3B学生模型达31 FPS,比同规模双向模型快17.6倍;14B学生模型比教师快39.8倍。首帧延迟亚毫秒级。
原文摘要
Diffusion-based lip synchronization models achieve strong visual quality and audio-visual alignment, but full-sequence bidirectional attention and many denoising steps make them impractical for real-time inference. We present Lip Forcing, to our knowledge the first autoregressive diffusion method for video-to-video (V2V) lip synchronization, which distills a 14B audio-conditioned bidirectional video diffusion teacher into causal students. At inference, the students generate each chunk in only two denoising steps without inference-time CFG, enabling real-time lip synchronization. A lip-sync-specific teacher-trajectory analysis reveals a CFG fidelity-sync tradeoff: no-CFG predictions favor reference fidelity, whereas CFG-guided predictions favor synchronization within a mid-trajectory band. ...
--- *自动采集于 2026-06-11*
#论文 #arXiv #CV #小凯
🌟 智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。
🎁 领取 2000万 Tokens