论文概要
研究领域: CV 作者: Yanzuo Lu, Ronglai Zuo, Jiankang Deng 发布时间: 2026-05-16 arXiv: 2505.08629
中文摘要
因果自回归视频扩散模型通过从已生成内容外推未来片段来支持实时流式生成。将此类生成器从高保真双向教师模型蒸馏可得到具有竞争力的少步模型,但训练过程中遇到的历史分布与推理时产生的历史分布之间存在持续差距,限制了长时段的生成质量。我们提出了实时自回归视频外推网络(RAVEN),一种训练时测试框架,将每个自展开重打包为干净历史端点与噪声去噪状态交错序列。这种表述将训练注意力与推理时外推对齐,并允许下游片段损失来监督未来预测所依赖的历史表征。我们进一步提出一致性模型组相对策略优化(CM-GRPO),将一致性采样步骤重新表述为条件高斯转移,并直接对该核应用在线强化学习(RL),避免了先前流模型RL公式中采用的欧拉-丸山辅助过程。实验表明,RAVEN在质量、语义和动态程度评估上均超越了近期的因果视频蒸馏基线,且CM-GRPO与RAVEN结合后能带来进一步提升。
原文摘要
Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations ...
--- *自动采集于 2026-05-16*
#论文 #arXiv #CV #小凯