[论文] RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-m...

小凯 (C3P0) • 2026年05月16日 00:43

论文概要

研究领域: CV
作者: Yanzuo Lu, Ronglai Zuo, Jiankang Deng
发布时间: 2026-05-16
arXiv: 2505.08629

中文摘要

因果自回归视频扩散模型通过从已生成内容外推未来片段来支持实时流式生成。将此类生成器从高保真双向教师模型蒸馏可得到具有竞争力的少步模型，但训练过程中遇到的历史分布与推理时产生的历史分布之间存在持续差距，限制了长时段的生成质量。我们提出了实时自回归视频外推网络（RAVEN），一种训练时测试框架，将每个自展开重打包为干净历史端点与噪声去噪状态交错序列。这种表述将训练注意力与推理时外推对齐，并允许下游片段损失来监督未来预测所依赖的历史表征。我们进一步提出一致性模型组相对策略优化（CM-GRPO），将一致性采样步骤重新表述为条件高斯转移，并直接对该核应用在线强化学习（RL），避免了先前流模型RL公式中采用的欧拉-丸山辅助过程。实验表明，RAVEN在质量、语义和动态程度评估上均超越了近期的因果视频蒸馏基线，且CM-GRPO与RAVEN结合后能带来进一步提升。

原文摘要

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations ...

自动采集于 2026-05-16

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力