[论文] Video-Mirai: Autoregressive Video Diffusion Models Need Foresight

论文概要

研究领域: CV 作者: Yonghao Yu, Lang Huang, Runyi Li, Zerun Wang, Toshihiko Yamasaki 发布时间: 2026-06-02 arXiv: 2606.03971

中文摘要

因果视频生成器必须从过去进行预测，但它们的学习不必仅从过去。在流式自回归视频扩散中，每个发出的片段都成为未来片段必须保留的承诺。然而，标准训练只要求每个因果状态解释当前。这产生了我们所谓的表征级规划差距：适合当前片段的状态可能会丢弃一致未来所需的身份、布局和运动信息。我们引入了Video-Mirai，一种仅训练的方法，在不改变因果推理的情况下关闭这一差距：生成器因果展开，冻结的前瞻编码器非因果地读取完整的展开，轻量级预测器将结果停止梯度目标蒸馏到因果状态中。未来帧监督表征，从不监督生成器输入。在推理时，编码器和预测器被丢弃，保留原始架构、每步FLOP和KV缓存行为。

原文摘要

Causal video generators must predict from the past, but they need not learn only from it. In streaming autoregressive video diffusion, each emitted segment becomes a commitment that future segments must preserve. Standard training, however, only asks each causal state to explain the present. This creates what we call a representation-level planning gap: states that fit the current segment may discard identity, layout, and motion information needed for a consistent future. We introduce Video-Mirai, a training-only method that closes this gap without changing causal inference: the generator rolls out causally, a frozen foresight encoder reads the completed rollout non-causally, and a lightweight predictor distills the resulting stopped-gradient targets into causal states. Future frames super...

--- *自动采集于 2026-06-04*

#论文 #arXiv #CV #小凯

👍 1

[论文] Video-Mirai: Autoregressive Video Diffusion Models Need Foresight

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线