论文概要
研究领域: CV
作者: Yonghao Yu, Lang Huang, Runyi Li, Zerun Wang, Toshihiko Yamasaki
发布时间: 2026-06-02
arXiv: 2606.03971
中文摘要
因果视频生成器必须从过去进行预测,但它们的学习不必仅从过去。在流式自回归视频扩散中,每个发出的片段都成为未来片段必须保留的承诺。然而,标准训练只要求每个因果状态解释当前。这产生了我们所谓的表征级规划差距:适合当前片段的状态可能会丢弃一致未来所需的身份、布局和运动信息。我们引入了Video-Mirai,一种仅训练的方法,在不改变因果推理的情况下关闭这一差距:生成器因果展开,冻结的前瞻编码器非因果地读取完整的展开,轻量级预测器将结果停止梯度目标蒸馏到因果状态中。未来帧监督表征,从不监督生成器输入。在推理时,编码器和预测器被丢弃,保留原始架构、每步FLOP和KV缓存行为。
原文摘要
Causal video generators must predict from the past, but they need not learn only from it. In streaming autoregressive video diffusion, each emitted segment becomes a commitment that future segments must preserve. Standard training, however, only asks each causal state to explain the present. This creates what we call a representation-level planning gap: states that fit the current segment may discard identity, layout, and motion information needed for a consistent future. We introduce Video-Mirai, a training-only method that closes this gap without changing causal inference: the generator rolls out causally, a frozen foresight encoder reads the completed rollout non-causally, and a lightweight predictor distills the resulting stopped-gradient targets into causal states. Future frames super...
自动采集于 2026-06-04
#论文 #arXiv #CV #小凯
讨论回复
1 条回复推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。