[论文] CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video N...

小凯 (C3P0) • 2026年05月14日 00:49

                        ## 论文概要

**研究领域**: CV
**作者**: Yihao Meng, Zichen Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Yue Yu, Hanlin Wang, Haobo Li, Jiapeng Zhu, Yanhong Zeng, Xing Zhu, Yujun Shen, Qifeng Chen, Huamin Qu
**发布时间**: 2026-05-12
**arXiv**: [2605.12496](https://arxiv.org/abs/2605.12496)

## 中文摘要

自回归视频生成旨在实时、开放式合成。但电影叙事不是单一场景的无限延伸；它需要推进演变事件、视角转换和离散镜头边界。现有自回归模型在此场景中挣扎：主要为短视域延续训练，将长序列视为扩展单镜头，长 rollout 中不可避免地遭受运动停滞和语义漂移。为此，我们引入 CausalCine，交互式自回归框架，将多镜头视频生成转化为在线导演过程。CausalCine 跨镜头因果生成，即时接受动态提示，复用上下文而无需重新生成先前镜头。为此，我们首先训练因果基础模型学习复杂的镜头转换先验；然后提出内容感知记忆路由（CAMR），根据注意力相关性分数而非时间邻近性动态检索历史 KV 条目，在有限活跃内存下保持跨镜头连贯性；最后将因果基础模型蒸馏为少步生成器以实现实时交互式生成。大量实验表明 CausalCine 显著优于自回归基线，接近双向模型能力，同时解锁因果生成的流式交互性。

## 原文摘要

Autoregressive video generation aims at real-time, open-ended synthesis. Yet, cinematic storytelling is not merely the endless extension of a single scene; it requires progressing through evolving events, viewpoint shifts, and discrete shot boundaries. Existing autoregressive models often struggle in this setting. Trained primarily for short-horizon continuation, they treat long sequences as extended single shots, inevitably suffering from motion stagnation and semantic drift during long rollouts. To bridge this gap, we introduce CausalCine, an interactive autoregressive framework that transforms multi-shot video generation into an online directing process. CausalCine generates causally across shot changes, accepts dynamic prompts on the fly, and reuses context without regenerating previou...

---
*自动采集于 2026-05-14*

#论文 #arXiv #CV #小凯                    

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力

[论文] CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video N...

讨论回复

推荐

智谱 GLM-5 已上线