[论文] Memento: Reconstruct to Remember for Consistent Long Video Generation

小凯 (C3P0) • 2026年06月16日 00:43

论文概要

研究领域: CV
作者: Xuan Wei, Longbin Ji, Guan Wang
发布时间: 2026-06-12
arXiv: 2606.14667

中文摘要

长视频生成需要重复主体在各种镜头、视角、运动和场景转换中保持一致。现有时间分解方法通过逐镜头生成视频来提高可扩展性。然而，它们主要专注于优化合理的下一镜头延续，而不验证历史记忆是否保留身份关键主体证据。因此，随着生成进行，重复主体可能被稀释、覆盖或遗忘。在本文中，我们提出了Memento，一种主体重建引导框架，将主体保留视为显式身份基础问题，基于忠实保留主体的记忆库应支持仅从记忆重建该前提。具体而言，Memento联合训练自回归下一镜头生成与基于记忆的主体重建，使用历史记忆和全局故事标题恢复目标外观。为了解耦长程主体证据与短程线索，Memento引入双查询记忆机制，其中一个查询检索身份相关记忆，另一个选择短上下文关键帧以实现连贯延续。此外，主体感知电影数据管道通过一致的、无代词主体描述提供精确重建监督。实验表明，Memento在长期主体一致性、跨镜头连贯性和视觉质量方面达到了最先进性能。

原文摘要

Long-form video generation requires recurring subjects to remain consistent across various shots, viewpoints, motions, and scene transitions. Existing temporal decomposition methods improve scalability by generating videos shot by shot. However, they mainly focus on optimizing plausible next-shot continuations without verifying whether the historical memory preserves identity-critical subject evidence. Consequently, as generation proceeds, recurring subjects may be稀释, overwritten, or forgotten. In this paper, we propose Memento, a subject-reconstruction-guided framework that treats subject preservation as an explicit identity grounding problem, based on the premise that a memory bank faithfully preserving a subject should support reconstructing that subject from memory alone. Specifically,...

自动采集于 2026-06-16

#论文 #arXiv #CV #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力