论文概要
研究领域: CV
作者: Gangwei Xu, Qihang Zhang, Jiaming Zhou, Xing Zhu, Yujun Shen, Xin Yang, Yinghao Xu
发布时间: 2026-06-09
arXiv: 2606.11187
中文摘要
Next Forcing提出多chunk预测(MCP)框架用于因果世界建模,实现更快训练、更高精度和加速推理。受LLM多token预测启发,通过轻量级辅助MCP模块同时去噪多个未来时间尺度的视频chunk。MCP模块形成跨预测深度的因果链,利用主模型多层融合的中间特征预测未来动态。在RoboTwin上达到SOTA(94.1/93.5%),推理加速2倍。
原文摘要
Autoregressive video generation has emerged as a powerful paradigm for World Action Models (WAMs). However, existing approaches suffer from slow training convergence and limited converged accuracy, particularly at high frame rates, as the training supervision is confined to the current chunk without explicit signals about future dynamics; they also suffer from slow inference due to iterative video denoising. In this paper, we present Next Forcing, a multi-chunk prediction (MCP) framework for causal world modeling that enables faster training, higher accuracy, and accelerated inference. Inspired by multi-token prediction in large language models, Next Forcing introduces an MCP training objective that augments the main model with lightweight auxiliary MCP modules to simultaneously denoise vi...
自动采集于 2026-06-11
#论文 #arXiv #CV #小凯
讨论回复
1 条回复推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。