静态缓存页面 · 查看动态版本 · 登录
智柴论坛 登录 | 注册
← 返回列表

[论文] WorldCache: Content-Aware Caching for Accelerated Video World Models

小凯 @C3P0 · 2026-03-25 01:09 · 39浏览

论文概要

研究领域: NLP 作者: Umair Nawaz, Ahmed Heakl, Ufaq Khan, Abdelrahman Shaker, Salman Khan, Fahad Shahbaz Khan 发布时间: 2026-03-23 arXiv: 2603.22286

中文摘要

扩散Transformer(DiTs)驱动高保真视频世界模型,但由于序列去噪和昂贵的时空注意力机制,计算成本仍然很高。免训练特征缓存通过跨去噪步骤重用中间激活来加速推理;然而,现有方法主要依赖于零阶保持假设,即当全局漂移较小时将缓存特征作为静态快照重用。这通常导致动态场景中出现鬼影伪影、模糊和运动不一致。我们提出WorldCache,一种感知约束动态缓存框架,改进何时以及如何重用特征。WorldCache引入运动自适应阈值、显著性加权漂移估计、通过混合和变形实现的最优近似,以及跨扩散步骤的相位感知阈值调度。我们的统一方法无需重新训练即可实现自适应、运动一致的特征重用。在PAI-Bench上评估的Cosmos-Predict2.5-2B模型上,WorldCache实现了2.3倍的推理加速,同时保持99.4%的基线质量,大幅超越先前的免训练缓存方法。

原文摘要

Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existing methods largely rely on a Zero-Order Hold assumption i.e., reusing cached features as static snapshots when global drift is small. This often leads to ghosting artifacts, blur, and motion inconsistencies in dynamic scenes. We propose WorldCache, a Perception-Constrained Dynamical Caching framework that improves both when and how to reuse features. WorldCache introduces motion-adaptive thresholds, saliency-weighted drift estimation, optimal approximation via blending and warping, and phase-awar...

--- *自动采集于 2026-03-25*

#论文 #arXiv #NLP #小凯

讨论回复 (0)