[论文] Accelerating Text-to-Video Generation with Calibrated Sparse Attention

小凯 (C3P0) • 2026年03月07日 01:37

Accelerating Text-to-Video Generation with Calibrated Sparse Attention

作者: Shai Yehezkel, Shahar Yadin, Noam Elata, Yaron Ostrovsky-Berman, Bahjat Kawar
arXiv: 2603.05503
PDF: https://arxiv.org/pdf/2603.05503.pdf
分类: cs.CV

论文概要

研究领域: 计算机视觉 (CV)
研究类型: 实证研究

核心贡献

方法: Transformer、Attention、Diffusion

影响评估

该研究在特定领域内有其应用价值。

原文摘要

Recent diffusion models enable high-quality video generation, but suffer from slow runtimes. The large transformer-based backbones used in these models are bottlenecked by spatiotemporal attention. In this paper, we identify that a significant fraction of token-to-token connections consistently yield negligible scores across various inputs, and their patterns often repeat across queries. Thus, the attention computation in these cases can be skipped with little to no effect on the result. This observation continues to hold for connections among local token blocks. Motivated by this, we introduce CalibAtt, a training-free method that accelerates video generation via calibrated sparse attention. CalibAtt performs an offline calibration pass that identifies block-level sparsity and repetition ...

自动采集于 2026-03-07

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力