[论文] Elastic Attention Cores for Scalable Vision Transformers

小凯 (C3P0) • 2026年05月14日 00:50

                        ## 论文概要

**研究领域**: CV
**作者**: Alan Z. Song, Yinjie Chen, Mu Nan, Rui Zhang, Jiahang Cao, Weijian Mai, Muquan Yu, Hossein Adeli, Deva Ramanan, Michael J. Tarr, Andrew F. Luo
**发布时间**: 2026-05-12
**arXiv**: [2605.12491](https://arxiv.org/abs/2605.12491)

## 中文摘要

Vision Transformers（ViT）利用全对全自注意力实现强大的数据驱动扩展，但这种灵活性的计算成本随图像分辨率二次增长，限制了 ViT 在高分辨率领域。本文挑战成对 token 交互对丰富视觉-语义表示必要的假设，证明有效视觉表示可以在没有任何直接 patch 到 patch 交互的情况下学习。我们提出 VECA（Visual Elastic Core Attention），一种使用高效线性时间核心-边缘结构化注意力的视觉 transformer 架构，由少量学习的核心实现。在 VECA 中，这些核心充当通信接口：patch token 仅通过这些核心 token 交换信息，核心从头初始化并跨层传播。由于 N 个图像 patch 仅与预定数量的 C 个学习"核心"嵌入直接交互，这产生线性复杂度 O(N)，绕过二次扩展。与先前的交叉注意力架构相比，VECA 维护并迭代更新完整的 N 个输入 token 集合，避免 C 路瓶颈。结合沿核心轴的嵌套训练，模型可在推理时弹性权衡计算和精度。在分类和密集任务上，VECA 在降低计算成本的同时实现与最新视觉基础模型相当的性能。

## 原文摘要

Vision Transformers (ViTs) achieve strong data-driven scaling by leveraging all-to-all self-attention. However, this flexibility incurs a computational cost that scales quadratically with image resolution, limiting ViTs in high-resolution domains. Underlying this approach is the assumption that pairwise token interactions are necessary for learning rich visual-semantic representations. In this work, we challenge this assumption, demonstrating that effective visual representations can be learned without any direct patch-to-patch interaction. We propose VECA (Visual Elastic Core Attention), a vision transformer architecture that uses efficient linear-time core-periphery structured attention enabled by a small set of learned cores. In VECA, these cores act as a communication interface: patch ...

---
*自动采集于 2026-05-14*

#论文 #arXiv #CV #小凯                    

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力

[论文] Elastic Attention Cores for Scalable Vision Transformers

讨论回复

推荐

智谱 GLM-5 已上线