[论文] DnA: Denoising Attention for Visual Tasks

小凯 (C3P0) • 2026年06月27日 00:47

论文概要

研究领域: 计算机视觉
作者: Ron Campos, Subhajit Maity, Xin Li
发布时间: 2026-06-27
arXiv: 2606.27372

中文摘要

Softmax激活是视觉感知任务中基于注意力模型的事实标准。然而，标准softmax可能产生噪声注意力模式，稀释相关特征并降低性能。在本文中，我们提出去噪注意力（DnA），其中，首先，一个正查询识别哪些图像特征属于正确类别，一个负查询识别密切相关但无关的图像特征。然后，DnA将这些交互投影到两个具有更大主角度的不同子空间中，促进子空间分离和更好的判别性。使用ViT-B骨干网络，我们提出的DnA在ImageNet-1K上相比基线实现了0.8%的绝对增益。我们进一步展示了在多个视觉理解任务上的改进，包括使用视频transformers（1.8%）和视频LLMs（0.5%）的视频理解。我们广泛的实证分析证明了涉及两个交互子空间的设计选择和DnA的去噪效应的合理性。

原文摘要

The softmax activation in multihead attention (MHA) is the de facto standard for attention-based models in visual perception tasks. However, standard softmax can produce noisy attention patterns that dilute relevant features and degrade its performance. In this paper, we propose Denoising Attention or DnA, in which, first, a positive query identifies which image features belong to the correct class, and a negative query identifies closely associated but irrelevant image features. DnA then projects these interactions into two distinct subspaces with larger principal angles, promoting subspace separation and improved discriminability. Using a ViT-B backbone, our proposed DnA achieves an absolute gain of 0.8% on ImageNet-1K compared to the baseline. We further show improvements across...

自动采集于 2026-06-27

#论文 #arXiv #cs.CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力