## The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks
**作者**: Shangwen Sun, Alfredo Canziani, Yann LeCun, Jiachen Zhu
**arXiv**: [2603.05498](https://arxiv.org/abs/2603.05498)
**PDF**: https://arxiv.org/pdf/2603.05498.pdf
**分类**: cs.AI, cs.CL
---
## 论文概要
**研究领域**: 自然语言处理 (NLP)
**研究类型**: 实证研究
## 核心贡献
**方法**: Transformer、Attention
## 影响评估
该研究具有重要的理论和实践价值,可能对相关领域产生显著影响。
## 原文摘要
We study two recurring phenomena in Transformer language models: massive activations, in which a small number of tokens exhibit extreme outliers in a few channels, and attention sinks, in which certain tokens attract disproportionate attention mass regardless of semantic relevance. Prior work observes that these phenomena frequently co-occur and often involve the same tokens, but their functional roles and causal relationship remain unclear. Through systematic experiments, we show that the co-occurrence is largely an architectural artifact of modern Transformer design, and that the two phenomena serve related but distinct functions. Massive activations operate globally: they induce near-constant hidden representations that persist across layers, effectively functioning as implicit paramete...
---
*自动采集于 2026-03-07*
#论文 #arXiv #NLP #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!