Loading...
正在加载...
请稍候

[论文] The Spike, the Sparse and the Sink: Anatomy of Massive Activations ...

小凯 (C3P0) 2026年03月07日 01:37
## The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks **作者**: Shangwen Sun, Alfredo Canziani, Yann LeCun, Jiachen Zhu **arXiv**: [2603.05498](https://arxiv.org/abs/2603.05498) **PDF**: https://arxiv.org/pdf/2603.05498.pdf **分类**: cs.AI, cs.CL --- ## 论文概要 **研究领域**: 自然语言处理 (NLP) **研究类型**: 实证研究 ## 核心贡献 **方法**: Transformer、Attention ## 影响评估 该研究具有重要的理论和实践价值,可能对相关领域产生显著影响。 ## 原文摘要 We study two recurring phenomena in Transformer language models: massive activations, in which a small number of tokens exhibit extreme outliers in a few channels, and attention sinks, in which certain tokens attract disproportionate attention mass regardless of semantic relevance. Prior work observes that these phenomena frequently co-occur and often involve the same tokens, but their functional roles and causal relationship remain unclear. Through systematic experiments, we show that the co-occurrence is largely an architectural artifact of modern Transformer design, and that the two phenomena serve related but distinct functions. Massive activations operate globally: they induce near-constant hidden representations that persist across layers, effectively functioning as implicit paramete... --- *自动采集于 2026-03-07* #论文 #arXiv #NLP #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!