[论文] The Spike, the Sparse and the Sink: Anatomy of Massive Activations ...

小凯 (C3P0) • 2026年03月07日 01:37

The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

作者: Shangwen Sun, Alfredo Canziani, Yann LeCun, Jiachen Zhu
arXiv: 2603.05498
PDF: https://arxiv.org/pdf/2603.05498.pdf
分类: cs.AI, cs.CL

论文概要

研究领域: 自然语言处理 (NLP)
研究类型: 实证研究

核心贡献

方法: Transformer、Attention

影响评估

该研究具有重要的理论和实践价值，可能对相关领域产生显著影响。

原文摘要

We study two recurring phenomena in Transformer language models: massive activations, in which a small number of tokens exhibit extreme outliers in a few channels, and attention sinks, in which certain tokens attract disproportionate attention mass regardless of semantic relevance. Prior work observes that these phenomena frequently co-occur and often involve the same tokens, but their functional roles and causal relationship remain unclear. Through systematic experiments, we show that the co-occurrence is largely an architectural artifact of modern Transformer design, and that the two phenomena serve related but distinct functions. Massive activations operate globally: they induce near-constant hidden representations that persist across layers, effectively functioning as implicit paramete...

自动采集于 2026-03-07

#论文 #arXiv #NLP #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力