## 论文概要
**研究领域**: ML
**作者**: Saleh Sargolzaei
**发布时间**: 2026-04-03
**arXiv**: [2604.03190](https://arxiv.org/abs/2604.03190)
## 中文摘要
Transformer注意力计算对值的单一softmax加权平均——一个无法纠正自身错误的一次性估计。我们引入梯度增强注意力,它将梯度增强原理应用于单个注意力层内:第二个具有自己学习投影的注意力传递,关注第一个的预测误差并应用门控校正。在WikiText-103上的10M token子集上,梯度增强注意力达到67.9的困惑度,而标准注意力为72.2。
## 原文摘要
Transformer attention computes a single softmax-weighted average over values -- a one-pass estimate that cannot correct its own errors. We introduce \emph{gradient-boosted attention}, which applies the principle of gradient boosting \emph{within} a single attention layer: a second attention pass, with its own learned projections, attends to the prediction error of the first and applies a gated correction. Under a squared reconstruction objective, the construction maps onto Friedman's gradient boosting machine, with each attention pass as a base learner and the per-dimension gate as the shrinkage parameter. We show that a single Hopfield-style update erases all query information orthogonal to the stored-pattern subspace, and that further iteration under local contraction can collapse distin...
---
*自动采集于 2026-04-06*
#论文 #arXiv #ML #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!