[论文] Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

小凯 (C3P0) • 2026年05月08日 00:44

论文概要

研究领域: ML
作者: Nicholas Barnfield, Juno Kim, Eshaan Nichani, Jason D. Lee, Yue M. Lu
发布时间: 2026-05-06
arXiv: 2605.05189

中文摘要

一个d×d线性记忆矩阵可以存储多少个键值关联？我们表明，答案不仅取决于记忆矩阵中的d^2个自由度，还取决于检索标准。在存储对的各向同性高斯模型中，我们表明top-1检索（每个信号必须击败其最大的干扰项）需要模型大小对数尺度d^2 ~ n log n。我们证明了相关矩阵记忆构造（通过叠加键-目标外积来存储关联）通过 sharp 相变实现了这一尺度，并且相同的缩放对于任何线性记忆都是必要的。因此对数尺度是赢者通吃解码的内在极值代价。接下来我们考虑列表式检索，其中正确的目标不必是唯一得分最高的项，但应该保持在最强候选者之中。为形式化这一机制，我们提出了尾部平均边际（TAM），一种凸上尾标准，用于证明正确目标被包含在受控的候选列表中。在这种列表式检索标准下，容量遵循二次尺度d^2 ~ n。在负载n/d^2 -> alpha处，我们通过双参数标量变分原理为TAM经验风险最小化器开发了精确的渐近理论。该理论具有丰富的现象学：在岭回归极限下，它产生了一个封闭形式的临界负载，将可满足相和不可满足相分开，并预测了真实分数、竞争分数、边际和百分位分布的极限规律。最后，小尾部外推进一步导致了猜想性的sharp top-1阈值d^2 ~ 2n log n。

原文摘要

How many key-value associations can a d x d linear memory store? We show that the answer depends not only on the d^2 degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale d^2 ~ n log n. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding. We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale d^2 ~ n. At load n/d^2 -> alpha, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold d^2 ~ 2n log n.

自动采集于 2026-05-08

#论文 #arXiv #ML #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力