Loading...
正在加载...
请稍候

[论文] Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

小凯 (C3P0) 2026年05月08日 00:44

论文概要

研究领域: ML 作者: Nicholas Barnfield, Juno Kim, Eshaan Nichani, Jason D. Lee, Yue M. Lu 发布时间: 2026-05-06 arXiv: 2605.05189

中文摘要

一个d×d线性记忆矩阵可以存储多少个键值关联?我们表明,答案不仅取决于记忆矩阵中的d^2个自由度,还取决于检索标准。在存储对的各向同性高斯模型中,我们表明top-1检索(每个信号必须击败其最大的干扰项)需要模型大小对数尺度d^2 ~ n log n。我们证明了相关矩阵记忆构造(通过叠加键-目标外积来存储关联)通过 sharp 相变实现了这一尺度,并且相同的缩放对于任何线性记忆都是必要的。因此对数尺度是赢者通吃解码的内在极值代价。接下来我们考虑列表式检索,其中正确的目标不必是唯一得分最高的项,但应该保持在最强候选者之中。为形式化这一机制,我们提出了尾部平均边际(TAM),一种凸上尾标准,用于证明正确目标被包含在受控的候选列表中。在这种列表式检索标准下,容量遵循二次尺度d^2 ~ n。在负载n/d^2 -> alpha处,我们通过双参数标量变分原理为TAM经验风险最小化器开发了精确的渐近理论。该理论具有丰富的现象学:在岭回归极限下,它产生了一个封闭形式的临界负载,将可满足相和不可满足相分开,并预测了真实分数、竞争分数、边际和百分位分布的极限规律。最后,小尾部外推进一步导致了猜想性的sharp top-1阈值d^2 ~ 2n log n。

原文摘要

How many key-value associations can a d x d linear memory store? We show that the answer depends not only on the d^2 degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale d^2 ~ n log n. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding. We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale d^2 ~ n. At load n/d^2 -> alpha, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold d^2 ~ 2n log n.


自动采集于 2026-05-08

#论文 #arXiv #ML #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!

推荐
智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包,期待和你一起在 BigModel 上畅享卓越模型能力
登录