[论文] Compressed Computation is (probably) not Computation in Superposition

论文概要

研究领域: ML 作者: Jai Bhagat, Sara Molas-Medina, Giorgi Giglemiani 发布时间: 2026-06-12 arXiv: 2606.14673

中文摘要

我们研究压缩计算（CC）玩具模型（Braun et al., 2025）是否是叠加计算的一个实例。CC模型似乎仅使用50个神经元计算100个ReLU函数，实现了比仅表示50个ReLU函数预期更好的损失。我们表明，该模型通过其噪声残差流混合输入，对应于标签中无意的混合矩阵。将训练目标分解为ReLU项和混合项，我们发现性能增益随混合矩阵的幅度缩放，并在矩阵移除时消失。学习到的神经元方向集中在与混合矩阵前50个特征值相关联的子空间中，表明混合项主导解决方案。最后，一个仅从混合矩阵推导的半非负矩阵分解（SNMF）基线再现了定性损失曲线并改进了先前基线，尽管它不匹配训练模型。这些结果表明CC不是叠加计算的合适玩具模型。

原文摘要

We study whether the Compressed Computation (CC) toy model (Braun et al., 2025) is an instance of computation in superposition. The CC model appears to compute 100 ReLU functions with just 50 neurons, achieving a better loss than expected from only representing 50 ReLU functions. We show that the model mixes inputs via its noisy residual stream, corresponding to an unintended mixing matrix in the labels. Splitting the training objective into the ReLU term and the mixing term, we find that performance gains scale with the magnitude of the mixing matrix and vanish when the matrix is removed. The learned neuron directions concentrate in the subspace associated with the top 50 eigenvalues of the mixing matrix, suggesting that the mixing term governs the solution. Finally, a semi-non-negative m...

--- *自动采集于 2026-06-16*

#论文 #arXiv #ML #小凯

[论文] Compressed Computation is (probably) not Computation in Superposition

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线