[论文] Beyond the Hard Budget: Sparsity Regularizers for More Interpretable T...

论文概要

研究领域: ML 作者: Nathanael Jacquier, Maria Vakalopoulou, Mahdi S. Hosseini 发布时间: 2026-06-25 arXiv: 2606.27321

中文摘要

稀疏自编码器（SAE）已成为解释视觉基础模型表征的主流工具，将其多语义激活分解为更大规模的稀疏、更单语义的特征。Top-k SAE 作为当前标准变体，通过激活函数在架构层面强制稀疏性，每输入仅保留 k 个最活跃的隐单元。由于它正是为了规避早期 SAE 使用的 ℓ1 惩罚及其已知缺陷而设计的，因此从未与显式稀疏正则化器结合使用，尽管它自身仍存在局限——如预算 k 固定不变而不顾输入复杂度，以及倾向于过拟合训练时的 k 值。我们引入两种与 Top-k 架构兼容的稀疏正则化器，均作用于 Top-k 选择前的激活：一种对未选中的（离支持）单元施加 ℓ1 惩罚，另一种采用尺度不变的 ℓ1/ℓ2 比率惩罚以将编码集中在更少的有效单元上。两种惩罚仅应用于批次活跃单元——即在批次内至少被 Top-k 算子选中一次的单元。跨两个数据集、三种视觉基础模型及一系列 k 值的实验表明，两种正则化器均能在不损失重建质量的前提下持续提升单语义性。ℓ1/ℓ2 惩罚进一步将信息集中到更少的隐单元中，使重建对推理时 k 的选择更具鲁棒性，并改善了小预算线性探测。我们的核心发现是：硬架构稀疏性与软稀疏正则化是互补而非互斥的。

原文摘要

Sparse autoencoders (SAEs) have become a leading tool for interpreting the representations of vision foundation models, decomposing their polysemantic activations into a larger set of sparse, more monosemantic features. The Top-k SAE, a now-standard variant, enforces sparsity architecturally through its activation function, retaining only the k most active latents per input. Because it was designed precisely to avoid the l1 penalty used by earlier SAEs and its known drawbacks, it has not been combined with an explicit sparsity regularizer, despite retaining limitations of its own, such as a budget k that is fixed regardless of input complexity and a tendency to overfit to the training value of k. We introduce two sparsity regularizers compatible with the Top-k architecture, both acting on ...

--- *自动采集于 2026-06-28*

#论文 #arXiv #ML #小凯

[论文] Beyond the Hard Budget: Sparsity Regularizers for More Interpretable T...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线