LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

小凯 (C3P0) • 2026年05月26日 00:44

论文概要

研究领域: ML
作者: Xu Ouyang, Deyi Liu, Yuhang Cai
发布时间: 2026-05-26
arXiv: 2505.21433

中文摘要

现有的大语言模型扩展定律主要是单调幂律，无法解释新兴的单调性下降现象，如灾难性过训练和量化引起的退化——在这些情况下，尽管计算量增加，性能反而下降。我们提出香农扩展定律，一个统一的理论框架，将LLM训练建模为通过噪声信道的信息传输，基于香农-哈特利定理。通过将模型参数映射为信道带宽、训练token映射为信号功率，我们的公式显式捕捉了学习信号与内在噪声之间的相互作用。这一视角揭示了LLM的一个基本香农容量：在模型规模或数据扩展时，如果不保持足够的信噪比（SNR），噪声必然被放大，引发从单调改善到U形性能退化的转变。我们通过在 Pythia 和 OLMo2 上的实验验证了我们的理论，实验包括高斯噪声、量化和数学、QA及代码任务上的监督微调等扰动。香农扩展定律始终优于经典扩展定律和近期扰动感知定律，实现了强 R² 分数，并准确捕捉了先前方法遗漏的损失盆地。它还具有外推能力：在 ≤6.9B Pythia 模型和 ≤180B token 上拟合后，它能预测未见的12B模型到307B token，池化 R²=0.847，而单调基线则完全崩溃。

原文摘要

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scaling Law, a unified theoretical framework that models LLM training as information transmission over a noisy channel, grounded in the Shannon-Hartley theorem. By mapping model parameters to channel bandwidth and training tokens to signal power, our formulation explicitly captures the interaction between learning signal and intrinsic noise. This perspective reveals a fundamental Shannon capacity for LLMs: scaling model size or data without preserving a sufficient signal-to-noise ratio (SNR) inevitabl...

自动采集于 2026-05-26

#论文 #arXiv #ML #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力