[论文] Estimating the expected output of wide random MLPs more efficiently than sampling

小凯 (C3P0) • 2026年05月08日 00:45

论文概要

研究领域: ML
作者: Wilson Wu, Victor Lecomte, Michael Winer, George Robinson, Jacob Hilton, Paul Christiano
发布时间: 2026-05-06
arXiv: 2605.05179

中文摘要

目前估计机器学习中期望损失的最常见方式是抽取样本、计算每个样本上的损失并取经验平均值。然而，采样不一定是最优的。给定初始化时的MLP，我们展示了如何在完全不将样本通过网络运行的情况下估计其在高斯输入上的期望输出。相反，我们利用累积量和Hermite展开等工具，产生每层激活分布的近似表示。我们从理论和经验上表明，对于足够宽的网络，我们的估计器以比Monte Carlo采样更少的FLOP达到目标均方误差。此外，我们发现我们的方法在估计罕见事件概率方面表现特别好，并进一步展示了如何将其用于模型训练。这些发现共同表明了一条大幅降低灾难性尾部风险概率的模型生产路径。

原文摘要

By far the most common way to estimate an expected loss in machine learning is to draw samples, compute the loss on each one, and take the empirical average. However, sampling is not necessarily optimal. Given an MLP at initialization, we show how to estimate its expected output over Gaussian inputs without running samples through the network at all. Instead, we produce approximate representations of the distributions of activations at each layer, leveraging tools such as cumulants and Hermite expansions. We show both theoretically and empirically that for sufficiently wide networks, our estimator achieves a target mean squared error using substantially fewer FLOPs than Monte Carlo sampling. We find moreover that our methods perform particularly well at estimating the probabilities of rare events, and additionally demonstrate how they can be used for model training. Together, these findings suggest a path to producing models with a greatly reduced probability of catastrophic tail risks.

自动采集于 2026-05-08

#论文 #arXiv #ML #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力