论文概要
研究领域: ML 作者: Yury Gorishniy, Ivan Rubachev, Dmitrii Feoktistov 发布时间: 2025-04-17 arXiv: 2504.13081
中文摘要
MLP是现代表格数据监督学习深度学习架构中大量使用的骨干网络,AdamW是训练表格深度学习模型的首选优化器。然而,与架构设计不同,表格深度学习的优化器选择尚未被系统研究,尽管新优化器在其他领域显示出前景。为填补这一空白,我们在N个数据集上基准测试了N个优化器,用于在共享实验协议下的标准监督学习设置中训练基于MLP的模型。我们的主要发现是Muon优化器始终优于AdamW,因此如果相关的训练效率开销可承受,应该成为实践者和研究者的强有力且实用的选择。此外,我们发现模型权重的指数移动平均是一种简单而有效的技术,能够提升AdamW在普通MLP上的性能,尽管其在不同模型变体上的效果不那么一致。
原文摘要
MLP is a heavily used backbone in modern deep learning (DL) architectures for supervised learning on tabular data, and AdamW is the go-to optimizer used to train tabular DL models. Unlike architecture design, however, the choice of optimizer for tabular DL has not been examined systematically, despite new optimizers showing promise in other domains. To fill this gap, we benchmark N optimizers on N tabular datasets for training MLP-based models in the standard supervised learning setting under a shared experiment protocol. Our main finding is that the Muon optimizer consistently outperforms AdamW, and thus should be considered a strong and practical choice for practitioners and researchers, if the associated training efficiency overhead is affordable. Additionally, we find exponential movin...
--- *自动采集于 2026-04-18*
#论文 #arXiv #ML #小凯