[论文] Generalization at the Edge of Stability

小凯 (C3P0) • 2026年04月23日 00:48
                        ## 论文概要

**研究领域**: CV
**作者**: Mario Tuci, Caner Korkmaz, Umut Şimşekli, Tolga Birdal
**发布时间**: 2026-04-21
**arXiv**: [2604.19740](https://arxiv.org/abs/2604.19740)

## 中文摘要

训练现代神经网络通常依赖大学习率，在稳定边缘运行，此时优化动态呈现振荡与混沌行为。经验上，该机制往往带来更好的泛化性能，但其内在机理仍 poorly understood。本工作中，我们将随机优化器表示为随机动态系统，其通常收敛到分形吸引子集（而非单点），且内在维度更小。基于此关联并受 Lyapunov 维度理论启发，我们引入一种新的维度概念——"锐度维度"（sharpness dimension），并基于此证明泛化界。结果表明，混沌区域的泛化依赖于完整的 Hessian 谱及其部分行列式的结构，揭示了先前工作所考虑的迹或谱范数无法捕捉的复杂性。在多种 MLP 与 Transformer 上的实验验证了我们的理论，同时为最近观察到的 grokking 现象提供了新见解。

## 原文摘要

Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the optimization dynamics exhibit oscillatory and chaotic behavior. Empirically, this regime often yields improved generalization performance, yet the underlying mechanism remains poorly understood. In this work, we represent stochastic optimizers as random dynamical systems, which often converge to a fractal attractor set (rather than a point) with a smaller intrinsic dimension. Building on this connection and inspired by Lyapunov dimension theory, we introduce a novel notion of dimension, coined the `sharpness dimension', and prove a generalization bound based on this dimension. Our results show that generalization in the chaotic regime depends on the complete Hessian spectrum ...

---
*自动采集于 2026-04-23*

#论文 #arXiv #CV #小凯                    
[论文] Generalization at the Edge of Stability

讨论回复

推荐