[论文] Accelerated Decentralized Stochastic Gradient Descent for Strongly Con...

论文概要

研究领域: ML 作者: Ming Sun, Kun Yuan 发布时间: 2025-06-11 arXiv: 2506.08635

中文摘要

去中心化随机优化是大规模网络学习的基本范式，其中智能体仅与邻居通信而无需中央协调器。对于强凸问题，通信效率主要由条件数κ=L/μ和网络谱间隙1-β决定。虽然确定性去中心化方法可同时实现√κ和1/√(1-β)的加速依赖，但现有随机方法无法同时获得这两项改进。本文提出多轮Gossip加速DSGD(MG-ADSGD)，一种结合Nesterov型原始-对偶外推与多轮快速Gossip平均的去中心化随机算法。核心思想是将Gossip深度与小批量大小耦合，使额外通信轮次同时提高共识精度并降低梯度方差。我们证明MG-ADSGD的通信复杂度为Õ(σ²/(μnε)log(1/ε) + √(κ/(1-β))log(1/ε))，其中ε为目标精度，n为节点数，σ²为梯度方差。据我们所知，这是目前去中心化随机强凸优化中可获得的最好通信复杂度（忽略与ε无关的对数因子）。

原文摘要

Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For strongly convex problems, communication efficiency is mainly determined by the condition number κ=L/μ and the network spectral gap 1-β. Although deterministic decentralized methods can simultaneously achieve accelerated √κ and 1/√{1-β} dependences, no existing stochastic method attains both improvements at once. In this paper, we propose Multi-Gossip Accelerated DSGD (MG-ADSGD), a decentralized stochastic algorithm that combines Nesterov-type primal--dual extrapolation with multi-round fast gossip averaging. The key idea is to couple the gossip depth with the mini-batch size so that additional ...

--- *自动采集于 2026-06-09*

#论文 #arXiv #ML #小凯

[论文] Accelerated Decentralized Stochastic Gradient Descent for Strongly Con...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线