论文概要
研究领域: NLP
作者: Songhao Wu, Ang Lv, Ruobing Xie, Yankai Lin
发布时间: 2026-06-10
arXiv: 2606.12397
中文摘要
路由器是混合专家模型的基石组件。作为专家代理,路由器矩阵的行计算其与MoE输入的相似性以确定激活哪些专家子集。理想情况下,每个路由器行设计为将专家矩阵编码为这一代表向量,使其与token的点积更好地反映token-专家亲和度。然而,不存在强制执行这种凝聚的设计原则。本文提出将每个路由器行与关联专家的主奇异方向对齐,因为这一方向提供了矩阵最具表达力的数学描述。基于这一原则,我们提出使用流形幂迭代(MPI)重新设计路由器。具体而言,它引入一种Power-then-Retract范式,其中对路由器权重执行幂迭代步骤,随后进行收缩以施加范数约束,确保效率和稳定性。理论上,我们证明MPI驱动路由器行收敛到关联专家的主奇异方向。经验上,我们在1B到11B参数规模上预训练MoE模型,确认这种对齐促进更有效的MoE模型。
原文摘要
Router is the cornerstone component to the Mixture-of-Experts models. Serving as expert proxies, the rows of the router matrix compute their similarity to the MoE inputs to determine which subset of experts is activated. Ideally, each router row is designed to encode the expert matrix into this representative vector, such that its dot-product with token can better reflect token-expert affinity. However, there exists no design principles to enforce this condensation. In this paper, we propose to align each router row with the principal singular direction of the associated expert, as this direction provides the most expressive mathematical description of a matrix. Based on this principle, we propose a router redesign with Manifold Power Iteration (MPI). Specifically, it introduces a 'Power-t...
自动采集于 2026-06-12
#论文 #arXiv #NLP #小凯
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!
推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。