论文概要
研究领域: ML
作者: Fatema Siddika, Md Anwar Hossen, Tanwi Mallick
发布时间: 2025-06-11
arXiv: 2506.08637
中文摘要
大语言模型的持续学习受可塑性-稳定性困境阻碍:获取新能力往往导致先前知识的灾难性遗忘。现有方法通常均匀对待参数,无法区分特定任务知识与共享能力。本文提出面向任务无关持续学习的稀疏专家混合框架(SETA),通过自适应稀疏子空间分解为任务特定专家模块来解决可塑性-稳定性冲突。与标准更新中任务争夺相同参数不同,SETA将知识分离为独特专家(隔离任务特定模式)和共享专家(捕捉共同特征)。该结构通过自适应弹性锚定和路由感知正则化维持,在权重和路由层面联合保护共享知识,并使统一门控网络在推理时自动检索正确的专家组合。跨领域基准实验表明,SETA总体性能优于现有持续学习基线,尤其在LLaMA-2 7B和Qwen3-4B上表现出极强的早期任务知识保持能力和改进的后向迁移。
原文摘要
Continual learning in Large Language Models (LLMs) is hindered by the plasticity-stability dilemma, where acquiring new capabilities often leads to catastrophic forgetting of previous knowledge. Existing methods typically treat parameters uniformly, failing to distinguish between specific task knowledge and shared capabilities. We introduce Mixture of Sparse Experts for Task Agnostic Continual Learning (SETA), a framework that resolves the plasticity-stability conflict through adaptive sparse subspace decomposition into task-specific expert modules. Unlike standard updates, where tasks compete for the same parameters, SETA separates knowledge into unique experts, designed to isolate task-specific patterns, and shared experts, responsible for capturing common features. This structure is mai...
自动采集于 2026-06-09
#论文 #arXiv #ML #小凯
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!
推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。