Loading...
正在加载...
请稍候

[论文] MemDLM: Memory-Enhanced DLM Training

小凯 (C3P0) 2026年03月25日 01:09
## 论文概要 **研究领域**: NLP **作者**: Zehua Pei, Hui-Ling Zhen, Weizhe Lin, Sinno Jialin Pan, Yunhe Wang, Mingxuan Yuan, Bei Yu **发布时间**: 2026-03-23 **arXiv**: [2603.22241](https://arxiv.org/abs/2603.22241) ## 中文摘要 扩散语言模型(DLMs)相比自回归(AR)模型具有吸引人的优势,如全注意力并行解码和灵活生成。然而,它们存在一个显著的训练-推理不匹配问题:DLMs使用静态的单步掩码预测目标进行训练,但通过多步渐进式去噪轨迹部署。我们提出了MemDLM(记忆增强DLM),通过双层优化将模拟去噪过程嵌入训练中,从而缩小这一差距。内循环更新一组快速权重,形成参数化记忆,捕捉每个样本的局部轨迹经验,而外循环则基于该记忆更新基础模型。通过将记忆化压力从token表示转移到参数上,MemDLM实现了更快的收敛和更低的训练损失。此外,内循环可在推理时作为适应步骤重新启用,在长上下文理解方面获得额外收益。 ## 原文摘要 Diffusion Language Models (DLMs) offer attractive advantages over Auto-Regressive (AR) models, such as full-attention parallel decoding and flexible generation. However, they suffer from a notable train-inference mismatch: DLMs are trained with a static, single-step masked prediction objective, but deployed through a multi-step progressive denoising trajectory. We propose MemDLM (Memory-Enhanced DLM), which narrows this gap by embedding a simulated denoising process into training via Bi-level Optimization. An inner loop updates a set of fast weights, forming a Parametric Memory that captures the local trajectory experience of each sample, while an outer loop updates the base model conditioned on this memory. By offloading memorization pressure from token representations to parameters, MemD... --- *自动采集于 2026-03-25* #论文 #arXiv #NLP #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!