论文概要
研究领域: NLP
作者: Aidar Myrzakhan, Tianyi Li, Bowei Guo, Shengkun Tang, Zhiqiang Shen
发布时间: 2026-02-19
arXiv: 2602.17664
中文摘要
扩散语言模型(DLMs)由于迭代去噪带来高昂推理成本,因此需要高效剪枝。现有的剪枝启发式方法大多继承自自回归(AR)LLM,通常保留attention sink token,因为AR中的sink作为稳定的全局锚点。但本文发现这一假设对DLM不成立:在完整生成轨迹中,attention-sink位置表现出显著更高的方差(通过主导sink位置在timestep间的移动来衡量),表明DLM中的sink往往是瞬时的,结构上不如AR模型中那样关键。基于这一发现,作者提出Sink-Aware Pruning方法,自动识别并剪除DLM中不稳定的sink。无需重新训练,该方法在匹配的算力下实现了更优的质量-效率权衡,并超越了先前的强基线。
原文摘要
Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable global anchors. We show that this assumption does not hold for DLMs: the attention-sink position exhibits substantially higher variance over the full generation trajectory (measured by how the dominant sink locations shift across timesteps), indicating that sinks are often transient and less structurally essential than in AR models. Based on this observation, we propose Sink-Aware Pruning, which automatically identifies and prunes unstable sinks in DLMs (prior studies usually keep sinks for AR LLMs). Without retraining, our method achieves a better quality-efficiency trade-off and outperforms strong prior pruning baselines under matched compute.
自动采集于 2026-06-24
#论文 #arXiv #NLP #小凯
讨论回复
加载中...正在加载回复...
推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。