论文概要
研究领域: NLP 作者: Wenxuan Ye, Yangyang Zhang, Xueli An 发布时间: 2025-04-30 arXiv: 2504.20801
中文摘要
小语言模型(SLMs)在可扩展部署方面具有计算效率优势,但其推理能力往往不及大语言模型(LLMs)。为缩小这一差距,当前方法在推理分歧点调用LLM生成token,但这些外部调用带来了显著的延迟和成本。另一种思路是标准蒸馏,但受限于容量,SLMs难以准确模仿LLM复杂的生成分布。我们通过识别"局部充分性"来解决这一困境:在分歧点,LLM偏好的token始终位于SLM的top-K下一个token预测之中,即使它未能成为SLM的top-1选择。因此,我们提出SELECT TO THINK(S2T),将LLM的角色从开放式生成重构为在SLM的候选方案中进行选择,将监督信号简化为离散的候选排序。在此基础上,我们引入S2T-LOCAL,将选择逻辑蒸馏到SLM中,使其能够在推理时无需依赖LLM进行自主重排序。实验表明,1.5B SLM的top-8候选能以95%的命中率捕获32B LLM的选择。将这一潜力转化为性能,S2T-LOCAL在各基准上平均将贪婪解码提升24.1%,效果等同于8路径自一致性,但仅使用单条轨迹的计算开销。
原文摘要
Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger counterparts (LLMs). To mitigate this gap, current approaches invoke an LLM to generate tokens at points of reasoning divergence, but these external calls introduce substantial latency and costs. Alternatively, standard distillation is often hindered by the capacity limitation, as SLMs struggle to accurately mimic the LLM's complex generative distribution. We address this dilemma by identifying local sufficiency: at divergence points, the LLM's preferred token consistently resides within the SLM's top-K next-token predictions, even when failing to emerge as the SLM top-1 choice. We therefore propose SELECT TO THINK (S2T), which refr...
--- *自动采集于 2026-05-01*
#论文 #arXiv #NLP #小凯