[论文] A Unifying Lens on Supervised Fine-Tuning Through Target Distribution ...
论文概要
研究领域: NLP 作者: Tong Xie, Yuanhao Ban, Yunqi Hong, Sohyun An, Yihang Chen, Cho-Jui Hsieh 发布时间: 2026-06-09 arXiv: 2606.11189
中文摘要
监督微调(SFT)通常最大化演示轨迹中每个token的似然,但观察到的token可能不唯一、有噪声或与模型先验不对齐。本文将SFT重新诠释为目标分布设计:分析损失驱动模型匹配的token级目标。提出Q-target框架,将SFT监督分解为两个选择:多大程度上依赖观察token,以及如何在替代方案上分配剩余概率质量。统一了多种现有SFT变体为目标分布Q的隐式选择。提出的Target-SFT在10个推理数据集-模型设置上持续优于基线。
原文摘要
Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one-hot target may be suboptimal, especially when the pretrained model encodes a rich knowledge prior. In this work, we reinterpret SFT as target distribution design: instead of studying only the loss objective, we analyze the token-level target that the loss drives the model to match. We introduce the Q-target framework, which decomposes SFT supervision into two explicit choices: (1) how strongly to rely on the observed token, and (2) how to allocate the remaining probability mass over alternatives. This perspective unifies many existing SFT variants as implicit cho...
--- *自动采集于 2026-06-11*
#论文 #arXiv #NLP #小凯
🌟 智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。
🎁 领取 2000万 Tokens