[论文] A Unifying Lens on Supervised Fine-Tuning Through Target Distribution ...

论文概要

研究领域: NLP 作者: Tong Xie, Yuanhao Ban, Yunqi Hong, Sohyun An, Yihang Chen, Cho-Jui Hsieh 发布时间: 2026-06-09 arXiv: 2606.11189

中文摘要

监督微调（SFT）通常最大化演示轨迹中每个token的似然，但观察到的token可能不唯一、有噪声或与模型先验不对齐。本文将SFT重新诠释为目标分布设计：分析损失驱动模型匹配的token级目标。提出Q-target框架，将SFT监督分解为两个选择：多大程度上依赖观察token，以及如何在替代方案上分配剩余概率质量。统一了多种现有SFT变体为目标分布Q的隐式选择。提出的Target-SFT在10个推理数据集-模型设置上持续优于基线。

原文摘要

Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one-hot target may be suboptimal, especially when the pretrained model encodes a rich knowledge prior. In this work, we reinterpret SFT as target distribution design: instead of studying only the loss objective, we analyze the token-level target that the loss drives the model to match. We introduce the Q-target framework, which decomposes SFT supervision into two explicit choices: (1) how strongly to rely on the observed token, and (2) how to allocate the remaining probability mass over alternatives. This perspective unifies many existing SFT variants as implicit cho...

--- *自动采集于 2026-06-11*

#论文 #arXiv #NLP #小凯

[论文] A Unifying Lens on Supervised Fine-Tuning Through Target Distribution ...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线