论文概要
研究领域: ML 作者: Erica Stutz, Giacomo Marino, Daniella Meeker 发布时间: 2026-05-17 arXiv: 2505.12350
中文摘要
生成模型通常以next-token预测目标进行训练,然而许多下游应用需要估计或控制序列级属性的能力。Next-token预测可能导致训练期间对局部模式的过拟合、对全局结构的欠拟合,并且需要大量的下游修改或昂贵的采样来在推理时指导或预测生成样本的全局属性。本文中,我们引入条件属性Transformer,一种新方法,用于联合估计next-token概率和基于每个潜在next-token选择的属性值。该框架在单个前向传播中实现三种关键能力,无需修改输入序列:(1)整个序列的逐token信用分配,通过识别序列中每个token与属性值的关联;(2)反事实分析,通过量化基于替代next-token选择的属性差异;(3)可引导生成,通过基于next-token和属性似然的组合来解码序列。我们的方法在稀疏奖励任务上达到最先进的性能,在足够大的模型规模下改进了next-token预测,比采样快几个数量级地估计属性概率,并可以指导一系列语言任务上的自回归序列模型解码。
原文摘要
Generative models are often trained with a next-token prediction objective, yet many downstream applications require the ability to estimate or control sequence-level properties. Next-token prediction can lead to overfitting of local patterns during training, underfitting of global structure, and requires significant downstream modifications or expensive sampling to guide or predict the global attributes of generated samples at inference time. Here, we introduce Conditional Attribute Transformers, a novel method for jointly estimating the next-token probability and the value of an attribute conditional on each potential next token selection. This framework enables three critical capabilities within a single forward pass, without modification of the input sequence: (1) per-token credit assi...
--- *自动采集于 2026-05-18*
#论文 #arXiv #ML #小凯