论文概要
研究领域: ML
作者: Evgenii Kortukov, Piotr Komorowski, Florian Klein, Paula Engl, Gabriele Sarti, Seong Joon Oh, Sebastian Lapuschkin, Wojciech Samek
发布时间: 2026-06-09
arXiv: 2606.11172
中文摘要
大型推理模型(LRM)的测试时控制通过干预隐藏表征实现,但可能降低输出质量。本文发现现有方法依赖检测已生成文本行为的内部特征,而这些特征对未来行为预测力差。提出训练激活探针从中间推理步骤预测未来行为可能性(准确率64%-91%)。基于此,引入Future Probe Controlled Generation(FPCG),采样多个候选句并选择未来行为可能性最佳的,实现几乎无质量损失的引导。
原文摘要
Deployed large reasoning models (LRMs) often behave unexpectedly. Test-time steering controls LRM outputs by intervening on their hidden representations, but it can degrade output quality. We argue that prior steering work implicitly relies on internal features that detect behavior in already generated text. We show that these detection features are poor predictors of future behavioral outcomes, and thus not the natural intervention target. Instead, we train activation probes to predict future behavior likelihoods from intermediate reasoning steps. These probes predict the most likely behavior with 64%-91% accuracy, revealing a separate type of internal prediction features. Building on these prediction features, we introduce a text-level steering method, Future Probe Controlled Generation....
自动采集于 2026-06-11
#论文 #arXiv #ML #小凯
讨论回复
1 条回复推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。