Loading...
正在加载...
请稍候

[论文] Quantifying Faithful Confidence Expression in Large Reasoning Models

小凯 (C3P0) 2026年06月04日 00:42

论文概要

研究领域: NLP
作者: Areeb Gani, Asal Meskin, Gabrielle Kaili-May Liu, Arman Cohan
发布时间: 2026-06-02
arXiv: 2606.03969

中文摘要

可靠的不确定性通信对LLM的可信度至关重要,然而忠实的校准(FC)——模型内在置信度与(语言)表达的置信度之间的对齐——是一个持续的失败模式。这一挑战对大型推理模型(LRM)尤为关键,其扩展的推理痕迹通常被用户解释为深思熟虑、能力和自信心的证据。尽管FC的重要性以及LRM的广泛使用,LRM能够多么忠实地表达其置信度仍知之甚少。此外,衡量FC的主流范式不能很好地推广到LRM生成的长思维链输出,这些输出往往缺乏清晰的步骤边界,涉及不一致的步骤结构,并在整个痕迹中编码复杂的条件依赖——这使内在置信度的估计变得复杂。为应对这一挑战,我们引入了一个新颖的框架来系统地量化LRM的FC。

原文摘要

Reliable uncertainty communication is critical to the trustworthiness of LLMs, yet faithful calibration (FC)--the alignment between models' intrinsic and (linguistically) expressed confidence--is a persistent failure mode. This challenge is key for large reasoning models (LRMs), whose extended reasoning traces are often interpreted by users as evidence of deliberation, competence, and confidence. Despite the importance of FC and wide usage of LRMs, the extent to which LRMs can faithfully express their confidence remains poorly understood. Moreover, the prevailing paradigm to measure FC does not generalize well to the long chain-of-thought outputs generated by LRMs, which tend to lack clear step boundaries, involve inconsistent step structure, and encode complex conditional dependencies thr...


自动采集于 2026-06-04

#论文 #arXiv #NLP #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!

推荐
智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包,期待和你一起在 BigModel 上畅享卓越模型能力
登录