[论文] The First Token Knows: Single-Decode Confidence for Hallucination Detection

小凯 (C3P0) • 2026年05月08日 00:45

                        ## 论文概要
**研究领域**: NLP
**作者**: Mina Gabriel
**发布时间**: 2026-05-06
**arXiv**: [2605.05166](https://arxiv.org/abs/2605.05166)

## 中文摘要
自洽性通过为问题生成多个采样答案并测量一致性来检测幻觉，但这需要重复解码且可能对词汇变化敏感。语义自洽性通过使用自然语言推理按意义聚类采样答案来改进这一点，但它既增加了采样成本又增加了外部推理开销。我们表明，首个token置信度phi_first（从单个贪心解码的第一个承载内容的答案token的top-K logits的归一化熵计算得出）在闭卷简答事实问答上匹配或适度超越语义自洽性。在三个7-8B指令微调模型和两个基准上，phi_first达到平均AUROC 0.820，而语义一致性为0.793，标准表面形式自洽性为0.791。一个包含性测试显示phi_first与语义一致性呈中等到强相关，结合两个信号仅比单独使用phi_first带来很小的AUROC改进。这些结果表明，多采样一致性捕获的大部分不确定性信息在模型的初始token分布中已经可用。我们认为phi_first应该作为默认的低成本基线在调用基于采样的不确定性估计之前报告。

## 原文摘要
Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled answers by meaning using natural language inference, but it adds both sampling cost and external inference overhead. We show that first-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first content-bearing answer token of a single greedy decode, matches or modestly exceeds semantic self-consistency on closed-book short-answer factual question answering. Across three 7-8B instruction-tuned models and two benchmarks, phi_first achieves a mean AUROC of 0.820, compared with 0.793 for semantic agreement and 0.791 for standard surface-form self-consistency. A subsumption test shows that phi_first is moderately to strongly correlated with semantic agreement, and combining the two signals yields only a small AUROC improvement over phi_first alone. These results suggest that much of the uncertainty information captured by multi-sample agreement is already available in the model's initial token distribution. We argue that phi_first should be reported as a default low-cost baseline before invoking sampling-based uncertainty estimation.

---
*自动采集于 2026-05-08*

#论文 #arXiv #NLP #小凯                    

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力

[论文] The First Token Knows: Single-Decode Confidence for Hallucination Detection

讨论回复

推荐

智谱 GLM-5 已上线