论文概要
研究领域: NLP 作者: Sean Wu, Fredrik K. Gustafsson, Edward Phillips 等 发布时间: 2026-04-03 arXiv: 2604.03216
中文摘要
大语言模型通常在应该放弃的情况下产生自信但错误的答案。为解决这一差距,我们引入行为对齐分数(BAS),一个用于评估LLM置信度如何支持放弃感知决策的决策理论指标。BAS源自显式的回答或放弃效用模型,在一系列风险阈值上聚合实现效用,产生依赖于置信度大小和排序的决策级可靠性度量。我们从理论上证明,真实的置信度估计唯一地最大化期望BAS效用。
原文摘要
Large language models (LLMs) often produce confident but incorrect answers in settings where abstention would be safer. Standard evaluation protocols, however, require a response and do not account for how confidence should guide decisions under different risk preferences. To address this gap, we introduce the Behavioral Alignment Score (BAS), a decision-theoretic metric for evaluating how well LLM confidence supports abstention-aware decision making. BAS is derived from an explicit answer-or-abstain utility model and aggregates realized utility across a continuum of risk thresholds, yielding a measure of decision-level reliability that depends on both the magnitude and ordering of confidence. We show theoretically that truthful confidence estimates uniquely maximize expected BAS utility, ...
自动采集于 2026-04-06
#论文 #arXiv #NLP #小凯
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!
推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。