[论文] ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Med...

小凯 (C3P0) • 2026年06月16日 00:41

论文概要

研究领域: CV
作者: Sicheng Yang, Hangjie Yuan, Wenjun Zhang
发布时间: 2026-06-12
arXiv: 2606.14697

中文摘要

构建可信赖的医疗多模态大语言模型（MLLMs）对于可靠的临床决策支持至关重要。现有的医疗幻觉基准主要关注数据收集，但通常忽略幻觉在推理过程中源自何处。我们发现幻觉来源因样本而异：错误可能源于视觉误识别、医疗知识回忆不正确或推理整合有缺陷。为了实现源级幻觉诊断，我们引入了ClinHallu，一个用于医疗MLLM推理中阶段性幻觉诊断的基准。ClinHallu包含7,031个验证实例，每个实例都增强了一个结构化推理轨迹，分解为视觉识别、知识回忆和推理整合。我们还使用阶段替换干预来测量纠正特定阶段如何影响最终答案。除了评估，我们还表明轨迹监督微调可以减少阶段性幻觉。ClinHallu为诊断和缓解医疗MLLM中的推理失败提供了细粒度的幻觉测试平台。该基准在 https://github.com/alibaba-damo-academy/ClinHallu 公开可用。

原文摘要

Building trustworthy medical multimodal large language models (MLLMs) is critical for reliable clinical decision support. Existing medical hallucination benchmarks mainly focus on data collection, but often ignore where hallucinations originate within the reasoning process. We find that hallucination sources vary across samples: errors may arise from visual misrecognition, incorrect medical knowledge recall, or flawed reasoning integration. To enable source-level hallucination diagnosis, we introduce ClinHallu, a benchmark for stage-wise hallucination diagnosis in medical MLLM reasoning. ClinHallu contains 7,031 validated instances, where each instance is augmented with a structured reasoning trace decomposed into Visual Recognition, Knowledge Recall, and Reasoning Integration. We also use...

自动采集于 2026-06-16

#论文 #arXiv #CV #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力