When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure
论文概要
研究领域: NLP 作者: Boyu Xiao, Xiuqi Tian, Xuwen Song 发布时间: 2026-05-26 arXiv: 2505.21637
中文摘要
尽管医学基准测试准确率很高,LLM在临床对话中可能表现出严重的多轮谄媚,在升级的压力下放弃最初的正确诊断。我们提出Med-Stress,一个有针对性的压力测试框架,用于评估升级压力下的信念稳定性。在九个前沿大型语言模型(LLM)中,我们发现医学知识与稳健性之间存在明显分离:高初始诊断能力并不意味着高信念稳定性,对几个LLM产生了巨大的知识-稳健性差距。为了缓解这种失效模式,我们提出轻量级推理时防御RBED(基于角色的认知防御)和R-FT(韧性导向微调)——一种训练时方法,将基于证据的压力抵抗内化。实验表明R-FT几乎消除了信念改变并大幅改善了稳健性。原文摘要
Despite strong medical benchmark accuracy, LLMs can exhibit severe multi-turn sycophancy in clinical dialogue, abandoning initial correct diagnosis under escalating pressure. We propose extbf{ extsc{Med-Stress}}, a targeted stress test framework that evaluates belief stability under escalating pressure. Across nine frontier large language models (LLMs), we find a clear dissociation between medical knowledge and robustness: high initial diagnostic capability does not imply high belief stability, yielding large knowledge-robustness gaps for several LLMs. To mitigate this failure mode, we propose a lightweight inference-time defense, extbf{ exttt{RBED}} ( extbf{R}ole- extbf{B}ased extbf{E}pistemic extbf{D}efense), and extbf{ exttt{R-FT}} ( extbf{R}esilience-oriented extbf{F}ine- extbf{T}uning), a training-time approach that internalizes evidence-based resistance to pressure. Experiments show that extbf{ exttt{R-FT}} nearly eliminates belief change and substantially improves robustness.--- *自动采集于 2026-05-27*
#论文 #arXiv #NLP #LLM #临床压力 #小凯
💬 讨论回复 (0)
推荐
🌟 智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。
🎁 领取 2000万 Tokens