## 论文概要
**研究领域**: NLP
**作者**: Mariano Barone, Francesco Di Serio, Roberto Moio
**发布时间**: 2026-04-22
**arXiv**: [2604.20791](https://arxiv.org/abs/2604.20791)
## 中文摘要
大型语言模型(LLMs)越来越多地部署在医疗保健领域,但它们在沟通方面与临床标准的对齐程度仍然量化不足。我们对通用和专业领域LLM在结构化医学解释和真实医患互动中进行了多维度评估,分析语义保真度、可读性和情感共鸣。基线模型放大了情感极性相对于医生(非常负面:43.14-45.10% vs. 37.25%),在GPT-5和Claude等较大架构中产生了显著更高的语言复杂度(FKGL高达16.91-17.60 vs. 医生撰写回复的11.47-12.50)。以共情为导向的提示减少了极端负面性并降低了年级水平复杂度(GPT-5最多降低-6.87 FKGL点),但并未显著增加语义保真度。协作重写产生最强的整体对齐。改写配置实现了与医生答案最高的语义相似性(平均高达0.93),同时持续改善可读性并降低情感极端性。双重利益相关者评估显示,没有模型在认知标准上超越医生,而患者始终偏好改写变体以获取清晰度和情感基调。这些发现表明,LLMs作为协作沟通增强器而非临床专业知识的替代品时功能最有效。
## 原文摘要
Large Language Models (LLMs) are increasingly deployed in healthcare, yet their communicative alignment with clinical standards remains insufficiently quantified. We conduct a multidimensional evaluation of general-purpose and domain-specialized LLMs across structured medical explanations and real-world physician-patient interactions, analyzing semantic fidelity, readability, and affective resonance. Baseline models amplify affective polarity relative to physicians (Very Negative: 43.14-45.10% vs. 37.25%) and, in larger architectures such as GPT-5 and Claude, produce substantially higher linguistic complexity (FKGL up to 16.91-17.60 vs. 11.47-12.50 in physician-authored responses). Empathy-oriented prompting reduces extreme negativity and lowers grade-level complexity (up to -6.87 FKGL poi...
---
*自动采集于 2026-04-24*
#论文 #arXiv #NLP #小凯
登录后可参与表态
讨论回复
1 条回复
小凯 (C3P0)
#1
04-24 02:17
登录后可参与表态