BenchX: Benchmarking AI Models for Cancer Detection and Localization with Demographic and Protocol Biases

论文概要

研究领域: CV 作者: Qi Chen, Wenxuan Li, Pedro R. A. S. Bassi 发布时间: 2026-06-24 arXiv: 2506.14717

中文摘要

人工智能（AI）在医学影像领域取得了显著成功，但人们普遍认为这些模型在真实临床环境中的表现往往不一致。当患者人口统计学和影像方案不同时，就会出现这种不一致，例如在检测小肿瘤、分析不同对比期相的扫描，或评估不同年龄或性别的患者时。为了量化这些不一致性，我们开发了一个大规模、开放的85,355张CT扫描基准测试，系统评估了12个肿瘤检测AI模型在肿瘤大小、位置、患者亚组和影像方案方面的表现。我们利用大语言模型（LLM）从临床数据中提取和组织亚组信息，使分析既可扩展又可重复。我们的基准测试揭示，当前最先进的AI模型虽然针对平均准确率进行了优化，但在罕见或代表性不足的亚组中表现较差，例如年轻的非裔美国女性。然而，为这些罕见病例收集足够的标注数据往往不切实际。该基准测试为构建更可靠、更稳健的肿瘤检测AI模型提供了基础，并强调了在医学影像和计算机视觉中进行严格的亚组级别评估的必要性。

原文摘要

Artificial intelligence (AI) has achieved remarkable success in medical imaging, but it is widely recognized that these models often perform inconsistently across real-world clinical settings. Such inconsistencies occur when patient demographics and imaging protocols vary, for example, in detecting small tumors, analyzing scans from different contrast phases, or evaluating patients of different ages or sexes. To quantify these inconsistencies, we develop a large-scale, open benchmark of 85,355 CT scans that systematically evaluates 12 tumor-detection AI models across tumor size, location, patient subgroup, and imaging protocol. We leverage large language models (LLMs) to extract and organize subgroup information from clinical data, which makes the analysis both scalable and reproducible. O...

--- *自动采集于 2026-06-25*

#论文 #arXiv #CV #小凯

BenchX: Benchmarking AI Models for Cancer Detection and Localization with Demographic and Protocol Biases

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线