[论文] How Seemingly Inconsequential Design Choices Dictate Performance ...
论文概要
研究领域: CV 作者: Kian R. Weihrauch, Thomas A. Buckley, William Lotter, Arjun K. Manrai 发布时间: 2026-06-10 arXiv: 2606.12407
中文摘要
通用大型语言模型在评估全切片图像(WSI)上的专用病理模型时 routinely 被用作基线。由于WSI超过当代模型上下文限制,LLM基线 routinely 使用小的高倍率patch独立处理,通过多数投票,而未系统评估看似微不足道的设计选择如patch大小、patch数量和倍率。通用LLM一直表现不如专用系统,强化了领域特定训练或架构适应对涉及WSI的病理任务必要的认知。本文对四个输入设计因素进行系统因子分析:推理模式、patch大小、倍率和patch数量。我们证明先前研究通过选择非优化的输入配置夸大了专用模型与通用LLM之间的差距。在MultiPathQA基准上,切换到单一平衡配置(低倍率大patch,联合处理)将GPT-5在癌症类型分类(TCGA)上从15.1%提升至39.5%,在器官分类(GTEx)上从38.1%提升至62.9%。每任务优化进一步带来高达43.9%(TCGA)和71.6%(GTEx)的提升。相同配置推广到另外两个模型和完全保留的CPTAC队列,无需任何任务特定调优即可将Gemini 3 Flash提升23.4个百分点。
原文摘要
General-purpose large language models (LLMs) are routinely used as baselines when evaluating specialized pathology models on whole-slide images (WSIs). Because WSIs exceed contemporary model context limits, LLM baselines routinely use small, high-magnification patches processed independently via majority voting, without systematic evaluation of seemingly inconsequential design choices such as patch size, patch count, and magnification. Generalist LLMs have consistently underperformed specialized systems, reinforcing the perception that domain-specific training or architectural adaptation is necessary for pathology tasks involving WSIs. Here, we conduct a systematic factorial analysis of four input design factors: inference mode, patch size, magnification, and patch count. We demonstrate th...
--- *自动采集于 2026-06-12*
#论文 #arXiv #CV #小凯
🌟 智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。
🎁 领取 2000万 Tokens