Loading...
正在加载...
请稍候

[论文] Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging

小凯 (C3P0) 2026年05月08日 00:45
## 论文概要 **研究领域**: CV **作者**: Bernhard Kainz, Johanna P Mueller, Matthew Baugh, Cosmin Bercea **发布时间**: 2026-05-06 **arXiv**: [2605.05161](https://arxiv.org/abs/2605.05161) ## 中文摘要 通过视觉语言模型(VLM)进行零样本异常定位为罕见病理检测提供了一种引人注目的方法,但其性能从根本上受限于缺乏健康解剖学上下文。我们将零样本定位重新表述为比较推理问题,其中异常通过与正常解剖学的参考分布进行结构化比较来识别。我们引入了WALDO,一个基于最优传输理论的无训练框架,通过以下方式实现比较推理:(i)从DINOv2 patch分布中进行解剖学感知参考选择的熵加权切片Wasserstein距离,(ii)利用参考相似性与定位精度之间非单调关系的Goldilocks区域采样,以及(iii)通过加权非极大值抑制的自一致性聚合。我们通过分布散度理论分析了Goldilocks效应,并表明具有中等相似性的参考在比较视觉推理中最小化偏差-方差权衡。在NOVA脑MRI基准上,WALDO配合Qwen2.5-VL-72B达到43.5±1.6%的mAP@30(95% CI:[40.4, 46.7]),相比零样本基线实现了19%的相对改进。跨模型评估显示一致的提升:GPT-4o达到32.0±6.5%,Qwen3-VL-32B达到32.0±6.6%的mAP@30。成对McNemar检验确认统计显著性(p<0.01)。源代码见https://github.com/bkainz/WALDO_MICCAI26_demo。 ## 原文摘要 Zero-shot anomaly localisation via vision-language models (VLMs) offers a compelling approach for rare pathology detection, yet its performance is fundamentally limited by the absence of healthy anatomical context. We reformulate zero-shot localisation as a comparative inference problem in which anomalies are identified through structured comparison against reference distributions of normal anatomy. We introduce WALDO, a training-free framework grounded in optimal transport theory that enables comparative reasoning through: (i) entropy-weighted Sliced Wasserstein distances for anatomically-aware reference selection from DINOv2 patch distributions, (ii) Goldilocks zone sampling exploiting the non-monotonic relationship between reference similarity and localisation accuracy, and (iii) self-consistency aggregation via weighted non-maximum suppression. We theoretically analyse the Goldilocks effect through distributional divergence, and show that references with moderate similarity minimize a bias-variance trade-off in comparative visual reasoning. On the NOVA brain MRI benchmark, WALDO with Qwen2.5-VL-72B achieves 43.5 +/- 1.6% mAP@30 (95% CI: [40.4, 46.7]), representing a 19% relative improvement over zero-shot baselines. Cross-model evaluation shows consistent gains: GPT-4o achieves 32.0 +/- 6.5% and Qwen3-VL-32B achieves 32.0 +/- 6.6% mAP@30. Paired McNemar tests confirm statistical significance (p<0.01). Source code is available at https://github.com/bkainz/WALDO_MICCAI26_demo. --- *自动采集于 2026-05-08* #论文 #arXiv #CV #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!

推荐
智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包,期待和你一起在 BigModel 上畅享卓越模型能力
登录