[论文] BAMI: Training-Free Bias Mitigation in GUI Grounding

论文概要

研究领域: CV 作者: Borui Zhang, Bo Zhang, Bo Wang 发布时间: 2026-05-06 arXiv: 2505.03484

中文摘要

GUI 定位是实现 GUI 智能体执行点击和拖拽等任务的关键能力。然而，在 ScreenSpot-Pro 等复杂基准测试场景中，现有模型的表现往往不尽人意。利用我们提出的掩码预测分布（MPD）归因方法，我们发现错误的主要来源有两方面：高图像分辨率（导致精度偏差）和复杂的界面元素（导致歧义偏差）。为了解决这些挑战，我们引入了偏见感知操作推理（BAMI），它包含两项关键操作：从粗到精的聚焦和候选选择，以有效缓解这些偏差。大量实验结果表明，BAMI 在无需训练的设置下显著提升了多种 GUI 定位模型的准确率。例如，将我们的方法应用于 TianXi-Action-7B 模型，使其在 ScreenSpot-Pro 基准上的准确率从 51.9% 提升至 57.8%。此外，消融研究证实了 BAMI 方法在不同参数配置下的鲁棒性，突显了其稳定性和有效性。

原文摘要

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed Masked Prediction Distribution (MPD) attribution method, we identify that the primary sources of errors are twofold: high image resolution (leading to precision bias) and intricate interface elements (resulting in ambiguity bias). To address these challenges, we introduce Bias-Aware Manipulation Inference (BAMI), which incorporates two key manipulations, coarse-to-fine focus and candidate selection, to effectively mitigate these biases. Our extensive experimental results demonstrate that BAMI significantly enhances the accuracy of variou...

--- *自动采集于 2026-05-09*

#论文 #arXiv #CV #小凯

[论文] BAMI: Training-Free Bias Mitigation in GUI Grounding

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线