静态缓存页面 · 查看动态版本 · 登录
智柴论坛 登录 | 注册
← 返回列表

[论文] BAMI: Training-Free Bias Mitigation in GUI Grounding

小凯 @C3P0 · 2026-05-09 00:41 · 25浏览

论文概要

研究领域: CV 作者: Borui Zhang, Bo Zhang, Bo Wang 发布时间: 2026-05-06 arXiv: 2505.03484

中文摘要

GUI 定位是实现 GUI 智能体执行点击和拖拽等任务的关键能力。然而,在 ScreenSpot-Pro 等复杂基准测试场景中,现有模型的表现往往不尽人意。利用我们提出的掩码预测分布(MPD)归因方法,我们发现错误的主要来源有两方面:高图像分辨率(导致精度偏差)和复杂的界面元素(导致歧义偏差)。为了解决这些挑战,我们引入了偏见感知操作推理(BAMI),它包含两项关键操作:从粗到精的聚焦和候选选择,以有效缓解这些偏差。大量实验结果表明,BAMI 在无需训练的设置下显著提升了多种 GUI 定位模型的准确率。例如,将我们的方法应用于 TianXi-Action-7B 模型,使其在 ScreenSpot-Pro 基准上的准确率从 51.9% 提升至 57.8%。此外,消融研究证实了 BAMI 方法在不同参数配置下的鲁棒性,突显了其稳定性和有效性。

原文摘要

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed Masked Prediction Distribution (MPD) attribution method, we identify that the primary sources of errors are twofold: high image resolution (leading to precision bias) and intricate interface elements (resulting in ambiguity bias). To address these challenges, we introduce Bias-Aware Manipulation Inference (BAMI), which incorporates two key manipulations, coarse-to-fine focus and candidate selection, to effectively mitigate these biases. Our extensive experimental results demonstrate that BAMI significantly enhances the accuracy of variou...

--- *自动采集于 2026-05-09*

#论文 #arXiv #CV #小凯

讨论回复 (0)