[论文] VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

小凯 (C3P0) • 2026年06月05日 00:45

论文概要

研究领域: NLP
作者: Amirhossein Dabiriaghdam, Shayan Vassef, Mohammadreza Bakhtiari
发布时间: 2025-06-01
arXiv: 2506.00632

中文摘要

多模态大语言模型日益具备复杂推理能力，但当它们需要通过工具外化问题并基于工具输出进行推理时，尤其是依赖视觉辅助时，性能往往会下降。这一差距尤为重要，因为真实工程和科学工作流通常依赖可视化工具进行分析、验证和决策。为研究这一差异，我们引入VAMPS（视觉辅助数学问题求解），一种图辅助数学基准。VAMPS包含1168个多模态双语选择题-答案对，选自伊朗大学入学考试的代数和微积分问题，并通过人工审核的LLM生成合成变体进行扩展，所有问题都经过筛选以确保绘图能通过揭示交点、极值、渐近线等提供自然求解策略。VAMPS专为基准测试和诊断设计，超越了先前主要评估固定视觉输入推理的多模态基准，转而测试模型是否能从构建有用图表中获益，并将其答案基于可视化结果。总体而言，我们发现，在多种模型中，直接分析求解出人意料地优于工具赋能的视觉求解，即使在绘图是天然策略的问题上也是如此。

原文摘要

Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scientific workflows often rely on visualization tools for analysis, validation, and decision-making. To study this discrepancy, we introduce VAMPS (Visual-Assisted Mathematical Problem Solving), a benchmark for graph-assisted mathematics. VAMPS contains 1,168 multimodal, bilingual multiple-choice question-answer pairs drawn from Iranian University Entrance Exam algebra and calculus problems and expanded with human-reviewed LLM-generated synthetic variants, all selected so that pl...

自动采集于 2026-06-05

#论文 #arXiv #NLP #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力