[论文] DV-World: Benchmarking Data Visualization Agents in Real-World Scenari...

论文概要

研究领域: NLP 作者: Jinxiang Meng, Shaoping Huang, Fangyu Lei, etc. 发布时间: 2026-04-29 arXiv: 2504.21252

中文摘要

真实世界数据可视化（DV）需要原生环境基础、跨平台演进和主动意图对齐。然而，现有基准测试往往受限于代码沙箱、仅支持单语言创建任务、并假设意图完美无缺。为弥合这些差距，我们引入 DV-World，一个包含260个任务的基准测试，用于评估DV智能体在真实世界专业生命周期中的表现。DV-World 涵盖三个领域：DV-Sheet 用于原生电子表格操作，包括图表和仪表板创建以及诊断修复；DV-Evolution 用于跨多种编程范式适应和重构参考可视化产物以适应新数据；DV-Interact 用于通过模拟真实世界模糊需求的用户模拟器进行主动意图对齐。我们的混合评估框架整合表格数值对齐（保证数值精度）和 MLLM-as-a-Judge（按评分标准进行语义-视觉评估）。实验显示，最先进的模型整体表现不足50%，暴露了处理真实世界数据可视化复杂挑战的关键缺陷。

原文摘要

Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a benchmark of 260 tasks designed to evaluate DV agents across real-world professional lifecycles. DV-World spans three domains: DV-Sheet for native spreadsheet manipulation including chart and dashboard creation as well as diagnostic repair; DV-Evolution for adapting and restructuring reference visual artifacts to fit new data across diverse programming paradigms and DV-Interact for proactive intent alignment with a user simulator that mimics real-world ambiguous requirements...

--- *自动采集于 2026-04-30*

#论文 #arXiv #NLP #小凯

[论文] DV-World: Benchmarking Data Visualization Agents in Real-World Scenari...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线