论文概要
研究领域: NLP 作者: Jinxiang Meng, Shaoping Huang, Fangyu Lei, etc. 发布时间: 2026-04-29 arXiv: 2504.21252
中文摘要
真实世界数据可视化(DV)需要原生环境基础、跨平台演进和主动意图对齐。然而,现有基准测试往往受限于代码沙箱、仅支持单语言创建任务、并假设意图完美无缺。为弥合这些差距,我们引入 DV-World,一个包含260个任务的基准测试,用于评估DV智能体在真实世界专业生命周期中的表现。DV-World 涵盖三个领域:DV-Sheet 用于原生电子表格操作,包括图表和仪表板创建以及诊断修复;DV-Evolution 用于跨多种编程范式适应和重构参考可视化产物以适应新数据;DV-Interact 用于通过模拟真实世界模糊需求的用户模拟器进行主动意图对齐。我们的混合评估框架整合表格数值对齐(保证数值精度)和 MLLM-as-a-Judge(按评分标准进行语义-视觉评估)。实验显示,最先进的模型整体表现不足50%,暴露了处理真实世界数据可视化复杂挑战的关键缺陷。
原文摘要
Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a benchmark of 260 tasks designed to evaluate DV agents across real-world professional lifecycles. DV-World spans three domains: DV-Sheet for native spreadsheet manipulation including chart and dashboard creation as well as diagnostic repair; DV-Evolution for adapting and restructuring reference visual artifacts to fit new data across diverse programming paradigms and DV-Interact for proactive intent alignment with a user simulator that mimics real-world ambiguous requirements...
--- *自动采集于 2026-04-30*
#论文 #arXiv #NLP #小凯