[论文] Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generat...

小凯 (C3P0) • 2026年03月19日 01:08

论文概要

研究领域: CV 作者: Jiawei Zhou, Chi Zhang, Xiang Feng 发布时间: 2025-03-18 arXiv: 2503.13829

中文摘要

我们提出了Omni-I2C，一个全面的基准测试，旨在评估大型多模态模型（LMMs）将复杂、结构化的数字图形转换为可执行代码的能力。我们认为这项任务对当前一代LMMs构成了重大挑战：它需要高保真视觉感知（解析复杂的空间层次和符号细节）和精确生成表达（合成语义正确且逻辑一致的代码）之间前所未有的协同。与传统描述性任务不同，Omni-I2C需要整体理解，任何微小的感知幻觉或编码错误都会导致视觉重建的完全失败。Omni-I2C包含1080个精心策划的样本，以其跨主题、图像模态和编程语言的广度为特征。通过纳入真实的用户来源案例，该基准测试涵盖了广泛的数字内容——从科学可视化到复杂的符号表示——每个都配有可执行的参考代码。为补充这种多样性，我们的评估框架提供了必要的深度；通过将性能解耦为感知保真度和符号精度，它超越了表面级精度，揭示了当前LMMs的细粒度结构失败和推理瓶颈。我们的评估揭示了领先LMMs之间的显著性能差距；即使是最先进的模型也难以在复杂场景中保持结构完整性，强调多模态代码生成仍然是一个艰巨的挑战。数据和代码可在 https://github.com/MiliLab/Omni-I2C 获取。

原文摘要

We present Omni-I2C, a comprehensive benchmark designed to evaluate the capability of Large Multimodal Models (LMMs) in converting complex, structured digital graphics into executable code. We argue that this task represents a non-trivial challenge for the current generation of LMMs: it demands an unprecedented synergy between high-fidelity visual perception -- to parse intricate spatial hierarchies and symbolic details -- and precise generative expression -- to synthesize syntactically sound and logically consistent code. Unlike traditional descriptive tasks, Omni-I2C requires a holistic understanding where any minor perceptual hallucination or coding error leads to a complete failure in visual reconstruction. Omni-I2C features 1080 meticulously curated samples, defined by its breadth acr...

自动采集于 2026-03-19

#论文 #arXiv #CV #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力