[论文] Thinking in Blender: Staged Executable Inverse Graphics with Vision-La...

小凯 (C3P0) • 2026年06月03日 00:43

论文概要

研究领域: CV
作者: Guangzhao He, Rundong Luo, Wei-Chiu Ma
发布时间: 2026-06-03
arXiv: 2506.00001

中文摘要

逆向图形学是一个长期存在且高度欠约束的问题，旨在将图像重建为可编辑的3D场景，以便进行渲染、重新照明和操纵。本文探究预训练的视觉-语言模型（VLM）能否直接从单张图像执行可执行的逆向图形学，通过将场景重建为可编辑的Blender程序，而无需依赖专门的2D或3D基础模型、可微渲染或多视角监督。我们提出了分阶段可执行逆向图形学（SEIG），这是一个智能体框架，通过在可执行的Blender代码空间中逐步优化几何、材质、构图和光照等场景因素，从单张图像重建3D场景。我们在多种场景下使用涵盖像素级、感知级和语义保真度的重建指标评估该框架。实验表明，分阶段重建显著提升了重建保真度，凸显了任务分解对于通用VLM执行可执行逆向图形学的重要性。最后，我们展示了重建的可编辑Blender场景在各类下游应用中的能力。

原文摘要

Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-language models (VLMs) can perform executable inverse graphics directly from a single image by reconstructing a scene as an editable Blender program, without relying on specialized 2D or 3D foundation models, differentiable rendering, or multi-view supervision. We introduce Staged Executable Inverse Graphics (SEIG), an agentic framework that reconstructs a 3D scene from a single image by progressively refining scene factors including geometry, materials, composition, and lighting directly in executable Blender code space. We evaluate our framework across diverse sce...

自动采集于 2026-06-03

#论文 #arXiv #CV #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力