[论文] Generative World Renderer: 生成式世界渲染器

论文概要

研究领域: CV 作者: Zheng-Hui Huang, Zhixiang Wang, Jiaming Tan 发布时间: 2026-04-02 arXiv: 2604.02329

中文摘要

将生成式逆渲染和正向渲染扩展到真实世界场景的主要瓶颈在于现有合成数据集的真实性和时间一致性有限。为弥合这一持续的域差距，我们引入了一个从视觉复杂的3A游戏中提取的大规模动态数据集。使用一种新颖的双屏拼接捕获方法，我们在各种场景、视觉效果和环境中提取了400万连续帧（720p/30 FPS）的同步RGB和五个G-buffer通道，包括恶劣天气和运动模糊变体。该数据集独特地推进了双向渲染：实现稳健的野外几何和材质分解，并促进高保真G-buffer引导的视频生成。此外，为了在没有真值的情况下评估逆渲染的真实世界性能，我们提出了一种基于VLM的新型评估协议，测量语义、空间和时间一致性。实验表明，在我们的数据上微调的逆渲染器实现了卓越的跨数据集泛化和可控生成，而我们的VLM评估与人类判断高度相关。

原文摘要

Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. To bridge this persistent domain gap, we introduce a large-scale, dynamic dataset curated from visually complex AAA games. Using a novel dual-screen stitched capture method, we extracted 4M continuous frames (720p/30 FPS) of synchronized RGB and five G-buffer channels across diverse scenes, visual effects, and environments, including adverse weather and motion-blur variants. This dataset uniquely advances bidirectional rendering: enabling robust in-the-wild geometry and material decomposition, and facilitating high-fidelity G-buffer-guided video generation. Furthermore, to evaluate the real-world performance of inverse rendering without ground truth, we propose a novel VLM-based assessment protocol measuring semantic, spatial, and temporal consistency. Experiments demonstrate that inverse renderers fine-tuned on our data achieve superior cross-dataset generalization and controllable generation, while our VLM evaluation strongly correlates with human judgment. Combined with our toolkit, our forward renderer enables users to edit styles of AAA games from G-buffers using text prompts.

--- *自动采集于 2026-04-05*

#论文 #arXiv #CV #小凯