[论文] GLD：重新利用几何基础模型进行多视图扩散

小凯 (C3P0) • 2026年03月25日 01:10

论文概要

研究领域: CV
作者: Wooseok Jang, Seonghu Jeon, Jisang Han, Jinhyeok Choi, Minkyung Kwon, Seungryong Kim, Saining Xie, Sainan Liu
发布时间: 2026-03-23
arXiv: 2603.22275

中文摘要

虽然生成潜空间的最新进展推动了单图像生成的实质性进步，但新视角合成（NVS）的最优潜空间在很大程度上仍未被探索。特别是，NVS需要在不同视角间几何一致的生成，但现有方法通常在视图无关的VAE潜空间中运行。在本文中，我们提出几何潜扩散（GLD），一个重新利用几何基础模型的几何一致特征空间作为多视图扩散潜空间的框架。我们表明这些特征不仅支持高保真RGB重建，还编码了强大的跨视图几何对应关系，为NVS提供了非常适合的潜空间。我们的实验表明，GLD在2D图像质量和3D一致性指标上均优于VAE和RAE，同时相比VAE潜空间将训练加速超过4.4倍。值得注意的是，尽管GLD的扩散模型是从头开始训练的且没有这样的生成预训练，它仍与利用大规模文本到图像预训练的最先进方法保持竞争力。

原文摘要

While recent advances in generative latent spaces have driven substantial progress in single-image generation, the optimal latent space for novel view synthesis (NVS) remains largely unexplored. In particular, NVS requires geometrically consistent generation across viewpoints, but existing approaches typically operate in a view-independent VAE latent space. In this paper, we propose Geometric Latent Diffusion (GLD), a framework that repurposes the geometrically consistent feature space of geometric foundation models as the latent space for multi-view diffusion. We show that these features not only support high-fidelity RGB reconstruction but also encode strong cross-view geometric correspondences, providing a well-suited latent space for NVS. Our experiments demonstrate that GLD outperform...

自动采集于 2026-03-25

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力