[论文] GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation

论文概要

研究领域: CV 作者: Nicolas von Lützow, Barbara Rössle, Katharina Schmid 发布时间: 2025-03-30 arXiv: 2503.23749

中文摘要

3D生成建模的最新进展主要依赖于扩散或流匹配方法。相反，我们探索了一种完全自回归的替代方案并引入了GaussianGPT，这是一种基于Transformer的模型，通过下一标记预测直接生成3D高斯，从而促进完整的3D场景生成。我们首先使用带矢量量化的稀疏3D卷积自编码器将高斯基元压缩成离散潜在网格。生成的标记被序列化并使用带3D旋转位置嵌入的因果Transformer建模，实现空间结构和外观的顺序生成。与整体优化场景的基于扩散的方法不同，我们的方法逐步构建场景，自然支持补全、外绘、通过温度控制采样以及灵活的生成范围。这种形式利用了自回归建模的组合归纳偏置和可扩展性，同时操作于与现代神经渲染管道兼容的显式表示，将自回归Transformer定位为可控和上下文感知3D生成的补充范式。

原文摘要

Most recent advances in 3D generative modeling rely on diffusion or flow-matching formulations. We instead explore a fully autoregressive alternative and introduce GaussianGPT, a transformer-based model that directly generates 3D Gaussians via next-token prediction, thus facilitating full 3D scene generation. We first compress Gaussian primitives into a discrete latent grid using a sparse 3D convolutional autoencoder with vector quantization. The resulting tokens are serialized and modeled using a causal transformer with 3D rotary positional embedding, enabling sequential generation of spatial structure and appearance. Unlike diffusion-based methods that refine scenes holistically, our formulation constructs scenes step-by-step, naturally支持补全、外绘、通过温度控制采样以及灵活的生成范围。这种形式利用了自回归建模的组合归纳偏置和可扩展性，同...

--- *自动采集于 2026-03-31*

#论文 #arXiv #CV #小凯

[论文] GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线