## 论文概要
**研究领域**: AI
**作者**: Tao Xie, Peishan Yang, Yudong Jin
**发布时间**: 2025-04-10
**arXiv**: [2504.07077](https://arxiv.org/abs/2504.07077)
## 中文摘要
本文解决了从长视频序列进行大规模3D场景重建的任务。近期前馈重建模型通过直接从RGB图像回归3D几何而无需显式3D先验或几何约束,取得了令人鼓舞的结果。然而,由于内存容量有限且无法有效捕获全局上下文线索,这些方法在长序列上难以保持重建精度和一致性。相比之下,人类可以自然地利用对场景的全局理解来指导局部感知。受此启发,我们提出一种新颖的神经全局上下文表示,它高效压缩和保留长距离场景信息,使模型能够利用广泛的上下文线索来增强重建精度和一致性。该上下文表示通过一组轻量级子网络实现,这些子网络在测试时通过自监督目标快速适应,在不产生显著计算开销的情况下大幅增加内存容量。在多个大规模基准测试上的实验,包括KITTI Odometry和Oxford Spires数据集,证明了我们方法在处理超大场景方面的有效性,在保持效率的同时实现了领先的姿态精度和SOTA 3D重建精度。
## 原文摘要
This paper addresses the task of large-scale 3D scene reconstruction from long video sequences. Recent feed-forward reconstruction models have shown promising results by directly regressing 3D geometry from RGB images without explicit 3D priors or geometric constraints. However, these methods often struggle to maintain reconstruction accuracy and consistency over long sequences due to limited memory capacity and the inability to effectively capture global contextual cues. In contrast, humans can naturally exploit the global understanding of the scene to inform local perception. Motivated by this, we propose a novel neural global context representation that efficiently compresses and retains long-range scene information, enabling the model to leverage extensive contextual cues for enhanced reconstruction accuracy and consistency. The context representation is realized through a set of lightweight neural sub-networks that are rapidly adapted during test time via self-supervised objectives, which substantially increases memory capacity without incurring significant computational overhead. The experiments on multiple large-scale benchmarks, including the KITTI Odometry and Oxford Spires datasets, demonstrate the effectiveness of our approach in handling ultra-large scenes, achieving leading pose accuracy and state-of-the-art 3D reconstruction accuracy while maintaining efficiency.
---
*自动采集于 2025-04-11*
#论文 #arXiv #AI #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!