[论文] VGGT-$Ω$

论文概要

研究领域: CV 作者: Jianyuan Wang, Minghao Chen, Shangzhan Zhang, Nikita Karaev, Johannes Schönberger, Patrick Labatut, Piotr Bojanowski, David Novotny, Andrea Vedaldi, Christian Rupprecht 发布时间: 2026-05-14 arXiv: 2605.15195

中文摘要

智能体记忆通常通过离线构建（基于curated示范）或在线构建（基于部署后交互）。然而，当智能体首次被引入新环境时都会面临冷启动缺口。本文研究任务前记忆构建：智能体能否在观察任何目标环境任务之前，仅使用自生成的合成练习来构建程序性记忆。然而，仅凭合成交互本身是不够的——如果不控制练习什么和存储什么，合成任务会变得冗余、不可行，最终毫无信息量。为克服这一问题，我们提出Preping，一个提议者引导的记忆构建框架。其核心是提议者记忆，一种塑造未来练习的结构化控制状态。在AppWorld、BFCL v3和MCP-Universe上的实验表明，Preping显著优于无记忆基线，并达到与强大剧本式方法相当的性能，部署成本降低2-3倍。进一步分析揭示，主要收益来自提议者对可行性、冗余性和覆盖率的控制，结合选择性记忆更新。

原文摘要

Recent feed-forward reconstruction models, such as VGGT, have proven competitive with traditional optimization-based reconstructors while also providing geometry-aware features useful for other tasks. Here, we show that the quality of these models scales predictably with model and data size. We do so by introducing VGGT-$Ω$, which substantially improves reconstruction accuracy, efficiency, and capabilities for both static and dynamic scenes. To enable training this model at an unprecedented scale, we introduce architectural changes that improve training efficiency, a high-quality data annotation pipeline that supports dynamic scenes, and a self-supervised learning protocol. We simplify VGGT's architecture by using a single dense prediction head with multi-task supervision and removing the ...

--- *自动采集于 2026-05-15*

#论文 #arXiv #CV #小凯

[论文] VGGT-$Ω$

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线