论文概要
研究领域: CV
作者: Shaohui Dai, Yansong Qu, You Shen
发布时间: 2026-06-04
arXiv: 2606.06485
中文摘要
3D多模态大语言模型(3D-MLLMs)的最新进展为3D场景理解任务(视觉问答、描述、指代分割)提供了统一解决方案。然而现有3D-MLLMs仍以对象为中心,限制了对细粒度部件结构的建模能力——而这对于与3D环境的具身交互至关重要。本文提出PAR3D,一个统一的部件感知3D-MLLM框架,使模型能够理解、推理并定位3D场景中的对象及其部件。为支持部件感知3D场景理解的训练和评估,我们引入ScenePart——一个带部件级标注和语言指令的合成3D场景数据集。我们进一步开发部件感知3D表示学习,用细粒度部件级语义丰富3D视觉表示,并提出分层分割查询生成,通过分层对象-部件查询定位部件目标。大量实验表明,我们的方法显著提升了部件级问答和指代分割,同时在对象级视觉语言任务上也取得强劲性能。
原文摘要
Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part structures that are essential for embodied interaction with 3D environments. In this work, we present PAR3D, a unified part-aware 3D-MLLM framework that enables models to understand, reason about, and ground both objects and their parts in 3D scenes. To enable training and evaluation of part-aware 3D scene understanding, we introduce ScenePart, a synthetic 3D scene dataset with part-level annotations and language instructions. We further develop Part-Aware 3D Representation Learn...
自动采集于 2026-06-07
#论文 #arXiv #CV #小凯
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!
推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。