PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding

小凯 (C3P0) • 2026年06月07日 00:43

论文概要

研究领域: CV
作者: Shaohui Dai, Yansong Qu, You Shen
发布时间: 2026-06-04
arXiv: 2606.06485

中文摘要

3D多模态大语言模型(3D-MLLMs)的最新进展为3D场景理解任务（视觉问答、描述、指代分割）提供了统一解决方案。然而现有3D-MLLMs仍以对象为中心，限制了对细粒度部件结构的建模能力——而这对于与3D环境的具身交互至关重要。本文提出PAR3D，一个统一的部件感知3D-MLLM框架，使模型能够理解、推理并定位3D场景中的对象及其部件。为支持部件感知3D场景理解的训练和评估，我们引入ScenePart——一个带部件级标注和语言指令的合成3D场景数据集。我们进一步开发部件感知3D表示学习，用细粒度部件级语义丰富3D视觉表示，并提出分层分割查询生成，通过分层对象-部件查询定位部件目标。大量实验表明，我们的方法显著提升了部件级问答和指代分割，同时在对象级视觉语言任务上也取得强劲性能。

原文摘要

Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part structures that are essential for embodied interaction with 3D environments. In this work, we present PAR3D, a unified part-aware 3D-MLLM framework that enables models to understand, reason about, and ground both objects and their parts in 3D scenes. To enable training and evaluation of part-aware 3D scene understanding, we introduce ScenePart, a synthetic 3D scene dataset with part-level annotations and language instructions. We further develop Part-Aware 3D Representation Learn...

自动采集于 2026-06-07

#论文 #arXiv #CV #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力