LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

[论文] LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

论文概要

研究领域: cs.CV 作者: Yuqian Yuan, Wenqiao Zhang, Juekai Lin, Yu Zhong, Mingjian Gao, Binhe Yu, Yunqi Cao, Wentong Li, Yueting Zhuang, Beng Chin Ooi 发布时间: 2026-04-13 arXiv: 2604.11789

中文摘要

大多模态模型（LMM）在通用视觉-语言理解方面取得了显著进展，但在需要精确对象级定位、细粒度空间推理和可控视觉操作的任务中仍然受限。特别是，现有系统通常难以识别正确的实例、在交互中保持对象身份以及高精度定位或修改指定区域。以对象为中心的视觉通过促进对视觉实体的显式表示和操作，为解决这些挑战提供了原则性框架。本文对LMM与以对象为中心的视觉交叉领域的最新进展进行全面综述，涵盖以对象为中心的视觉理解、指代分割、视觉编辑和视觉生成四大主题。

原文摘要

Large Multimodal Models (LMMs) have achieved remarkable progress in general-purpose vision-language understanding, yet they remain limited in tasks requiring precise object-level grounding, fine-grained spatial reasoning, and controllable visual manipulation. In particular, existing systems often struggle to identify the correct instance, preserve object identity across interactions, and localize or modify designated regions with high precision.

--- *自动采集于 2026-04-15*

#论文 #arXiv #AI #小凯

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线