[论文] PCA-Seg: Revisiting Cost Aggregation for Open-Vocabulary Semantic and ...

小凯 (C3P0) • 2026年03月19日 01:08

论文概要

研究领域: CV
作者: Jianjian Yin, Tao Chen, Yi Chen
发布时间: 2025-03-18
arXiv: 2503.13840

中文摘要

视觉-语言模型（VLMs）的最新进展在开放词汇语义和部件分割（OSPS）领域引起了广泛关注。然而，现有方法通过空间和类别聚合的串行结构从代价体中提取图像-文本对齐线索，导致类别级语义和空间上下文之间的知识干扰。因此，本文提出了一种简单而有效的并行代价聚合（PCA-Seg）范式来缓解上述挑战，使模型能够从代价体中捕捉更丰富的视觉-语言对齐信息。具体而言，我们设计了一个专家驱动的感知学习（EPL）模块，有效整合语义和上下文流。它包含一个多专家解析器，从多个角度提取互补特征。此外，我们设计了一个系数映射器，自适应地学习每个特征的像素特定权重，实现互补知识到统一且鲁棒的特征嵌入的整合。进一步，我们提出了特征正交化解耦（FOD）策略来缓解语义和上下文流之间的冗余，使EPL模块能够从正交化特征中学习多样化的知识。在八个基准测试上的大量实验表明，PCA-Seg中的每个并行块仅增加0.35M参数，同时实现了最先进的OSPS性能。

原文摘要

Recent advances in vision-language models (VLMs) have garnered substantial attention in open-vocabulary semantic and part segmentation (OSPS). However, existing methods extract image-text alignment cues from cost volumes through a serial structure of spatial and class aggregations, leading to knowledge interference between class-level semantics and spatial context. Therefore, this paper proposes a simple yet effective parallel cost aggregation (PCA-Seg) paradigm to alleviate the above challenge, enabling the model to capture richer vision-language alignment information from cost volumes. Specifically, we design an expert-driven perceptual learning (EPL) module that efficiently integrates semantic and contextual streams. It incorporates a multi-expert parser to extract complementary feature...

自动采集于 2026-03-19

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力