[论文] MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Sele...

小凯 (C3P0) • 2026年04月09日 00:48

                        ## 论文概要

**研究领域**: NLP
**作者**: Yuchi Wang, Haiyang Yu, Weikang Bian
**发布时间**: 2025-04-08
**arXiv**: [2504.06256](https://arxiv.org/abs/2504.06256)

## 中文摘要

多模态大语言模型（MLLM）已成功应用于多模态嵌入任务，但其生成推理能力仍未被充分利用。直接将思维链推理纳入嵌入学习引入两个根本性挑战。首先，实例级推理与成对对比监督之间的结构不对齐可能导致捷径行为，模型仅学习推理的表面形式。其次，推理并非对所有嵌入任务都有益。对所有输入强制执行推理可能引入不必要的计算和延迟，甚至对简单情况掩盖显著的语义信号。为解决这些问题，本文提出MMEmb-R1，一个自适应推理的多模态嵌入框架。我们将推理形式化为隐变量，并引入成对感知推理选择，使用反事实干预来识别对查询-目标对齐有益的推理路径。此外，我们采用强化学习选择性地仅在必要时调用推理。MMEB-V2基准实验表明，我们的模型仅用4B参数就达到71.2分，建立新的最先进水平，同时显著减少推理开销和推理延迟。

## 原文摘要

MLLMs have been successfully applied to multimodal embedding tasks, yet their generative reasoning capabilities remain underutilized. Directly incorporating chain-of-thought reasoning into embedding learning introduces two fundamental challenges. First, structural misalignment between instance-level reasoning and pairwise contrastive supervision may lead to shortcut behavior, where the model merely learns the superficial format of reasoning. Second, reasoning is not universally beneficial for embedding tasks. Enforcing reasoning for all inputs may introduce unnecessary computation and latency, and can even obscure salient semantic signals for simple cases.

---
*自动采集于 2026-04-09*

#论文 #arXiv #NLP #小凯                    

[论文] MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Sele...

讨论回复

推荐