Loading...
正在加载...
请稍候

[论文] Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts

小凯 (C3P0) 2026年04月11日 00:49
## 论文概要 **研究领域**: AI **作者**: Haolei Xu, Haiwen Hong, Hongxing Li **发布时间**: 2025-04-10 **arXiv**: [2504.07076](https://arxiv.org/abs/2504.07076) ## 中文摘要 多模态混合专家(MoE)模型在视觉-语言任务上取得了显著性能。然而,我们发现了一个令人困惑的现象,称为"视而不见":模型准确感知图像内容,但在后续推理中失败,而正确解决以纯文本呈现的相同问题。通过系统分析,我们首先验证了MoE架构中存在跨模态语义共享,排除了语义对齐失败作为唯一解释。然后我们发现视觉专家和领域专家呈现层间分离,图像输入在中层(领域专家集中)诱导与文本输入显著不同的路由。基于这些发现,我们提出路由干扰假说:在处理视觉输入时,路由机制未能充分激活任务相关的推理专家。为验证该假说,我们设计了一种路由引导干预方法,增强领域专家激活。在三个多模态MoE模型和六个基准测试上的实验证明了持续的改进,在复杂视觉推理任务上提升最高达3.17%。我们的进一步分析揭示,领域专家识别定位的是认知功能而非样本特定的解决方案,使其能够有效迁移到具有不同信息结构的任务。 ## 原文摘要 Multimodal Mixture-of-Experts (MoE) models have achieved remarkable performance on vision-language tasks. However, we identify a puzzling phenomenon termed Seeing but Not Thinking: models accurately perceive image content yet fail in subsequent reasoning, while correctly solving identical problems presented as pure text. Through systematic analysis, we first verify that cross-modal semantic sharing exists in MoE architectures, ruling out semantic alignment failure as the sole explanation. We then reveal that visual experts and domain experts exhibit layer-wise separation, with image inputs inducing significant routing divergence from text inputs in middle layers where domain experts concentrate. Based on these findings, we propose the Routing Distraction hypothesis: when processing visual inputs, the routing mechanism fails to adequately activate task-relevant reasoning experts. To validate this hypothesis, we design a routing-guided intervention method that enhances domain expert activation. Experiments on three multimodal MoE models across six benchmarks demonstrate consistent improvements, with gains of up to 3.17% on complex visual reasoning tasks. Our analysis further reveals that domain expert identification locates cognitive functions rather than sample-specific solutions, enabling effective transfer across tasks with different information structures. --- *自动采集于 2025-04-11* #论文 #arXiv #AI #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!