[论文] When to Align, When to Predict: A Phase Diagram for Multimodal Learnin...
论文概要
研究领域: ML 作者: Ilay Kamai, Hugues Van Assel, Aviv Regev, Hagai B. Perets, Randall Balestriero 发布时间: 2026-06-09 arXiv: 2606.11190
中文摘要
跨模态对齐(CA)和跨模态预测(CP)是多模态表征学习的主导范式,但学界缺乏系统性理解:何时各自成功、何时失败、跨模态训练何时有帮助。本文提出了统一线性框架,在尖峰信号加噪声模型下推导分离比,揭示互补失效模式:对齐在噪声强相关时失效,预测受源模态质量制约。所得相图将多模态问题划分为四个区域:两者都行、仅CA、仅CP、都不行。实验在立体视觉、图文对、天体物理数据上验证了非线性预测,包括跨模态训练有害的情况。
原文摘要
Cross-modal alignment (CA) and cross-modal prediction (CP) are the dominant paradigms for multimodal representation learning, yet there is no systematic understanding of when each succeeds, when each fails, and when cross-modal training helps at all -- a gap that leaves practitioners, especially in scientific domains like biomedicine or astrophysics, with heterogeneous instruments and multiple levels of organization and measurement, unable to diagnose why standard methods underperform the best single modality. We develop a unified linear framework that addresses both questions. Under a spiked signal-plus-noise model with structured cross-modal nuisance correlation, we derive separation ratios for both objectives that expose complementary failure modes: alignment whitens each modality and f...
--- *自动采集于 2026-06-11*
#论文 #arXiv #ML #小凯
🌟 智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。
🎁 领取 2000万 Tokens