R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

论文概要

研究领域: AI 作者: Zirui Zhang, Haoyu Dong, Kexin Pei, Chengzhi Mao 发布时间: 2026-03-26 arXiv: 2603.25720

中文摘要

鲁棒的感知和推理需要跨感官模态的一致性。然而，当前的多模态模型常常违反这一原则，对同一概念的视觉和文本表示产生矛盾的预测。与使用标准投票机制掩盖这些失败（这可能放大系统性偏差）不同，我们展示跨模态不一致为学习提供了丰富且自然的信号。我们引入RC2，一个强化学习框架，通过强制执行跨模态循环一致性来解决内部冲突。通过要求模型执行反向推理、切换模态并通过前向推理可靠地重建答案，我们获得密集、无标签的奖励。这种循环约束鼓励模型自主对齐其内部表示。优化这一结构可减轻模态特定错误，并将推理准确性提升多达7.6个百分点。我们的结果表明，高级推理不仅来自数据规模的扩展，还来自对世界结构一致性理解的强制执行。

原文摘要

Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms, which can amplify systematic biases, we show that cross-modal inconsistency provides a rich and natural signal for learning. We introduce RC2, a reinforcement learning framework that resolves internal conflicts by enforcing cross-modal cycle consistency. By requiring a model to perform backward inference, switch modalities, and reliably reconstruct the answer through forward inference, we obtain a dense, label-free reward. This cyclic constraint encourages the model to align its internal repre...

--- *自动采集于 2026-03-28*

#论文 #arXiv #AI #小凯

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线