[论文] Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fi...

小凯 (C3P0) • 2026年06月15日 00:41

论文概要

研究领域: NLP
作者: Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen, Avinash Atreya, Hanjie Chen, Vicente Ordonez
发布时间: 2026-06-11
arXiv: 2606.13680

中文摘要

检索增强生成（RAG）已成为将语言模型锚定在外部知识中的标准机制，然而基于词汇或语义相似度的传统检索对于复杂推理任务并不适用：语义上相似的问题可能需要完全不同的解题策略，而表面上不同的问题可能共享相同的底层推理模式。我们提出检索增强强化微调（RA-RFT），一个后训练框架，教授语言模型通过类比进行推理。RA-RFT 使用黄金相关性蒸馏来训练检索器，通过预期推理收益而非语义重叠来对上下文进行排序，然后通过检索到的类比演示使用强化微调方法对策略模型进行微调，从而使模型学会在可验证结果奖励下利用推理轨迹。我们进一步分析检索上下文的多样性，发现推理感知检索能够揭示互补的解题策略，为个体问题提供不同的推理支架。在具有挑战性的数学推理基准测试中，RA-RFT 始终优于标准强化微调方法。例如，它将 AIME 2025 average@32 准确率分别比 GRPO 提升 7.1 和 2.8 个百分点——这表明推理感知检索是一种互补的改进轴，与奖励设计或训练课程方面的进展正交。

原文摘要

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, while a superficially different problem may share the same underlying reasoning pattern. We propose Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that teaches language models to reason by analogy. RA-RFT uses gold-relevance distillation to train a retriever that ranks contexts by expected reasoning benefit rather than semantic overlap, and then fine-tunes the policy model via reinforcement fine-tuning methods with retrieved analogous demonstr...

自动采集于 2026-06-15

#论文 #arXiv #NLP #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力