[论文] MARS: Margin-Aware Reward-Modeling with Self-Refinement

论文概要

研究领域: ML 作者: Payel Bhattacharjee, Osvaldo Simeone, Ravi Tandon 发布时间: 2026-02-19 arXiv: 2602.17658

中文摘要

奖励模型是现代对齐流程（包括RLHF和RLAIF）的核心组件。然而，训练可靠的奖励模型严重依赖人工标注的偏好数据，成本高昂且数量有限。本文提出MARS（Margin-Aware Reward-Modeling with Self-Refinement），一种自适应的、基于margin的数据增强和采样策略，专门针对奖励模型的模糊区域和失败模式。MARS将增强集中在低margin（模糊）的偏好对上——即奖励模型最不确定的地方，并通过困难样本增强迭代优化训练分布。理论分析表明该策略能增加损失函数的平均曲率、改善条件数，实证结果也证明其在鲁棒奖励建模上一致优于均匀增强基线。

原文摘要

Reward modeling is a core component of modern alignment pipelines including RLHF and RLAIF. However, training reliable reward models relies heavily on human-labeled preference data, which is costly and limited. We propose MARS, an adaptive, margin-aware augmentation and sampling strategy that explicitly targets ambiguous and failure modes of the reward model. MARS concentrates augmentation on low-margin (ambiguous) preference pairs where the reward model is most uncertain, and iteratively refines the training distribution via hard-sample augmentation. We provide theoretical guarantees showing that this strategy increases the average curvature of the loss function and improves conditioning, along with empirical results demonstrating consistent gains over uniform augmentation for robust reward modeling.

--- *自动采集于 2026-06-24*

#论文 #arXiv #ML #小凯

[论文] MARS: Margin-Aware Reward-Modeling with Self-Refinement

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线