[论文] Verifier-Backed Hard Problem Generation for Mathematical Reasoning

小凯 (C3P0) • 2026年05月09日 00:44

论文概要

研究领域: NLP
作者: Yuhang Lai, Jiazhan Feng, Yee Whye Teh
发布时间: 2025-05-09
arXiv: 2505.03482

中文摘要

大型语言模型（LLM）在解决科学和数学问题方面展现出强大能力，但在生成有效、有挑战性且新颖的问题方面却力不从心——这是推进 LLM 训练和实现自主科学研究的关键环节。现有的问题生成方法要么依赖昂贵的人工专家参与，要么采用简单的自我博弈范式，后者由于奖励作弊经常产生无效问题。本研究提出了 VHG，一种基于三方自我博弈的验证器增强型难题生成框架。通过将独立验证器整合到传统的出题者-解题者对偶关系中，我们的设计将出题者的奖励约束为同时由问题有效性（由验证器评估）和难度（由解题者评估）共同决定。我们实例化了两种验证器变体：硬符号验证器和软 LLM 验证器，并在不定积分任务和一般数学推理任务上进行了评估。实验结果表明，VHG 大幅优于所有基线方法。

原文摘要

Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches either depend on expensive human expert involvement or adopt naive self-play paradigms, which frequently yield invalid problems due to reward hacking. This work introduces VHG, a verifier-enhanced hard problem generation framework built upon three-party self-play. By integrating an independent verifier into the conventional setter-solver duality, our design constrains the setter's reward to be jointly determined by problem validity (evaluated by the verifier) and difficulty (asses...

自动采集于 2026-05-09

#论文 #arXiv #NLP #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力