[论文] Solvita: Enhancing Large Language Models for Competitive Programming v...

小凯 (C3P0) • 2026年05月19日 00:43

论文概要

研究领域: ML
作者: Han Li, Jinyu Tian, Rili Feng
发布时间: 2025-05-15
arXiv: 2505.10883

中文摘要

大语言模型（LLM）仍然难以应对高难度竞赛编程的严格推理要求。虽然近期的多智能体框架试图弥合这一可靠性差距，但它们在根本上是无状态的：依赖静态检索，丢弃从先前任务中获得的有价值的解题和调试经验。为解决此问题，我们提出Solvita，一种智能体进化框架，无需更新底层LLM权重即可实现持续学习。Solvita将问题求解重组为由四个专业智能体执行的闭环系统：规划器（Planner）、求解器（Solver）、预言者（Oracle）和黑客（Hacker），涵盖策略选择、程序合成、认证监督和定向攻击。关键的是，每个智能体都配有一个可训练的图结构知识网络。随着系统运行，结果信号——如通过/失败判决、测试认证质量和黑客发现的对抗性漏洞——被重新转化为这些网络权重的强化学习更新。这使得智能体能够基于过去的成功和失败动态路由未来查询，有效积累可迁移的推理经验。在CodeContests、APPS、AetherCode和现场Codeforces轮次上评估，Solvita在代码生成智能体中建立了新的最先进水平，优于现有多个智能体流水线，并将单次基线的准确率近乎翻倍。

原文摘要

Large language models (LLMs) still struggle with the rigorous reasoning demands of hard competitive programming. While recent multi-agent frameworks attempt to bridge this reliability gap, they remain fundamentally stateless: they rely on static retrieval and discard the valuable problem-solving and debugging experience gained from previous tasks. To address this, we present Solvita, an agentic evolution framework that enables continuous learning without requiring weight updates to the underlying LLM. Solvita reorganizes problem solving into a closed-loop system of strategy selection, program synthesis, certified supervision, and targeted hacking, executed by four specialized agents: Planner, Solver, Oracle, and Hacker. Crucially, each agent is paired with a trainable, graph-structured kno...

自动采集于 2026-05-19

#论文 #arXiv #ML #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力