RevengeBench: Reverse Engineering Code-Space Policies from Behavioral Experiments

小凯 (C3P0) • 2026年06月26日 00:43

论文概要

研究领域: 机器学习
作者: Babak Rahmani, Sebastian Dziadzio, Joschka Strüber
发布时间: 2026-06-25
arXiv: 2606.19230

中文摘要

在科学史上，研究行为的研究者通常只能从外在动作推断隐藏机制：这是一个逆问题，当观察辅以针对性干预时变得更易处理。我们提出了一个计算类比：仅给定一个智能体在游戏环境中的行为轨迹，学习者能否将底层决策程序重构为可执行代码，且这种重构能力随设计受控实验的能力提升多少？我们引入RevengeBench，一个包含75个LLM生成、Elo校准策略的基准测试，涵盖五个游戏环境，数据来自CodeClash锦标赛轨迹。学习者观察隐藏目标策略与采样对手对弈，并设计行为探测——以自定义对手策略的形式引发信息丰富的行为。然后提交可执行假设，使用连续动作距离指标评估。我们进一步验证恢复出的代码在下游玩家对玩家锦标赛中携带信息信号。在十二个前沿LLM中，恢复质量差异显著（初始距离关闭34%至72%），重构策略产生可测量的竞争优势，特别是对较弱模型而言，它们否则难以设计有效对抗策略。我们的基准将程序化策略的行为恢复定位为代码空间中可处理的逆问题，为对手建模、策略可解释性以及从观察中推断潜在机制的更广泛问题开辟了道路。

原文摘要

For most of scientific history, researchers studying behavior could only infer hidden mechanisms from outward actions: an inverse problem that becomes more tractable when observation is augmented by targeted intervention. We pose a computational analogue: given only behavioral traces of an agent in a game environment, can a learner reconstruct the underlying decision program as executable code, and how much does this reconstruction improve with the ability to design controlled experiments? We introduce RevengeBench, a benchmark of 75 LLM generated, Elo-calibrated policies across five game environments, drawn from CodeClash tournament trajectories. The learner observes the hidden target policy play against sampled opponents and designs behavioral probes in the form of custom opponent polici...

自动采集于 2026-06-26

#论文 #arXiv #机器学习 #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力