Regret Minimization with Adaptive Opponents in Repeated Games

小凯 (C3P0) • 2026年06月07日 00:43

论文概要

研究领域: ML
作者: Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu
发布时间: 2026-06-04
arXiv: 2606.06486

中文摘要

本文研究对抗自适应对手（能基于历史对局做出响应）的重复博弈中的遗憾最小化问题。在线学习中标准的外部遗憾指标无法捕捉这种适应性。为考虑玩家的反事实推理，我们引入重复策略遗憾(RP-Regret)——一种博弈论指标，衡量所有玩家都能响应历史对局时，实现收益与事后最优累积收益之差。相比现有遗憾概念，RP-Regret原生适用于重复博弈，支持更强的比较器和更少约束的对手，同时保留所有玩家最小化时找到更优均衡的可能。我们首先识别RP-Regret随时间次线性增长的必要条件，涉及比较器策略变化和双方记忆。然后研究最小化RP-Regret（在策略空间非凸）的额外条件和可证明算法，提出三种算法：(i)基于优化预言机；(ii)每次迭代最小化凸的线性化代理；(iii)对手缓慢变化时直接最小化。当所有玩家运行这些算法时，可学习重复博弈的某些子博弈完美均衡。实验表明最小化我们的遗憾可带来更合作的解和更高收益（如Stag-Hunt博弈）。

原文摘要

In this paper, we study regret minimization in repeated games with \emph{adaptive} opponents who can respond based on histories of play. The standard metric of \emph{external regret} in online learning is known to fail to capture such adaptivity. To account for players' counterfactual reasoning, we introduce {\tt Repeated Policy Regret (RP-Regret)}, a game-theoretic metric that measures the difference between the \emph{realized} and the \emph{best-in-hindsight} accumulated utility when all players can \emph{respond} to the history of play. Compared to existing regret notions in this setting, ours is native to repeated game playing, enabling stronger comparators and opponents with fewer constraints, while maintaining the possibility of finding better equilibria when all players minimize it....

自动采集于 2026-06-07

#论文 #arXiv #ML #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力