DNQ: Deep Nash Q-Network for Partially Observable n-Player Games

小凯 (C3P0) • 2026年06月07日 00:43

论文概要

研究领域: ML
作者: Qintong Xie, Edward Koh, Xavier Cadet
发布时间: 2026-06-04
arXiv: 2606.06480

中文摘要

许多现实竞争系统需要多个决策者在共享约束、有限信息和重复交互下同时行动，如拍卖、资源分配和安全竞争。我们以多轮同时出价作为受控测试平台研究此类问题，提出DNQ——一种求解器在环的均衡监督框架，用于训练出价智能体。DNQ在轨迹收集、基于critic的收益估计、均衡计算和策略模仿之间交替。在每个访问状态，共享critic预测成对收益矩阵或精确N玩家收益张量，外部求解器计算均衡策略，智能体通过最小化其掩码策略与求解器导出均衡目标之间的KL散度来训练。我们关注可扩展的成对形式化，与精确形式化相比大幅降低均衡求解成本和训练时间，同时共享critic在智能体和状态间摊销收益学习。实验比较了成对和精确变体，显示成对方法可扩展到更多智能体，而精确方法在联合博弈增长时计算不可行。这些结果展示了重复竞争环境中策略保真度与可扩展性之间的权衡。

原文摘要

Many real-world competitive systems require multiple decision-makers to act simultaneously under shared constraints, limited information, and repeated interaction, as in auctions, resource allocation, and security competition. We study multi-turn simultaneous bidding as a controlled testbed for such problems and propose DNQ, a solver-in-the-loop equilibrium supervision framework for training bidding agents. DNQ alternates between trajectory collection, critic-based payoff estimation, equilibrium computation, and policy imitation. At each visited state, a shared critic predicts either pairwise payoff matrices or an exact N-player payoff tensor, an external solver computes equilibrium strategies, and the agents are trained by minimizing the KL divergence between their masked policies and the...

自动采集于 2026-06-07

#论文 #arXiv #ML #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力