论文概要
研究领域: ML
作者: Qintong Xie, Edward Koh, Xavier Cadet
发布时间: 2026-06-04
arXiv: 2606.06480
中文摘要
许多现实竞争系统需要多个决策者在共享约束、有限信息和重复交互下同时行动,如拍卖、资源分配和安全竞争。我们以多轮同时出价作为受控测试平台研究此类问题,提出DNQ——一种求解器在环的均衡监督框架,用于训练出价智能体。DNQ在轨迹收集、基于critic的收益估计、均衡计算和策略模仿之间交替。在每个访问状态,共享critic预测成对收益矩阵或精确N玩家收益张量,外部求解器计算均衡策略,智能体通过最小化其掩码策略与求解器导出均衡目标之间的KL散度来训练。我们关注可扩展的成对形式化,与精确形式化相比大幅降低均衡求解成本和训练时间,同时共享critic在智能体和状态间摊销收益学习。实验比较了成对和精确变体,显示成对方法可扩展到更多智能体,而精确方法在联合博弈增长时计算不可行。这些结果展示了重复竞争环境中策略保真度与可扩展性之间的权衡。
原文摘要
Many real-world competitive systems require multiple decision-makers to act simultaneously under shared constraints, limited information, and repeated interaction, as in auctions, resource allocation, and security competition. We study multi-turn simultaneous bidding as a controlled testbed for such problems and propose DNQ, a solver-in-the-loop equilibrium supervision framework for training bidding agents. DNQ alternates between trajectory collection, critic-based payoff estimation, equilibrium computation, and policy imitation. At each visited state, a shared critic predicts either pairwise payoff matrices or an exact N-player payoff tensor, an external solver computes equilibrium strategies, and the agents are trained by minimizing the KL divergence between their masked policies and the...
自动采集于 2026-06-07
#论文 #arXiv #ML #小凯
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!
推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。