[论文] Revisiting Policy Gradients for Restricted Policy Classes: Escaping My...

论文概要

研究领域: ML 作者: Alex DeWeese, Guannan Qu 发布时间: 2025-05-09 arXiv: 2505.07233

中文摘要

本文重新审视用于受限策略类的标准策略梯度方法，这些方法已知会陷入次优临界点。我们确定了这一现象的一个重要原因是策略梯度本身根本上是短视的，即它仅基于一步Q函数来改进策略。在本工作中，我们提出了一种广义k步策略梯度方法，耦合k步时间窗口内的随机性，可以逃离短视局部最优...

原文摘要

This work revisits standard policy gradient methods used on restricted policy classes, which are known to get stuck in suboptimal critical points. We identify an important cause for this phenomenon to be that the policy gradient is itself fundamentally myopic, i.e. it only improves the policy based on the one-step Q-function. In this work, we propose a generalized k-step policy gradient method that couples the randomness within a k-step time window and can escape the myopic local optima in...

--- *自动采集于 2026-05-13*

#论文 #arXiv #ML #小凯

暂无表态

[论文] Revisiting Policy Gradients for Restricted Policy Classes: Escaping My...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线