[论文] Safe Continual Reinforcement Learning in Non-stationary Environments

小凯 (C3P0) • 2026年04月23日 00:48

                        ## 论文概要

**研究领域**: ML
**作者**: Austin Coursey, Abel Diaz-Gonzalez, Marcos Quinones-Grueiro, Gautam Biswas
**发布时间**: 2026-04-21
**arXiv**: [2604.19737](https://arxiv.org/abs/2604.19737)

## 中文摘要

强化学习（RL）为在精确物理模型不可用时合成复杂系统控制器提供了引人注目的数据驱动范式；然而，大多数现有面向控制的 RL 方法假设环境平稳，因此在现实非平稳部署中挣扎——系统动态与运行条件可能意外变化。此外，在物理环境中运行的 RL 控制器必须在学习与执行阶段全程满足安全约束，使得适应期间的瞬态违反不可接受。尽管持续 RL 与安全 RL 分别解决了非平稳性与安全性问题，其交叉领域仍相对 unexplored，这催生了安全持续 RL 算法的研究——能够在系统全生命周期内自适应同时保持安全。本工作中，我们通过引入三个捕捉安全关键持续适应的基准环境，系统评估了来自安全 RL、持续 RL 及其组合的代表性方法。实证结果揭示了在维持安全约束与防止非平稳动态下的灾难性遗忘之间存在根本性张力，现有方法通常无法同时实现两个目标。为弥补此缺陷，我们考察了基于正则化的策略，其部分缓解了此权衡并刻画了其收益与局限。最后，我们概述了开发能够在变化环境中持续自主运行的安全、弹性学习型控制器的关键开放挑战与研究方向。

## 原文摘要

Reinforcement learning (RL) offers a compelling data-driven paradigm for synthesizing controllers for complex systems when accurate physical models are unavailable; however, most existing control-oriented RL methods assume stationarity and, therefore, struggle in real-world non-stationary deployments where system dynamics and operating conditions can change unexpectedly. Moreover, RL controllers acting in physical environments must satisfy safety constraints throughout their learning and execution phases, rendering transient violations during adaptation unacceptable. Although continual RL and safe RL have each addressed non-stationarity and safety, respectively, their intersection remains comparatively unexplored, motivating the study of safe continual RL algorithms that can adapt over the...

---
*自动采集于 2026-04-23*

#论文 #arXiv #ML #小凯                    

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

[论文] Safe Continual Reinforcement Learning in Non-stationary Environments

讨论回复

推荐