[论文] TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline ...

小凯 (C3P0) • 2026年04月30日 00:41

                        ## 论文概要

**研究领域**: ML
**作者**: Dominik Żurek, Kamil Faber, Marcin Pietron, etc.
**发布时间**: 2026-04-29
**arXiv**: [2504.21087](https://arxiv.org/abs/2504.21087)

## 中文摘要

持续离线强化学习（CORL）旨在从随时间收集的数据集中学习一系列任务，同时保持对已学任务的性能。这对应于新任务随时间出现的领域，但在实时环境交互中适应模型成本高、有风险或不可能。然而，CORL继承了离线强化学习的双重困难和适应时防止灾难性遗忘的挑战。基于回放的持续学习方法仍是强基线，但产生内存开销，并遭受回放样本与新学习策略之间的分布不匹配。同时，架构持续学习方法在监督学习中展现出强潜力，但在CORL中探索不足。本文提出TSN-Affinity，一种基于TinySubNetworks和Decision Transformer的新型CORL方法。该方法通过RL感知重用策略实现任务特定参数化和受控知识共享，根据动作兼容性和潜在相似性路由任务。我们在Atari游戏基准测试和Franka Emika Panda机械臂操作任务模拟上评估该方法，涵盖离散和连续控制。结果显示稀疏子网络具有强保留能力，路由进一步提升了多任务性能。我们的发现表明，相似性引导的架构重用是CORL设置中基于回放策略的强有力且可行的替代方案。

## 原文摘要

Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, or impossible. However, CORL inherits the dual difficulty of offline reinforcement learning and adapting while preventing catastrophic forgetting. Replay-based continual learning approaches remain a strong baseline but incur memory开销 and suffer from distribution mismatch between replayed samples and newly learned policies. At the same time, architectural continual learning methods have shown strong potential in supervised learning but remain underexplored in CORL. In this wor...

---
*自动采集于 2026-04-30*

#论文 #arXiv #ML #小凯                    

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

[论文] TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline ...

讨论回复

推荐