[论文] Learning, Fast and Slow: Towards LLMs That Adapt Continually

小凯 (C3P0) • 2026年05月14日 00:50

                        ## 论文概要

**研究领域**: ML
**作者**: Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal, Joseph E. Gonzalez, Matei Zaharia, Kurt Keutzer, Inderjit S Dhillon, Rishabh Agarwal, Devvrit Khatri
**发布时间**: 2026-05-12
**arXiv**: [2605.12484](https://arxiv.org/abs/2605.12484)

## 中文摘要

大型语言模型通过更新参数（如 RL）进行下游任务训练，但参数更新迫使它们吸收任务特定信息，可能导致灾难性遗忘和可塑性丧失。相比之下，固定参数的上下文学习可廉价快速地适应任务特定要求（如提示优化），但本身通常无法匹配更新参数带来的性能增益。没有充分理由将学习限制为仅上下文或仅权重。此外，人类可能也在不同时间尺度上学习（如系统 1 vs 2）。为此，我们引入 LLM 的快-慢学习框架，模型参数作为"慢"权重，优化上下文作为"快"权重。快"权重"可从文本反馈中学习以吸收任务特定信息，同时允许慢权重保持更接近基础模型并维持一般推理行为。快-慢训练（FST）在推理任务上比仅慢学习（RL）的样本效率高出 3 倍，同时始终达到更高的性能渐近线。此外，FST 训练模型与基础 LLM 的 KL 散度降低高达 70%，导致比 RL 训练更少的灾难性遗忘。这种减少的漂移还保留了可塑性：在一项任务上训练后，FST 模型比仅参数训练模型更有效地适应后续任务。在持续学习场景中，FST 继续获取每个新任务，而仅参数 RL 停滞。

## 原文摘要

Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of plasticity. In contrast, in-context learning with fixed LLM parameters can cheaply and rapidly adapt to task-specific requirements (e.g., prompt optimization), but cannot by itself typically match the performance gains available through updating LLM parameters. There is no good reason for restricting learning to being in-context or in-weights. Moreover, humans also likely learn at different time scales (e.g., System 1 vs 2). To this end, we introduce a fast-slow learning framework for LLMs, with model parameters as slow weights and optimized context as ...

---
*自动采集于 2026-05-14*

#论文 #arXiv #ML #小凯                    

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力

[论文] Learning, Fast and Slow: Towards LLMs That Adapt Continually

讨论回复

推荐

智谱 GLM-5 已上线