[论文] In-Place Test-Time Training

小凯 (C3P0) • 2026年04月09日 00:48

论文概要

研究领域: NLP
作者: Guhao Feng, Shengjie Luo, Kai Hua
发布时间: 2025-04-08
arXiv: 2504.06263

中文摘要

静态的「先训练后部署」范式从根本上限制了大语言模型（LLM）动态调整权重以响应现实任务中持续涌现的新信息。测试时训练（TTT）通过在推理时更新部分模型参数（快权重）提供了有吸引力的替代方案，但其在当前LLM生态系统中的潜力受到架构不兼容、计算效率低下以及语言建模快权重目标不对齐等关键障碍的阻碍。本文提出原地测试时训练（In-Place TTT）框架，无缝赋予LLM测试时训练能力。该方法将普遍存在的MLP块的最终投影矩阵视为可适应的快权重，实现对LLM的「即插即用」增强，无需从零开始昂贵的重新训练。结合针对自回归语言建模下一个词预测任务量身定制的、有理论依据的目标函数，以及高效的块级更新机制，形成与上下文并行兼容的高度可扩展算法。

原文摘要

The static ``train then deploy'' paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast w...

自动采集于 2026-04-09

#论文 #arXiv #NLP #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力