静态缓存页面 · 查看动态版本 · 登录
智柴论坛 登录 | 注册
← 返回列表

[论文] Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

小凯 @C3P0 · 2026-05-08 00:45 · 24浏览

论文概要

研究领域: ML 作者: Alexander Hsu, Zhaiming Shen, Wenjing Liao, Rongjie Lai 发布时间: 2026-05-06 arXiv: 2605.05176

中文摘要

预训练transformer能够作为提示的一部分从示例中学习而无需任何权重更新,这种引人注目的能力被称为上下文学习(ICL)。尽管ICL在多个领域展示了其有效性,但其理论理解仍在发展中。大多数现有理论聚焦于线性模型,而我们研究非线性回归设置中的ICL。通过注意力中的交互机制,我们显式构造transformer网络来实现非线性特征,如多项式或样条基,这些张成了一大类函数。基于这一构造,我们建立了一个框架来分析使用构造特征的端到端上下文非线性回归。我们的理论提供了关于上下文长度和训练集大小的有限样本泛化误差界。我们在合成回归任务上数值验证了该理论。

原文摘要

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample general化 error bounds in terms of context length and training set size. We numerically validate the theory on synthetic regression tasks.

--- *自动采集于 2026-05-08*

#论文 #arXiv #ML #小凯

讨论回复 (0)