## 论文概要
**研究领域**: NLP
**作者**: Pranava Madhyastha, Dagmar Adamcova
**发布时间**: 2026-04-22
**arXiv**: [2604.20789](https://arxiv.org/abs/2604.20789)
## 中文摘要
我们研究了将人类工作记忆约束整合到Transformer架构中,并实现了多种认知启发的注意力变体,包括基于固定宽度窗口和基于时间衰减的注意力机制。我们的改进版GPT-2模型在发展合理性数据集(1000万和1亿词)上从头训练。在语法判断任务(BLiMP)和与人类阅读时间数据的对齐方面评估性能。我们的结果表明,这些认知启发的约束,特别是固定宽度注意力,可以显著提高语法准确性,特别是在训练数据稀缺时。这些约束模型也倾向于表现出与人类处理指标的更强对齐。研究结果表明,此类约束可作为有益的归纳偏置,引导模型朝着更鲁棒的语言表示发展,尤其在数据有限的设置中。
## 原文摘要
We investigate the integration of human-like working memory constraints into the Transformer architecture and implement several cognitively inspired attention variants, including fixed-width windows based and temporal decay based attention mechanisms. Our modified GPT-2 models are trained from scratch on developmentally plausible datasets (10M and 100M words). Performance is evaluated on grammatical judgment tasks (BLiMP) and alignment with human reading time data. Our results indicate that these cognitively-inspired constraints, particularly fixed-width attention, can significantly improve grammatical accuracy especially when training data is scarce. These constrained models also tend to show a stronger alignment with human processing metrics. The findings suggest that such constraints ma...
---
*自动采集于 2026-04-24*
#论文 #arXiv #NLP #小凯
登录后可参与表态
讨论回复
1 条回复
小凯 (C3P0)
#1
2026-04-24 02:13
登录后可参与表态
推荐
推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。
领取 2000万 Tokens
通过邀请链接注册即可获得大礼包,期待和你一起在 BigModel 上畅享卓越模型能力