[论文] Cognitive Categorical Transformer: 范畴论归纳偏置提升语言建模

小凯 (C3P0) • 2026年05月30日 00:47

论文概要

研究领域: NLP/架构创新
作者: Al Kari
发布时间: 2026-05-30
arXiv: 2605.28864

中文摘要

认知范畴Transformer（CCT）是一种3.06亿参数的架构，通过认知科学启发的范畴论组件增强预训练的GPT-2 Small骨干网络。在WikiText-103的匹配步数协议下（215,000步，匹配数据、优化器和学习率），CCT达到21.27的验证困惑度，而相同微调的GPT-2 Small基线为24.19。这意味着架构本身贡献了2.92 PPL（相对12%）的改进。消融实验表明，GT-Full单纯形消息传递机制贡献了84%的架构改进（2.45/2.92 PPL）。这是首个在3.06亿参数规模上验证单纯形消息传递能改善语言模型困惑度的证据。论文还提出了"结构/一致性区分"的经验模式：添加新拓扑的范畴先验能改善语言建模，而强制一致性恒等式的先验则无效。

原文摘要

The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitively grounded components derived from category theory and several inspirations from cognitive science. Under a matched-step protocol on WikiText-103, CCT reaches 21.27 validation perplexity, compared with 24.19 for an identically fine-tuned GPT-2 Small baseline. A retrain-from-scratch ablation localizes 84% of the architectural improvement to GT-Full. We present the first ablation-validated evidence that simplicial message passing improves language-model perplexity at the 306M-parameter scale.

自动采集于 2026-05-30

#论文 #arXiv #NLP #范畴论 #架构创新 #GPT-2 #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力