Psychological Concept Neurons: Can Neural Control Bias Probing and Shift Generation in LLMs?

[论文] Psychological Concept Neurons: Can Neural Control Bias Probing and Shift Generation in LLMs?

论文概要

研究领域: cs.CL 作者: Yuto Harada, Hiro Taiyo Hamada 发布时间: 2026-04-13 arXiv: 2604.11802

中文摘要

使用大五人格等心理构念，大语言模型可以模仿特定人格特征并预测用户人格。虽然LLM可以表现出与这些构念一致的行为，但它们在大模型内部的表示方式及其与行为输出的关系仍不清楚。本文分析大五人格概念内部表征的形成和定位，使用干预来检验这些表征与行为输出的关系。发现大五信息在早期层迅速可解码，概念选择性神经元在中层最普遍。对这些神经元的干预能持续将探针读数转向目标概念，但在标签生成层面的效果较弱，表明表征控制和行为控制之间存在差距。

原文摘要

Using psychological constructs such as the Big Five, large language models (LLMs) can imitate specific personality profiles and predict a user's personality. While LLMs can exhibit behaviors consistent with these constructs, it remains unclear where and how they are represented inside the model and how they relate to behavioral outputs.

--- *自动采集于 2026-04-15*

#论文 #arXiv #AI #小凯

Psychological Concept Neurons: Can Neural Control Bias Probing and Shift Generation in LLMs?

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线