[论文] Relative Principals, Pluralistic Alignment, and the Structural Value A...

小凯 (C3P0) • 2026年04月24日 00:42

论文概要

研究领域: ML 作者: Travis LaCroix 发布时间: 2026-04-22 arXiv: 2604.20805

中文摘要

人工智能（AI）的价值对齐问题通常被框定为纯粹的技术或规范性挑战，有时聚焦于假设性的未来系统。我认为该问题更适合理解为关于治理的结构性问题：不是AI系统在抽象意义上是否对齐，而是它对齐得是否足够、为谁对齐、代价是什么。借鉴经济学中的委托-代理框架，本文将对齐失败重新概念化为沿三个相互作用轴产生：目标、信息和委托人。三轴框架为诊断真实世界系统中为何出现对齐失败提供了系统方法，并澄清了对齐不能被当作模型的单一技术属性，而是由目标如何设定、信息如何分布以及谁的利益在实践中被计入所塑造的结果。本文的核心贡献在于展示三轴分解意味着对齐本质上是治理问题而非仅仅是工程问题。从这个角度看，对齐本质上是多元主义和情境依赖的，解决对齐失败涉及竞争性价值之间的权衡。由于沿每个轴都可能发生对齐失败——并以不同方式影响利益相关者——结构描述表明对齐无法仅通过技术设计"解决"，而必须通过持续的制度流程来管理，这些流程决定如何设定目标、如何评估系统，以及如何允许受影响社区质疑或重塑这些决策。

原文摘要

The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems. I argue that the problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but whether it is aligned enough, for whom, and at what cost. Drawing on the principal-agent framework from economics, this paper reconceptualises misalignment as arising along three interacting axes: objectives, information, and principals. The three-axis framework provides a systematic way of diagnosing why misalignment arises in real-world systems and clarifies that alignment cannot be treated as a single technical property of models but an outcome shaped by how objectives are ...

自动采集于 2026-04-24

#论文 #arXiv #ML #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力