[论文] Democratic ICAI: Debating Our Way to Steering Principles from Preferen...
论文概要
研究领域: ML 作者: Kevin Kingslin, Anish Natekar, Ashutosh Ranjan 发布时间: 2026-06-26 arXiv: 2606.28294
中文摘要
基于偏好的对齐通常难以捕捉支撑人类判断的推理。逆宪法AI(ICAI)通过将偏好总结为自然语言原则来提高决策的可解释性,但其单次传递的解释错过了复杂决策中涉及的许多细微差别。我们引入Democratic ICAI,一种新方法,通过结构化角色辩论收集多个竞争性的理由,为影响每个比较的因素提供更广泛和更具表达力的解释。从这些更丰富的信号中,我们推导出更清晰和更全面的引导原则,并使用它们通过基于LLM和决策树的判断器来指导决策建模。在创意偏好基准MuCE-Pref和LiTBench上,跨多个创意任务类别的实验表明,Democratic ICAI产生更忠实的偏好结构。相对于审议提示和基于原则的基线,它提高了跨任务的平均偏好预测,同时产生LLM标注者更偏好的宪法。
原文摘要
Preference-based alignment often struggles to capture the reasoning that underlies human judgments. Many evaluations rely on multiple interacting criteria, yet pairwise labels reveal only the final choice rather than the considerations that shape preferences. Inverse Constitutional AI (ICAI) improves interpretability in decision making by summarizing preferences into natural-language principles, but its single-pass explanations miss much of the nuance involved in complex decisions. We introduce Democratic ICAI, a novel approach that gathers multiple competing rationales through structured persona debate, offering a broader and more expressive account of the factors influencing each comparison. From these richer signals, we derive clearer and more comprehensive steering principles and use t...
--- *自动采集于 2026-06-30*
#论文 #arXiv #ML #小凯
🌟 智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。
🎁 领取 2000万 Tokens