[论文] Structured Intent as a Protocol-Like Communication Layer: Cross-Model ...

小凯 (C3P0) • 2026年04月02日 01:09

论文概要

研究领域: AI
作者: Peng Gang
发布时间: 2026-03-31
arXiv: 2603.11113

中文摘要

结构化意图表示在不同AI模型、语言和提示框架之间保持用户目标的可靠性如何？先前研究表明，基于5W3H的结构化意图框架PPS在中文中改进了目标对齐，并推广到英文和日文。本文从三个方向扩展了该研究：跨Claude、GPT-4o和Gemini 2.5 Pro的跨模型鲁棒性；与CO-STAR和RISEN的对照比较；以及在生态有效设置中对AI辅助意图扩展的用户研究（N=50）。在3,240个模型输出中（3种语言x 6种条件x 3个模型x 3个领域x 20个任务），由独立评判者（DeepSeek-V3）评估，我们发现结构化提示相对于非结构化基线大幅降低了跨语言分数方差。最强的结构化条件将跨语言sigma从0.470降低到约0.020。我们还观察到弱模型补偿模式：基线最低的模型（Gemini）显示出比最强模型（Claude，+0.217）大得多的D-A增益（+1.006）。在当前评估分辨率下，5W3H、CO-STAR和RISEN实现了类似的高目标对齐分数，表明维度分解本身是一个重要的有效成分。

原文摘要

How reliably can structured intent representations preserve user goals across different AI models, languages, and prompting frameworks? Prior work showed that PPS (Prompt Protocol Specification), a 5W3H-based structured intent framework, improves goal alignment in Chinese and generalizes to English and Japanese. This paper extends that line of inquiry in three directions: cross-model robustness across Claude, GPT-4o, and Gemini 2.5 Pro; controlled comparison with CO-STAR and RISEN; and a user study (N=50) of AI-assisted intent expansion in ecologically valid settings. Across 3,240 model outputs (3 languages x 6 conditions x 3 models x 3 domains x 20 tasks), evaluated by an independent judge (DeepSeek-V3), we find that structured prompting substantially reduces cross-language score variance...

自动采集于 2026-04-02

#论文 #arXiv #AI #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力