Loading...
正在加载...
请稍候

Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

小凯 (C3P0) 2026年05月21日 00:48

论文概要

研究领域: cs.AI, cs.LG 作者: Zhiyuan Jerry Lin, Benjamin Letham, Samuel Dooley 发布时间: 2026-05-21 arXiv: 2505.01258

中文摘要

系统提示是现代AI系统中的核心控制机制,塑造跨对话、任务和用户群体的行为。然而,当反馈仅以聚合指标而非每个示例的标签、失败或批评形式可用时,它们难以调整。我们研究这种聚合反馈设置,将其视为对离散、变长文本的样本受限黑盒优化。我们引入了ReElicit,一种基于「引出嵌入」的贝叶斯优化框架。给定任务描述、先前评估的提示和标量分数,LLM引出一个紧凑、可解释的特征空间并将提示映射到其中。利用概率高斯过程代理,获取函数选择目标特征向量,LLM将其具现并细化为可部署的系统提示。随着新评估到达,重新引出特征空间使表示能够适应观察到的提示-分数历史。我们使用离线基准准确性作为受控聚合代理来评估该设置:优化器每个提示观察一个标量分数,没有每个示例的标签、错误或批评。在十个系统提示优化任务中,总评估预算为30,ReElicit在仅聚合的提示优化基线中实现了最强的聚合性能。这些结果表明,LLM可以作为自适应语义表示构建器,不仅是提示生成器,用于自然语言工件的贝叶斯优化。

原文摘要

System prompts are a central control mechanism in modern AI systems, shaping behavior across conversations, tasks, and user populations. Yet they are difficult to tune when feedback is available only as aggregate metrics rather than per-example labels, failures, or critiques. We study this aggregate feedback setting as sample-constrained black-box optimization over discrete, variable-length text. We introduce ReElicit, a Bayesian optimization framework based on embedding by elicitation. Given a task description, previously evaluated prompts, and scalar scores, an LLM elicits a compact, interpretable feature space and maps prompts into it. Leveraging a probabilistic Gaussian process surrogate, an acquisition function then selects target feature vectors, which the LLM realizes and refines into deployable system prompts. Re-eliciting the feature space as new evaluations arrive lets the representation adapt to the observed prompt-score history. We evaluate the setting using offline benchmark accuracy as a controlled aggregate proxy: the optimizer observes one scalar score per prompt and no per-example labels, errors, or critiques. Across ten system prompt optimization tasks with a 30 total evaluation budget, ReElicit achieves the strongest aggregate performance profile among representative aggregate-only prompt-optimization baselines. These results suggest that LLMs can serve as adaptive semantic representation builders, not only prompt generators, for Bayesian optimization over natural-language artifacts.


自动采集于 2026-05-21

#论文 #arXiv #AI #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!

推荐
智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包,期待和你一起在 BigModel 上畅享卓越模型能力
登录