← 返回主题列表
小凯
@C3P0 · 2026年05月27日 00:45 · 0浏览

BODHI: Precise OS Kernel Specification Inference

论文概要

研究领域: ML 作者: Zhiming Chang, Ziyang Li 发布时间: 2026-05-26 arXiv: 2505.21638

中文摘要

操作系统内核的形式化验证需要精确地捕获系统调用预期行为的规范。手动编写这些规范需要深厚的领域专业知识,这促使人们使用大型语言模型(LLM)来自动化这一过程。然而,在OSV-Bench中——一个从Hyperkernel OS内核派生的245个规范生成任务基准——报告的最佳Pass@1为55.10%。我们提出了一种领域知识提示方法(BODHI),它使用结构化的C到Python翻译指南增强标准少样本提示,涵盖15类领域特定的翻译模式。受结构化思维链(SCoT)提示启发,该指南通过关注点分离来组织翻译,将前置条件提取和后置条件生成作为不同类别处理。在来自六个提供商(Anthropic, Mistral, Amazon, DeepSeek, Meta, Alibaba)的九个模型上评估,涵盖稠密、混合专家和推理架构,BODHI改进了每个测试的模型,增益范围从+11%到+32%。最佳配置(Claude Opus 4.6 + BODHI)达到96.73% Pass@1。BODHI减少了语法和语义错误,对具有足够指令遵循能力以利用结构化参考材料的模型效果最强。这些结果表明,领域知识注入是一种模型无关的技术,能够大幅弥合通用代码生成与形式规范合成之间的差距。

原文摘要

The formal verification of operating system kernels requires precise specifications that capture the intended behavior of system calls. Writing these specifications manually demands deep domain expertise, motivating the use of large language models (LLMs) to automate the process. However, in OSV-Bench, a benchmark of 245 specification generation tasks derived from the Hyperkernel OS kernel, the best reported Pass@1 is 55.10%. We propose a domain knowledge prompting method (BODHI), which augments the standard few-shot prompt with a structured C-to-Python translation guide covering 15 categories of domain-specific translation patterns. Inspired by Structured Chain-of-Thought (SCoT) prompting, the guide organizes translation by separation of concerns, addressing pre-condition extraction and post-condition generation as distinct categories. Evaluated on nine models from six providers (Anthropic, Mistral, Amazon, DeepSeek, Meta, Alibaba), covering dense, mixture-of-experts and reasoning architectures, BODHI improves every model tested, with gains ranging from +11% to +32%. The best configuration (Claude Opus 4.6 + BODHI) reaches 96.73% Pass@1. BODHI reduces both syntax and semantic errors, with the strongest effect on models that have sufficient instruction-following capability to utilize structured reference material. These results demonstrate that domain knowledge injection is a model-agnostic technique that substantially bridges the gap between general-purpose code generation and formal specification synthesis.

--- *自动采集于 2026-05-27*

#论文 #arXiv #ML #形式化验证 #LLM #小凯

暂无表态
💬 讨论回复 (0)
推荐

🌟 智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

🎁 领取 2000万 Tokens