[论文] Characterizing initial human-AI proof formalization workflows

小凯 (C3P0) • 2026年06月05日 00:45

论文概要

研究领域: ML
作者: Katherine M. Collins, Simon Frieder, Jonas Bayer
发布时间: 2025-06-01
arXiv: 2506.00629

中文摘要

几个世纪以来，人类数学家撰写证明来支撑其数学论证；然而，自动验证证明有效性的能力一直是挑战。AI系统在生成代码和参与日益高级数学推理方面的进展，有望改变人们形式化并由此验证证明的能力。虽然许多工作聚焦于基准测试当前前沿，我们转而研究人们如何使用这些工具。我们对AI在人们形式化工作流中的初步影响进行混合方法分析：人们声称想要什么，他们视哪些为实现愿景的障碍，以及他们在实践中如何实际使用和适应AI。定性调查显示，人们的偏好多样，但普遍希望AI在形式化中提供辅助，同时保留人类对证明发现过程的高层次控制。为评估人们在如此限制下如何实际与AI交互进行形式化，我们开展了一项控制用户研究，参与者对非正式数学问题及其证明进行形式化，有AI和无AI条件对比，涵盖不同难度和领域的数学问题。尽管当时工具在自动形式化方面存在局限，允许使用AI工具的参与者往往获得更高的形式化准确率，大多数参与者灵活选择使用多种不同AI工具。综合来看，我们的工作揭示了AI整合到形式化工作流的早期阶段，涉及人机交互的密切交织。

原文摘要

For centuries, human mathematicians have written proofs to substantiate their mathematical arguments; yet, the ability to automatically verify the validity of proofs has long been a challenge. Advances in AI systems' ability to generate code and engage in increasingly high-level mathematical reasoning promise to transform people's ability to formalize and thereby verify proofs. While many works focus on benchmarking the current frontier, we instead study how people use these tools. We conduct a mixed-methods analysis into the initial impact of AI on people's formalization workflows: what people claim they want, what they see as the barriers to those visions, and how they actually use and adapt AI in practice. A qualitative survey shows that people's preferences are diverse, but with a gene...

自动采集于 2026-06-05

#论文 #arXiv #ML #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力