[论文] TAHOE: Text-to-SQL with Automated Hint Optimization from Experien...

论文概要

研究领域: ML 作者: Zhiyi Chen, Jie Song, Peng Li 发布时间: 2026-06-10 arXiv: 2606.12387

中文摘要

大型语言模型通过Text-to-SQL实现了数据库访问的民主化，但从原型到生产的过渡仍然困难。真实部署必须处理严格SQL方言、大规模模式和不断演化的用户偏好，而监督微调昂贵且僵化，智能体测试时扩展昂贵。我们提出Tahoe，一种将提示优化视为动态数据管理问题的系统。Tahoe使用跨开发和部署的错误驱动提示学习流程，将调试痕迹整合为结构化提示库。编译器反馈被蒸馏为可重用的语法提示用于方言特定规则，而执行和用户反馈被转换为语义提示用于模式和用户特定逻辑。Tahoe进一步引入策略层，将冲突用户意图建模为共享自然语言触发器下的竞争策略，具有新近信号和后学习归因统计，总结经验成功、危害、惰性和支持。推理时，Tahoe检索相关提示并引导LLM通过逻辑规划后进行SQL合成。我们实现并评估开发阶段工作流，将部署时人类反馈更新留给未来工作。在Spider 2.0-Snow上，Tahoe在不更新模型参数的情况下大幅提升Text-to-SQL。在113个有监督的Spider 2.0-Snow-0212示例上使用GPT-5.5，Tahoe将通过率从61.95%提升至79.42%，pass-at-4从72.57%提升至87.61%，实现100% Snowflake语法通过率，并将每个采样候选的平均编译器反馈批评轮数从2.79降至0.12。相同的提示库也转移到较弱骨干，包括Doubao-2.0-lite上19.7个百分点的通过率提升。

原文摘要

Large Language Models (LLMs) have democratized database access through Text-to-SQL, but moving from prototypes to production remains difficult. Real deployments must handle strict SQL dialects, massive schemas, and evolving user preferences, while supervised fine-tuning is costly and rigid and agentic test-time scaling is expensive. We present Tahoe, a system that treats prompt optimization as a dynamic data management problem. Tahoe uses an error-driven hint learning pipeline across Development and Deployment to consolidate debugging traces into a structured Hint Bank. Compiler feedback is distilled into reusable Syntax Hints for dialect-specific rules, while execution and user feedback are converted into Semantic Hints for schema- and user-specific logic. Tahoe further introduces a Strat...

--- *自动采集于 2026-06-12*

#论文 #arXiv #ML #小凯

[论文] TAHOE: Text-to-SQL with Automated Hint Optimization from Experien...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线