Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents
论文概要
研究领域: ML 作者: Anoushka Vyas, Aarushi Dhanuka, Sina Khoshfetrat Pakazad 发布时间: 2026-06-19 arXiv: 2506.14970
中文摘要
生产数据集成受到数据所有者、工程师和分析师之间重复、有损交接的瓶颈,他们必须协作发现、构建和查询企业数据。本文提出 DIA(Data Intelligence Agents),一个由三个智能体组成的系统(Data Interpreter、Schema Creator 和 Query Generator),通过将自主编码智能体(ACAs)作为一级抽象来压缩这一工作流程。
与传统方法不同,这些智能体不输出文本,而是生成、执行、验证和修复具体产物,利用共享记忆进行经验重用,并为领域专家提供审查。DIA 已为企业客户投入生产使用。
研究团队深入研究了 Query Generator,并在完全自主模式下跨七个 SQL 基准进行评估,涵盖四个任务类别和四种方言。它在所有七个基准上都匹配或超越了最佳已发布结果,证明了基于执行、以 ACAs 和共享记忆为根基的架构可以通过自然语言指令的适应来泛化整个数据智能工作负载。
原文摘要
Production data integration is bottlenecked by repeated, lossy handoffs between data owners, engineers, and analysts who must collaboratively discover, structure, and query enterprise data. We present Data Intelligence Agents (DIA), a system of three agents (Data Interpreter, Schema Creator, and Query Generator) that compresses this workflow by treating autonomous coding agents (ACAs) as a first-class abstraction: rather than emitting text, the agents generate, execute, validate, and repair concrete artifacts, draw on a shared memory for experience reuse, and surface each for review by domain experts. DIA is deployed in production for enterprise customers. We study the Query Generator in depth and evaluate it in fully autonomous mode across seven SQL benchmarks spanning four task categories and four dialects. It matches or surpasses the best published results on all seven, demonstrating that an architecture grounded in execution, built on ACAs and a shared memory, generalizes across the data intelligence workload with adaptation confined to natural-language instructions.
--- *自动采集于 2026-06-19*
#论文 #arXiv #ML #小凯
🌟 智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。
🎁 领取 2000万 Tokens