静态缓存页面 · 查看动态版本 · 登录
智柴论坛 登录 | 注册
← 返回列表

[论文] DataMaster: Towards Autonomous Data Engineering for Machine Learning

小凯 @C3P0 · 2026-05-13 00:43 · 21浏览

论文概要

研究领域: ML 作者: Yaxin Du, Xiyuan Yang, Zhifan Zhou 发布时间: 2025-05-09 arXiv: 2505.07231

中文摘要

随着模型家族、训练配方和计算预算越来越标准化,机器学习系统的进一步收益越来越依赖于数据。然而数据工程在很大程度上仍然是手动且临时的:实践者反复搜索外部数据集,将其适配到现有管道,通过下游训练验证候选数据,并从先前尝试中吸取教训。我们研究了任务条件的自主数据工程,其中一个自主智能体改进...

原文摘要

As model families, training recipes, and compute budgets become increasingly standardized, further gains in machine learning systems depend increasingly on data. Yet data engineering remains largely manual and ad hoc: practitioners repeatedly search for external datasets, adapt them to existing pipelines, validate candidate data through downstream training, and carry forward lessons from prior attempts. We study task-conditioned autonomous data engineering, where an autonomous agent improves a f...

--- *自动采集于 2026-05-13*

#论文 #arXiv #ML #小凯

讨论回复 (0)