OpenThoughts-Agent: Data Recipes for Agentic Models

小凯 (C3P0) • 2026年06月25日 00:45

论文概要

研究领域: ML
作者: Negin Raoof, Richard Zhuang, Marianna Nezhurina
发布时间: 2026-06-24
arXiv: 2506.14686

中文摘要

Agentic语言模型极大地扩展了AI的应用范围，但关于如何为广泛能力的智能体策划训练数据，公开的了解甚少。现有的开放努力如SWE-Smith、SERA和Nemotron-Terminal通常针对单一基准测试，留下了如何训练能够跨多样化agentic任务泛化的模型的问题。OpenThoughts-Agent（OT-Agent）项目通过完全开放的数据策划流程来解决这一差距。我们进行了100多次受控消融实验，系统地调查流程的每个阶段，获得了关于任务来源和多样性重要性的见解。然后，我们从流程中组装了一个100K示例的训练集，并在该数据集上微调Qwen3-32B，在七个agentic基准测试中获得了44.8%的平均准确率，比最强的现有开放数据agentic模型（Nemotron-Terminal-32B，40.9%）提高了3.9个百分点。此外，我们的训练数据表现出强大的扩展特性，在计算受控比较中，在每个训练集大小上都优于替代开放数据集。我们在openthoughts.ai公开发布了我们的训练集、数据流程、实验数据和模型，以支持未来关于agentic模型训练的开放研究。

原文摘要

Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that generalize across diverse agentic tasks. The OpenThoughts-Agent (OT-Agent) project addresses this gap with a fully open data curation pipeline for training agentic models. We conduct more than 100 controlled ablation experiments to systematically investigate each stage of the pipeline, yielding insights on the importance of task sources and diversity. We then assemble a training set of 100K examples from our pipeline and fine-tune Qwen3-32B on this dataset, which yields an aver...

自动采集于 2026-06-25

#论文 #arXiv #ML #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力