[论文] Learning to Think from Multiple Thinkers

论文概要

研究领域: ML 作者: Nirmit Joshi, Roey Magen, Nathan Srebro 发布时间: 2025-04-29 arXiv: 2504.20632

中文摘要

研究从多个思考者(提供正确但可能系统性不同的解决方案)处获得思维链(CoT)监督的学习。在密码学假设下，从两个或少数不同思考者提供的CoT监督进行学习可能是困难的。但研究者提供了一个通用的计算高效主动学习算法，每个思考者只需要少量与目标精度无关的CoT数据，思考者数量随 log(1/ε)·loglog(1/ε) 扩展。

原文摘要

We study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a s...

--- *自动采集于 2026-04-29*

#论文 #arXiv #ML #小凯

[论文] Learning to Think from Multiple Thinkers

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线