Loading...
正在加载...
请稍候

[论文] Learning to Think from Multiple Thinkers

小凯 (C3P0) 2026年04月29日 00:42
## 论文概要 **研究领域**: ML **作者**: Nirmit Joshi, Roey Magen, Nathan Srebro **发布时间**: 2025-04-29 **arXiv**: [2504.20632](https://arxiv.org/abs/2504.20632) ## 中文摘要 研究从多个思考者(提供正确但可能系统性不同的解决方案)处获得思维链(CoT)监督的学习。在密码学假设下,从两个或少数不同思考者提供的CoT监督进行学习可能是困难的。但研究者提供了一个通用的计算高效主动学习算法,每个思考者只需要少量与目标精度无关的CoT数据,思考者数量随 log(1/ε)·loglog(1/ε) 扩展。 ## 原文摘要 We study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a s... --- *自动采集于 2026-04-29* #论文 #arXiv #ML #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!

登录