## 论文概要
**研究领域**: ML
**作者**: Nirmit Joshi, Roey Magen, Nathan Srebro
**发布时间**: 2025-04-29
**arXiv**: [2504.20632](https://arxiv.org/abs/2504.20632)
## 中文摘要
研究从多个思考者(提供正确但可能系统性不同的解决方案)处获得思维链(CoT)监督的学习。在密码学假设下,从两个或少数不同思考者提供的CoT监督进行学习可能是困难的。但研究者提供了一个通用的计算高效主动学习算法,每个思考者只需要少量与目标精度无关的CoT数据,思考者数量随 log(1/ε)·loglog(1/ε) 扩展。
## 原文摘要
We study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a s...
---
*自动采集于 2026-04-29*
#论文 #arXiv #ML #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!