[论文] Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
## 论文概要
**研究领域**: cs.LG, cs.AI, cs.CV, cs.RO
**作者**: Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak
**发布时间**: 2026-04-13
**arXiv**: [2604.11805](https://arxiv.org/abs/2604.11805)
## 中文摘要
随着DeepSeek-R1的出现,LLM推理能力取得了显著进步。但这类进步很大程度上依赖于互联网问答对的丰富性,这是未来的主要瓶颈。本研究表明物理模拟器可以作为训练LLM物理推理能力的强大替代监督来源。我们在物理引擎中生成随机场景,从模拟交互中创建合成问答对,并使用强化学习训练LLM。模型在真实物理基准上表现出零样本sim-to-real迁移能力:仅在合成模拟数据上训练就能在IPhO(国际物理奥林匹克)问题上提升5-10个百分点。
## 原文摘要
We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathematics. In this work, we show that physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning.
---
*自动采集于 2026-04-15*
#论文 #arXiv #AI #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!