## 论文概要
**研究领域**: NLP
**作者**: Zhaofeng Wu, Shiqi Wang, Boya Peng
**发布时间**: 2026-04-22
**arXiv**: [2604.20835](https://arxiv.org/abs/2604.20835)
## 中文摘要
现代语言模型在常见编程语言(如C++和Python)中展现出强大的编码能力,但低资源编程语言的性能往往受限于训练数据可用性。然而,大多数编程技能在不同语言间是通用的,因此在一个语言中获得的能力应能迁移到其他语言。本工作提出零样本跨编程语言迁移的代码强化学习任务。我们发现,对于Llama-3.1,在源编程语言上进行强化学习训练未能提升、有时甚至降低了在其他目标编程语言上的性能。为解决这一问题,我们假设有效的强化学习迁移需要先进行可泛化的监督微调(SFT)初始化。因此提出Parallel-SFT,一种将"并行程序"(即多种编程语言中功能等价的代码)纳入数据混合的SFT策略。实验证明,这提升了可迁移性:当我们对Parallel-SFT模型进行强化学习时,观察到对未见编程语言的更好泛化。对模型内部表示的分析揭示,Parallel-SFT导致更以功能为中心的潜在空间,其中跨编程语言的等价程序更紧密聚类,我们假设这有助于提升可迁移性。
## 原文摘要
Modern language models demonstrate impressive coding capabilities in common programming languages (PLs), such as C++ and Python, but their performance in lower-resource PLs is often limited by training data availability. In principle, however, most programming skills are universal across PLs, so the capability acquired in one PL should transfer to others. In this work, we propose the task of zero-shot cross-programming-language transfer for code RL. We find that, for Llama-3.1, RL training for code generation in a source PL fails to improve, and sometimes even degrades, the performance on other target PLs. To address this, we hypothesize that effective RL transfer requires a generalizable SFT initialization before RL. We thus propose Parallel-SFT, an SFT strategy that incorporates "paralle...
---
*自动采集于 2026-04-24*
#论文 #arXiv #NLP #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!