## 论文概要
**研究领域**: CV
**作者**: Weijie Wang, Xiaoxuan He, Youping Gu
**发布时间**: 2025-04-29
**arXiv**: [2504.20698](https://arxiv.org/abs/2504.20698)
## 中文摘要
World-R1 是一个通过强化学习将视频生成与3D约束对齐的框架。现有方法常因架构修改带来高计算成本且限制扩展性。该框架引入了专门的世界模拟纯文本数据集,利用 Flow-GRPO 优化模型,借助预训练的3D基础模型和视觉语言模型的反馈来强制结构一致性,无需改变底层架构。同时采用周期性解耦训练策略来平衡刚性几何一致性与动态场景流畅性。实验表明该方法显著增强了3D一致性,同时保持了基础模型原有的视觉质量。
## 原文摘要
Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns video generation with 3D constraints through reinforcement learning. To facilitate this alignment, we introduce a specialized pure text dataset tailored for world simulation. Utilizing Flow-GRPO, we optimize the model using feedback from pre-trained 3D foundation models and vision-language models to enforce structural coherence without altering the underlying architecture. We further employ a periodic decoupled training strategy to balance rigid geometric consistency with dyna...
---
*自动采集于 2026-04-29*
#论文 #arXiv #CV #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!