世界R1:加强文本到视频生成的三维限制
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
作者
Authors
Weijie Wang | Xiaoxuan He | Youping Gu | Yifan Yang | Zeyu Zhang | Yefei He | Yanbo Ding | Xirui Hu | Donny Y. Chen | Zhiyuan He | Yuqing Yang | Bohan Zhuang
期刊
Journal
暂无期刊信息
年份
Year
2026
分类
Category
国家
Country
-
📝 摘要
Abstract
Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns video generation with 3D constraints through reinforcement learning. To facilitate this alignment, we introduce a specialized pure text dataset tailored for world simulation. Utilizing Flow-GRPO, we optimize the model using feedback from pre-trained 3D foundation models and vision-language models to enforce structural coherence without altering the underlying architecture. We further employ a periodic decoupled training strategy to balance rigid geometric consistency with dynamic scene fluidity. Extensive evaluations reveal that our approach significantly enhances 3D consistency while preserving the original visual quality of the foundation model, effectively bridging the gap between video generation and scalable world simulation.
📊 文章统计
Article Statistics
基础数据
Basic Stats
43
浏览
Views
0
下载
Downloads
29
引用
Citations
引用趋势
Citation Trend
阅读国家分布
Country Distribution
阅读机构分布
Institution Distribution
月度浏览趋势
Monthly Views