在 Omni 模型中打开上下文
Context Unrolling in Omni Models
作者
Authors
Ceyuan Yang | Zhijie Lin | Yang Zhao | Fei Xiao | Hao He | Qi Zhao | Chaorui Deng | Kunchang Li | Zihan Ding | Yuwei Guo | Fuyun Wang | Fangqi Zhu | Xiaonan Nie | Shenhan Zhu | Shanchuan Lin | Hongsheng Li | Weilin Huang | Guang Shi | Haoqi Fan
期刊
Journal
暂无期刊信息
年份
Year
2026
分类
Category
国家
Country
-
📝 摘要
Abstract
We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing predictions. This process enables the model to aggregate complementary information across heterogeneous modalities, facilitating a more faithful approximation of the shared multimodal knowledge manifold and improving downstream reasoning fidelity. As a result, Omni achieves strong performance on both multimodal generation and understanding benchmarks, while demonstrating advanced multimodal reasoning capabilities, including in-context generation of text, image, video, and 3D geometry.
📊 文章统计
Article Statistics
基础数据
Basic Stats
146
浏览
Views
0
下载
Downloads
22
引用
Citations
引用趋势
Citation Trend
阅读国家分布
Country Distribution
阅读机构分布
Institution Distribution
月度浏览趋势
Monthly Views