OmniDiT: Extending Diffusion Transformer to Omni-VTON Framework
作者
Authors
Weixuan Zeng|Pengcheng Wei|Huaiqing Wang|Boheng Zhang|Jia Sun|Dewen Fan|Lin HE|Long Chen|Qianqian Gan|Fan Yang|Tingting Gao
期刊
Journal
暂无期刊信息
年份
Year
2026
分类
Category
国家
Country
中国China
📝 摘要
Abstract
Despite the rapid advancement of Virtual Try-On (VTON) and Try-Off (VTOFF) technologies, existing VTON methods face challenges with fine-grained detail preservation, generalization to complex scenes, complicated pipeline, and efficient inference. To tackle these problems, we propose OmniDiT, an omni Virtual Try-On framework based on the Diffusion Transformer, which combines try-on and try-off tasks into one unified model. Specifically, we first establish a self-evolving data curation pipeline to continuously produce data, and construct a large VTON dataset Omni-TryOn, which contains over 380k diverse and high-quality garment-model-tryon image pairs and detailed text prompts. Then, we employ the token concatenation and design an adaptive position encoding to effectively incorporate multiple reference conditions. To relieve the bottleneck of long sequence computation, we are the first to introduce Shifted Window Attention into the diffusion model, thus achieving a linear complexity. To remedy the performance degradation caused by local window attention, we utilize multiple timestep prediction and an alignment loss to improve generation fidelity. Experiments reveal that, under various complex scenes, our method achieves the best performance in both the model-free VTON and VTOFF tasks and a performance comparable to current SOTA methods in the model-based VTON task.
📊 文章统计
Article Statistics
基础数据
Basic Stats
95
浏览
Views
0
下载
Downloads
5
引用
Citations
引用趋势
Citation Trend
阅读国家分布
Country Distribution
阅读机构分布
Institution Distribution
月度浏览趋势
Monthly Views