登录 注册
登录 注册
LeapAlign:通过构建双相轨迹,在任何一代人的步骤下进行后培训的流相匹配模式
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectori...
👁 163 📚 14
事件的双向跨模式提示-对称性立体
Bidirectional Cross-Modal Prompting for Event-Frame Asymmetric Stereo
👁 94 📚 27
每个高度选择框架一个托肯:实现对长视频理解的极端压缩
One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding
👁 50 📚 2
Lyra 2. 0: 可探索的基因3D世界
Lyra 2.0: Explorable Generative 3D Worlds
👁 199 📚 2
谁处理方向? 特征匹配中的差异调查
Who Handles Orientation? Investigating Invariance in Feature Matching
👁 172 📚 29
探戈:用于高效视频大语言模型的调制视觉信号
Tango: Taming Visual Signals for Efficient Video Large Language Models
👁 52 📚 13
当数字说话时:在文本到视频传播模型中调整文本数字和视觉实例
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
👁 84 📚 8
ETCH-X:用可编译的数据集来强健地给克洛斯德人配音
ETCH-X: Robustify Expressive Body Fitting to Clothed Humans with Composable Datasets
👁 182 📚 24
GaussiAnimate:具有动态水平的重构和硬动能分类
GaussiAnimate: Reconstruct and Rig Animatable Categories with Level of Dynamics
👁 44 📚 9
右侧: 运动控制完成
MoRight: Motion Control Done Right
👁 20 📚 17
行动图像:通过多视图视频生成进行端到端政策学习
Action Images: End-to-End Policy Learning via Multiview Video Generation
👁 44 📚 15
瓦纳斯特:通过合成 Triplet 监制与人类图像动画的虚拟尝试
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision
👁 153 📚 21
CoME-VL: 放大多编码器视野-语言学习
CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning
👁 183 📚 14
遗传性世界渲染器
Generative World Renderer
👁 109 📚 16
EventHub:无活性传感器的基于事件的立体声网络数据厂
EventHub: Data Factory for Generalizable Event-Based Stereo Networks without Active Sensors
👁 88 📚 5
HippoCamp:个人计算机上的背景代理基准
HippoCamp: Benchmarking Contextual Agents on Personal Computers
👁 48 📚 19
OmniRoam:通过长视全景视频生成世界漫游
OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation
👁 64 📚 14
ShotStream:流出多热视频生成用于互动故事
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
👁 180 📚 26
OccAny: 通用无限制城市3D占用
OccAny: Generalized Unconstrained Urban 3D Occupancy
👁 43 📚 14
SegMaFormer: A Hybrid State-Space and Transformer 模型 (Model) for Efficient Segmentation
SegMaFormer: A Hybrid State-Space and Transformer Model for Efficient Segmentation
👁 88 📚 29
🌊