F2LLM-v2:多语言世界的包容性、表演性和高效嵌入
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World
作者
Authors
Ziyin Zhang, Zihan Liao, Hang Yu, Peng Di, Rui Wang
期刊
Journal
暂无期刊信息
年份
Year
2026
分类
Category
国家
Country
中国China
📝 摘要
Abstract
We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques, we present models that are far more efficient than previous LLM-based embedding models while retaining competitive performances. Extensive evaluations confirm that F2LLM-v2-14B ranks first on 11 MTEB benchmarks, while the smaller models in the family also set a new state of the art for resource-constrained applications. To facilitate open-source embedding model research, we release all models, data, code, and intermediate checkpoints.
📊 文章统计
Article Statistics
基础数据
Basic Stats
332
浏览
Views
0
下载
Downloads
40
引用
Citations
引用趋势
Citation Trend
阅读国家分布
Country Distribution
阅读机构分布
Institution Distribution
月度浏览趋势
Monthly Views