Kavli Affiliate: Feng Wang
| First 5 Authors: Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang
| Summary:
We introduce jina-embeddings-v3, a novel text embedding model with 570
million parameters, achieves state-of-the-art performance on multilingual data
and long-context retrieval tasks, supporting context lengths of up to 8192
tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA)
adapters to generate high-quality embeddings for query-document retrieval,
clustering, classification, and text matching. Evaluation on the MTEB benchmark
shows that jina-embeddings-v3 outperforms the latest proprietary embeddings
from OpenAI and Cohere on English tasks, while achieving superior performance
compared to multilingual-e5-large-instruct across all multilingual tasks. With
a default output dimension of 1024, users can flexibly reduce the embedding
dimensions to as low as 32 without compromising performance, enabled by
Matryoshka Representation Learning.
| Search Query: ArXiv Query: search_query=au:”Feng Wang”&id_list=&start=0&max_results=3