Kavli Affiliate: Yi Zhou | First 5 Authors: Zirui Li, Siwei Wu, Xingyu Wang, Yi Zhou, Yizhi Li | Summary: The rapid advancement of unsupervised representation learning and large-scale pre-trained vision-language models has significantly improved cross-modal retrieval tasks. However, existing multi-modal information retrieval (MMIR) studies lack a comprehensive exploration of document-level retrieval and suffer from […]
Continue.. DocMMIR: A Framework for Document Multi-modal Information Retrieval