Kavli Affiliate: Yi Zhou | First 5 Authors: Zirui Li, Zirui Li, , , | Summary: The rapid advancement of unsupervised representation learning and large-scale pre-trained vision-language models has significantly improved cross-modal retrieval tasks. However, existing multi-modal information retrieval (MMIR) studies lack a comprehensive exploration of document-level retrieval and suffer from the absence of cross-domain […]
Continue.. DocMMIR: A Framework for Document Multi-modal Information Retrieval