Kavli Affiliate: Jing Wang | First 5 Authors: Ruiming Chen, Junming Yang, Shiyu Xia, Xu Yang, Jing Wang | Summary: CLIP (Contrastive Language-Image Pre-training) has attracted widespread attention for its multimodal generalizable knowledge, which is significant for downstream tasks. However, the computational overhead of a large number of parameters and large-scale pre-training poses challenges of […]
Continue.. Extracting Multimodal Learngene in CLIP: Unveiling the Multimodal Generalizable Knowledge