Kavli Affiliate: Yi Zhou
| First 5 Authors: Hong Li, Xingyu Li, Pengbo Hu, Yinuo Lei, Chunxiao Li
| Summary:
While the field of multi-modal learning keeps growing fast, the deficiency of
the standard joint training paradigm has become clear through recent studies.
They attribute the sub-optimal performance of the jointly trained model to the
modality competition phenomenon. Existing works attempt to improve the jointly
trained model by modulating the training process. Despite their effectiveness,
those methods can only apply to late fusion models. More importantly, the
mechanism of the modality competition remains unexplored. In this paper, we
first propose an adaptive gradient modulation method that can boost the
performance of multi-modal models with various fusion strategies. Extensive
experiments show that our method surpasses all existing modulation methods.
Furthermore, to have a quantitative understanding of the modality competition
and the mechanism behind the effectiveness of our modulation method, we
introduce a novel metric to measure the competition strength. This metric is
built on the mono-modal concept, a function that is designed to represent the
competition-less state of a modality. Through systematic investigation, our
results confirm the intuition that the modulation encourages the model to rely
on the more informative modality. In addition, we find that the jointly trained
model typically has a preferred modality on which the competition is weaker
than other modalities. However, this preferred modality need not dominate
others. Our code will be available at
https://github.com/lihong2303/AGM_ICCV2023.
| Search Query: ArXiv Query: search_query=au:”Yi Zhou”&id_list=&start=0&max_results=3