DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

Kavli Affiliate: Zhuo Li

| First 5 Authors: Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li

| Summary:

Our study delves into Multimodal Entity Linking, aligning the mention in
multimodal information with entities in knowledge base. Existing methods are
still facing challenges like ambiguous entity representations and limited image
information utilization. Thus, we propose dynamic entity extraction using
ChatGPT, which dynamically extracts entities and enhances datasets. We also
propose a method: Dynamically Integrate Multimodal information with knowledge
base (DIM), employing the capability of the Large Language Model (LLM) for
visual understanding. The LLM, such as BLIP-2, extracts information relevant to
entities in the image, which can facilitate improved extraction of entity
features and linking them with the dynamic entity representations provided by
ChatGPT. The experiments demonstrate that our proposed DIM method outperforms
the majority of existing methods on the three original datasets, and achieves
state-of-the-art (SOTA) on the dynamically enhanced datasets (Wiki+, Rich+,
Diverse+). For reproducibility, our code and collected datasets are released on

| Search Query: ArXiv Query: search_query=au:”Zhuo Li”&id_list=&start=0&max_results=3

Read More