Kavli Affiliate: Ke Wang
| First 5 Authors: Hanchao Liu, Wenyuan Xue, Yifei Chen, Dapeng Chen, Xiutian Zhao
| Summary:
Recent development of Large Vision-Language Models (LVLMs) has attracted
growing attention within the AI landscape for its practical implementation
potential. However, “hallucination”, or more specifically, the misalignment
between factual visual content and corresponding textual generation, poses a
significant challenge of utilizing LVLMs. In this comprehensive survey, we
dissect LVLM-related hallucinations in an attempt to establish an overview and
facilitate future mitigation. Our scrutiny starts with a clarification of the
concept of hallucinations in LVLMs, presenting a variety of hallucination
symptoms and highlighting the unique challenges inherent in LVLM
hallucinations. Subsequently, we outline the benchmarks and methodologies
tailored specifically for evaluating hallucinations unique to LVLMs.
Additionally, we delve into an investigation of the root causes of these
hallucinations, encompassing insights from the training data and model
components. We also critically review existing methods for mitigating
hallucinations. The open questions and future directions pertaining to
hallucinations within LVLMs are discussed to conclude this survey.
| Search Query: ArXiv Query: search_query=au:”Ke Wang”&id_list=&start=0&max_results=3