Kavli Affiliate: Wei Gao
| First 5 Authors: Bowen Qu, Haohui Li, Wei Gao, ,
| Summary:
AI-Generated Images (AGIs) have inherent multimodal nature. Unlike
traditional image quality assessment (IQA) on natural scenarios, AGIs quality
assessment (AGIQA) takes the correspondence of image and its textual prompt
into consideration. This is coupled in the ground truth score, which confuses
the unimodal IQA methods. To solve this problem, we introduce IP-IQA (AGIs
Quality Assessment via Image and Prompt), a multimodal framework for AGIQA via
corresponding image and prompt incorporation. Specifically, we propose a novel
incremental pretraining task named Image2Prompt for better understanding of
AGIs and their corresponding textual prompts. An effective and efficient
image-prompt fusion module, along with a novel special [QA] token, are also
applied. Both are plug-and-play and beneficial for the cooperation of image and
its corresponding prompt. Experiments demonstrate that our IP-IQA achieves the
state-of-the-art on AGIQA-1k and AGIQA-3k datasets. Code will be available.
| Search Query: ArXiv Query: search_query=au:”Wei Gao”&id_list=&start=0&max_results=3