Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification

Kavli Affiliate: Feng Wang

| First 5 Authors: Bohan Li, Xiao Xu, Xinghao Wang, Yutai Hou, Yunlong Feng

| Summary:

Existing image augmentation methods consist of two categories:
perturbation-based methods and generative methods. Perturbation-based methods
apply pre-defined perturbations to augment an original image, but only locally
vary the image, thus lacking image diversity. In contrast, generative methods
bring more image diversity in the augmented images but may not preserve
semantic consistency, thus incorrectly changing the essential semantics of the
original image. To balance image diversity and semantic consistency in
augmented images, we propose SGID, a Semantic-guided Generative Image
augmentation method with Diffusion models for image classification.
Specifically, SGID employs diffusion models to generate augmented images with
good image diversity. More importantly, SGID takes image labels and captions as
guidance to maintain semantic consistency between the augmented and original
images. Experimental results show that SGID outperforms the best augmentation
baseline by 1.72% on ResNet-50 (from scratch), 0.33% on ViT (ImageNet-21k), and
0.14% on CLIP-ViT (LAION-2B). Moreover, SGID can be combined with other image
augmentation baselines and further improves the overall performance. We
demonstrate the semantic consistency and image diversity of SGID through
quantitative human and automated evaluations, as well as qualitative case
studies.

| Search Query: ArXiv Query: search_query=au:”Feng Wang”&id_list=&start=0&max_results=3

Read More