Synthetic Over-sampling for Imbalanced Node Classification with Graph Neural Networks

Kavli Affiliate: Xiang Zhang

| First 5 Authors: Tianxiang Zhao, Xiang Zhang, Suhang Wang, ,

| Summary:

In recent years, graph neural networks (GNNs) have achieved state-of-the-art
performance for node classification. However, most existing GNNs would suffer
from the graph imbalance problem. In many real-world scenarios, node classes
are imbalanced, with some majority classes making up most parts of the graph.
The message propagation mechanism in GNNs would further amplify the dominance
of those majority classes, resulting in sub-optimal classification performance.
In this work, we seek to address this problem by generating pseudo instances of
minority classes to balance the training data, extending previous
over-sampling-based techniques. This task is non-trivial, as those techniques
are designed with the assumption that instances are independent. Neglection of
relation information would complicate this oversampling process. Furthermore,
the node classification task typically takes the semi-supervised setting with
only a few labeled nodes, providing insufficient supervision for the generation
of minority instances. Generated new nodes of low quality would harm the
trained classifier. In this work, we address these difficulties by synthesizing
new nodes in a constructed embedding space, which encodes both node attributes
and topology information. Furthermore, an edge generator is trained
simultaneously to model the graph structure and provide relations for new
samples. To further improve the data efficiency, we also explore synthesizing
mixed “in-between” nodes to utilize nodes from the majority class in this
over-sampling process. Experiments on real-world datasets validate the
effectiveness of our proposed framework.

| Search Query: ArXiv Query: search_query=au:”Xiang Zhang”&id_list=&start=0&max_results=10