Abstract
In graph neural network applications, GraphSAGE applies inductive learning and has been widely applied in important research topics such as node classification. The subgraph of nodes directly affects the classification performance for GraphSAGE since it applies aggregation function to obtain embedding from the neighbors’ feature. In many practical applications, the uneven class distribution of nodes makes it difficult for graph neural network to fully learn the topology and attribute of the minority, which limits the classification performance. Aiming at the problem of imbalanced node classification in GraphSAGE, we propose a new graph over-sampling algorithm called subgraph generation by conditional generative adversarial network (SG-CGAN). SG-CGAN learns the hidden layer expression of different nodes through GraphSAGE and trains conditional generative adversarial network (CGAN) through the nodes’ hidden vector and related subgraph. Meanwhile, the hidden synthetic data are generated as input of CGAN to generate subgraphs of the minority, and retrain the GraphSAGE by adding the synthetic subgraphs. In the experiments on five graph datasets with first-order neighbors, the average improvement in ACC, macro-F1, and micro-F1 was \(1.25\%\), \(4.44\%\), and \(1.59\%\), respectively, compared to not adding synthetic data. In the second-order neighbor experiments, the percentages were \(0.75\%\), \(3.58\%\), and \(2.1\%\), verifying the effectiveness of the SG-CGAN generated data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
This publication is supported by multiple datasets, which are openly available at locations cited in the reference section.
Code availability
The source code used in this work is available on Github (https://github.com/KaiHuangMO/SGCGAN.git).
References
Abedin MZ, Guotai C, Hajek P et al (2023) Combining weighted smote with ensemble learning for the class-imbalanced prediction of small business credit risk. Complex Intell Syst 9(4):3559–3579
Ando S, Huang CY (2017) Deep over-sampling framework for classifying imbalanced data. In: Ceci M, Hollmén J, Todorovski L et al (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, Cham, pp 770–785
Bao Y, Yang S (2023) Two novel smote methods for solving imbalanced classification problems. IEEE Access 11:5816–5823
Barua S, Islam MM, Yao X et al (2012) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Ding H, Sun Y, Wang Z et al (2023) A gan and ensemble learning-based hybrid approach for imbalanced data classification. Inform Process Manage 60(2):103–235
Dong Y, Xiao H, Dong Y (2022) Sa-cgan: an oversampling method based on single attribute guided conditional gan for multi-class imbalanced learning. Neurocomputing 472:326–337
Georgios Douzas, Bacao F et al (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Exp Syst Appl 91:464–71
El Alaoui D, Riffi J, Sabri A et al (2022) Deep graphsage-based recommendation system: jumping knowledge connections with ordinal aggregation network. Neural Comput Appl 4(14):11679–90
Elreedy D, Atiya AF, Kamalov F (2023) A theoretical distribution analysis of synthetic minority oversampling technique (smote) for imbalanced learning. Mach Learn. https://doi.org/10.1007/s10994-022-06296-4
Fan SKS, Tsai DM, Yeh PC (2023) Effective variational-autoencoder-based generative models for highly imbalanced fault detection data in semiconductor manufacturing. IEEE Trans Semicond Manuf 36(2):205–14
Fey M, Lenssen JE (2019) Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428
Fu S, Tian Y, Tang J et al (2023) Cost-sensitive learning with modified stein loss function. Neurocomputing 525:57–75
Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Guan H, Zhao L, Dong X et al (2023) Extended natural neighborhood for smote and its variants in imbalanced classification. Eng Appl Artif Intell 124(106):570
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural inform Process Syst 30
Han Q, Liu H, Huang M et al (2023) Heart disease prediction based on mwmote and res-bigru models. In: 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), IEEE, pp 563–569
Hu Y, Qu A, Work D (2022) Detecting extreme traffic events via a context augmented graph autoencoder. ACM Trans Intell Syst Technol (TIST) 13(6):1–23
Huang G, Jafari AH (2023) Enhanced balancing gan: minority-class image generation. Neural Comput Appl 35(7):5145–5154
Huang K, Wang X (2022) Ada-incvae: improved data generation using variational autoencoder for imbalanced classification. Appl Intell 52(3):2838–2853
Isola P, Zhu JY, Zhou T et al (2016) Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision & Pattern Recognition
Juan X, Zhou F, Wang W et al (2023) Ins-gnn: Improving graph imbalance learning with self-supervision. Inf Sci 637(118):935
Lehne B, Schlitt T (2009) Protein-protein interaction databases: keeping up with growing interactomes. Hum Genom 3(3):1–7
Lo WW, Layeghy S, Sarhan M et al (2022) E-graphsage: A graph neural network based intrusion detection system for iot. In: NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium, IEEE, pp 1–9
Lu C, Reddy CK, Wang P et al (2023) Multi-label clinical time-series generation via conditional gan. IEEE Trans Knowl Data Eng
Mernyei P, Cangea C (2020) Wiki-cs: A wikipedia-based benchmark for graph neural networks. arXiv preprint arXiv:2007.02901
Namata G, London B, Getoor L et al (2012) Query-driven active surveying for collective classification. In: 10th International Workshop on Mining and Learning with Graphs, p 1
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 701–710
Qu L, Zhu H, Zheng R et al (2021) Imgagn: Imbalanced network embedding via generative adversarial graph networks. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp 1390–1398
Rafatirad S, Homayoun H, Chen Z et al (2022) Graph learning. Machine learning for computer scientists and data analysts. Springer, pp 277–304
Ren Z, Zhu Y, Liu Z et al (2023) Few-shot gan: improving the performance of intelligent fault diagnosis in severe data imbalance. IEEE Trans Instrum Measure 72:1–4
Sen P, Namata G, Bilgic M et al (2008) Collective classification in network data. AI Mag 29(3):93–93
Shi M, Ding C, Wang R et al (2023) Graph embedding deep broad learning system for data imbalance fault diagnosis of rotating machinery. Reliab Eng Syst Saf 240(109):601
Sun Z, Zhang H, Bai J et al (2023) A discriminatively deep fusion approach with improved conditional gan (im-cgan) for facial expression recognition. Pattern Recogn 135(109):157
Thakur PS, Jadeja M, Chouhan SS (2024) Cbret: a cluster-based resampling technique for dealing with imbalanced data in code smell prediction. Knowl-Based Syst 286:111390
Tomek I (2007) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern SMC 6(6):448–452
Velickovic P, Cucurull G, Casanova A et al (2017) Graph attention networks. Stat 1050:20
Wang H, Li P, Lang X et al (2023) Ftgan: a novel gan-based data augmentation method coupled time-frequency domain for imbalanced bearing fault diagnosis. IEEE Trans Instrum Meas 72:1–14
Welling M, Kipf TN (2016) Semi-supervised classification with graph convolutional networks. In: J. International Conference on Learning Representations (ICLR 2017)
Wu L, Lin H, Gao Z et al (2021) Graphmixup: Improving class-imbalanced node classification on graphs by self-supervised context prediction. arXiv preprint arXiv:2106.11133
Xia F, Sun K, Yu S et al (2021) Graph learning: a survey. IEEE Trans Artif Intell 2(2):109–127
Xie Liu H, Zeng S et al (2021) A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl-Based Syst 213(106):689
Yan M, Li N (2023) Borderline-margin loss based deep metric learning framework for imbalanced data. Appl Intell 53(2):1487–1504
Zhao T, Zhang X, Wang S (2021a) Graphsmote: Imbalanced node classification on graphs with graph neural networks. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 833–841
Zhao Y, Hao K, Xs Tang et al (2021) A conditional variational autoencoder based self-transferred algorithm for imbalanced classification. Knowl-Based Syst 218(106):756
Zhu Z, Xing H, Xu Y (2023) Balanced neighbor exploration for semi-supervised node classification on imbalanced graph data. Inf Sci 631:31–44
Funding
Kai Huang reports financial support was provided by Natural Science Foundation (3502Z202372018) of Xiamen, China. Kai Huang reports financial support was provided by Department of Education (JAT232012) of Fujian Province of China.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest to declare that are relevant to the content of this article.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, K., Chen, C. Subgraph generation applied in GraphSAGE deal with imbalanced node classification. Soft Comput 28, 10727–10740 (2024). https://doi.org/10.1007/s00500-024-09797-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-024-09797-7