Abstract
Increasing the number of minority samples by data generation can effectively improve the performance of mining minority samples using a classifier in imbalanced problems. In this paper, we proposed an effective data generation algorithm for minority samples called the Adaptive Increase dimension of Variational AutoEncoder (ADA-INCVAE). Complementary to prior studies, a theoretical study is conducted from the perspective of multi-task learning to solve the posterior collapse for VAE. Afterward, by using the theoretical support, it proposed a novel training method by increasing the dimension of data to avoid the occurrence of posterior collapse. Aiming at restricting the range of synthetic data for different minority samples, an adaptive reconstruction loss weight is proposed according to the distance distribution of majority samples around the minority class samples. In the data generation stage, the generation proportion of different sample points is determined by the local information of the minority class. The experimental results based on 12 imbalanced datasets indicated that the algorithm could help the classifier to effectively improve F1-measure and G-mean, which verifies the effectiveness of synthetic data generated by ADA-INCVAE.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 1–1
Sainin MS, Alfred R, Adnan F, Ahmad F (2017) Combining sampling and ensemble classifier for multiclass imbalance data learning. In: International conference on computational science and technology. Springer, pp 262–272
Pouyanfar S, Chen SC (2017) Automatic video event detection for imbalance data using enhanced ensemble deep learning. Int J Semant Comput 11(01):85–109
Zhang X, Han Y, Xu W, Wang Q (2019) Hoba: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Information Sciences
Le T, Vo B, Fujita H, Nguyen NT, Baik SW (2019) A fast and accurate approach for bankruptcy forecasting using squared logistics loss with gpu-based extreme gradient boosting. Information Sciences
Sun JA, Li HB, Fh C, Fu BD, Ai WE (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-svm ensemble combined with smote and time weighting. Inform Fusion 54:128–144
Tang B, He H (2015) Kerneladasyn: Kernel based adaptive synthetic data generation for imbalanced learning. In: 2015 IEEE Congress on evolutionary computation (CEC). IEEE, pp 664–671
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority oversampling technique. J Artif Intell Res 16:321–357
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: ICLR
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new oversampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887
Bellinger C (2016) Beyond the boundaries of SMOTE: a framework for manifold-based synthetic oversampling. Ph.D. thesis, Université d’Ottawa/University of Ottawa
Li J, Fong S, Wong RK, Chu VW (2018) Adaptive multi-objective swarm fusion for imbalanced data classification. Inform Fusion 39:1–24
Cervantes J, Garcia-Lamont F, Rodriguez L, López A, Castilla JR, Trueba A (2017) Pso-based method for svm classification on skewed data sets. Neurocomputing 228:187–197
Raghuwanshi BS, Shukla S (2020) Smote based class-specific extreme learning machine for imbalanced learning. Knowl-Based Syst 187:104814
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29
Guan H, Zhang Y, Xian M, Cheng H, Tang X (2020) Smote-wenn: Solving class imbalance and small sample problems by oversampling and distance scaling. Appl Intell 1–16
Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: an open-source software for multi-class imbalance learning. Knowl-Based Syst 174(JUN.15):137–143
Kovacs G (2019) Smote-variants: a python implementation of 85 minority oversampling techniques. Neurocomputing 366(Nov.13):352–354
Goodfellow JI, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville CA, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst 27(NIPS 2014):2672–2680
Zhang C, Zhou Y, Chen Y, Deng Y, Wang X, Dong L, Wei H (2018) Oversampling algorithm based on vae in imbalanced classification. In: International conference on cloud computing. Springer, pp 334–344
Kim J, Oh TH, Lee S, Pan F, Kweon IS (2019) Variational prototyping-encoder: One-shot learning with prototypical images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9462–9470
Mocanu DC, Mocanu E (2018) One-shot learning using mixture of variational autoencoders: a generalization learning approach. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 2016–2018
He J, Spokoyny D, Neubig G, Berg-Kirkpatrick T (2018) Lagging inference networks and posterior collapse in variational autoencoders. In: International conference on learning representations
Zhu Q, Bi W, Liu X, Ma X, Li X, Wu D (2020) A batch normalized inference network keeps the kl vanishing away. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp 2636–2649
Shen X, Su H, Niu S, Demberg V (2018) Improving variational encoder-decoders in dialogue generation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Bowman SR, Vilnis L, Vinyals O, Dai A, Bengio S (2016) Generating sentences from a continuous space. In: Proceedings of The 20th SIGNLL conference on computational natural language learning
Kingma DP, Salimans T, Jozefowicz R, Chen X, Sutskever I, Welling M (2016) Improved variational inference with inverse autoregressive flow. In: Advances in neural information processing systems, pp 4743–4751
Lin X, Zhen HL, Li Z, Zhang QF, Kwong S (2019) Pareto multi-task learning. In: Advances in neural information processing systems. pp 12060–12070
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328
Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20
Douzas G, Bacao F (2019) Geometric smote a geometrically enhanced drop-in replacement for smote. Inform Sci 501:118– 135
Chen B, Xia S, Chen Z, Wang B, Wang G (2020) Rsmote: A self-adaptive robust smote for imbalanced problems with label noise. Information Sciences
Higgins I, Matthey L, Pal A, Burgess CP, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-vae: Learning basic visual concepts with a constrained variational framework. In: ICLR
Dupont E (2018) Learning disentangled joint continuous and discrete representations. In: Proceedings of the 32nd international conference on neural information processing systems. pp 708–718
van den Oord A, Vinyals O, Kavukcuoglu K (2017) Neural discrete representation learning. In: NIPS
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, K., Wang, X. ADA-INCVAE: Improved data generation using variational autoencoder for imbalanced classification. Appl Intell 52, 2838–2853 (2022). https://doi.org/10.1007/s10489-021-02566-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02566-1