[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Constructing small sample datasets with game mixed sampling and improved genetic algorithm

Published: 04 June 2024 Publication History

Abstract

The issue of categorizing imbalanced data is becoming increasingly prevalent. While existing methodologies have demonstrated notable advancements in handling imbalanced data, the challenges of extensive data size and low-quality data in the dataset persist. We propose an innovative hybrid approach that combines game mixed sampling and improved genetic algorithms to address the issues of excessive dataset size and low data quality in imbalanced data classification problems. Specifically, in the game hybrid sampling module, we identify the optimal hybrid sampling method and sampling ratio for the current dataset through the game idea, intending to obtain a diverse dataset to ensure comprehensive coverage of the dataset. Additionally, in the module on improving genetic algorithms, we integrate the classifier group, encode the performance metrics of the sampled data, and perform a comprehensive evaluation of the fitness of each data point. Preserve population data of many excellent individuals through selection operations. The population data is subjected to crossover and mutation operations to explore the search space, and the minimum stable population size is determined by sliding standard deviation. In the real world, where credit card fraud data are highly imbalanced, our combined approach achieves small dataset sizes and high evaluation indices, outperforms existing methods, and demonstrates the effectiveness of game mixed sampling and improved genetic algorithms.

References

[1]
Yan Z, Hongle D, Gang K, et al. Dynamic weighted selective ensemble learning algorithm for imbalanced Data Streams J Supercomput 2021 78 4 5394-5419
[2]
Tong Y, Li Z, Huang H, et al. Research of spatial context convolutional neural networks for early diagnosis of Alzheimer’s disease J Supercomput 2023 80 4 5279-5297
[3]
Thakkar A and Lohiya R A survey on Intrusion Detection System: feature selection, model, performance measures, application perspective, challenges, and future research directions Artif Intell Rev 2021 55 1 453-563
[4]
Patel U and Patel V Active learning-based hyperspectral image classification: a reinforcement learning approach J Supercomput 2023 80 2 2461-2486
[5]
Nadkarni PM, Ohno-Machado L, and Chapman WW Natural language processing: an introduction J Am Med Inform Assoc 2011 18 5 544-551
[6]
Saini M and Susan S Diabetic retinopathy screening using Deep Learning for multi-class imbalanced datasets Comput Biol Med 2022 149
[7]
Jiang Z, Zhao L, Lu Y, et al. A semi-supervised resampling method for class-imbalanced learning Expert Syst Appl 2023 221
[8]
Guo J, Wu H, Chen X, et al. Adaptive SV-Borderline SMOTE-SVM algorithm for imbalanced data classification Appl Soft Comput 2024 150
[9]
Fang Y, Yao Y, Lin X, et al. A feature selection based on genetic algorithm for intrusion detection of industrial control systems Comput Secur 2024 139
[10]
Saheed YK, Abdulganiyu OH, and Tchakoucht TA Modified genetic algorithm and fine-tuned long short-term memory network for intrusion detection in the internet of things networks with edge capabilities Appl Soft Comput 2024
[11]
Golrasan E and Varposhti M Probabilistic coverage in Mobile Directional Sensor Networks: a game theoretical approach J Supercomput 2023 79 13 14200-14220
[12]
Junsomboon N, Phienthrakul T (2017) Combining over-sampling and under-sampling techniques for Imbalance dataset. In: Proceedings of the 9th International Conference on Machine Learning and Computing.
[13]
Ram PK and Kuila P Gaae: a novel genetic algorithm based on autoencoder with ensemble classifiers for Imbalanced Healthcare Data J Supercomput 2022 79 1 541-572
[14]
Gupta N, Jindal V, and Bedi P CSE-ids: using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based Intrusion Detection Systems Comput Secur 2022 112
[15]
Liu Y, Liu Y, Zhong S, et al. Noise-robust oversampling for imbalanced data classification Pattern Recogn 2023 133
[16]
Hoyos-Osorio J, Alvarez-Meza A, Daza-Santacoloma G, et al. Relevant information undersampling to support Imbalanced Data Classification Neurocomputing 2021 436 136-146
[17]
He H, He J, and Zhang L Imbalanced data sampling design based on grid boundary domain for Big Data Comput Stat 2024
[18]
Zhang A, Yu H, Huan Z, et al. SMOTE-RkNN: a hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors Inf Sci 2022 595 70-88
[19]
Khalili N and Rastegar MA Optimal cost-sensitive credit scoring using a new hybrid performance metric Expert Syst Appl 2023 213
[20]
Su Q, Hamed HNA, Isa MA, et al. A GAN-based data augmentation method for imbalanced multi-class skin lesion classification IEEE Access 2024
[21]
Ding H, Sun Y, Huang N, et al. VGAN-BL: imbalanced data classification based on generative adversarial network and biased loss Neural Comput Appl 2024 36 6 2883-2899
[22]
Guan S, Zhao X, Xue Y, et al. AWGAN: an adaptive weighting GAN approach for oversampling imbalanced datasets Inf Sci 2024
[23]
Ye M, Han QL, Ding L, et al. Distributed nash equilibrium seeking in games with Partial Decision Information: a survey Proc IEEE 2023 111 2 140-157
[24]
Beinecke J and Heider D Gaussian noise up-sampling is better suited than smote and ADASYN for clinical decision making BioData Min 2021
[25]
Fernandez A, Garcia S, Herrera F, et al. Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary J Artif Intell Res 2018 61 863-905
[26]
Smiti S and Soui M Bankruptcy prediction using deep learning approach based on borderline smote Inf Syst Front 2020 22 5 1067-1083
[27]
Tahir MA, Kittler J, and Yan F Inverse random under sampling for class imbalance problem and its application to multi-label classification Pattern Recogn 2012 45 10 3738-3750
[28]
ULB MLG- (Ed.) (2018) Credit Card Fraud Detection. Retrieved from https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
[29]
Sakar C, Serbes G, Gunduz A, et al. Parkinson’s Disease Classification UCI Mach Learn Repos 2018
[30]
Realinho V, Vieira MM, Machado J, et al. Predict students’ dropout and academic success UCI Mach Learn Repos 2021
[31]
Saeed MH and Hama JI Cardiac disease prediction using AI algorithms with selectkbest Med Biol Eng Comput 2023 61 12 3397-3408
[32]
Zhang ML and Zhou ZH ML-KNN: a lazy learning approach to multi-label learning Pattern Recogn 2007 40 7 2038-2048
[33]
Zhang S, Li J, and Li Y Reachable distance function for KNN classification IEEE Trans Knowl Data Eng 2022
[34]
Sun J, Lang J, Fujita H, et al. Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on smote and bagging with differentiated sampling rates Inf Sci 2018 425 76-91
[35]
Kurani A, Doshi P, Vakharia A, et al. A comprehensive comparative study of Artificial Neural Network (ANN) and support vector machines (SVM) on stock forecasting Ann Data Sci 2021 10 1 183-208
[36]
Guo L, Li W, and Lang X Application of SKPCA-LSSVM model in gasoline dry point prediction J Liaoning Petrochem Univ 2022 42 3 74-78
[37]
Bai J, Li Y, Li J, et al. Multinomial random forest Pattern Recogn 2022 122
[38]
Prinzie A and Van den Poel D Random forests for multiclass classification: random multinomial logit Expert Syst Appl 2008 34 3 1721-1732
[39]
Sethuraman R, Sellappan S, Shunmugiah J, et al. An optimized AdaBoost multi-class support vector machine for driver behavior monitoring in the advanced driver assistance systems Expert Syst Appl 2023 212
[40]
Wang C, Deng C, and Wang S Imbalance-xgboost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost Pattern Recogn Lett 2020 136 190-197
[41]
Hashemi SK, Mirtaheri SL, and Greco S Fraud detection in banking data by machine learning techniques IEEE Access 2022 11 3034-3043
[42]
Cui J, Yan C, and Wang C ReMEMBeR: ranking metric embedding-based multicontextual behavior profiling for online banking fraud detection IEEE Trans Comput Soc Syst 2021 8 3 643-654
[43]
Kong M, Li R, Wang J, et al. CFTNet: a robust credit card fraud detection model enhanced by counterfactual data augmentation Neural Comput Appl 2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 80, Issue 14
Sep 2024
1621 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 04 June 2024
Accepted: 21 May 2024

Author Tags

  1. Imbalanced dataset
  2. Game mixed sampling
  3. Improved genetic algorithms
  4. Performance metric coding
  5. Small dataset size
  6. High evaluation index

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media