[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3607947.3607993acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesic3Conference Proceedingsconference-collections
research-article

An Improved Oversampling Algorithms based on Informative Sample Selection Strategy Solving Imbalance

Published: 28 September 2023 Publication History

Abstract

Imbalanced data has been the focus of ongoing classification research. It describes a scenario where the distribution of data samples is uneven, and one or more classes in the dataset are underrepresented as a result. When trained on such datasets, this mismatch has a negative impact on the performance of conventional learning models. The key problem is in finding appropriate samples for creating synthetic data, even though numerous strategies have been developed to overcome class imbalance during data pre-processing. In this study, we offer an efficient method for overcoming imbalance classification issues caused by oversampling called Informative Sample Selection (ISS). The main goal of ISS is to find useful samples from the minority class in the dataset that may be used to produce data that is synthetic. We conducted experiments on 22 imbalanced datasets to evaluate the performance of our suggested model. We assessed the performance of ISS in comparison to several cutting-edge techniques, including SMOTE, Borderline-SMOTE, ADASYN, safe-level SMOTE, and ROS. AUC and F-Measure were the evaluation measures employed in our study. The outcomes of our tests show that ISS works better than the current approaches, showing significant progress in tackling the challenges brought on by imbalanced data in classification.

References

[1]
Bewoor, L.A.; Chandra Prakash, V.; Sapkal, S.U. Evolutionary hybrid particle swarm optimization algorithm for solving NP-hard no-wait flow shop scheduling problems. Algorithms 2017, 10, 121.
[2]
Wang, S.; Yao, X. Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability 2013, 62, 434–443.
[3]
Liu, S.; Zhang, J.; Xiang, Y.; Zhou, W.; Xiang, D. A study of data pre-processing techniques for imbalanced biomedical data classification. arXiv preprint arXiv:1911.00996 2019.
[4]
Searle, S.R.; Searle, S. Linear models for unbalanced data; Vol. 1987, Wiley New York, 1987.
[5]
Sáez, J.A.; Luengo, J.; Stefanowski, J.; Herrera, F. SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences 2015, 291, 184–203.
[6]
Amarendra, C.; Reddy, K.H. PSO Algorithm Support Switching Pulse Sequence ISVM for Six-Phase Matrix Converter-Fed Drives. In Smart Intelligent Computing and Applications; Springer, 2019; pp. 559–569.
[7]
Namassivaya, N.; Pal, S.; Ratnam, D.V. Modelling of FPGA-Particle Swarm Optimized GNSS Receiver for Satellite Applications. Wireless Personal Communications 2019, 106, 879–895.
[8]
Potharaju, S.P.; Sreedevi, M. A Novel LtR and RtL Framework for Subset Feature Selection (Reduction) for Improving the Classification Accuracy. In Progress in Advanced Computing and Intelligent Engineering; Springer, 2019; pp. 215–224.
[9]
Thirugnanasambandam, K.; Prakash, S.; Subramanian, V.; Pothula, S.; Thirumal, V. Reinforced cuckoo search algorithm-based multimodal optimization. Applied Intelligence 2019, 49, 2059–2083.
[10]
Sultanpure, K.A.; Reddy, L.S.S. Job Scheduling for Energy Efficiency Using Artificial Bee Colony through Virtualization. International Journal of Intelligent Engineering and Systems 2018, 11, 138–148.
[11]
Douzas, G.; Bacao, F.; Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences 2018, 465, 1–20.
[12]
Bekkar, M.; Djemaa, H.K.; Alitouche, T.A. Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl 2013, 3.
[13]
Chung, D.; Kim, H. Accurate ensemble pruning with PL-bagging. Computational Statistics & Data Analysis 2015, 83, 1–13.
[14]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 2002, 16, 321–357.
[15]
Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. International conference on intelligent computing. Springer, 2005, pp. 878–887.
[16]
Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Pacific-Asia conference on knowledge discovery and data mining. Springer, 2009, pp. 475–482.
[17]
He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, 2008, pp. 1322–1328.
[18]
Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering 2012, 26, 405–425.
[19]
Ofek, N.; Rokach, L.; Stern, R.; Shabtai, A. Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem. Neurocomputing 2017, 243, 88–102.
[20]
Guzmán-Ponce, A.; Sánchez, J.S.; Valdovinos, R.M.; Marcial-Romero, J.R. DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem. Expert Systems with Applications 2021, 168, 114301.
[21]
Bej, S.; Davtyan, N.; Wolfien, M.; Nassar, M.; Wolkenhauer, O. LoRAS: An oversampling approach for imbalanced datasets. Machine Learning 2021, 110, 279–301.
[22]
Tang, B.; He, H. KernelADASYN: Kernel based adaptive synthetic data generation for imbalanced learning. 2015 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2015, pp. 664–671.
[23]
Hossin, M.; Sulaiman, M. A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process 2015, 5, 1.
[24]
Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition 1997, 30, 1145–1159.
[25]
P. Vuttipittayamongkol, E. Elyan, A. Petrovski, C. Jayne, Overlap-based under-sampling for improving imbalanced data classification, in: International Conference on Intelligent Data Engineering and Automated Learning, Springer, 2018, pp. 689–697.
[26]
C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Dbsmote: density-based synthetic minority over-sampling technique, Applied Intelligence 36 (3) (2012) 664–684
[27]
. C. Bunkhumpornpat, K. Sinapiromsaran, Dbmute: density-based majority under-sampling technique, Knowledge and Information Systems 50 (3) (2017) 827–850
[28]
. I. Nekooeimehr, S. K. Lai-Yuen, Adaptive semi-unsupervised weighted oversampling (a-suwo) for imbalanced datasets, Expert Systems with Applications 46(2016) 405–416
[29]
J. A. S ́aez, J. Luengo, J. Stefanowski, F. Herrera, Smote–ipf: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, Information Sciences 291 (2015) 184–203

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing
August 2023
783 pages
ISBN:9798400700224
DOI:10.1145/3607947
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2023

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IC3 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 24
    Total Downloads
  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media