Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling

Ijaz Khan^16,17,
Abdul Rahim Ahmad¹⁸,
Nafaa Jabeur¹⁹ &
…
Mohammed Najah Mahdi²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13051))

Included in the following conference series:

International Visual Informatics Conference

1506 Accesses
1 Citations

Abstract

Classification, a significant application of machine learning, labels each instance of the dataset into one of the predefined classes. Problems occur when the number of instances in the classes is not uniform. The exceptional lyuneven class distribution gives rise to class imbalancing issues which tend to demote the overall performance of the classifier. A set of data-level algorithms are available which are applied to adjust the class distribution. The class imbalancing emerges frequently in datasets from educational domains where the number of students with unsatisfactory performance general appears in low number comparing to the students with satisfactory outcomes. This paper applies a set of data-level sampling algorithms over a dataset taken from an educational domain. It underlines the consequences rising from classification with imbalanced dataset. This research confirms that a classification model achieving higher accuracy may not appear effective in correct identification of instances in minority class. Classification with an imbalance dataset may produce low recall, precision and F-Measure for classes with lower number of instances. The performance of classification model improves with application of data level algorithm. However, it highlights the supremacy of oversampling algorithm over undersampling algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Review of the Oversampling Techniques in Class Imbalance Problem

Experimental Analysis of Oversampling Techniques in Class Imbalance Problem

Revisiting Class Imbalance: A Generalized Notion for Oversampling

References

Luque, A., et al.: The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recogn. 91, 216–231 (2019)
Article Google Scholar
Tyagi, S., Mittal, S.: Sampling approaches for imbalanced data classification problem in machine learning. In: Proceedings of ICRIC 2019, pp. 209–221. Springer (2020). https://doi.org/10.1007/978-3-030-29407-6_17
Leevy, J.L., Khoshgoftaar, T.M., Bauder, R.A., Seliya, N.: A survey on addressing high-class imbalance in big data. J. Big Data 5(1), 1–30 (2018). https://doi.org/10.1186/s40537-018-0151-6
Article Google Scholar
Elreedy, D., Atiya, A.F.: A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Inf. Sci. 505, 32–64 (2019)
Article Google Scholar
Raghuwanshi, B.S., Shukla, S.: SMOTE based class-specific extreme learning machine for imbalanced learning. Knowl.-Based Syst. 187, 104814 (2019)
Google Scholar
Romero, C., Ventura, S.: Educational data mining and learning analytics: an updated survey. Wiley Interdisc. Rev. Data Mining Knowl. Disc. 10(3), e1355 (2020)
Google Scholar
Leitner, P., Khalil, M., Ebner, M.: Learning analytics in higher education—a literature review. Learn. Anal. Fundaments Appl. Trends 94, 1–23 (2017). https://doi.org/10.1007/978-3-319-52977-6_1
Khan, I., et al.: A conceptual framework to aid attribute selection in machine learning student performance prediction models. Int. J. Interactive Mob. Technol. 15(15) (2021)
Google Scholar
Osmanbegovic, E., Suljic, M.: Data mining approach for predicting student performance. Econ. Rev. J. Econ. Bus. 10(1), 3–12 (2012)
Google Scholar
Asif, R., Merceron, A., Pathan, M.K.: Predicting student academic performance at degree level: a case study. Int. J. Intell. Syst. Appl. 7(1), 49 (2014)
Google Scholar
Kabakchieva, D.: Predicting student performance by using data mining methods for classification. Cybern. Inf. Technol. 13(1), 61–72 (2013)
MathSciNet Google Scholar
Ramesh, V., Parkavi, P., Ramar, K.: Predicting student performance: a statistical and data mining approach. Int. J. Comput. Appl. 63(8), 35–39 (2013)
Google Scholar
Kaur, P., Singh, M., Josan, G.S.: Classification and prediction based data mining algorithms to predict slow learners in education sector. Procedia Comput. Sci. 57, 500–508 (2015)
Article Google Scholar
Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem. Int. J. Advance Soft Compu. Appl. 5(3) (2013)
Google Scholar
Huang, Y.-M., Du, S.-X.: Weighted support vector machine for classification with uneven training class sizes. In: 2005 International Conference on Machine Learning and Cybernetics. IEEE (2005)
Google Scholar
Khan, I., et al.: Tracking student performance in introductory programming by means of machine learning. In: 2019 4th MEC International Conference on Big Data and Smart City (ICBDSC). IEEE (2019)
Google Scholar
Loyola-González, O., et al.: An empirical study of oversampling and undersampling methods for lcmine an emerging pattern based classifier. In: Mexican Conference on Pattern Recognition, Springer (2019). https://doi.org/10.1007/978-3-642-38989-4_27
Verbiest, N., et al.: Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection. Appl. Soft Comput. 22, 511–517 (2014)
Article Google Scholar
Mohammed, R., Rawashdeh, J., Abdullah, M.: Machine learning with oversampling and undersampling techniques: overview study and experimental results. In: 2020 11th International Conference on Information and Communication Systems (ICICS). IEEE (2020)
Google Scholar
Hernandez, J., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: An empirical study of oversampling and undersampling for instance selection methods on imbalance datasets. In: Iberoamerican Congress on Pattern Recognition. Springer (2013). https://doi.org/10.1007/978-3-642-41822-8_33
Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
García, S., Luengo, J., Herrera, F.: Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl.-Based Syst. 98, 1–29 (2016)
Article Google Scholar
Fernández, A., et al.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)
Article MathSciNet Google Scholar
Elreedy, D., Atiya, A.F.: A novel distribution analysis for smote oversampling method in handling class imbalance. In: International Conference on Computational Science. Springer (2019). https://doi.org/10.1007/978-3-030-22744-9_18
Hall, M., et al.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article MathSciNet Google Scholar
Franklin, J.: The elements of statistical learning: data mining, inference and prediction. Math. Intelligencer 27(2), 83–85 (2005). https://doi.org/10.1007/BF02985802
Article Google Scholar
Tharwat, A.: Classification assessment methods. Appl. Comput. Inf. (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Graduate Studies, Universiti Tenaga Nasional, Kajang, Malaysia
Ijaz Khan
Information Technology Department, Buraimi University College, Al-Buraimi, Oman
Ijaz Khan
College of Computing and Informatics, Universiti Tenaga Nasional, Kajang, Malaysia
Abdul Rahim Ahmad
Computer Science Department, German University of Technology, Muscat, Oman
Nafaa Jabeur
Institute of Informatics and Computing in Energy, Universiti Tenaga Nasional, Kajang, Malaysia
Mohammed Najah Mahdi

Authors

Ijaz Khan
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Rahim Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Nafaa Jabeur
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Najah Mahdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ijaz Khan .

Editor information

Editors and Affiliations

Universiti Tenaga Nasional, Selangor, Malaysia
Halimah Badioze Zaman
Dublin City University, Dublin, Ireland
Alan F. Smeaton
National Central University, Jhongli, Taiwan
Timothy K. Shih
Queen Mary University of London, London, UK
Sergio Velastin
Toyo University, Tokyo, Japan
Tada Terutoshi
University of Southern Denmark, Odense, Denmark
Bo Nørregaard Jørgensen
Universiti Tenaga Nasional, Selangor, Malaysia
Hazleen Aris
Universiti Tenaga Nasional, Selangor, Malaysia
Nazrita Ibrahim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khan, I., Ahmad, A.R., Jabeur, N., Mahdi, M.N. (2021). Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2021. Lecture Notes in Computer Science(), vol 13051. Springer, Cham. https://doi.org/10.1007/978-3-030-90235-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-90235-3_38
Published: 16 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90234-6
Online ISBN: 978-3-030-90235-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of the Oversampling Techniques in Class Imbalance Problem

Experimental Analysis of Oversampling Techniques in Class Imbalance Problem

Revisiting Class Imbalance: A Generalized Notion for Oversampling

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Minimizing Classification Errors in Imbalanced Dataset Using Means of Sampling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of the Oversampling Techniques in Class Imbalance Problem

Experimental Analysis of Oversampling Techniques in Class Imbalance Problem

Revisiting Class Imbalance: A Generalized Notion for Oversampling

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation