[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Mitigating false negatives in imbalanced datasets: : An ensemble approach

Published: 01 March 2025 Publication History

Highlights

Addressing imbalanced data in ML poses challenges due to class disproportion.
In some imbalanced datasets, false negatives impact more than false positives.
This work introduces the MinFNR algorithm to minimize False Negative Rates (FNR).
The new algorithm strategically combines data, algorithmic, and hybrid approaches.

Abstract

Imbalanced datasets present a challenge in machine learning, especially in binary classification scenarios where one class significantly outweighs the other. This imbalance often leads to models favoring the majority class, resulting in inadequate predictions for the minority class, specifically in false negatives. In response to this issue, this work introduces the MinFNR ensemble algorithm, designed to minimize False Negative Rates (FNR) in imbalanced datasets. The new approach strategically combines data-level, algorithmic-level, and hybrid-level approaches to enhance overall predictive capabilities while minimizing computational resources using the Set Covering Problem (SCP) formulation. Through a comprehensive evaluation of diverse datasets, MinFNR consistently outperforms individual algorithms, showing its potential for applications where the cost of false negatives is substantial, such as fraud detection and medical diagnosis. This work also contributes to ongoing efforts to improve the reliability and effectiveness of machine learning algorithms in real imbalanced scenarios.

References

[1]
[dataset] Air Pollution Norwegian. Magne, Aldrin (2004, July 28). https://lib.stat.cmu.edu/datasets/NO2.dat. Accessed March 27, 2024.
[2]
J. Brownlee, Imbalanced classification with python - choose better metrics, balance skewed classes, and apply cost-sensitive learning, Machine Learning Mastery 463 (2020).
[3]
C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem, in: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009,.
[4]
C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, DBSMOTE: Density-based synthetic minority over-sampling technique, Applied Intelligence 36 (3) (2012) 664–684,.
[5]
Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE : Synthetic Minority Over-sampling Technique. 16, 321–357.
[6]
Chvatal, V. (1979). A Greedy Heuristic for the Set-Covering Problem. In Source: Mathematics of Operations Research (Vol. 4, Issue 3). https://www.jstor.org/stable/3689577.
[7]
[dataset] Andrea, Dal Pozzolo. (2017) Credit Card Fraud Detection. Https://Www.Kaggle.Com/Datasets/Mlg-Ulb/Creditcardfraud. Accessed March 27, 2024.
[8]
A. Fernández, S. García, F. Herrera, N.V. Chawla, SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary, Journal of Artificial Intelligence Research 61 (2018) 863–905,.
[9]
X. Gao, B. Ren, H. Zhang, B. Sun, J. Li, J. Xu, Y. He, K. Li, An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling, Expert Systems with Applications 160 (2020),.
[10]
Garey, M. R., & Johnson, D. S. (1979). Garey, David S. Johnson - Computers and Intractability - A Guide to the Theory of NP-Completeness (1st ed.). https://doi.org/10.1090/S0273-0979-1980-14848-X.
[11]
G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, Learning from class-imbalanced data : Review of methods and applications, Expert Systems With Applications 73 (2017) 220–239,.
[12]
D.S. Johnson, Approximation algorithms for combinatorial problems, Journal of Computer and System Sciences 9 (1974).
[13]
B. Knowles, J. D'Cruz, J.T. Richards, K.R. Varshney, Humble AI, Commun. ACM 66 (9) (2023) 73–79,.
[14]
Kou, Y., Lu, C., & Sinvongwattana, S. (2004). Survey of Fraud Detection Techniques Yo-Ping Huang. 749–754. https://doi.org/10.1109/ICNSC.2004.1297040.
[15]
B. Lebichot, G.M. Paldino, W. Siblini, L. He-Guelton, F. Oblé, G. Bontempi, Incremental learning strategies for credit cards fraud detection, International Journal of Data Science and Analytics 12 (2) (2021) 165–174,.
[16]
X.Y. Liu, J. Wu, Z.H. Zhou, Exploratory undersampling for class-imbalance learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39 (2) (2009) 539–550,.
[17]
L. Lovász, On the ratio of optimal integral and fractional covers, Discrete Mathematics 13 (1975).
[18]
N. Lunardon, G. Menardi, N. Torelli, ROSE: A package for binary imbalanced learning, R Journal 6 (1) (2014) 79–89. https://doi.org/10.32614/rj-2014-008.
[19]
J.N. Mandrekar, Receiver operating characteristic curve in diagnostic test assessment, Journal of Thoracic Oncology 5 (9) (2010) 1315–1316,.
[20]
Peeperkorn, J., vanden Broucke, S., & De Weerdt, J. (2024). Validation set sampling strategies for predictive process monitoring. Information Systems, 121. https://doi.org/10.1016/j.is.2023.102330.
[21]
González-Fabra, J., Álvarez-Moreno, M., Gumbau, M., & Bo, C. PubChem. (2017, July 12). Bioassay Datasets. Https://Www.Kaggle.Com/Datasets/Uciml/Bioassay-Datasets. https://doi.org/https://doi.org/10.19061/iochem-bd-6-3, Accessed March 27, 2024.
[22]
M.O. Vasconcelos, L. Cavique, Dataset for corruption risk assessment in a public administration, Data in Brief 40 (2022),.
[23]
W. William, A. Ware, A.H. Basaza-Ejiri, J. Obungoloch, A review of image analysis and machine learning techniques for automated cervical cancer screening from pap-smear images, Computer Methods and Programs in Biomedicine 164 (2018) 15–22,.
[24]
Wolsey, L. A. (2021). Integer programming (John Wiley & Sons, Ed.; 2a). WILEY. https://doi.org/
[25]
B. Zhu, B. Baesens, A. Backiel, S.K.L.M. Vanden Broucke, Benchmarking sampling techniques for imbalance learning in churn prediction, Journal of the Operational Research Society 69 (1) (2018) 49–65,.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal  Volume 262, Issue C
Mar 2025
1584 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 March 2025

Author Tags

  1. Imbalanced dataset
  2. False negative rate
  3. Ensemble algorithms
  4. Fraud detection
  5. Set covering problem

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media