[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Dynamic Classification Ensembles for Handling Imbalanced Multiclass Drifted Data Streams

Published: 01 June 2024 Publication History

Abstract

Machine learning models often encounter significant difficulties when dealing with multiclass imbalanced data streams in nonstationary environments. These challenges can lead to biased and unreliable predictions, which ultimately impact the overall performance of the models. To address these issues, we propose an innovative approach that integrates dynamic ensemble selection, an adaptive technique for managing imbalanced multiclass data streams, with a concept drift detector for recognizing stream changes and the K-nearest neighbor (KNN) algorithm to tackle issues related to class overlap. The primary objective was to improve the classification of imbalanced multiclass drifted data streams. The adaptive oversampling method generates synthetic samples to mitigate the issues associated with imbalanced data streams. This method utilizes KNN to ensure that the generated samples do not overlap. To handle incoming data streams, a drift detector assists in deciding whether to retain the existing classifiers or create a new one. Dynamic Ensemble Selection (DES) was utilized to select the most appropriate classifier for incoming data, aiming to optimize the performance of the classification task. The proposed method offers an effective solution for achieving an accurate and resilient classification in the context of imbalanced multiclass drifted data streams. To evaluate the effectiveness of our proposal, we conducted experiments on a variety of datasets, including benchmark datasets, real application stream datasets, and synthetic data streams. The experimental results demonstrate the superiority of our contribution in addressing the challenges posed by imbalanced multiclass drifted data streams.

References

[1]
P. Zyblewski, R. Sabourin, M. Woźniak, Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams, Inf. Fusion 66 (2021) 138–154.
[2]
A.H. Madkour, Historical Isolated Forest for detecting and adaptation concept drifts in nonstationary data streaming, IJCI. Int. J. Comput. Inf. (2023).
[3]
S. Wang, L.L. Minku, X. Yao, A systematic study of online class imbalance learning with concept drift, IEEE Trans. Neural Networks Learn. Syst. 29 (10) (2018) 4802–4821.
[4]
S. Yanmin, A.K.C. Wong, M.S. Kamel, Classification of imbalanced data: A review, Int. J. Pattern Recogn. Artif. Intell. 23 (4) (2009) 687–719.
[5]
Napierala, Krystyna, Jerzy Stefanowski, “Identification of different types of minority class examples in imbalanced data.,” in Hybrid Artificial Intelligent Systems: 7th International Conference, Salamanca, Spain, 2012.
[6]
B. Krawczyk, et al., Ensemble learning for data stream analysis: A survey, Inf. Fusion 37 (2017) 132–156.
[7]
F. Charte, et al., Addressing imbalance in multilabel classification: Measures and random re- sampling algorithms, Neurocomputing 163 (2015) 3–16.
[8]
F. Charte, et al., MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation, Knowl.-Based Syst. 89 (2025) 385–397.
[9]
Z. Daniels, D. Metaxas, Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests, In 31st AAAI Conference on Artificial Intelligence, 2017.
[10]
Q. Dai, J.W. Liu, J.P. Yang, Class-imbalanced positive instances augmentation via three-line hybrid, Knowl.-Based Syst. 257 (2022).
[11]
F. Charte, et al., MleNN: A first approach to heuristic multilabel undersampling, Intell. Data Eng. Automat. Learn.ng – IDEAL 8669 (2014) 1–9.
[12]
F. Charte, et al., Dealing with difficult minority labels in imbalanced mutilabel data sets, Neurocomputing 326–327 (2019) 39–53.
[13]
T. Li, Y. Wang, L. Liu, L. Chen, C.L. Philip Chen, Subspace-based minority oversampling for imbalance classification, Inf. Sci. 621 (2023) 371–388.
[14]
V. Lopez, et al., Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics, Expert Syst. Appl. 39 (7) (2012) 6585–6608.
[15]
Nitesh V. Chawla, et al, “SMOTEBoost: Improving prediction of the minority class in boosting,” Cavtat-Dubrovnik, Croatia, 2003.
[16]
U. Bhowan, et al., Evolving diverse ensembles using genetic programming for classification with unbalanced data, IEEE Trans. Evol. Comput. 17 (3) (2012) 368–386.
[17]
M. Galar, et al., A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, IEEE Trans. Syst. Man Cybernet., Part C (Appl. Rev.) 42 (4) (2011) 463–484.
[18]
V. López, A. Fernández, S. García, V. Palade, F. Herrera, An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Inf. Sci. 250 (2013) 113–141.
[19]
M. Denil, T. Trappenberg. 2010. Overlap versus imbalance. In Advances in Artificial Intelligence: 23rd Canadian Conference on Artificial Intelligence, Canadian AI 2010, Ottawa, Canada, May 31–June 2, 2010. Proceedings 23 (pp. 220-231). Springer Berlin Heidelberg.
[20]
Q. Dai, J.W. Liu, Y.H. Shi, Class-overlap undersampling based on Schur decomposition for Class-imbalance problems, Expert Syst. Appl. 221 (2023).
[21]
R.MO. Cruz, R. Sabourin, G.DC. Cavalcanti, Dynamic classifier selection: Recent advances and perspectives, Inf. Fusion 41 (2018) 195–216.
[22]
X. Zhu, J. Li, J. Ren, J. Wang, G. Wang, Dynamic ensemble learning for multi-label classification, Inf. Sci. 623 (2023) 94–111.
[23]
T. Woloszynski, M. Kurzynski, A probabilistic model of classifier competence for dynamic ensemble selection, Pattern Recogn. 44 (10–11) (2011) 2656–2668.
[24]
R. Lysiak, M. Kurzynski, T. Woloszynski, Optimal selection of ensemble classifiers using measures of competence and diversity of base classifiers, Neurocomputing 125 (1) (2014) 29–35.
[25]
R.MO. Cruz, R. Sabourin, G.DC. Cavalcanti, META-DES. Oracle: Meta-learning and feature selection for dynamic ensemble selection, Inf. Fusion 38 (1) (2017) 84–103.
[26]
J.L. Lobo, et al., CURIE: a cellular automaton for concept drift detection, Data Min. Knowl. Disc. 35 (6) (2021) 2655–2678.
[27]
F. Bayram, B.S. Ahmed, A. Kassler, From concept drift to model degradation: An overview on performance-aware drift detectors, Knowl.-Based Syst. 245 (2022).
[28]
J.N. Adams, et al., Explainable concept drift in process mining, Inf. Syst. (2023).
[29]
M.U. Togbe, Y. Chabchoub, A. Boly, M. Barry, R. Chiky, M. Bahri, Anomalies detection using isolation in concept-drifting data streams, Computers 10 (1) (2021) 13–23.
[30]
O.A. Mahdi, E. Pardede, N. Ali, J. Cao, Diversity measure as a new drift detection method in data streaming, Knowl.-Based Syst. 191 (2020).
[31]
J. Gama, R. Sebastiao, P.P. Rodrigues, “On evaluating stream learning algorithms, Mach. Learn. 90 (3) (2013) 317–346.
[32]
E.S. Page, Continuous inspection schemes, Biometrika 41 (½) (1954) 100–115.
[33]
B. Liu, K. Blekas, G. Tsoumakas, Multi-label sampling based on local label imbalance, Pattern Recogn. 122 (2022) 108.
[34]
J. Ren, Y. Wang, Y.-M. Cheung, X.-Z. Gao, X. Guo, Grouping-based oversampling in kernel space for imbalanced data classification, Pattern Recogn. 133 (2023).
[35]
N.V. Chawla, et al., SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357.
[36]
Hui Han, Wen-Yuan Wang, Bing-Huan Mao, “Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning,” in Advances in Intelligent Computing: International Conference on Intelligent Computing, Berlin Heidelberg, 2005.
[37]
F. Charte, et al., MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation, Knowl.-Based Syst. 89 (2015) 385–397.
[38]
D. Chicco, G. Jurman, The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation, BMC Genomics 21 (1) (2020) 1–13.
[39]
Géron, Aurélien. “Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow”. O'Reilly Media, Inc., 2022.
[40]
P. Ksieniewicz, P. Zyblewski, stream-learn—open-source python library for difficult data stream batch analysis, Neurocomputing 478 (2022) 11–21.
[41]
H. Citakoglu, Comparison of multiple learning artificial intelligence models for estimation of long-term monthly temperatures in Turkey, Arab. J. Geosci. Val. 14 (2021) 1–16.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 670, Issue C
Jun 2024
882 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 June 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media