[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

A resampling ensemble algorithm for classification of imbalance problems

Published: 01 November 2014 Publication History

Abstract

In this paper, a resampling ensemble algorithm is developed focused on the classification problems for imbalanced datasets. In the method, the small classes are oversampled and large classes are undersampled. The resampling scale is determined by the ratio of the min class number and max class number. And multiple machine learning methods are selected to construct the ensemble. Numerical results show that the algorithm performance is highly related to the ratio of minority class number and attribute number. When the ratio is less than 3, the performance will be greatly hindered. Experimental results also show that the ensemble of different types of methods could improve the algorithm performance efficiently.

References

[1]
Liu, D.Y., Feature selection based on mutual information for gear faulty diagnosis on imbalanced dataset. J. Comput. Inf. Syst. v8 i18. 7831-7838.
[2]
L. Mena, J.A. Gonzalez. Machine learning for imbalanced datasets: application in medical diagnostic, in: Proceedings of the 19th International FLAIRS Conference (FLAIRS-2006), Melbourne Beach, Florida, May 11-13, 2006.
[3]
D.A. Cieslak, N.V. ChawlaA. Striegel, Combating imbalance in network intrusion datasets, in: Proceedings of 2006 IEEE International Conference on Granular Computing, 2006, pp. 732-737.
[4]
Thomas, C., Improving intrusion detection for imbalanced network traffic. Secur. Commun. Netw. 1-17.
[5]
Zheng, Z.H., Wu, X.Y. and Srihari, R., Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newslett. v6 i1. 80-89.
[6]
Y.L. Li, G.S. Sun, Yehang Zhu. Data imbalance problem in text classification. in: Proceedings of the Third International Symposium on Information Processing, 2010, pp. 301-305.
[7]
Perols, J., Financial statement fraud detection: an analysis of statistical and machine learning algorithms. AUDITING: J. Pract. Theory. v30 i2. 19-50.
[8]
Ghazikhani, A., Monsefi, R. and Yazdi, H.S., Ensemble of online neural networks for non-stationary and imbalanced data streams. Neurocomputing. v122. 535-544.
[9]
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H. and Herrera, F., A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst., Man Cybern.-Part C: Appl. Rev. v42 i4. 463-484.
[10]
Batista, G.E.A.P.A., Prati, R.C. and Monard, M.C., A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newslett. v6 i1. 20-29.
[11]
Wu, G. and Chang, E., KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. v17 i6. 786-795.
[12]
Chawla, N., Cieslak, D., Hall, L. and Joshi, A., Automatically countering imbalance and its empirical relationship to cost. Data Min. Knowl. Discov. v17. 225-252.
[13]
Zhou, Z.H., Ensemble learning. In: Li, S.Z. (Ed.), Encyclopedia of Biometrics, Springer, Berlin. pp. 270-273.
[14]
Chawla, N.V., Lazarevic, A., Hall, L.O. and Bowyer., K.W., SMOTEBoost: Improving prediction of the minority class in boosting knowledge discovery in databases PKDD. Lect. Notes Comput. Sci. v2838. 107-119.
[15]
C. Seiffert, T.M. Khoshgoftaar, J.V. Hulse, A. Napolitano. RUSBoost: improving classification performance when training data is skewed, in: Proceedings of the 19th International Conference on Pattern Recognition ICPR 2008, pp. 1-4.
[16]
Błaszczyński, J., Deckert, M., Stefanowski, J. and Wilk, S., IIvotes ensemble for imbalanced data. Intell. Data Anal. v16 i5. 777-801.
[17]
Liu, X.Y., Wu, J.X. and Zhou, Z.H., Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst., Man Cybern.-Part B: Cybern. v39 i2. 539-550.
[18]
S. Wang, X. Yao. Diversity analysis on imbalanced data sets by using ensemble models, in: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, 2009, pp. 324-331.
[19]
Bache, K. and Lichman., M., UCI Machine Learning Repository. 2013. University of California, School of Information and Computer Science, Irvine, CA.
[20]
Kubat, M., Holte, R. and Matwin, S., Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. v30. 195-215.
[21]
Sun, Y.M., Cost-Sensitive Boosting for Classification of Imbalanced Data. 2007. University of Waterloo, Waterloo, Ontario, Canadã.

Cited By

View all
  • (2024)Exploratory Analysis of Methods, Techniques, and Metrics to Handle Class Imbalance ProblemProcedia Computer Science10.1016/j.procs.2024.04.082235:C(863-877)Online publication date: 24-Jul-2024
  • (2024)An extended belief rule-based system with hybrid sampling strategy for imbalanced rule baseInformation Sciences: an International Journal10.1016/j.ins.2024.121288684:COnline publication date: 1-Dec-2024
  • (2023)Hyperparameter optimized classification pipeline for handling unbalanced urban and rural energy consumption patternsExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119127214:COnline publication date: 15-Mar-2023
  • Show More Cited By
  1. A resampling ensemble algorithm for classification of imbalance problems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Neurocomputing
    Neurocomputing  Volume 143, Issue
    November, 2014
    362 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 01 November 2014

    Author Tags

    1. Ensemble Learning
    2. Imbalanced classification
    3. Resampling scale

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploratory Analysis of Methods, Techniques, and Metrics to Handle Class Imbalance ProblemProcedia Computer Science10.1016/j.procs.2024.04.082235:C(863-877)Online publication date: 24-Jul-2024
    • (2024)An extended belief rule-based system with hybrid sampling strategy for imbalanced rule baseInformation Sciences: an International Journal10.1016/j.ins.2024.121288684:COnline publication date: 1-Dec-2024
    • (2023)Hyperparameter optimized classification pipeline for handling unbalanced urban and rural energy consumption patternsExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119127214:COnline publication date: 15-Mar-2023
    • (2022)A resistance outlier sampling algorithm for imbalanced data predictionIntelligent Data Analysis10.3233/IDA-21151926:3(583-598)Online publication date: 1-Jan-2022
    • (2022)Balanced and Accurate Pseudo-Labels for Semi-Supervised Image ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/350671118:3s(1-18)Online publication date: 31-Oct-2022
    • (2022)Gaussian Distribution Based Oversampling for Imbalanced Data ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.298596534:2(667-679)Online publication date: 1-Feb-2022
    • (2022)Assembly quality evaluation for linear axis of machine tool using data-driven modeling approachJournal of Intelligent Manufacturing10.1007/s10845-020-01666-y33:3(753-769)Online publication date: 1-Mar-2022
    • (2021)Bagging k-dependence Bayesian network classifiersIntelligent Data Analysis10.3233/IDA-20512525:3(641-667)Online publication date: 1-Jan-2021
    • (2021)Sample and feature selecting based ensemble learning for imbalanced problemsApplied Soft Computing10.1016/j.asoc.2021.107884113:PAOnline publication date: 1-Dec-2021
    • (2021)Entropy‐based hybrid sampling ensemble learning for imbalanced dataInternational Journal of Intelligent Systems10.1002/int.2238836:7(3039-3067)Online publication date: 28-May-2021
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media