[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Iterative minority oversampling and its ensemble for ordinal imbalanced datasets

Published: 01 February 2024 Publication History

Abstract

Ordinal classification of imbalanced datasets is a challenging problem that occurs in many real-world applications. The main challenge is to simultaneously consider the classes ordering and imbalanced distribution. Although the classic synthetic instances oversampling techniques can improve the identification of minority classes, they easily incur the damage of the classes ordering when the synthetic instances fall in non-adjacent classes regions. In this paper, we propose a powerful method for handling the imbalanced problem embedded in the ordinal classification, namely Iterative Minority oversampling technique for imbalanced Ordinal Classification (IMOC). Concretely, we first develop an iterative identification procedure to select the minority instance that is hardest to learn. Then, a weighted oversampling probability distribution that respects the ordinal nature is used to generate synthetic minority instances to balance the skewed distribution. Furthermore, two novel ensemble versions are developed to boost the capability of our proposed IMOC. In order to verify the effectiveness and robustness of our proposed methods, an extensive experimental study is carried out on a large number of datasets from real-world applications. The experimental results supported by proper statistical tests indicate that our proposed methods outperform state-of-the-art algorithms in terms of the most frequently used performance measures.

References

[1]
Abdi L., Hashemi S., To combat multi-class imbalanced problems by means of over-sampling techniques, IEEE Trans. Knowl. Data Eng. 28 (1) (2015) 238–251,.
[2]
Altuntas S., Dereli T., A novel approach based on DEMATEL method and patent citation analysis for prioritizing a portfolio of investment projects, Expert Syst. Appl. 42 (3) (2015) 1003–1012,.
[3]
Baccianella S., Esuli A., Sebastiani F., Evaluation measures for ordinal regression, in: 2009 Ninth International Conference on Intelligent Systems Design and Applications, IEEE, 2009, pp. 283–287,.
[4]
Barua S., Islam M.M., Yao X., Murase K., MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning, IEEE Trans. Knowl. Data Eng. 26 (2) (2012) 405–425,.
[5]
Batista G.E., Prati R.C., Monard M.C., A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explor. Newsl. 6 (1) (2004) 20–29,.
[6]
Breiman L., Bagging predictors, Mach. Learn. 24 (1996) 123–140,.
[7]
Breiman L., Random forests, Mach. Learn. 45 (2001) 5–32,.
[8]
Breiman L., Friedman J., Stone C.J., Olshen R.A., Classification and Regression Trees, CRC Press, 1984.
[9]
Bunkhumpornpat C., Sinapiromsaran K., Lursinsap C., DBSMOTE: Density-based synthetic minority over-sampling technique, Appl. Intell. 36 (3) (2012) 664–684,.
[10]
Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P., SMOTE: Synthetic minority over-sampling technique, J. Artificial Intelligence Res. 16 (2002) 321–357,.
[11]
Chawla N.V., Lazarevic A., Hall L.O., Bowyer K.W., SMOTEBoost: Improving prediction of the minority class in boosting, in: Knowledge Discovery in Databases: PKDD 2003: 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, September 22-26, 2003. Proceedings 7, Springer, 2003, pp. 107–119,.
[12]
Chu W., Keerthi S.S., Support vector ordinal regression, Neural Comput. 19 (3) (2007) 792–815,.
[13]
Cieslak D.A., Hoens T.R., Chawla N.V., Kegelmeyer W.P., Hellinger distance decision trees are robust and skew-insensitive, Data Min. Knowl. Discov. 24 (1) (2012) 136–158,.
[14]
Cruz-Ramírez M., Hervás-Martínez C., Sánchez-Monedero J., Gutiérrez P.A., Metrics to guide a multi-objective evolutionary algorithm for ordinal classification, Neurocomputing 135 (2014) 21–31,.
[15]
Datta S., Das S., Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs, Neural Netw. 70 (2015) 39–52,.
[16]
De La Calleja, J., Fuentes, O., 2007. A Distance-Based Over-Sampling Method for Learning from Imbalanced Data Sets. In: FLAIRS Conference. pp. 634–635.
[17]
Deng W.-Y., Zheng Q.-H., Lian S., Chen L., Wang X., Ordinal extreme learning machine, Neurocomputing 74 (1–3) (2010) 447–456,.
[18]
Fernández A., Garcia S., Herrera F., Chawla N.V., SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary, J. Artif. Intell. Res. 61 (2018) 863–905,.
[19]
Garcıa V., Sánchez J., Mollineda R., An empirical study of the behavior of classifiers on imbalanced and overlapped data sets, in: Progress in Pattern Recognition, Image Analysis and Applications, Springer Heidelberg, 2007, pp. 397–406,.
[20]
Ge J., Chen H., Zhang D., Hou X., Yuan L., Active learning for imbalanced ordinal regression, IEEE Access 8 (2020) 180608–180617,.
[21]
Han W., Jiang T., Li Y., Schuller B., Ruan H., Ordinal learning for emotion recognition in customer service calls, in: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2020, pp. 6494–6498,.
[22]
Han H., Wang W.-Y., Mao B.-H., Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, in: Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I 1, Springer, 2005, pp. 878–887,.
[23]
He H., Bai Y., Garcia E.A., Li S., ADASYN: Adaptive synthetic sampling approach for imbalanced learning, in: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), IEEE, 2008, pp. 1322–1328,.
[24]
Japkowicz N., Stephen S., The class imbalance problem: A systematic study, Intell. Data Anal. 6 (5) (2002) 429–449,.
[25]
Jo T., Japkowicz N., Class imbalances versus small disjuncts, ACM Sigkdd Explor. Newsl. 6 (1) (2004) 40–49,.
[26]
Kaur P., Gosain A., Robust hybrid data-level sampling approach to handle imbalanced data during classification, Soft Comput. 24 (20) (2020) 15715–15732,.
[27]
Kim K.-j., Ahn H., A corporate credit rating model using multi-class support vector machines with an ordinal pairwise partitioning approach, Comput. Oper. Res. 39 (8) (2012) 1800–1811,.
[28]
Lázaro M., Figueiras-Vidal A.R., Neural network for ordinal classification of imbalanced data by minimizing a Bayesian cost, Pattern Recognit. 137 (2023),.
[29]
Li D.-C., Wang S.-Y., Huang K.-C., Tsai T.-I., Learning class-imbalanced data with region-impurity synthetic minority oversampling technique, Inform. Sci. 607 (2022) 1391–1407,.
[30]
Lim P., Goh C.K., Tan K.C., Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning, IEEE Trans. Cybern. 47 (9) (2016) 2850–2861,.
[31]
Lin Z., Gao Z., Ji H., Zhai R., Shen X., Mei T., Classification of cervical cells leveraging simultaneous super-resolution and ordinal regression, Appl. Soft Comput. 115 (2022),.
[32]
Lin H.-T., Li L., Reduction from cost-sensitive ordinal ranking to weighted binary classification, Neural Comput. 24 (5) (2012) 1329–1367,.
[33]
Ling C.X., Sheng V.S., Yang Q., Test strategies for cost-sensitive decision trees, IEEE Trans. Knowl. Data Eng. 18 (8) (2006) 1055–1067,.
[34]
Liu X.-Y., Wu J., Zhou Z.-H., Exploratory undersampling for class-imbalance learning, IEEE Trans. Syst. Man Cybern. B 39 (2) (2008) 539–550,.
[35]
Mayabadi S., Saadatfar H., Two density-based sampling approaches for imbalanced and overlapping data, Knowl.-Based Syst. 241 (2022),.
[36]
Mullick S.S., Datta S., Das S., Adaptive learning-based k-nearest neighbor classifiers with resilience to class imbalance, IEEE Trans. Neural Netw. Learn. Syst. 29 (11) (2018) 5713–5725,.
[37]
Nekooeimehr I., Lai-Yuen S.K., Cluster-based weighted oversampling for ordinal regression (CWOS-ord), Neurocomputing 218 (2016) 51–60,.
[38]
Pérez-Ortiz M., Gutiérrez P.A., Hervás-Martínez C., Yao X., Graph-based approaches for over-sampling in the context of ordinal regression, IEEE Trans. Knowl. Data Eng. 27 (5) (2014) 1233–1245,.
[39]
Prati R.C., Batista G.E., Monard M.C., Class imbalances versus class overlapping: an analysis of a learning system behavior, in: MICAI 2004: Advances in Artificial Intelligence: Third Mexican International Conference on Artificial Intelligence, Mexico City, Mexico, April 26-30, 2004. Proceedings 3, Springer, 2004, pp. 312–321,.
[40]
Qin Z., Zhang P., Li X., Ultra fast deep lane detection with hybrid anchor driven ordinal classification, IEEE Trans. Pattern Anal. Mach. Intell. (2022) 1–14,.
[41]
Singer G., Anuar R., Ben-Gal I., A weighted information-gain measure for ordinal classification trees, Expert Syst. Appl. 152 (2020),.
[42]
Sun Y., Kamel M.S., Wong A.K., Wang Y., Cost-sensitive boosting for classification of imbalanced data, Pattern Recognit. 40 (12) (2007) 3358–3378,.
[43]
Sun L., Wang T., Ding W., Xu J., Tan A., Two-stage-neighborhood-based multilabel classification for incomplete data with missing labels, Int. J. Intell. Syst. 37 (2022) 6773–6810,.
[44]
Sun L., Zhang J., Ding W., Xu J., Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted K-nearest neighbors, Inform. Sci. 593 (2022) 591–613,.
[45]
Tang Y., Gao J., Improved classification for problem involving overlapping patterns, IEICE Trans. Inf. Syst. 90 (11) (2007) 1787–1795,.
[46]
Tu J., Liu H., Li C., Ordinal regression for direction-related anomaly detection, IEEE Trans. Neural Netw. Learn. Syst. (2022) 1–14,.
[47]
Vorraboot P., Rasmequan S., Chinnasarn K., Lursinsap C., Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms, Neurocomputing 152 (2015) 429–443,.
[48]
Vuttipittayamongkol P., Elyan E., Neighbourhood-based undersampling approach for handling imbalanced and overlapped data, Inform. Sci. 509 (2020) 47–70,.
[49]
Wang S., Yao X., Diversity analysis on imbalanced data sets by using ensemble models, in: 2009 IEEE Symposium on Computational Intelligence and Data Mining, IEEE, 2009, pp. 324–331,.
[50]
Wang S., Yao X., Multiclass imbalance problems: Analysis and potential solutions, IEEE Trans. Syst. Man Cybern. B 42 (4) (2012) 1119–1130,.
[51]
Wu J., Dang T., Sethu V., Ambikairajah E., A novel markovian framework for integrating absolute and relative ordinal emotion information, IEEE Trans. Affect. Comput. (2022) 1,.
[52]
Young W.A., Nykl S.L., Weckman G.R., Chelberg D.M., Using voronoi diagrams to improve classification performances when modeling imbalanced datasets, Neural Comput. Appl. 26 (5) (2015) 1041–1054,.
[53]
Zhou Z., Huang B., Zhang R., Yin M., Liu C., Liu Y., Yi Z., Wu X., Methods to recognize depth of hard inclusions in soft tissue using ordinal classification for robotic palpation, IEEE Trans. Instrum. Meas. 71 (2022) 1–12,.
[54]
Zhu T., Lin Y., Liu Y., Synthetic minority oversampling technique for multiclass imbalance problems, Pattern Recognit. 72 (2017) 327–340,.
[55]
Zhu T., Lin Y., Liu Y., Zhang W., Zhang J., Minority oversampling for imbalanced ordinal regression, Knowl.-Based Syst. 166 (2019) 140–155,.
[56]
Zhu Q., Zhu T., Zhang R., Ye H., Sun K., Xu Y., Zhang D., A cognitive driven ordinal preservation for multi-modal imbalanced brain disease diagnosis, IEEE Trans. Cogn. Dev. Syst. 15 (2) (2023) 675–689,.

Index Terms

  1. Iterative minority oversampling and its ensemble for ordinal imbalanced datasets
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Engineering Applications of Artificial Intelligence
        Engineering Applications of Artificial Intelligence  Volume 127, Issue PA
        Jan 2024
        1599 pages

        Publisher

        Pergamon Press, Inc.

        United States

        Publication History

        Published: 01 February 2024

        Author Tags

        1. Ordinal classification
        2. Imbalanced datasets
        3. Oversampling
        4. Ensemble learning
        5. Synthetic instances generation

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 22 Dec 2024

        Other Metrics

        Citations

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media