More Web Proxy on the site http://driver.im/

research-article

Handling data irregularities in classification: : Foundations, trends, and future challenges

Authors:

Bidyut B. ChaudhuriAuthors Info & Claims

Volume 81, Issue C

Pages 674 - 693

https://doi.org/10.1016/j.patcog.2018.03.008

Published: 01 September 2018 Publication History

Highlights

•

Data irregularities can significantly degrade the performance of classifiers.

•

We present a comprehensive taxonomy and survey of various data irregularities.

•

We discuss prominent methods to handle distribution and feature-based irregularities.

•

We highlight the co-occurrences and interrelations among different irregularities.

•

We unearth a number of promising future research avenues.

Abstract

Most of the traditional pattern classifiers assume their input data to be well-behaved in terms of similar underlying class distributions, balanced size of classes, the presence of a full set of observed features in all data instances, etc. Practical datasets, however, show up with various forms of irregularities that are, very often, sufficient to confuse a classifier, thus degrading its ability to learn from the data. In this article, we provide a bird’s eye view of such data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities. Subsequently, we discuss the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities. We also discuss the interrelation and co-occurrences of the data irregularities including class imbalance, small disjuncts, class skew, missing features, and absent (non-existing or undefined) features. Finally, we uncover a number of interesting future research avenues that are equally contextual with respect to the regular as well as deep machine learning paradigms.

References

[1]

D.H. Wolpert, The lack of a priori distinctions between learning algorithms, Neural Comput. 8 (1996) 1341–1390.

Digital Library

[2]

R.A. Fisher, The use of multiple measurements in taxonomic problems, Ann. Eugen. 7 (1936) 179–188.

[3]

Deng L., Yu D., Deep learning: methods and applications, Found. Trends Signal Process. 7 (2014) 197–387.

Digital Library

[4]

S.B. Kotsiantis, Supervised machine learning: a review of classification techniques., Informatica (Slovenia) 31 (2007) 249–268.

[5]

B. Frenay, M. Verleysen, Classification in the presence of label noise: a survey, IEEE Trans. Neural Netw. Learn. Syst. 25 (2014) 845–869.

[6]

He H., E.A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng. 21 (2009) 1263–1284.

Digital Library

[7]

He H., Ma Y., Imbalanced Learning: Foundations, Algorithms, and Applications, first edn., Wiley-IEEE Press, 2013.

[8]

S. García, J. Luengo, F. Herrera, Data Preprocessing in Data Mining, Springer Publishing Company, Incorporated, 2014.

Digital Library

[9]

P. Branco, L. Torgo, R.P. Ribeiro, A survey of predictive modeling on imbalanced domains, ACM Comput. Surv. 49 (2016) 31:1–31:50.

[10]

M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42 (2012) 463–484.

Digital Library

[11]

B. Krawczyk, Learning from imbalanced data: open challenges and future directions, Progr. Artif. Intell. 5 (2016) 221–232.

[12]

R.C. Holte, L.E. Acker, B.W. Porter, Concept learning and the problem of small disjuncts, Proceedings of the Eleventh International Joint Conference on Artificial Intelligence – Volume 1, IJCAI’89, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1989, pp. 813–818.

[13]

M.C. Monard, G.E. Batista, Learning with skewed class distributions, Proceedings of the Advances in Logic, Artificial Intelligence, and Robotics: LAPTEC 2002, 85, 2002, p. 173.

[14]

M. Saar-Tsechansky, F. Provost, Handling missing values when applying classification models, J. Mach. Learn. Res. 8 (2007) 1623–1657.

[15]

P.J. García-Laencina, J.L. Sancho-Gómez, A.R. Figueiras-Vidal, Pattern classification with missing data: a review, Neural Comput. Appl. 19 (2010) 263–282.

[16]

S. Datta, D. Misra, S. Das, A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features, Pattern Recognit. Lett. 80 (2016) 231–237.

[17]

G. Chechik, G. Heitz, G. Elidan, P. Abbeel, D. Koller, Max-margin classification of data with absent features, J. Mach. Learn. Res. 9 (2008) 1–21.

[18]

A.D. Pozzolo, O. Caelen, Y.A.L. Borgne, S. Waterschoot, G. Bontempi, Learned lessons in credit card fraud detection from a practitioner perspective, Expert Syst. Appl. 41 (2014) 4915–4928.

[19]

N. Wahab, A. Khan, Y.S. Lee, Two-phase deep convolutional neural network for reducing class skewness in histopathological images based breast cancer detection, Comput. Biol. Med. 85 (2017) 86–97.

[20]

G.M. Weiss, H. Hirsh, The problem with noise and small disjuncts, Proceedings of the International Conference on Machine Learning ICML 1998, 1998, p. 574.

[21]

V. Nikulin, G.J. McLachlan, Classification of imbalanced marketing data with balanced random sets, Proceedings of the 2009 International Conference on KDD-Cup 2009 – Volume 7, KDD-CUP’09, JMLR.org, 2009, pp. 89–100.

[22]

Liu Z., Wang H., Yan Y., Guo G., Effective facial expression recognition via the boosted convolutional neural network, in: H. Zha, X. Chen, L. Wang, Q. Miao (Eds.), Proceedings of the Computer Vision: CCF Chinese Conference, CCCV 2015, Xi’an, China, Part I September 18–20, Springer Berlin Heidelberg, Berlin, Heidelberg, 2015, pp. 179–188.

[23]

Young R., D.R. Johnson, Handling missing values in longitudinal panel data with multiple imputation, J. Marriage Family 77 (2015) 277–294.

[24]

B. Kirkpatrick, K. Stevens, Perfect phylogeny problems with missing values, IEEE/ACM Trans. Comput. Biol. Bioinf. 11 (2014) 928–941.

[25]

Q. Xiang, X. Dai, Y. Deng, C. He, J. Wang, J. Feng, Z. Dai, Missing value imputation for microarray gene expression data using histone acetylation information, BMC Bioinf. 9 (2008) 252.

[26]

Zhang W., Yang Y., Wang Q., A comparative study of absent features and unobserved values in software effort data, Int. J. Software Eng. Knowl. Eng. 22 (2012) 185–202.

[27]

V. López, A. Fernández, S. García, V. Palade, F. Herrera, An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics, Inf. Sci. (NY) 250 (2013) 113–141.

[28]

R.C. Prati, G.E.A.P.A. Batista, M.C. Monard, Class imbalances versus class overlapping: an analysis of a learning system behavior, in: R. Monroy, G. Arroyo-Figueroa, L.E. Sucar, H. Sossa (Eds.), Proceedings of the MICAI 2004: Advances in Artificial Intelligence: Third Mexican International Conference on Artificial Intelligence, Mexico City, Mexico April 26–30, Springer Berlin Heidelberg, Berlin, Heidelberg, 2004, pp. 312–321.

[29]

A. Fernández, S. del Río, N.V. Chawla, F. Herrera, An insight into imbalanced big data classification: outcomes and challenges, Complex Intell. Syst. 3 (2017) 105–120.

[30]

N. Japkowicz, The class imbalance problem: significance and strategies, In Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI, 2000, pp. 111–117.

[31]

P. Hart, The condensed nearest neighbor rule (corresp.), IEEE Trans. Inf. Theory 14 (1968) 515–516.

Digital Library

[32]

I. Tomek, Two modifications of CNN, IEEE Trans. Syst. Man Cybern. 7(2) (1976) 679–772.

[33]

Yen S.J., Lee Y.S., Cluster-based under-sampling approaches for imbalanced data distributions, Expert Syst. Appl. 36 (2009) 5718–5727.

Digital Library

[34]

S. García, F. Herrera, Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy, Evol. Comput. 17 (2009) 275–306.

Digital Library

[35]

Wong G.Y., Leung F.H.F., Ling S.H., An under-sampling method based on fuzzy logic for large imbalanced dataset, Proceedings of the 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2014, pp. 1248–1252.

[36]

Ng W.W.Y., Hu J., Yeung D.S., Yin S., F. Roli, Diversified sensitivity-based undersampling for imbalance classification problems, IEEE Trans. Cybern. 45 (2015) 2402–2412.

[37]

Fu Y., Zhang H., Bai Y., Sun W., An under-sampling method: based on principal component analysis and comprehensive evaluation model, Proceedings of the 2016 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), 2016, pp. 414–415.

[38]

Ha J., Lee J.S., A new under-sampling method using genetic algorithm for imbalanced data classification, Proceedings of the Tenth International Conference on Ubiquitous Information Management and Communication, IMCOM ’16, ACM, New York, NY, USA, 2016, pp. 95:1–95:6.

[39]

D. Devi, S.K. Biswas, B. Purkayastha, Redundancy-driven modified tomek-link based undersampling: a solution to class imbalance, Pattern Recognit. Lett. 93 (2017) 3–12.

[40]

C. Bunkhumpornpat, K. Sinapiromsaran, Dbmute: density-based majority under-sampling technique, Knowl. Inf. Syst. 50 (2017) 827–850.

Digital Library

[41]

N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, Smote: synthetic minority over-sampling technique, J. Artif. Int. Res. 16 (2002) 321–357.

[42]

Han H., Wang W.Y., Mao B.H., Borderline-smote: a new over-sampling method in imbalanced data sets learning, in: D.S. Huang, X.P. Zhang, G.B. Huang (Eds.), Proceedings of the Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, Part I August 23–26, Springer Berlin Heidelberg, Berlin, Heidelberg, 2005, pp. 878–887.

[43]

He H., Bai Y., E.A. Garcia, Li S., Adasyn: adaptive synthetic sampling approach for imbalanced learning, Proceedings of the IN: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), IJCNN 2008, 2008, pp. 1322–1328.

[44]

T. Maciejewski, J. Stefanowski, Local Neighbourhood Extension of Smote for Mining Imbalanced Data, Proceedings of the 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 2011, pp. 104–111.

[45]

C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem, in: T. Theeramunkong, B. Kijsirikul, N. Cercone, T.B. Ho (Eds.), Proceedings of the Advances in Knowledge Discovery and Data Mining: Thirteenth Pacific-Asia Conference, PAKDD 2009, Bangkok, Thailand April 27–30, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009, pp. 475–482.

[46]

J.A. Sáez, B. Krawczyk, M. Woźniak, Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets, Pattern Recognit. 57 (2016) 164–178.

Digital Library

[47]

L. Abdi, S. Hashemi, To combat multi-class imbalanced problems by means of over-sampling techniques, IEEE Trans. Knowl. Data Eng. 28 (2016) 238–251.

[48]

C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Dbsmote: density-based synthetic minority over-sampling technique, Appl. Intell. 36 (2012) 664–684.

[49]

E. Ramentol, Y. Caballero, R. Bello, F. Herrera, Smote-rsb *: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory., Knowl. Inf. Syst. 33 (2012) 245–265.

[50]

Wang Q., A hybrid sampling svm approach to imbalanced data classification, Abstract Appl. Anal. 2014 (2014) 33–44.

[51]

W. Prachuabsupakij, CLUS: a new hybrid sampling classification for imbalanced data, Proceedings of the 2015 Twelfth International Joint Conference on Computer Science and Software Engineering (JCSSE), 2015, pp. 281–286.

[52]

Jian C., Gao J., Ao Y., A new sampling method for classifying imbalanced data based on support vector machine ensemble, Neurocomputing 193 (2016) 115–122.

Digital Library

[53]

J.A. Sáez, J. Luengo, J. Stefanowski, F. Herrera, Smote-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, Inf. Sci. (NY) 291 (2015) 184–203.

Digital Library

[54]

Kang Q., Chen X., Li S., Zhou M., A noise-filtered under-sampling scheme for imbalanced classification, IEEE Trans. Cybern. PP (2017) 1–12.

[55]

N. Junsomboon, T. Phienthrakul, Combining over-sampling and under-sampling techniques for imbalance dataset, Proceedings of the Ninth International Conference on Machine Learning and Computing, ICMLC 2017, ACM, New York, NY, USA, 2017, pp. 243–247.

[56]

S. Gónzalez, S. García, M. Lázaro, A.R. Figueiras-Vidal, F. Herrera, Class switching according to nearest enemy distance for learning from highly imbalanced data-sets, Pattern Recognit. 70 (2017) 12–24.

[57]

K. Veropoulos, C. Campbell, N. Cristianini, et al., Controlling the sensitivity of support vector machines, Proceedings of the International Joint Conference on AI, IJCAI, 1999, pp. 55–60.

[58]

Ling C.X., Sheng V.S., Cost-sensitive learning and the class imbalanced problem, in: C. Sammut (Ed.), Encyclopedia of Machine Learning, Springer, 2007.

[59]

Zhou Z.H., Liu X.Y., Training cost-sensitive neural networks with methods addressing the class imbalance problem, IEEE Trans. Knowl. Data Eng. 18 (2006) 63–77.

[60]

V. López, A. Fernández, J.G. Moreno-Torres, F. Herrera, Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. open problems on intrinsic data characteristics, Expert Syst. Appl. 39 (2012) 6585–6608.

[61]

Sun Y., Kamel M.S., Wong A.K., Wang Y., Cost-sensitive boosting for classification of imbalanced data, Pattern Recognit. 40 (2007) 3358–3378.

Digital Library

[62]

Haixiang G., Yijing L., Shang J., Mingyun G., Yuanyue H., Bing G., Learning from class-imbalanced data: review of methods and applications, Expert Syst. Appl. 73 (2017) 220–239.

Digital Library

[63]

Cheng F., Zhang J., Wen C., Liu Z., Li Z., Large cost-sensitive margin distribution machine for imbalanced data classification, Neurocomputing 224 (2017) 45–57.

[64]

Xiao W., Zhang J., Li Y., Zhang S., Yang W., Class-specific cost regulation extreme learning machine for imbalanced classification, Neurocomputing 261 (2017) 70–82.

[65]

N. Nikolaou, N. Edakunni, M. Kull, P. Flach, G. Brown, Cost-sensitive boosting algorithms: do we really need them?, Mach. Learn. 104 (2016) 359–384.

[66]

M. Ohsaki, P. Wang, K. Matsuda, S. Katagiri, H. Watanabe, A. Ralescu, Confusion-matrix-based kernel logistic regression for imbalanced data classification, IEEE Trans. Knowl. Data Eng. 29 (2017) 1806–1819.

[67]

T. Imam, K.M. Ting, J. Kamruzzaman, Z-SVM: an svm for improved classification of imbalanced data, Proceedings of the Ninteenth Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence, Springer-Verlag, Berlin, Heidelberg, 2006, pp. 264–273.

[68]

S. Datta, S. Das, Near-bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs, Neural Netw. 70 (2015) 39–52.

[69]

B. Krawczyk, M. Woźniak, Diversity measures for one-class classifier ensembles, Neurocomputing 126 (2014) 36–44.

[70]

B. Krawczyk, M. Woźniak, F. Herrera, Weighted one-class classification for different types of minority class examples in imbalanced data, Proceedings of the 2014Ȃfieee symposium on computational intelligence and data mining (CIDM), 2014, pp. 337–344.

[71]

B. Krawczyk, L. Jeleń, A. Krzyżak, T. Fevens, One-class classification decomposition for imbalanced classification of breast cancer malignancy data, in: L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L.A. Zadeh, J.M. Zurada (Eds.), Proceedings of the Artificial Intelligence and Soft Computing: Thirteenth International Conference, ICAISC 2014, Zakopane, Poland, Part I, Springer International Publishing, Cham, 2014, pp. 539–550.

[72]

S. Ertekin, J. Huang, L. Bottou, L. Giles, Learning on the border: active learning in imbalanced data classification, Proceedings of the sixteenth ACM conference on Conference on Information and Knowledge Management, CIKM’07, ACM, New York, NY, USA, 2007, pp. 127–136.

[73]

S. Doyle, J. Monaco, M. Feldman, J. Tomaszewski, A. Madabhushi, An active learning based classification strategy for the minority class problem: application to histopathology annotation, BMC Bioinf. 12 (2011).

[74]

Chen Y., Mani S., Active learning for unbalanced data in the challenge with multiple models and biasing, in: I. Guyon, G. Cawley, G. Dror, V. Lemaire, A. Statnikov (Eds.), Proceedings of the Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, in: Proceedings of Machine Learning Research, 16, PMLR, Sardinia, Italy, 2011, pp. 113–126.

[75]

J. Attenberg, Ş. Ertekin, Class Imbalance and Active Learning, John Wiley & Sons, Inc., 2013, pp. 101–149.

[76]

Z. Ferdowsi, R. Ghani, R. Settimi, Online active learning with imbalanced classes, Proceedings of the 2013 IEEE Thirteenth International Conference on Data Mining, 2013, pp. 1043–1048.

[77]

You X., Wang R., Tao D., Diverse expected gradient active learning for relative attributes, IEEE Trans. Image Process. 23 (2014) 3203–3217.

[78]

H. Guo, W. Wang, An active learning-based SVM multi-class classification model, Pattern Recognit. 48 (2015) 1577–1597.

[79]

Zhang J., Wu X., Shengs V.S., Active learning with imbalanced multiple noisy labeling, IEEE Trans. Cybern. 45 (2015) 1095–1107.

[80]

Zhang X., Yang T., P. Srinivasan, Online asymmetric active learning with imbalanced data, Proceedings of the Twenty-secod ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, ACM, New York, NY, USA, 2016, pp. 2055–2064.

[81]

S. Khanchi, M.I. Heywood, A.N. Zincir-Heywood, Properties of a GP active learning framework for streaming data with class imbalance, Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’17, ACM, New York, NY, USA, 2017, pp. 945–952.

[82]

X. Li, L. Wang, E. Sung, Adaboost with SVM-based component classifiers, Eng. Appl. Artif. Intell. 21 (2008) 785–795.

[83]

S. Wu, S.I. Amari, Conformal transformation of kernel functions: a data-dependent way to improve support vector machine classifiers, Neural Process. Lett. 15 (2002) 59–67.

[84]

Wu G., Chang E.Y., Adaptive feature-space conformal transformation for imbalanced-data learning, Proceedings of the International Machine Learning Society, ICML, 2003, pp. 816–823.

[85]

Wu G., Chang E.Y., Aligning boundary in kernel space for learning imbalanced dataset, Proceedings of the Fourth IEEE International Conference on Data Mining, 2004. ICDM’04, 2004, pp. 265–272.

[86]

G. Wu, E.Y. Chang, Kba: kernel boundary alignment considering imbalanced data distribution, IEEE Trans. Knowl. Data Eng. 17 (2005) 786–795.

Digital Library

[87]

P. Williams, Li S., Feng J., Wu S., Scaling the kernel function to improve performance of the support vector machine, Proceedings of the International Symposium on Neural Networks, 2005, pp. 831–836.

[88]

A. Maratea, A. Petrosino, Asymmetric kernel scaling for imbalanced data classification, Proceedings of the International Workshop on Fuzzy Logic and Applications, 2011, pp. 196–203.

[89]

A. Maratea, A. Petrosino, M. Manzo, Adjusted f-measure and kernel scaling for imbalanced data learning, Inf. Sci. (NY) 257 (2014) 331–341.

[90]

Zhang Y., Fu P., Liu W., Chen G., Imbalanced data classification based on scaling kernel-based support vector machine, Neural Comput. Appl. 25 (2014) 927–935.

[91]

Cheng Q., Zhou H., Cheng J., Li H., A minimax framework for classification with applications to images and high dimensional data, IEEE Trans. Pattern Anal. Mach. Intell. 36 (2014) 2117–2130.

[92]

Peng C., Cheng J., Cheng Q., A supervised learning model for high-dimensional and large-scale data, ACM Trans. Intell. Syst. Technol. 8 (2016) 30:1–30:23.

[93]

N.V. Chawla, A. Lazarevic, L.O. Hall, K.W. Bowyer, Smoteboost: improving prediction of the minority class in boosting, in: N. Lavrač, D. Gamberger, L. Todorovski, H. Blockeel (Eds.), Proceedings of the Knowledge Discovery in Databases: PKDD 2003: Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia September 22–26, Springer Berlin Heidelberg, Berlin, Heidelberg, 2003, pp. 107–119.

[94]

D. Mease, A. Wyner, A. Buja, Cost-weighted boosting with jittering and over/under-sampling: JOUS-boost, J. Mach. Learn. Res. 8 (2007) 409–439.

[95]

C. Seiffert, T.M. Khoshgoftaar, J.V. Hulse, A. Napolitano, Rusboost: a hybrid approach to alleviating class imbalance, IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40 (2010) 185–197.

Digital Library

[96]

Chen S., He H., E.A. Garcia, Ramoboost: ranked minority oversampling in boosting, IEEE Trans. Neural Netw. 21 (2010) 1624–1642.

[97]

R. Barandela, R. Valdovinos, J. Sánchez, New applications of ensembles of classifiers, Pattern Anal. Appl. 6 (2003) 245–256.

[98]

Wang S., Yao X., Diversity analysis on imbalanced data sets by using ensemble models, Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining, 2009, pp. 324–331.

[99]

M. Galar, A. Fernández, E. Barrenechea, F. Herrera, Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling, Pattern Recognit. 46 (2013) 3460–3471.

[100]

Tang B., He H., Gir-based ensemble sampling approaches for imbalanced learning, Pattern Recognit. 71 (2017) 306–319.

[101]

A. DÁddabbo, R. Maglietta, Parallel selective sampling method for imbalanced and large data classification, Pattern Recognit. Lett. 62 (2015) 61–67.

[102]

P. Thanathamathee, C. Lursinsap, Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and adaboost techniques, Pattern Recognit. Lett. 34 (2013) 1339–1347.

[103]

S. del Río, V. López, J.M. Benítez, F. Herrera, On the use of mapreduce for imbalanced big data using random forest, Inf. Sci. (NY) 285 (2014) 112–137.

[104]

Sun Z., Song Q., Zhu X., Sun H., Xu B., Zhou Y., A novel ensemble method for classifying imbalanced data, Pattern Recognit. 48 (2015) 1623–1637.

Digital Library

[105]

R. Akbani, S. Kwek, N. Japkowicz, Applying support vector machines to imbalanced datasets, in: J.F. Boulicaut, F. Esposito, F. Giannotti, D. Pedreschi (Eds.), Proceedings of the Machine Learning: ECML 2004: Fifteenth European Conference on Machine Learning, Pisa, Italy September 20–24, Springer Berlin Heidelberg, Berlin, Heidelberg, 2004, pp. 39–50.

[106]

Hsu J.L., Hung P.C., Lin H.Y., Hsieh C.H., Applying under-sampling techniques and cost-sensitive learning methods on risk assessment of breast cancer, J. Med. Syst. 39 (2015) 40.

[107]

Zhou A., Qu B.Y., Li H., Zhao S.Z., Suganthan P.N., Zhang Q., Multiobjective evolutionary algorithms: a survey of the state of the art, Swarm. Evol. Comput. 1 (2011) 32–49.

[108]

P. Soda, A multi-objective optimisation approach for class imbalance learning, Pattern Recognit. 44 (2011) 1801–1810.

[109]

C. Chira, C. Lemnaru, A Multi-objective Evolutionary Approach to Imbalanced Classification Problems, Proceedings of the 2015 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), 2015, pp. 149–154.

[110]

S. García, R. Aler, I.M. Galván, Using evolutionary multiobjective techniques for imbalanced classification data, Proceedings of the Twentieth International Conference on Artificial Neural Networks: Part I, ICANN’10, Springer-Verlag, Berlin, Heidelberg, 2010, pp. 422–427.

[111]

A. Aşkan, S. Sayın, Svm classification for imbalanced data sets using a multiobjective optimization framework, Ann. Oper. Res. 216 (2014) 191–203.

[112]

H.H. Maheta, V.K. Dabhi, Classification of imbalanced data sets using multi objective genetic programming, Proceedings of the 2015 International Conference on Computer Communication and Informatics (ICCCI), 2015, pp. 1–6.

[113]

U. Bhowan, M. Johnston, Zhang M., Yao X., Evolving diverse ensembles using genetic programming for classification with unbalanced data, IEEE Trans. Evol. Comput. 17 (2013) 368–386.

[114]

U. Bhowan, M. Johnston, Zhang M., Yao X., Reusing genetic programming for ensemble selection in classification of unbalanced data, IEEE Trans. Evol. Comput. 18 (2014) 893–908.

[115]

W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, F.E. Alsaadi, A survey of deep neural network architectures and their applications, Neurocomputing 234 (2017) 11–26.

[116]

C. Huang, Y. Li, C.C. Loy, X. Tang, Learning deep representation for imbalanced classification, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5375–5384.

[117]

Chung Y.A., Lin H.T., Yang S.W., Cost-aware pre-training for multiclass cost-sensitive deep learning, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, AAAI Press, 2016, pp. 1411–1417.

[118]

Wang S., Liu W., Wu J., Cao L., Meng Q., P.J. Kennedy, Training deep neural networks on imbalanced data sets, Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), 2016, pp. 4368–4374.

[119]

V. Raj, S. Magg, S. Wermter, Towards effective classification of imbalanced data with convolutional neural networks, in: F. Schwenker, H.M. Abbas, N.E. Gayar, E. Trentin (Eds.), Proceedings of the Artificial Neural Networks in Pattern Recognition: Seventh IAPR TC3 Workshop, ANNPR 2016, Ulm, Germany September 28–30, Springer International Publishing, Cham, 2016, pp. 150–162.

[120]

Li F., Li S., Zhu C., Lan X., Chang H., Cost-effective class-imbalance aware CNN for vehicle localization and categorization in high resolution aerial images, Remote Sens. (Basel) 9 (2017).

[121]

Yan Y., Chen M., Shyu M.L., Chen S.C., Deep learning for imbalanced multimedia data classification, Proceedings of the 2015 IEEE International Symposium on Multimedia (ISM), 2015, pp. 483–488.

[122]

Dua D., Karra Taniskidou E., UCI Machine Learning Repository [http://archive.ics.uci.edu/ml], University of California, School of Information and Computer Science, Irvine, CA, 2017.

[123]

D. Broomhead, D. Lowe, Multivariable functional interpolation and adaptive networks, Compl. Syst. 2 (1988) 321–355.

[124]

Huang G.B., Zhu Q.Y., Siew C.K., Extreme learning machine: theory and applications, Neurocomputing 70 (2006) 489–501.

[125]

C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995) 273–297.

[126]

A. Bordes, S. Ertekin, J. Weston, L. Bottou, Fast kernel classifiers with online and active learning, J. Mach. Learn. Res. 6 (2005) 1579–1619.

[127]

CiteSeer Data, Citeseer data, (accessed 09-January-2018), http://citeseer.ist.psu.edu.

[128]

USPSOCRData, USPS OCR data, (accessed 09-January-2018) https://cs.nyu.edu/~roweis/data/usps_all.mat.

[129]

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (1998) 2278–2324.

[130]

J.R. Quinlan, C4. 5: Programs for machine learning, Elsevier, 2014.

[131]

I. Triguero, S. González, J.M. Moyano, S. García, J. Alcalá-Fdez, J. Luengo, A. Fernández, M.J. del Jesus, L. Sánchez, F. Herrera, Keel 3.0: an open source software for multi-stage analysis in data mining, Int. J. Comput. Intell. Syst. 10 (2017) 1238–1249.

[132]

D.E. Rumelhart, G.E. Hinton, Williams R.J., Learning representations by back-propagating errors, Nature 323 (1986) 533.

[133]

J.A. Lee, ELENA project, 2000 (accessed 09-January-2018), https://www.elen.ucl.ac.be/neural-nets/Research/Projects/ELENA/elena.htm.

[134]

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, Proceedings of the Twenty-forth International Conference on Machine Learning, ACM, 2007, pp. 473–480.

[135]

Liu Z., Luo P., Wang X., Tang X., Deep learning face attributes in the wild, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3730–3738.

[136]

S.H. Khan, M. Hayat, M. Bennamoun, F.A. Sohel, R. Togneri, Cost-sensitive learning of deep feature representations from imbalanced data, IEEE Trans. Neural Netw. Learn. Syst. PP (2017) 1–15.

[137]

A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.

[138]

L. Fei-Fei, R. Fergus, P. Perona, Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories, Comput. Vis. Image Underst. 106 (2007) 59–70.

Digital Library

[139]

L. Ballerini, R.B. Fisher, B. Aldridge, J. Rees, A color and texture based hierarchical K-nn approach to the classification of non-melanoma skin lesions, Color Medical Image Analysis, Springer, 2013, pp. 63–86.

[140]

O. Beijbom, P.J. Edmunds, D.I. Kline, B.G. Mitchell, D. Kriegman, Automated annotation of coral reef survey images, Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2012, pp. 1170–1177.

[141]

A. Quattoni, A. Torralba, Recognizing Indoor Scenes, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, IEEE, 2009, pp. 413–420.

[142]

G.M. Weiss, Mining with rarity: a unifying framework, ACM SIGKDD Explor. Newslett. 6 (2004) 7–19.

Digital Library

[143]

G.M. Weiss, The impact of small disjuncts on classifier learning., Proceedings of the Data Mining, 8, 2010, pp. 193–226.

[144]

G.M. Weiss, H. Hirsh, A quantitative study of small disjuncts, Proceedings of the AAAI/IAAI 2000, 2000, pp. 665–670.

[145]

D.R. Carvalho, A.A. Freitas, A genetic algorithm-based solution for the problem of small disjuncts, Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, 2000, pp. 345–352.

[146]

Ting K.M., The problem of small disjuncts: its remedy in decision trees, Proceedings of the Tenth Canadian Conference on Artificial Intelligence, 1994, pp. 91–97.

[147]

J.R. Quinlan, Improved estimates for the accuracy of small disjuncts, Mach. Learn. 6 (1991) 93–98.

[148]

A. Danyluk, F. Provost, Small disjuncts in action: learning to diagnose errors in the local loop of the telephone network, Proceedings of Tenth International Conference on Machine Learning 1993, 1993, pp. 81–88.

[149]

G.M. Weiss, Learning with rare cases and small disjuncts, Proceedings of the International Machine Learning Society, ICML 1995, 1995, pp. 558–565.

[150]

A.V.D. Bosch, A. Weijters, H.J.V.D. Herik, W. Daelemans, When small disjuncts abound, try lazy learning: a case study, Proceedings of the Seventh Belgian-Dutch Conference on Machine Learning, 1997, pp. 109–118.

[151]

H. Déjean, Learning rules and their exceptions, J. Mach. Learn. Res. 2 (2002) 669–693.

[152]

V. García, J.S. Sánchez, H.O. Domínguez, L. Cleofas-Sánchez, Dissimilarity-based learning from imbalanced data with small disjuncts and noise, Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, 2015, pp. 370–378.

[153]

D.R. Carvalho, A.A. Freitas, A genetic-algorithm for discovering small-disjunct rules in data mining, Appl. Soft Comput. 2 (2002) 75–88.

[154]

D.R. Carvalho, A.A. Freitas, Evaluating six candidate solutions for the small-disjunct problem and choosing the best solution via meta-learning, Artif. Intell. Rev. 24 (2005) 61–98.

[155]

K.M. Ali, M.J. Pazzani, Reducing the small disjuncts problem by learning probabilistic concept descriptions, in: T. Petsche (Ed.), Computational Learning Theory and Natural Learning Systems, 3, 1992.

[156]

N. Japkowicz, Supervised Learning with Unsupervised Output Separation, Proceedings of the International Conference on Artificial Intelligence and Soft Computing, 3, 2002, pp. 321–325.

[157]

A.D. Shapiro, Structured Induction in Expert Systems, Addison-Wesley Longman Publishing Co., Inc., 1987.

[158]

G.M. Weiss, F. Provost, Learning when training data are costly: the effect of class distribution on tree induction, J. Artif. Intell. Res. 19 (2003) 315–354.

Digital Library

[159]

K. Napierała, J. Stefanowski, S. Wilk, Learning from imbalanced data in presence of noisy and borderline examples, Proceedings of the Rough Sets and Current Trends in Computing, 2010, pp. 158–167.

[160]

Zhang X., Li Y., Kotagiri R., L. Wu, Z. Tari, M. Cheriet, KRNN: k rare-class nearest neighbour classification, Pattern Recognit. 62 (2017) 33–44.

Digital Library

[161]

Zhang X., Li Y., A positive-biased nearest neighbour algorithm for imbalanced classification, Proceedings of the pacific-asia conference on knowledge discovery and data mining, 2013, pp. 293–304.

[162]

Wang L., L. Khan, B. Thuraisingham, An effective evidence theory based k-nearest neighbor (Knn) classification, Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 1, 2008, pp. 797–801.

[163]

H. Dubey, V. Pudi, Class based weighted K-nearest neighbor over imbalance dataset, Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2013, pp. 305–316.

[164]

R.J.A. Little, D.B. Rubin, Statistical Analysis with Missing Data, John Wiley & Sons, Inc., New York, 1987.

[165]

J.L. Schafer, Analysis of Incomplete Multivariate Data, CRC Press, 1997.

[166]

J.L. Schafer, J.W. Graham, Missing data: our view of the state of the art, Psychol. Methods 7 (2002) 147–177.

[167]

Zhang W., Yang Y., Wang Q., A comparative study of absent features and unobserved values in software effort data, Int. J. Software Eng. Knowl. Eng. 22 (2012) 185–202.

[168]

P.K. Sharpe, R.J. Solly, Dealing with missing values in neural network-based diagnostic systems, Neural Comput. Appl. 3 (1995) 73–77.

[169]

A.R.T. Donders, G.J.M.G. van der Heijden, T. Stijnen, K.G.M. Moons, Review: a gentle introduction to imputation of missing values, J. Clin. Epid. 59 (2006) 1087–1091.

[170]

J.K. Dixon, Pattern recognition with partly missing data, IEEE Trans. Syst. Man Cybern. 9 (1979) 617–621.

[171]

G. Tutz, S. Ramzan, Improved methods for the imputation of missing data by nearest neighbor methods, Comput. Stat. Data Anal. 90 (2015) 84–99.

Digital Library

[172]

J.W. Grzymala-Busse, Hu M., A comparison of several approaches to missing attribute values in data mining, Proceedinmngs of the Rough Sets and Current Trends in Computing, 2001, pp. 378–385.

[173]

D.B. Rubin, Multiple Imputation for Nonresponse in Surveys, John Wiley & Sons, 1987.

[174]

W.R. Gilks, S. Richardson, D.J. Spiegelhalter, Introducing markov chain Monte Carlo, Markov Chain Monte Carlo Pract. 1 (1996) 1–19.

[175]

Chen F., Missing no more: using the MCMC procedure to model missing data, Proceedings of the SAS Global Forum 2013 Conference, SAS Institute Inc., 2013, pp. 1–23.

[176]

N.J. Horton, S.R. Lipsitz, Multiple imputation in practice: comparison of software packages for regression models with missing variables, Am. Stat. 55 (2001) 244–254.

[177]

O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, R.B. Altman, Missing value estimation methods for dna microarrays, Bioinformatics 17 (2001) 520–525.

[178]

Bo T.H., B. Dysvik, I. Jonassen, Lsimpute: accurate estimation of missing values in microarray data with least squares methods, Nucleic Acid Res. 32 (2004).

[179]

M.S.B. Sehgal, I. Gondal, L.S. Dooley, Collateral missing value imputation: a new robust missing value estimation algorithm FPR microarray data, Bioinformatics 21 (2005) 2417–2423.

[180]

M.S.B. Sehgal, I. Gondal, L.S. Dooley, K-ranked covariance based missing values estimation for microarray data classification, Proceedings of the Fourth International Conference on Hybrid Intelligent Systems, 2004. HIS’04, IEEE, 2004, pp. 274–279.

[181]

Ouyang M., W.J. Welsh, P. Georgopoulos, Gaussian mixture clustering and imputation of microarray data, Bioinformatics 20 (2004) 917–923.

Digital Library

[182]

F. Meng, C. Cai, H. Yan, A bicluster-based Bayesian principal component analysis method for microarray missing value estimation, IEEE J. Biomed. Health Inf. 18 (2014) 863–871.

[183]

F. Fessant, S. Midenet, Self-organising map for data imputation and correction in surveys, Neural Computing. Appl. 10 (2002) 300–310.

[184]

Y. Bengio, F. Gingras, Recurrent neural networks for missing or asynchronous data, Proceedings of the Advances in Neural Information Processing systems (NIPS) 1996, 1996, pp. 395–401.

[185]

S. Narayanan, R.J. Marks, J.L. Vian, Choi J.J., M.A. El-Sharkawi, B.B. Thompson, Set constraint discovery: missing sensor data restoration using autoassociative regression machines, Proceedings of the 2002 International Joint Conference on Neural Networks, 2002. IJCNN ’02, 3, 2002, pp. 2872–2877.

[186]

P.J. García-Laencina, J. Serrano, A.R. Figueiras-Vidal, J.L. Sancho-Gómez, Multi-task neural networks for dealing with missing inputs, in: J. Mira, J.R. Álvarez (Eds.), Proceedings of the Bio-inspired Modeling of Cognitive Tasks: Second International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2007, La Manga del Mar Menor, Spain, Part I June 18–21, Springer Berlin Heidelberg, Berlin, Heidelberg, 2007, pp. 282–291.

[187]

C. Barceló, The impact of alternative imputation methods on the measurement of income and wealth: evidence from the spanish survey of household finances, Working Paper Series, Banco de España, 2008.

[188]

I. Myrtveit, E. Stensrud, U.H. Olsson, Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods, IEEE Trans. Softw. Eng. 27 (2001) 999–1013.

[189]

A.P. Dempster, D.B. Rubin, Part I: introduction, in: W.G. Madow, I. Olkin, D.B. Rubin (Eds.), Incomplete Data in Sample Surveys, 2, New York: Academic Press, 1983, pp. 3–10.

[190]

E. Acuña, C. Rodriguez, The treatment of missing values and its effect on classifier accuracy, in: D. Banks, F.R. McMorris, P. Arabie, W. Gaul (Eds.), Proceedings of the Classification, Clustering, and Data Mining Applications, Studies in Classification, Data Analysis, and Knowledge Organisation, Springer Berlin Heidelberg, 2004, pp. 639–647.

[191]

S. Ahmad, V. Tresp, Some solutions to the missing feature problem in vision, in: S. Hanson, J. Cowan, C. Giles (Eds.), Proceedings of the Advances in Neural Information Processing S systems, Morgan-Kaufmann, 1993, pp. 393–400.

[192]

Wang Q., Rao J.N.K., Empirical likelihood-based inference in linear models with missing data, Scand. J. Stat. 29 (2002) 563–576.

[193]

Wang Q., Rao J.N.K., Empirical likelihood-based inference under imputation for missing response data, Annal. Stat. 30 (2002) 896–924.

[194]

Z. Ghahramani, M.I. Jordan, Supervised learning from incomplete data via an EM approach, Proceedings of the Advances in Neural Information Processing S systems, 1994, pp. 120–127.

[195]

V. Tresp, S. Ahmad, R. Neuneier, Training neural networks with deficient data, Proceedings of the Advances in Neural Information Processing S systems, 1994, pp. 128–135.

[196]

D. Williams, Liao X., Xue Y., L. Carin, B. Krishnapuram, On classification with incomplete data, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007) 427–436.

[197]

M. Ramoni, P. Sebastiani, Robust learning with missing data, Mach. Learn. 45 (2001) 147–170.

[198]

C. Bhattacharyya, P.K. Shivaswamy, A.J. Smola, A second order cone programming formulation for classifying missing data, Proceedings of the Advances in neural Information Processing Systems, 2005, pp. 153–160.

[199]

A.J. Smola, S. Vishwanathan, T. Hofmann, Kernel methods for missing variables., Proceedings of the Artificial Intelligence and Statistics, AISTATS, 2005.

[200]

K. Pelckmans, J. De Brabanter, J.A. Suykens, B. De Moor, Handling missing values in support vector machine classifiers, Neural Netw. 18 (2005) 684–692.

[201]

D.F. Heitjan, S. Basu, Distinguishing “missing at random” and “missing completely at random”, Am. Stat. 50 (1996) 207–213.

[202]

B.M. Marlin, Missing data problems in machine learning, Ph.D. thesis, University of Toronto, 2008.

[203]

S. Krause, R. Polikar, An ensemble of classifiers approach for the missing feature problem, Proceedings of the International Joint Conference on Neural Networks, 2003, 1, 2003, pp. 553–558.

[204]

P. Juszczak, R.P.W. Duin, Combining one-class classifiers to classify missing data, Multiple Classifier Systems, Springer, 2004, pp. 92–101.

[205]

L. Nanni, A. Lumini, S. Brahnam, A classifier ensemble approach for the missing feature problem, Artif. Intell. Med. 55 (2012) 37–50.

[206]

J.R. Quinlan, Induction of decision trees, Mach. Learn. 1 (1986) 81–106.

[207]

P. Clark, T. Niblett, The cn2 induction algorithm, Mach. Learn. 3 (1989) 261–283.

[208]

L. Struski, M. Smieja, J. Tabor, Incomplete data representation for SVM classification, CoRR abs/1612.01480 (2016).

[209]

E. Hazan, R. Livni, Y. Mansour, Classification with low rank and missing data., Proceedings of the International Conference on Machine Learning, ICML, 2015, pp. 257–266.

[210]

Che Z., S. Purushotham, Cho K., Sontag D., Liu Y., Recurrent neural networks for multivariate time series with missing values, CoRR abs/1606.01865 (2016).

[211]

L. Gondara, Wang K., Multiple imputation using deep denoising autoencoders, CoRR abs/1705.02737 (2017).

[212]

Zhong S.H., Liu Y., Hua K.A., Field effect deep networks for image recognition with incomplete data, ACM Trans. Multimedia Comput. Commun. Appl. 12 (2016) 52:1–52:22.

[213]

Duan Y., Lv Y., Kang W., Zhao Y., A deep learning based approach for traffic data imputation, Proceedings of the Seventeenth International IEEE Conference on Intelligent Transportation Systems (ITSC), 2014, pp. 912–917.

[214]

C. Leke, T. Marwala, Missing data estimation in high-dimensional datasets: a swarm intelligence-deep neural network approach, in: Y. Tan, Y. Shi, B. Niu (Eds.), Proceedings of the Advances in Swarm Intelligence: Seventh International Conference, ICSI 2016, Bali, Indonesia, Part I June 25–30, Springer International Publishing, Cham, 2016, pp. 259–270.

[215]

T. Schioler, J. Nolan, P. McNair, Transferability of Knowledge based systems, Proceedings of the Medical Informatics Europe 1991, Springer, 1991, pp. 394–398.

[216]

S.A. Dudani, The distance-weighted k-nearest-neighbor rule, IEEE Trans. Syst. Man Cybern. (1976) 325–327.

[217]

F. Leisch, E. Dimitriadou, Machine learning benchmark problems, 2006(accessed 09-January-2018), http://ftp.auckland.ac.nz/software/CRAN/doc/packages/mlbench.pdf.

[218]

D.J. Hand, Yu K., Idiot’S bayes-not so stupid after all?, Int. Stat. Rev. 69 (2001) 385–398.

[219]

S. Zhaowei, Z. Lingfeng, M. Shangjun, Bin F., Z. Taiping, Incomplete time series prediction using max-margin classification of data with absent features, Math. Probl. Eng. 2010 (2010).

[220]

M. Gönen, E. Alpayd, Multiple kernel learning algorithms, J. Mach. Learn. Res. 12 (2011) 2211–2268.

[221]

Jo T., N. Japkowicz, Class imbalances versus small disjuncts, ACM Sigkdd Explor. Newslett. 6 (2004) 40–49.

[222]

L.E. Zarate, B.M. Nogueira, T.R. Santos, Song M.A., Techniques for missing value recovering in imbalanced databases: application in a marketing database with massive missing data, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2006. SMC’06, 3, 2006, pp. 2658–2664.

[223]

D. Davis, M. Rahman, Missing value imputation using stratified supervised learning for cardiovascular data, J. Inf. Data Min. (2016).

[224]

N. Poolsawad, C. Kambhampati, J. Cleland, Balancing class for performance of classification with a clinical dataset, Proceedings of the World Congress on Engineering, 2014, 1, 2014.

[225]

Chen Y., Mani S., Active learning for unbalanced data in the challenge with multiple models and biasing, Proceedings of the Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, 2011, pp. 113–126.

[226]

J. Takum, C. Bunkhumpornpat, Parameter-free imputation for imbalance datasets, Proceedings of the International Conference on Asian Digital Libraries, 2014, pp. 260–267.

[227]

V. Vapnik, O. Chapelle, Bounds on error expectation for support vector machines, Neural Comput. 12 (2000) 2013–2036.

[228]

E. Bax, Validation of k-nearest neighbor classifiers, IEEE Trans. Inf. Theory 58 (2012) 3225–3234.

Cited By

Han MGuo HWang W(2025)A new data complexity measure for multi-class imbalanced classification tasksPattern Recognition10.1016/j.patcog.2024.110881157:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.patcog.2024.110881
Sharman RAgrawal LMulgund P(2024)Handling Imbalanced Data With Weighted Logistic Regression and Propensity Score Matching methodsJournal of Database Management10.4018/JDM.33588835:1(1-37)Online publication date: 7-Jan-2024
https://dl.acm.org/doi/10.4018/JDM.335888
Autran JKuhn VDiguet JDubois MBuche C(2024)AI4I-PMDIProcedia Computer Science10.1016/j.procs.2024.09.546246:C(1201-1209)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.procs.2024.09.546
Show More Cited By

Index Terms

Handling data irregularities in classification: Foundations, trends, and future challenges
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Index terms have been assigned to the content through auto-classification.

Recommendations

A dissimilarity-based imbalance data classification algorithm

Class imbalances have been reported to compromise the performance of most standard classifiers, such as Naive Bayes, Decision Trees and Neural Networks. Aiming to solve this problem, various solutions have been explored mainly via balancing the skewed ...
Data dependency in multiple classifier systems

In this paper, the data dependency of aggregation modules in multiple classifier system is being investigated. We first propose a new categorization scheme, in which combining methods are grouped into data-independent, implicitly data-dependent and ...
Maximizing AUC to learn weighted naive Bayes for imbalanced data classification
Abstract
Imbalanced data classification is a challenging problem frequently encountered in many real-world applications. Traditional classification algorithms are generally designed to maximize overall accuracy; therefore, their effectiveness ...
Highlights
- A novel weighted naive Bayes (NB) for imbalanced data classification was proposed.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Pattern Recognition

Pattern Recognition Volume 81, Issue C

Sep 2018

694 pages

ISSN:0031-3203

Issue’s Table of Contents

Elsevier Ltd.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 September 2018

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Han MGuo HWang W(2025)A new data complexity measure for multi-class imbalanced classification tasksPattern Recognition10.1016/j.patcog.2024.110881157:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.patcog.2024.110881
Sharman RAgrawal LMulgund P(2024)Handling Imbalanced Data With Weighted Logistic Regression and Propensity Score Matching methodsJournal of Database Management10.4018/JDM.33588835:1(1-37)Online publication date: 7-Jan-2024
https://dl.acm.org/doi/10.4018/JDM.335888
Autran JKuhn VDiguet JDubois MBuche C(2024)AI4I-PMDIProcedia Computer Science10.1016/j.procs.2024.09.546246:C(1201-1209)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.procs.2024.09.546
Guan SZhao XXue YPan H(2024)AWGANInformation Sciences: an International Journal10.1016/j.ins.2024.120311663:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120311
Zelenkov Y(2024)Firm failure prediction using genetic programming generated featuresExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123839249:PCOnline publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123839
Sun ZYing WZhang WGong S(2024)Undersampling method based on minority class density for imbalanced dataExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123328249:PAOnline publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123328
Tsai CChen KLin W(2024)Feature selection and its combination with data over-sampling for multi-class imbalanced datasetsApplied Soft Computing10.1016/j.asoc.2024.111267153:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.asoc.2024.111267
Abayomi-Alli ODamaševičius RMisra SAbayomi-Alli A(2024)FruitQ: a new dataset of multiple fruit images for freshness evaluationMultimedia Tools and Applications10.1007/s11042-023-16058-683:4(11433-11460)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-023-16058-6
Zoghi ZSerpen G(2024)UNSW‐NB15 computer security datasetSecurity and Privacy10.1002/spy2.3317:1Online publication date: 9-Jan-2024
https://dl.acm.org/doi/10.1002/spy2.331
S VA R(2023)Modeling of class imbalance handling with optimal deep learning enabled big data classification modelIntelligent Decision Technologies10.3233/IDT-23019817:4(1179-1197)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/IDT-230198
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents