[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Learning from class-imbalanced data

Published: 01 May 2017 Publication History

Abstract

527 articles related to imbalanced data and rare events are reviewed.Viewing reviewed papers from both technical and practical perspectives.Summarizing existing methods and corresponding statistics by a new taxonomy idea.Categorizing 162 application papers into 13 domains and giving introduction.Some opening questions are discussed at the end of this manuscript. Rare events, especially those that could potentially negatively impact society, often require humans decision-making responses. Detecting rare events can be viewed as a prediction task in data mining and machine learning communities. As these events are rarely observed in daily life, the prediction task suffers from a lack of balanced data. In this paper, we provide an in depth review of rare event detection from an imbalanced learning perspective. Five hundred and seventeen related papers that have been published in the past decade were collected for the study. The initial statistics suggested that rare events detection and imbalanced learning are concerned across a wide range of research areas from management science to engineering. We reviewed all collected papers from both a technical and a practical point of view. Modeling methods discussed include techniques such as data preprocessing, classification algorithms and model evaluation. For applications, we first provide a comprehensive taxonomy of the existing application domains of imbalanced learning, and then we detail the applications for each category. Finally, some suggestions from the reviewed papers are incorporated with our experiences and judgments to offer further research directions for the imbalanced learning and rare event detection fields.

References

[1]
A. Abbasi, H. Chen, A comparison of fraud cues and classification methods for fake escrow website detection, Information Technology and Management, 10 (2009) 83-101.
[2]
C. Abeysinghe, J. Li, J. He, A Classifier Hub for Imbalanced Financial Data, in: Australasian Database Conference, Springer, 2016.
[3]
A. Al-Ghraibah, L.E. Boucheron, R.J. McAteer, A Study of Feature Selection of Magnetogram Complexity Features in an Imbalanced Solar Flare Prediction Data-set, in: 2015 IEEE International Conference on Data Mining Workshop (ICDMW), IEEE, 2015.
[4]
E. Alfaro, N. Garca, M. Gmez, D. Elizondo, Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks, Decision Support Systems, 45 (2008) 110-122.
[5]
S. Ali, A. Majid, S.G. Javed, M. Sattar, Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data, Computers in biology and medicine, 73 (2016) 38-46.
[6]
M. Alibeigi, S. Hashemi, A. Hamzeh, DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets, Data & Knowledge Engineering, 81 (2012) 67-103.
[7]
S. Alshomrani, A. Bawakid, S.-O. Shim, A. Fernndez, F. Herrera, A proposal for evolutionary fuzzy systems using feature weighting: Dealing with overlapping in imbalanced datasets, Knowledge-Based Systems, 73 (2015) 1-17.
[8]
F.A. Alsulaiman, J.J. Valdes, A. El Saddik, Identity verification based on haptic handwritten signatures: Genetic programming with unbalanced data, in: Computational Intelligence for Security and Defence Applications (CISDA), 2012 IEEE Symposium on, IEEE, 2012.
[9]
A. Anand, G. Pugalenthi, G.B. Fogel, P. Suganthan, An approach for classification of highly imbalanced data using weighting and undersampling, Amino acids, 39 (2010) 1385-1391.
[10]
B. Anderson, P. Adey, Governing events and life:Emergency'in UK Civil Contingencies, Political Geography, 31 (2012) 24-33.
[11]
S. Ando, Classifying imbalanced data in distance-based feature space, Knowledge and Information Systems (2015) 1-24.
[12]
A.D. Ashkezari, H. Ma, T.K. Saha, C. Ekanayake, Application of fuzzy support vector machine for determining the health index of the insulation system of in-service power transformers, Dielectrics and Electrical Insulation, IEEE Transactions on, 20 (2013) 965-973.
[13]
A. Azaria, A. Richardson, S. Kraus, V. Subrahmanian, Behavioral Analysis of Insider Threat: A Survey and Bootstrapped Prediction in Imbalanced Data, Computational Social Systems, IEEE Transactions on, 1 (2014) 135-155.
[14]
S.-H. Bae, K.-J. Yoon, Polyp Detection via Imbalanced Learning and Discriminative Feature Learning, Medical Imaging, IEEE Transactions on, 34 (2015) 2379-2393.
[15]
S. Bagherpour, . Nebot, F. Mugica, FIR as Classifier in the Presence of Imbalanced Data, in: International Symposium on Neural Networks, Springer, 2016.
[16]
A.C. Bahnsen, A. Stojanovic, D. Aouada, B. Ottersten, Cost sensitive credit card fraud detection using Bayes minimum risk, in: Machine Learning and Applications (ICMLA), 2013 12th International Conference on, IEEE, 2013.
[17]
F. Bao, Y. Deng, Q. Dai, ACID: association correction for imbalanced data in GWAS, IEEE/ACM Transactions on Computational Biology and Bioinformatics (2016).
[18]
L. Bao, C. Juan, J. Li, Y. Zhang, Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets, Neurocomputing, 172 (2016) 198-206.
[19]
C. Beyan, R. Fisher, Classifying imbalanced data sets using similarity based hierarchical decomposition, Pattern Recognition, 48 (2015) 1653-1672.
[20]
R. Blagus, L. Lusa, SMOTE for high-dimensional class-imbalanced data, BMC bioinformatics, 14 (2013) 1-16.
[21]
J. Baszczyski, M. Lango, Diversity Analysis on Imbalanced Data Using Neighbourhood and Roughly Balanced Bagging Ensembles, in: International Conference on Artificial Intelligence and Soft Computing, Springer, 2016.
[22]
V. Bogina, T. Kuflik, O. Mokryn, Learning Item Temporal Dynamics for Predicting Buying Sessions, in: Proceedings of the 21st International Conference on Intelligent User Interfaces, ACM, 2016.
[23]
J.P. Boyu Wang, Online Bagging and Boosting for Imbalanced Data Streams, IEEE Transactions on Knowledge and Data Engineering, 28 (2016) 3353-3366.
[24]
P. Branco, L. Torgo, R.P. Ribeiro, A Survey of Predictive Modeling on Imbalanced Domains, ACM Computing Surveys (CSUR), 49 (2016).
[25]
A. Braytee, W. Liu, P. Kennedy, A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data, in: International Conference on Neural Information Processing, Springer, 2016.
[26]
C. Brekke, A.H. Solberg, Classifiers and confidence estimation for oil spill detection in ENVISAT ASAR images, Geoscience and Remote Sensing Letters, IEEE, 5 (2008) 65-69.
[27]
A. Bria, C. Marrocco, M. Molinara, F. Tortorella, A ranking-based cascade approach for unbalanced data, in: Pattern Recognition (ICPR), 2012 21st International Conference on, IEEE, 2012.
[28]
I. Brown, C. Mues, An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Applications, 39 (2012) 3446-3453.
[29]
H. Cao, X.-L. Li, D.Y.-K. Woon, S.-K. Ng, Integrated oversampling for imbalanced time series classification, Knowledge and Data Engineering, IEEE Transactions on, 25 (2013) 2809-2822.
[30]
H. Cao, V.Y. Tan, J.Z. Pang, A parsimonious mixture of Gaussian trees model for oversampling in imbalanced and multimodal time-series classification, Neural Networks and Learning Systems, IEEE Transactions on, 25 (2014) 2226-2239.
[31]
Y. Cao, J. Wu, Projective ART for clustering data sets in high dimensional spaces, Neural Networks, 15 (2002) 105-120.
[32]
G. Casaola-Martin, T. Garrigues, M. Bermejo, I. Gonzlez-lvarez, N. Nguyen-Hai, M. Cabrera-Prez, Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling, Molecular diversity, 20 (2016) 93-109.
[33]
C.L. Castro, A.P. Braga, Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data, Neural Networks and Learning Systems, IEEE Transactions on, 24 (2013) 888-899.
[34]
S. Cateni, V. Colla, M. Vannucci, A method for resampling imbalanced datasets in binary classification tasks for real-world problems, Neurocomputing, 135 (2014) 32-41.
[35]
L. Cerf, D. Gay, N. Selmaoui-Folcher, B. Crmilleux, J.-F. Boulicaut, Parameter-free classification in multi-class imbalanced data sets, Data & Knowledge Engineering, 87 (2013) 109-129.
[36]
J.-S. Chang, W.-H. Chang, A cost-effective method for early fraud detection in online auctions, in: ICT and Knowledge Engineering (ICT & Knowledge Engineering), 2012 10th International Conference on, IEEE, 2012.
[37]
N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: synthetic minority over-sampling technique, Journal of artificial intelligence research (2002) 321-357.
[38]
K. Chen, B.-L. Lu, J.T. Kwok, Efficient classification of multi-label and imbalanced data using min-max modular classifiers, in: Neural Networks, 2006. IJCNN'06. International Joint Conference on, IEEE, 2006.
[39]
S. Chen, H. He, E.A. Garcia, RAMOBoost: ranked minority oversampling in boosting, Neural Networks, IEEE Transactions on, 21 (2010) 1624-1642.
[40]
X.-w. Chen, M. Wasikowski, Fast: a roc-based feature selection metric for small samples and imbalanced data classification problems, in: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2008.
[41]
Y.-S. Chen, An empirical study of a hybrid imbalanced-class DT-RST classification procedure to elucidate therapeutic effects in uremia patients, Medical & biological engineering & computing, 54 (2016) 983-1001.
[42]
Z.-Y. Chen, Z.-P. Fan, M. Sun, A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data, European Journal of Operational Research, 223 (2012) 461-472.
[43]
F. Cheng, J. Zhang, C. Wen, Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data, Pattern Recognition Letters, 80 (2016) 107-112.
[44]
J. Cheng, G.-Y. Liu, Affective detection based on an imbalanced fuzzy support vector machine, Biomedical Signal Processing and Control, 18 (2015) 118-126.
[45]
T.-H. Cheng, P.-H. Hu, A data-driven approach to manage the length of stay for appendectomy patients, Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 39 (2009) 1339-1347.
[46]
D. Chetchotsak, S. Pattanapairoj, B. Arnonkijpanich, Integrating new data balancing technique with committee networks for imbalanced data: GRSOM approach, Cognitive neurodynamics, 9 (2015) 627-638.
[47]
C. D'Este, G. Timms, A. Turnbull, A. Rahman, Ensemble aggregation methods for relocating models of rare events, Engineering Applications of Artificial Intelligence, 34 (2014) 58-65.
[48]
A. D'Addabbo, R. Maglietta, Parallel selective sampling method for imbalanced and large data classification, Pattern Recognition Letters, 62 (2015) 61-67.
[49]
I.B.V. da Silva, P.J. Adeodato, PCA and Gaussian noise in MLP neural network training improve generalization in problems with small and unbalanced data sets, in: Neural networks (IJCNN), the 2011 international joint conference on, IEEE, 2011.
[50]
H.-L. Dai, Imbalanced Protein Data Classification Using Ensemble FTM-SVM, NanoBioscience, IEEE Transactions on, 14 (2015) 350-359.
[51]
A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, G. Bontempi, Credit card fraud detection and concept-drift adaptation with delayed supervised information, in: Neural Networks (IJCNN), 2015 International Joint Conference on, IEEE, 2015.
[52]
B. Das, N.C. Krishnan, D.J. Cook, RACOG and wRACOG: Two Probabilistic Oversampling Techniques, Knowledge and Data Engineering, IEEE Transactions on, 27 (2015) 222-234.
[53]
S. Datta, S. Das, Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs, Neural Networks, 70 (2015) 39-52.
[54]
J.C.S. de Souza, S.G. Claudino, R. da Silva Simes, P.R. Oliveira, K.M. Honrio, Recent advances for handling imbalancement and uncertainty in labelling in medicinal chemistry data analysis, in: SAI Computing Conference (SAI), 2016, IEEE, 2016.
[55]
S. del Ro, V. Lpez, J.M. Bentez, F. Herrera, On the use of MapReduce for imbalanced big data using random forest, Information Sciences, 285 (2014) 112-137.
[56]
M. Denil, T. Trappenberg, Overlap versus Imbalance, in: Canadian Conference on Advances in Artificial Intelligence, 2010.
[57]
J.F. Dez-Pastor, J.J. Rodrguez, C. Garca-Osorio, L.I. Kuncheva, Random balance: ensembles of variable priors classifiers for imbalanced data, Knowledge-Based Systems, 85 (2015) 96-111.
[58]
J.F. Dez-Pastor, J.J. Rodrguez, C.I. Garca-Osorio, L.I. Kuncheva, Diversity techniques improve the performance of the best imbalance learning ensembles, Information Sciences, 325 (2015) 98-117.
[59]
G. Ditzler, R. Polikar, Incremental learning of concept drift from streaming imbalanced data, Knowledge and Data Engineering, IEEE Transactions on, 25 (2013) 2283-2301.
[60]
A. Dong, F.-l. Chung, S. Wang, Semi-supervised classification method through oversampling and common hidden space, Information Sciences, 349 (2016) 216-228.
[61]
D.J. Drown, T.M. Khoshgoftaar, N. Seliya, Evolutionary sampling and software quality modeling of high-assurance systems, Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 39 (2009) 1097-1107.
[62]
L. Duan, M. Xie, T. Bai, J. Wang, A new support vector data description method for machinery fault diagnosis with unbalanced datasets, Expert Systems with Applications, 64 (2016) 239-246.
[63]
L. Duan, M. Xie, T. Bai, J. Wang, Support vector data description for machinery multi-fault classification with unbalanced datasets, in: Prognostics and Health Management (ICPHM), 2016 IEEE International Conference on, IEEE, 2016.
[64]
R. Dubey, J. Zhou, Y. Wang, P.M. Thompson, J. Ye, Analysis of sampling techniques for imbalanced data: An n= 648 ADNI study, NeuroImage, 87 (2014) 220-241.
[65]
V. Engen, J. Vincent, K. Phalp, Enhancing network based intrusion detection for imbalanced data, International Journal of Knowledge-Based and Intelligent Engineering Systems, 12 (2008) 357-367.
[66]
N.F. Escudeiro, A.M. Jorge, D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions, Journal of the Brazilian Computer Society, 18 (2012) 311-330.
[67]
F. Fabris, L.R. Margoto, F.M. Varejao, Novel approaches for detecting frauds in energy consumption, in: Network and System Security, 2009. NSS'09. Third International Conference on, IEEE, 2009.
[68]
B. Fahimnia, C.S. Tang, H. Davarzani, J. Sarkis, Quantitative models for managing supply chain risks: A review, European Journal of Operational Research, 247 (2015) 1-15.
[69]
J. Fan, Z. Niu, Y. Liang, Z. Zhao, Probability Model Selection and Parameter Evolutionary Estimation for Clustering Imbalanced Data without Sampling, Neurocomputing (2016).
[70]
H. Farvaresh, M.M. Sepehri, A data mining framework for detecting subscription fraud in telecommunication, Engineering Applications of Artificial Intelligence, 24 (2011) 182-194.
[71]
A. Fernndez, M.J. Del Jesus, F. Herrera, Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning, Springer, 2010.
[72]
A. Fernndez, M.J. del Jesus, F. Herrera, On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets, Information Sciences, 180 (2010) 1268-1291.
[73]
A. Fernndez, V. Lpez, M. Galar, M.J. Del Jesus, F. Herrera, Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches, Knowledge-Based Systems, 42 (2013) 97-110.
[74]
C. Ferri, J. Hernndez-Orallo, P.A. Flach, A coherent interpretation of AUC as a measure of aggregated classification performance, in: Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011.
[75]
G. Folino, F.S. Pisani, P. Sabatino, An Incremental Ensemble Evolved by using Genetic Programming to Efficiently Detect Drifts in Cyber Security Datasets, in: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, ACM, 2016.
[76]
M. Frasca, A. Bertoni, M. Re, G. Valentini, A neural network algorithm for semi-supervised node label learning from unbalanced data, Neural Networks, 43 (2013) 84-98.
[77]
Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: ICML, 1996.
[78]
Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of computer and system sciences, 55 (1997) 119-139.
[79]
J.H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics (2001) 1189-1232.
[80]
J. Fu, S. Lee, Certainty-based active learning for sampling imbalanced datasets, Neurocomputing, 119 (2013) 350-358.
[81]
M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 42 (2012) 463-484.
[82]
M. Galar, A. Fernndez, E. Barrenechea, F. Herrera, EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling, Pattern Recognition, 46 (2013) 3460-3471.
[83]
M. Gao, X. Hong, C.J. Harris, Construction of neurofuzzy models for imbalanced data classification, Fuzzy Systems, IEEE Transactions on, 22 (2014) 1472-1488.
[84]
X. Gao, Z. Chen, S. Tang, Y. Zhang, J. Li, Adaptive weighted imbalance learning with application to abnormal activity recognition, Neurocomputing, 173 (2016) 1927-1935.
[85]
V. Garca, J.S. Snchez, R. Martn-Flez, R.A. Mollineda, Surrounding neighborhood-based SMOTE for learning from imbalanced data sets, Progress in Artificial Intelligence, 1 (2012) 347-362.
[86]
N. Garcia-Pedrajas, J.A.R. del Castillo, G. Cerruela-Garcia, A Proposal for Local k Values for k-Nearest Neighbor Rule, IEEE transactions on neural networks and learning systems (2015).
[87]
N. Garca-Pedrajas, C. Garca-Osorio, Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections, Progress in Artificial Intelligence, 2 (2013) 29-44.
[88]
A. Ghazikhani, R. Monsefi, H.S. Yazdi, Ensemble of online neural networks for non-stationary and imbalanced data streams, Neurocomputing, 122 (2013) 535-544.
[89]
A. Ghazikhani, R. Monsefi, H.S. Yazdi, Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams, Neural Computing and Applications, 23 (2013) 1283-1295.
[90]
A. Ghazikhani, R. Monsefi, H.S. Yazdi, Online neural network model for non-stationary and imbalanced data stream classification, International Journal of Machine Learning and Cybernetics, 5 (2014) 51-62.
[91]
R. Gong, S.H. Huang, A KolmogorovSmirnov statistic based segmentation approach to learning from imbalanced datasets: With application in property refinance prediction, Expert Systems with Applications, 39 (2012) 6192-6200.
[92]
K. Govindan, M.B. Jepsen, ELECTRE: A comprehensive literature review on methodologies and applications, European Journal of Operational Research, 250 (2016) 1-29.
[93]
Q. Gu, L. Zhu, Z. Cai, Evaluation measures of the classification performance of imbalanced data sets, Springer, 2009.
[94]
H. Guo, Y. Li, L. Yanan, L. Xiao, L. Jinling, BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification, Engineering Applications of Artificial Intelligence, 49 (2016) 176-193.
[95]
I. Guyon, A. Elisseeff, An introduction to variable and feature selection, The Journal of Machine Learning Research, 3 (2003) 1157-1182.
[96]
J. Ha, J.-S. Lee, A New Under-Sampling Method Using Genetic Algorithm for Imbalanced Data Classification, in: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication, ACM, 2016.
[97]
S. Hajian, J. Domingo-Ferrer, A. Martinez-Balleste, Discrimination prevention in data mining for intrusion and crime detection, in: Computational Intelligence in Cyber Security (CICS), 2011 IEEE Symposium on, IEEE, 2011.
[98]
D.J. Hand, Measuring classifier performance: a coherent alternative to the area under the ROC curve, Machine learning, 77 (2009) 103-123.
[99]
D.J. Hand, R.J. Till, A simple generalisation of the area under the ROC curve for multiple class classification problems, Machine learning, 45 (2001) 171-186.
[100]
M. Hao, Y. Wang, S.H. Bryant, An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data, Analytica chimica acta, 806 (2014) 117-127.
[101]
W.M. Hartmann, Dimension reduction vs. variable selection. Applied Parallel Computing, Springer, 2004.
[102]
A.K.I. Hassan, A. Abraham, Modeling insurance fraud detection using imbalanced data classification, Springer, 2016.
[103]
H. He, E.A. Garcia, Learning from imbalanced data, Knowledge and Data Engineering, IEEE Transactions on, 21 (2009) 1263-1284.
[104]
He, H. and Y. Ma (2013). "Imbalanced learning. Foundations, algorithms, and applications."
[105]
N. Herndon, D. Caragea, A Study of Domain Adaptation Classifiers Derived From Logistic Regression for the Task of Splice Site Prediction, IEEE transactions on nanobioscience, 15 (2016) 75-83.
[106]
C.S. Hilas, P.A. Mastorocostas, An application of supervised and unsupervised learning approaches to telecommunications fraud detection, Knowledge-Based Systems, 21 (2008) 721-726.
[107]
T.R. Hoens, R. Polikar, N.V. Chawla, Learning from streaming data with concept drift and imbalance: an overview, Progress in Artificial Intelligence, 1 (2012) 89-101.
[108]
X. Hong, S. Chen, C.J. Harris, A kernel-based two-class classifier for imbalanced data sets, Neural Networks, IEEE Transactions on, 18 (2007) 28-41.
[109]
S. Hu, Y. Liang, L. Ma, Y. He, MSMOTE: improving classification performance when training data is imbalanced, in: 2009 Second International Workshop on Computer Science and Engineering, IEEE, 2009.
[110]
G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing, 70 (2006) 489-501.
[111]
K. Huang, H. Yang, I. King, M.R. Lyu, Imbalanced learning with a biased minimax probability machine, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 36 (2006) 913-923.
[112]
X. Huang, Y. Zou, Y. Wang, Cost-sensitive sparse linear regression for crowd counting with imbalanced training data, in: Multimedia and Expo (ICME), 2016 IEEE International Conference on, IEEE, 2016.
[113]
J. Jacques, J. Taillard, D. Delerue, C. Dhaenens, L. Jourdan, Conception of a dominance-based multi-objective local search in the context of classification rule mining in large and imbalanced data sets, Applied Soft Computing, 34 (2015) 705-720.
[114]
L.A. Jeni, J.F. Cohn, F. De La Torre, Facing Imbalanced DataRecommendations for the Use of Performance Metrics, in: Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on, IEEE, 2013.
[115]
C. Jian, J. Gao, Y. Ao, A new sampling method for classifying imbalanced data based on support vector machine ensemble, Neurocomputing, 193 (2016) 115-122.
[116]
X. Jin, F. Yuan, T.W. Chow, M. Zhao, Weighted local and global regressive mapping: A new manifold learning method for machine fault classification, Engineering Applications of Artificial Intelligence, 30 (2014) 118-128.
[117]
T. Jo, N. Japkowicz, Class imbalances versus small disjuncts, ACM SIGKDD Explorations Newsletter, 6 (2004) 40-49.
[118]
J. Kim, K. Choi, G. Kim, Y. Suh, Classification cost: An empirical comparison among traditional classifier, Cost-Sensitive Classifier, and MetaCost, Expert Systems with Applications, 39 (2012) 4013-4019.
[119]
S. Kim, H. Kim, Y. Namkoong, Ordinal Classification of Imbalanced Data with Application in Emergency and Disaster Information Services, IEEE Intelligent Systems, 31 (2016) 50-56.
[120]
G. King, L. Zeng, Logistic regression in rare events data, Political analysis, 9 (2001) 137-163.
[121]
M. Kirlidog, C. Asuk, A fraud detection approach with data mining in health insurance, Procedia-Social and Behavioral Sciences, 62 (2012) 989-994.
[122]
B. Krawczyk, M. Galar, . Jele, F. Herrera, Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy, Applied Soft Computing, 38 (2016) 714-726.
[123]
B. Krawczyk, G. Schaefer, An improved ensemble approach for imbalanced classification problems, in: Applied Computational Intelligence and Informatics (SACI), 2013 IEEE 8th International Symposium on, IEEE, 2013.
[124]
B. Krawczyk, M. Woniak, G. Schaefer, Cost-sensitive decision tree ensembles for effective imbalanced classification, Applied Soft Computing, 14 (2014) 554-562.
[125]
M. Krivko, A hybrid model for plastic card fraud detection systems, Expert Systems with Applications, 37 (2010) 6070-6076.
[126]
N.S. Kumar, K.N. Rao, A. Govardhan, K.S. Reddy, A.M. Mahmood, Undersampled K-means approach for handling imbalanced distributed data, Progress in Artificial Intelligence, 3 (2014) 29-38.
[127]
J. Kwak, T. Lee, C.O. Kim, An Incremental Clustering-Based Fault Detection Algorithm for Class-Imbalanced Process Data, Semiconductor Manufacturing, IEEE Transactions on, 28 (2015) 318-328.
[128]
J.-s. Lan, V.L. Berardi, B.E. Patuwo, M. Hu, A joint investigation of misclassification treatments and imbalanced datasets on neural network performance, Neural Computing and Applications, 18 (2009) 689-706.
[129]
P.C. Lane, D. Clarke, P. Hender, On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data, Decision Support Systems, 53 (2012) 712-718.
[130]
B. Lerner, J. Yeshaya, L. Koushnir, On the classification of a small imbalanced cytogenetic image database, Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 4 (2007) 204-215.
[131]
S. Lessmann, S. Vo, A reference model for customer-centric data mining with support vector machines, European Journal of Operational Research, 199 (2009) 520-530.
[132]
H. Li, M.-L. Wong, Financial fraud detection by using Grammar-based multi-objective genetic programming with ensemble learning, in: Evolutionary Computation (CEC), 2015 IEEE Congress on, IEEE, 2015.
[133]
J. Li, S. Fong, S. Mohammed, J. Fiaidhi, Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms, The Journal of Supercomputing (2015) 1-21.
[134]
J. Li, L.-s. Liu, S. Fong, R.K. Wong, S. Mohammed, J. Fiaidhi, Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data, Computerized Medical Imaging and Graphics (2016).
[135]
K. Li, X. Kong, Z. Lu, L. Wenyin, J. Yin, Boosting weighted ELM for imbalanced learning, Neurocomputing, 128 (2014) 15-21.
[136]
L. Li, L. Jing, D. Huang, Protein-protein interaction extraction from biomedical literatures based on modified SVM-KNN, in: Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on, IEEE, 2009.
[137]
Q. Li, B. Yang, Y. Li, N. Deng, L. Jing, Constructing support vector machine ensemble with segmentation for imbalanced datasets, Neural Computing and Applications, 22 (2013) 249-256.
[138]
S. Li, B. Tang, H. He, An Imbalanced Learning based MDR-TB Early Warning System, Journal of medical systems, 40 (2016) 1-9.
[139]
X. Li, Q. Shao, J. Wang, Classification of tongue coating using Gabor and Tamura features on unbalanced data set, in: Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on, IEEE, 2013.
[140]
Y. Li, H. Guo, L. Xiao, L. Yanan, L. Jinling, Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data, Knowledge-Based Systems, 94 (2016) 88-104.
[141]
J. Liang, L. Bai, C. Dang, F. Cao, The-Means-Type Algorithms Versus Imbalanced Data Distributions, Fuzzy Systems, IEEE Transactions on, 20 (2012) 728-745.
[142]
T.W. Liao, Classification of weld flaws with imbalanced class data, Expert Systems with Applications, 35 (2008) 1041-1052.
[143]
R.F. Lima, A.C. Pereira, A Fraud Detection Model Based on Feature Selection and Undersampling Applied to Web Payment Systems, in: 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), IEEE, 2015.
[144]
M. Lin, K. Tang, X. Yao, Dynamic sampling approach to training neural networks for multiclass imbalance classification, Neural Networks and Learning Systems, IEEE Transactions on, 24 (2013) 647-660.
[145]
S.-J. Lin, C. Chang, M.-F. Hsu, Multiple extreme learning machines for a two-class imbalance corporate life cycle prediction, Knowledge-Based Systems, 39 (2013) 214-223.
[146]
N. Liu, Z.X. Koh, E.C.-P. Chua, L.M.-L. Tan, Z. Lin, B. Mirza, Risk scoring for prediction of acute cardiac complications from imbalanced clinical data, Biomedical and Health Informatics, IEEE Journal of, 18 (2014) 1894-1902.
[147]
X.-Y. Liu, J. Wu, Z.-H. Zhou, Exploratory undersampling for class-imbalance learning, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 39 (2009) 539-550.
[148]
V. Lpez, S. del Ro, J.M. Bentez, F. Herrera, Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data, Fuzzy Sets and Systems, 258 (2015) 5-38.
[149]
V. Lpez, A. Fernndez, S. Garca, V. Palade, F. Herrera, An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Information Sciences, 250 (2013) 113-141.
[150]
V. Lpez, A. Fernndez, J.G. Moreno-Torres, F. Herrera, Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics, Expert Systems with Applications, 39 (2012) 6585-6608.
[151]
O. Loyola-Gonzlez, J.F. Martnez-Trinidad, J.A. Carrasco-Ochoa, M. Garca-Borroto, Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases, Neurocomputing, 175 (2016) 935-947.
[152]
J. Lu, C. Zhang, F. Shi, A Classification Method of Imbalanced Data Base on PSO Algorithm, in: International Conference of Young Computer Scientists, Engineers and Educators, Springer, 2016.
[153]
W.Z. Lu, D. Wang, Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme, Science of the Total Environment, 395 (2008) 109-116.
[154]
L. Lusa, Class prediction for high-dimensional class-imbalanced data, BMC bioinformatics, 11 (2010) 523.
[155]
L. Lusa, Gradient boosting for high-dimensional prediction of rare events, Computational Statistics & Data Analysis (2016).
[156]
M. Maalouf, M. Siddiqi, Weighted logistic regression for large-scale imbalanced and rare events data, Knowledge-Based Systems, 59 (2014) 142-148.
[157]
M. Maalouf, T.B. Trafalis, Robust weighted kernel logistic regression in imbalanced and rare events data, Computational Statistics & Data Analysis, 55 (2011) 168-183.
[158]
S. Maldonado, J. Lpez, Imbalanced data classification using second-order cone programming support vector machines, Pattern Recognition, 47 (2014) 2070-2079.
[159]
S. Maldonado, R. Weber, F. Famili, Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines, Information Sciences, 286 (2014) 228-246.
[160]
B. Mandadi, A. Sethi, Unusual event detection using sparse spatio-temporal features and bag of words model, in: Image Information Processing (ICIIP), 2013 IEEE Second International Conference on, IEEE, 2013.
[161]
W. Mao, L. He, Y. Yan, J. Wang, Online sequential prediction of bearings imbalanced fault diagnosis by extreme learning machine, Mechanical Systems and Signal Processing, 83 (2017) 450-473.
[162]
W. Mao, J. Wang, L. He, Y. Tian, Two-Stage Hybrid Extreme Learning Machine for Sequential Imbalanced Data, in: Proceedings of ELM-2015, Volume 1, Springer, 2016, pp. 423-433.
[163]
A. Maratea, A. Petrosino, M. Manzo, Adjusted F-measure and kernel scaling for imbalanced data learning, Information Sciences, 257 (2014) 331-341.
[164]
S. Mardani, H.R. Shahriari, A new method for occupational fraud detection in process aware information systems, in: Information Security and Cryptology (ISCISC), 2013 10th International ISC Conference on, IEEE, 2013.
[165]
C. Mrquez-Vera, A. Cano, C. Romero, S. Ventura, Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data, Applied intelligence, 38 (2013) 315-330.
[166]
C.K. Maurya, D. Toshniwal, G.V. Venkoparao, Online anomaly detection via class-imbalance learning, in: Contemporary Computing (IC3), 2015 Eighth International Conference on, IEEE, 2015.
[167]
C.K. Maurya, D. Toshniwal, G.V. Venkoparao, Online sparse class imbalance learning on big data, Neurocomputing (2016).
[168]
G. Menardi, N. Torelli, Training and assessing classification rules with imbalanced data, Data Mining and Knowledge Discovery, 28 (2014) 92-122.
[169]
Mikolov, T., K. Chen, G. Corrado and J. Dean (2013). "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781.
[170]
B. Mirza, Z. Lin, J. Cao, X. Lai, Voting based weighted online sequential extreme learning machine for imbalance multi-class classification, in: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2015.
[171]
B. Mirza, Z. Lin, N. Liu, Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift, Neurocomputing, 149 (2015) 316-329.
[172]
B. Mirza, Z. Lin, K.-A. Toh, Weighted online sequential extreme learning machine for class imbalance learning, Neural processing letters, 38 (2013) 465-486.
[173]
S.O. Moepya, S.S. Akhoury, F.V. Nelwamondo, Applying Cost-Sensitive Classification for Financial Fraud Detection under High Class-Imbalance, in: Data Mining Workshop (ICDMW), 2014 IEEE International Conference on, IEEE, 2014.
[174]
A. Moreo, A. Esuli, F. Sebastiani, Distributional Random Oversampling for Imbalanced Text Classification, in: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, ACM, 2016.
[175]
H. Motoda, H. Liu, Feature selection, extraction and construction, Institute of Information and Computing Machinery, Taiwan, 2002.
[176]
J. Nagi, K. Yap, S. Tiong, S. Ahmed, A. Mohammad, Detection of abnormalities and electricity theft using genetic support vector machines, in: TENCON 2008-2008 IEEE Region 10 Conference, IEEE, 2008.
[177]
K. Napierala, J. Stefanowski, Types of minority class examples and their influence on learning classifiers from imbalanced data, Journal of Intelligent Information Systems (2015) 1-35.
[178]
K. Napieraa, J. Stefanowski, Addressing imbalanced data with argument based rule learning, Expert Systems with Applications, 42 (2015) 9468-9481.
[179]
J. Natwichai, X. Li, M. Orlowska, Hiding classification rules for data sharing with privacy preservation, Springer, 2005.
[180]
I. Nekooeimehr, S.K. Lai-Yuen, Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets, Expert Systems with Applications, 46 (2016) 405-416.
[181]
W.W. Ng, G. Zeng, J. Zhang, D.S. Yeung, W. Pedrycz, Dual autoencoders features for imbalance classification problem, Pattern Recognition, 60 (2016) 875-889.
[182]
K.E. Niehaus, I.A. Clark, C. Bourne, C.E. Mackay, E.A. Holmes, S.M. Smith, MVPA to enhance the study of rare cognitive events: An investigation of experimental PTSD, in: Pattern Recognition in Neuroimaging, 2014 International Workshop on, IEEE, 2014.
[183]
S. Oh, M.S. Lee, B.-T. Zhang, Ensemble learning with active example selection for imbalanced biomedical data classification, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 8 (2011) 316-325.
[184]
S.-H. Oh, Error back-propagation algorithm for classification of imbalanced data, Neurocomputing, 74 (2011) 1058-1061.
[185]
D. Olszewski, A probabilistic approach to fraud detection in telecommunications, Knowledge-Based Systems, 26 (2012) 246-258.
[186]
P.-F. Pai, M.-F. Hsu, M.-C. Wang, A support vector machine-based model for detecting top management fraud, Knowledge-Based Systems, 24 (2011) 314-321.
[187]
J. Pan, Q. Fan, S. Pankanti, H. Trinh, P. Gabbur, S. Miyazawa, Soft margin keyframe comparison: Enhancing precision of fraud detection in retail surveillance, in: Applications of Computer Vision (WACV), 2011 IEEE Workshop on, IEEE, 2011.
[188]
S. Panigrahi, A. Kundu, S. Sural, A.K. Majumdar, Credit card fraud detection: A fusion approach using DempsterShafer theory and Bayesian learning, Information Fusion, 10 (2009) 354-363.
[189]
Y. Park, J. Ghosh, Ensembles of $({alpha}) $-Trees for Imbalanced Classification Problems, Knowledge and Data Engineering, IEEE Transactions on, 26 (2014) 131-143.
[190]
Peng, J. Yang, W. Li, D. Zhao, O. Zaiane, Ensemble-based hybrid probabilistic sampling for imbalanced data learning in lung nodule CAD, Computerized Medical Imaging and Graphics, 38 (2014) 137-150.
[191]
M.D. Prez-Godoy, A. Fernndez, A.J. Rivera, M.J. del Jesus, Analysis of an evolutionary RBFN design algorithm, CO 2 RBFN, for imbalanced data sets, Pattern Recognition Letters, 31 (2010) 2375-2388.
[192]
P. Phoungphol, Y. Zhang, Y. Zhao, Robust multiclass classification for learning from imbalanced biomedical data, Tsinghua Science and technology, 17 (2012) 619-628.
[193]
J.D. Prusa, T.M. Khoshgoftaar, N. Seliya, Enhancing Ensemble Learners with Data Sampling on High-Dimensional Imbalanced Tweet Sentiment Data, in: The Twenty-Ninth International Flairs Conference, 2016.
[194]
V. Raj, S. Magg, S. Wermter, Towards effective classification of imbalanced data with convolutional neural networks, in: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, Springer, 2016.
[195]
E. Ramentol, S. Vluymans, N. Verbiest, Y. Caballero, R. Bello, C. Cornelis, IFROWANN: imbalanced fuzzy-rough ordered weighted average nearest neighbor classification, Fuzzy Systems, IEEE Transactions on, 23 (2015) 1622-1637.
[196]
L.M. Raposo, M.B. Arruda, R.M. de Brindeiro, F.F. Nobre, Lopinavir Resistance Classification with Imbalanced Data Using Probabilistic Neural Networks, Journal of medical systems, 40 (2016) 1-7.
[197]
A. Razavian, H. Azizpour, J. Sullivan, S. Carlsson, CNN features off-the-shelf: an astounding baseline for recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014.
[198]
F. Ren, P. Cao, W. Li, D. Zhao, O. Zaiane, Ensemble based adaptive over-sampling method for imbalanced data learning in computer aided detection of microaneurysm, Computerized Medical Imaging and Graphics (2016).
[199]
Y. Ren, Y. Wang, X. Wu, G. Yu, C. Ding, Influential factors of red-light running at signalized intersection and prediction using a rare events logistic regression model, Accident Analysis & Prevention, 95 (2016) 266-273.
[200]
A.M. Richardson, B.A. Lidbury, Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data, BMC bioinformatics, 14 (2013) 1.
[201]
D. Rodriguez, I. Herraiz, R. Harrison, J. Dolado, J.C. Riquelme, Preliminary comparison of techniques for dealing with imbalance in software defect prediction, in: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, ACM, 2014.
[202]
Y. Saeys, I. Inza, P. Larraaga, A review of feature selection techniques in bioinformatics, bioinformatics, 23 (2007) 2507-2517.
[203]
J.A. Sez, J. Luengo, J. Stefanowski, F. Herrera, SMOTEIPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, Information Sciences, 291 (2015) 184-203.
[204]
Y. Sahin, S. Bulkan, E. Duman, A cost-sensitive decision tree approach for fraud detection, Expert Systems with Applications, 40 (2013) 5916-5923.
[205]
J.A. Sanz, D. Bernardo, F. Herrera, H. Bustince, H. Hagras, A compact evolutionary interval-valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data, Fuzzy Systems, IEEE Transactions on, 23 (2015) 973-990.
[206]
R.E. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated predictions, Machine learning, 37 (1999) 297-336.
[207]
C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse, A. Napolitano, RUSBoost: A hybrid approach to alleviating class imbalance, Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 40 (2010) 185-197.
[208]
Y.-H. Shao, W.-J. Chen, J.-J. Zhang, Z. Wang, N.-Y. Deng, An efficient weighted Lagrangian twin support vector machine for imbalanced data classification, Pattern Recognition, 47 (2014) 3158-3167.
[209]
J. Song, X. Huang, S. Qin, Q. Song, A bi-directional sampling based on K-means method for imbalance text classification, in: Computer and Information Science (ICIS), 2016 IEEE/ACIS 15th International Conference on, IEEE, 2016.
[210]
L. Song, D. Li, X. Zeng, Y. Wu, L. Guo, Q. Zou, nDNA-prot: identification of DNA-binding proteins based on unbalanced classification, BMC bioinformatics, 15 (2014) 1.
[211]
C.-T. Su, Y.-H. Hsiao, An evaluation of the robustness of MTS for imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 19 (2007) 1321.
[212]
S. Subudhi, S. Panigrahi, Quarter-Sphere Support Vector Machine for Fraud Detection in Mobile Telecommunication Networks, Procedia Computer Science, 48 (2015) 353-359.
[213]
M.A.H.F.N.S.R.A.J. Sultana, Enhancing the performance of decision tree: A research study of dealing with unbalanced data, in: Digital Information Management (ICDIM), 2012 Seventh International Conference on, Macau, 2012.
[214]
L. Sun, J. Mathew, K. Dhiraj, P. Saraju, Algorithms for rare event analysis in nano-CMOS circuits using statistical blockade, in: SoC Design Conference (ISOCC), 2010 International, IEEE, 2010.
[215]
Y. Sun, M.S. Kamel, Y. Wang, Boosting for learning multiple classes with imbalanced class distribution, in: Data Mining, 2006. ICDM'06. Sixth International Conference on, IEEE, 2006.
[216]
Y. Sun, M.S. Kamel, A.K. Wong, Y. Wang, Cost-sensitive boosting for classification of imbalanced data, Pattern Recognition, 40 (2007) 3358-3378.
[217]
Y. Sun, A.K. Wong, M.S. Kamel, Classification of imbalanced data: A review, International Journal of Pattern Recognition and Artificial Intelligence, 23 (2009) 687-719.
[218]
Z. Sun, Q. Song, X. Zhu, H. Sun, B. Xu, Y. Zhou, A novel ensemble method for classifying imbalanced data, Pattern Recognition, 48 (2015) 1623-1637.
[219]
M.A. Tahir, J. Kittler, K. Mikolajczyk, F. Yan, A multiple expert approach to the class imbalance problem using inverse random under sampling, Springer, 2009.
[220]
M. Tajik, S. Movasagh, M.A. Shoorehdeli, I. Yousefi, Gas turbine shaft unbalance fault detection by using vibration data and neural networks, in: Robotics and Mechatronics (ICROM), 2015 3rd RSI International Conference on, IEEE, 2015.
[221]
M. Tan, L. Tan, S. Dara, C. Mayeux, Online defect prediction for imbalanced data, in: Proceedings of the 37th International Conference on Software Engineering-, Volume 2, IEEE Press, 2015.
[222]
S.C. Tan, J. Watada, Z. Ibrahim, M. Khalid, Evolutionary fuzzy ARTMAP neural networks for classification of semiconductor defects, Neural Networks and Learning Systems, IEEE Transactions on, 26 (2015) 933-950.
[223]
M. Taneja, K. Garg, A. Purwar, S. Sharma, Prediction of click frauds in mobile advertising, in: Contemporary Computing (IC3), 2015 Eighth International Conference on, IEEE, 2015.
[224]
J. Tian, H. Gu, W. Liu, Imbalanced classification using support vector machine ensemble, Neural Computing and Applications, 20 (2011) 203-209.
[225]
I. Tomek, A generalization of the k-NN rule, Systems, Man and Cybernetics, IEEE Transactions on (1976) 121-126.
[226]
K.N. Topouzelis, Oil spill detection by SAR images: dark formation detection, feature extraction and classification algorithms, Sensors, 8 (2008) 6642-6659.
[227]
T.B. Trafalis, I. Adrianto, M.B. Richman, S. Lakshmivarahan, Machine-learning classifiers for imbalanced tornado data, Computational Management Science, 11 (2014) 403-418.
[228]
C.H. Tsai, L.C. Chang, H.C. Chiang, Forecasting of ozone episode days by cost-sensitive neural network methods, Science of the Total Environment, 407 (2009) 2124-2135.
[229]
S. Vajda, G.A. Fink, Strategies for training robust neural network based digit recognizers on unbalanced data sets, in: Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on, IEEE, 2010.
[230]
K.S. Vani, T. Sravani, Multiclass unbalanced protein data classification using sequence features, in: Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on, IEEE, 2014.
[231]
W. Verbeke, K. Dejaeger, D. Martens, J. Hur, B. Baesens, New insights into churn prediction in the telecommunication sector: A profit driven data mining approach, European Journal of Operational Research, 218 (2012) 211-229.
[232]
V. Vigneron, H. Chen, A multi-scale seriation algorithm for clustering sparse imbalanced data: application to spike sorting, Pattern Analysis and Applications (2015) 1-19.
[233]
S. Vluymans, D.S. Tarrag, Y. Saeys, C. Cornelis, F. Herrera, Fuzzy rough classifiers for class imbalanced multi-instance data, Pattern Recognition (2015).
[234]
N.H. Vo, Y. Won, Classification of unbalanced medical data with weighted regularized least squares, in: Frontiers in the Convergence of Bioscience and Information Technologies, 2007. FBIT 2007, IEEE, 2007.
[235]
T. Voigt, R. Fried, M. Backes, W. Rhode, Threshold optimization for classification in imbalanced data in a problem of gamma-ray astronomy, Advances in Data Analysis and Classification, 8 (2014) 195-216.
[236]
C.-M. Vong, W.-F. Ip, C.-C. Chiu, P.-K. Wong, Imbalanced Learning for Air Pollution by Meta-Cognitive Online Sequential Extreme Learning Machine, Cognitive Computation, 7 (2015) 381-391.
[237]
A.A. Vorobeva, Examining the performance of classification algorithms for imbalanced data sets in web author identification, in: Open Innovations Association and Seminar on Information Security and Protection of Information Technology (FRUCT-ISPIT), 2016 18th Conference of, FRUCT, 2016.
[238]
X. Wan, J. Liu, W.K. Cheung, T. Tong, Learning to improve medical decision making from imbalanced data without a priori cost, BMC medical informatics and decision making, 14 (2014) 1.
[239]
B.X. Wang, N. Japkowicz, Boosting support vector machines for imbalanced data sets, Knowledge and Information Systems, 25 (2010) 1-20.
[240]
J. Wang, P. Zhao, S.C. Hoi, Cost-sensitive online classification, IEEE Transactions on Knowledge and Data Engineering, 26 (2014) 2425-2438.
[241]
S. Wang, H. Chen, X. Yao, Negative correlation learning for classification ensembles, in: Neural Networks (IJCNN), The 2010 International Joint Conference on, IEEE, 2010.
[242]
S. Wang, L.L. Minku, X. Yao, A learning framework for online class imbalance learning, in: Computational Intelligence and Ensemble Learning (CIEL), 2013 IEEE Symposium on, IEEE, 2013.
[243]
S. Wang, L.L. Minku, X. Yao, A multi-objective ensemble method for online class imbalance learning, in: Neural Networks (IJCNN), 2014 International Joint Conference on, IEEE, 2014.
[244]
S. Wang, L.L. Minku, X. Yao, Resampling-based ensemble methods for online class imbalance learning, Knowledge and Data Engineering, IEEE Transactions on, 27 (2015) 1356-1368.
[245]
S. Wang, X. Yao, Diversity analysis on imbalanced data sets by using ensemble models, in: Computational Intelligence and Data Mining, 2009. CIDM'09. IEEE Symposium on, IEEE, 2009.
[246]
S. Wang, X. Yao, Multiclass imbalance problems: Analysis and potential solutions, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 42 (2012) 1119-1130.
[247]
S. Wang, X. Yao, Using class imbalance learning for software defect prediction, Reliability, IEEE Transactions on, 62 (2013) 434-443.
[248]
Y. Wang, X. Li, X. Ding, Probabilistic framework of visual anomaly detection for unbalanced data, Neurocomputing (2016).
[249]
Y. Wang, Y. Tian, L. Su, X. Fang, Z. Xia, T. Huang, Detecting Rare Actions and Events from Surveillance Big Data with Bag of Dynamic Trajectories, in: Multimedia Big Data (BigMM), 2015 IEEE International Conference on, IEEE, 2015.
[250]
Z. Wang, J. Xin, S. Tian, G. Yu, Distributed Weighted Extreme Learning Machine for Big Imbalanced Data Learning, in: Proceedings of ELM-2015, Volume 1, Springer, 2016, pp. 319-332.
[251]
M. Wasikowski, X.-w. Chen, Combating the small sample class imbalance problem using feature selection, Knowledge and Data Engineering, IEEE Transactions on, 22 (2010) 1388-1400.
[252]
M.-H. Wei, C.-H. Cheng, C.-S. Huang, P.-C. Chiang, Discovering medical quality of total hip arthroplasty by rough set classifier with imbalanced class, Quality & Quantity, 47 (2013) 1761-1779.
[253]
W. Wei, J. Li, L. Cao, Y. Ou, J. Chen, Effective detection of sophisticated online banking fraud on extremely imbalanced data, World Wide Web, 16 (2013) 449-475.
[254]
G.M. Weiss, Mining with rarity: a unifying framework, ACM SIGKDD Explorations Newsletter, 6 (2004) 7-19.
[255]
G.M. Weiss, H. Hirsh, Learning to predict extremely rare events, in: AAAI workshop on learning from imbalanced data sets, 2000.
[256]
H. Wen, S. Ge, S. Chen, H. Wang, L. Sun, Abnormal event detection via adaptive cascade dictionary learning, in: Image Processing (ICIP), 2015 IEEE International Conference on, IEEE, 2015.
[257]
S. Wilk, J. Stefanowski, S. Wojciechowski, K.J. Farion, W. Michalowski, Application of Preprocessing Methods to Imbalanced Clinical Data: An Experimental Study, Springer, 2016.
[258]
D. Wu, Z. Wang, Y. Chen, H. Zhao, Mixed-kernel based weighted extreme learning machine for inertial sensor based human activity recognition with imbalanced dataset, Neurocomputing, 190 (2016) 35-49.
[259]
X. Wu, S. Meng, E-commerce customer churn prediction based on improved SMOTE and AdaBoost, in: Service Systems and Service Management (ICSSSM), 2016 13th International Conference on, IEEE, 2016.
[260]
W. Xiao, J. Zhang, Y. Li, W. Yang, Imbalanced Extreme Learning Machine for Classification with Imbalanced Data Distributions, in: Proceedings of ELM-2015, Volume 2, Springer, 2016, pp. 503-514.
[261]
W. Xin, L. Yi-ping, J. Ting, G. Hui, L. Sheng, Z. Xiao-wei, A new classification method for LIDAR data based on unbalanced support vector machine, in: Image and Data Fusion (ISIDF), 2011 International Symposium on, IEEE, 2011.
[262]
W. Xiong, B. Li, L. He, M. Chen, J. Chen, Collaborative web service QoS prediction on unbalanced data distribution, in: Web Services (ICWS), 2014 IEEE International Conference on, IEEE, 2014.
[263]
J. Xu, S. Denman, C. Fookes, S. Sridharan, Detecting rare events using KullbackLeibler divergence: A weakly supervised approach, Expert Systems with Applications, 54 (2016) 13-28.
[264]
J. Xu, S. Denman, V. Reddy, C. Fookes, S. Sridharan, Real-time video event detection in crowded scenes using MPEG derived features: A multiple instance learning approach, Pattern Recognition Letters, 44 (2014) 113-125.
[265]
L. Xu, M.-Y. Chow, L.S. Taylor, Power distribution fault cause identification with imbalanced data using the data mining-based fuzzy classification e-algorithm, Power Systems, IEEE Transactions on, 22 (2007) 164-171.
[266]
L. Xu, M.-Y. Chow, J. Timmis, L.S. Taylor, Power distribution outage cause identification with imbalanced data using artificial immune recognition system (AIRS) algorithm, Power Systems, IEEE Transactions on, 22 (2007) 198-204.
[267]
Y. Xu, Z. Yang, Y. Zhang, X. Pan, L. Wang, A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification, Knowledge-Based Systems (2015).
[268]
Y. Qing, W. B., Z. Peilan, C. Xiang, Z. Meng, W. Yang, The prediction method of material consumption for electric power production based on PCBoost and SVM, in: 2015 8th International Congress on Image and Signal Processing (CISP), 2015, pp. 1256-1260.
[269]
J. Yang, J. Zhou, Z. Zhu, X. Ma, Z. Ji, Iterative ensemble feature selection for multiclass classification of imbalanced microarray data, Journal of Biological Research-Thessaloniki, 23 (2016) 13.
[270]
P. Yang, L. Xu, B.B. Zhou, Z. Zhang, A.Y. Zomaya, A particle swarm based hybrid system for imbalanced medical data sampling, BMC genomics, 10 (2009) 1.
[271]
X. Yang, D. Lo, Q. Huang, X. Xia, J. Sun, Automated Identification of High Impact Bug Reports Leveraging Imbalanced Learning Strategies, in: Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, IEEE, 2016.
[272]
C.-W. Yeh, D.-C. Li, L.-S. Lin, T.-I. Tsai, A Learning Approach with Under-and Over-Sampling for Imbalanced Data Sets, in: Advanced Applied Informatics (IIAI-AAI), 2016 5th IIAI International Congress on, IEEE, 2016.
[273]
W. Yi, The Cascade Decision-Tree Improvement Algorithm Based on Unbalanced Data Set, in: 2010 International Conference on Communications and Mobile Computing, IEEE, 2010.
[274]
H. Yu, C. Mu, C. Sun, W. Yang, X. Yang, X. Zuo, Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data, Knowledge-Based Systems, 76 (2015) 67-78.
[275]
H. Yu, J. Ni, Y. Dan, S. Xu, Mining and integrating reliable decision rules for imbalanced cancer gene expression data sets, Tsinghua Science and technology, 17 (2012) 666-673.
[276]
H. Yu, C. Sun, X. Yang, W. Yang, J. Shen, Y. Qi, ODOC-ELM: Optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data, Knowledge-Based Systems, 92 (2016) 55-70.
[277]
J. Yun, J. Ha, J.-S. Lee, Automatic Determination of Neighborhood Size in SMOTE, in: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication, ACM, 2016.
[278]
A. Zakaryazad, E. Duman, A profit-driven Artificial Neural Network (ANN) with applications to fraud detection and direct marketing, Neurocomputing, 175 (2016) 121-131.
[279]
J. Zhai, S. Zhang, C. Wang, The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers, International Journal of Machine Learning and Cybernetics (2015) 1-9.
[280]
B. Zhang, Y. Zhou, C. Faloutsos, Toward a comprehensive model in internet auction fraud detection, in: Hawaii International Conference on System Sciences, Proceedings of the 41st Annual, IEEE, 2008.
[281]
C. Zhang, W. Gao, J. Song, J. Jiang, An imbalanced data classification algorithm of improved autoencoder neural network, in: 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI), IEEE, 2016.
[282]
D. Zhang, J. Ma, J. Yi, X. Niu, X. Xu, An ensemble method for unbalanced sentiment classification, in: Natural Computation (ICNC), 2015 11th International Conference on, IEEE, 2015.
[283]
K. Zhang, A. Li, B. Song, Fraud Detection in Tax Declaration Using Ensemble ISGNN, in: Computer Science and Information Engineering, 2009 WRI World Congress on, IEEE, 2009.
[284]
N. Zhang, Cost-sensitive spectral clustering for photo-thermal infrared imaging data, in: International Conference on Information Science & Technology, 2016.
[285]
X. Zhang, B. Wang, X. Chen, Intelligent fault diagnosis of roller bearings with multivariable ensemble-based incremental support vector machine, Knowledge-Based Systems, 89 (2015) 56-85.
[286]
X. Zhang, Z. Yang, L. Shangguan, Y. Liu, L. Chen, Boosting mobile Apps under imbalanced sensing data, Mobile Computing, IEEE Transactions on, 14 (2015) 1151-1161.
[287]
Zhang, X., Y. Zhuang, H. Hu and W. Wang (2015d). "3-D Laser-Based Multiclass and Multiview Object Detection in Cluttered Indoor Scenes."
[288]
Y. Zhang, P. Fu, W. Liu, G. Chen, Imbalanced data classification based on scaling kernel-based support vector machine, Neural Computing and Applications, 25 (2014) 927-935.
[289]
Y. Zhang, D. Zhang, G. Mi, D. Ma, G. Li, Y. Guo, Using ensemble methods to deal with imbalanced data in predicting proteinprotein interactions, Computational Biology and Chemistry, 36 (2012) 36-41.
[290]
Z. Zhang, B. Krawczyk, S. Garca, A. Rosales-Prez, F. Herrera, Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data, Knowledge-Based Systems (2016).
[291]
X.M. Zhao, X. Li, L. Chen, K. Aihara, Protein classification with imbalanced data, Proteins: Structure, function, and bioinformatics, 70 (2008) 1125-1132.
[292]
Z. Zhao, P. Zhong, Y. Zhao, Learning SVM with weighted maximum margin criterion for classification of imbalanced data, Mathematical and Computer Modelling, 54 (2011) 1093-1099.
[293]
W. Zhong, B. Raahemi, J. Liu, Classifying peer-to-peer applications using imbalanced concept-adapting very fast decision tree on IP data stream, Peer-to-Peer Networking and Applications, 6 (2013) 233-246.
[294]
L. Zhou, Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods, Knowledge-Based Systems, 41 (2013) 16-25.
[295]
Machine Learning, in: Machine Learning, Tsinghua University press, 2016.
[296]
Z.-H. Zhou, X.-Y. Liu, Training cost-sensitive neural networks with methods addressing the class imbalance problem, Knowledge and Data Engineering, IEEE Transactions on, 18 (2006) 63-77.
[297]
X. Zhu, A.B. Goldberg, Introduction to semi-supervised learning, Synthesis lectures on artificial intelligence and machine learning, 3 (2009) 1-130.
[298]
M. Ziba, J.M. Tomczak, Boosted SVM with active learning strategy for imbalanced data, Soft Computing, 19 (2015) 3357-3368.
[299]
M. Ziba, J.M. Tomczak, M. Lubicz, J. witek, Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients, Applied Soft Computing, 14 (2014) 99-108.
[300]
Q. Zou, S. Xie, Z. Lin, M. Wu, Y. Ju, Finding the Best Classification Threshold in Imbalanced Classification, Big Data Research (2016).

Cited By

View all
  • (2024)Deep convolutional neural networks with Bee Collecting Pollen Algorithm (BCPA)-based landslide data balancing and spatial predictionJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23492446:1(597-617)Online publication date: 1-Jan-2024
  • (2024)Predictive analysis for road accidents using a tree-based and deep learning fusion systemJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23207846:1(2381-2397)Online publication date: 1-Jan-2024
  • (2024)Imbalance-learning road crash assessment under reduced visibility settingsJournal of Ambient Intelligence and Smart Environments10.3233/AIS-23012716:2(215-240)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal  Volume 73, Issue C
May 2017
98 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 May 2017

Author Tags

  1. Data mining
  2. Imbalanced data
  3. Machine learning
  4. Rare events

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Deep convolutional neural networks with Bee Collecting Pollen Algorithm (BCPA)-based landslide data balancing and spatial predictionJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23492446:1(597-617)Online publication date: 1-Jan-2024
  • (2024)Predictive analysis for road accidents using a tree-based and deep learning fusion systemJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23207846:1(2381-2397)Online publication date: 1-Jan-2024
  • (2024)Imbalance-learning road crash assessment under reduced visibility settingsJournal of Ambient Intelligence and Smart Environments10.3233/AIS-23012716:2(215-240)Online publication date: 1-Jan-2024
  • (2024)Review on Improved Machine Learning Techniques for Predicting Chronic DiseasesOptical Memory and Neural Networks10.3103/S1060992X2401002833:1(28-46)Online publication date: 1-Mar-2024
  • (2024)A multi-strategy ontology mapping method based on cost-sensitive SVMJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00708-713:1Online publication date: 27-Sep-2024
  • (2024)Multi-Class Imbalanced Data Handling with Concept Drift in Fog Computing: A Taxonomy, Review, and Future DirectionsACM Computing Surveys10.1145/368962757:1(1-48)Online publication date: 7-Oct-2024
  • (2024)Addressing Data Imbalance via Image Augmentation for Automated Quality Inspection in Steel ProductionProceedings of the 2024 10th International Conference on Computer Technology Applications10.1145/3674558.3674583(174-181)Online publication date: 15-May-2024
  • (2024)The EarSAVAS DatasetProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596168:2(1-26)Online publication date: 15-May-2024
  • (2024)Multi-Class Imbalance Classification Based on Data Distribution and Adaptive WeightsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338496136:10(5265-5279)Online publication date: 1-Oct-2024
  • (2024)Improved Contraction-Expansion Subspace Ensemble for High-Dimensional Imbalanced Data ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338427436:10(5194-5205)Online publication date: 1-Oct-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media