Abstract
In this paper, we propose a novel training paradigm that combines two learning strategies: cost-sensitive and self-paced learning. This learning approach can be applied to the decision problems where highly imbalanced data is used during training process. The main idea behind the proposed method is to start the learning process by taking large number of minority examples and only the easiest majority objects and then gradually turning to more difficult cases. We examine the quality of this training paradigm comparing to other learning schemas for neural network model using a set of highly imbalanced benchmark datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We consider two-class imbalanced data problem in which the minority class is assumed to be positive and the majority class is associated with the negative class.
- 2.
Absolute value can be omitted since \(\mathbf {v} \in \{0,1\}^n\).
- 3.
We can always utilize about 10–20% data for validation.
- 4.
AUC is defined as the arithmetic mean of True Positive Rate (TPR, called Sensitivity)and the True Negative Rate (TNR, called Specificity), \(AUC = \frac{TPR+TNR}{2}\). TP,TN,FP,FN are the elements of the confusion matrix, \(TPR = \frac{TP}{TP+FN}\) and \(TNR = \frac{TN}{TN+FP}\). We can represent the AUC value in such form if we consider classes, not probabilities while testing. In such case the ROC curve is represented by one point located in position (TPR,FPR). The area under ROC curve can be calculated using the procedure \(AUC=\frac{1+TPR-FPR}{2}\). Making use of \(TNR=1-FPR\) we have \(AUC=\frac{1}{2}(TPR+TNR)\). In our opinion this method of calculating AUC is better for imbalanced data problems, because it evaluates true predictions instead of the ordering of data that is used for evaluation.
References
Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17(2–3), 255–287 (2010)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML, pp. 41–48 (2009)
Gorski, J., Pfeuffer, F., Klamroth, K.: Biconvex sets and optimization with biconvex functions: a survey and extensions. Math. Methods Oper. Res. 66(3), 373–407 (2007)
Jiang, L., Meng, D., Yu, S.I., Lan, Z., Shan, S., Hauptmann, A.: Self-paced learning with diversity. In: Advances in Neural Information Processing Systems, pp. 2078–2086 (2014)
Jiang, L., Meng, D., Zhao, Q., Shan, S., Hauptmann, A.G.: Self-paced curriculum learning. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Krueger, K.A., Dayan, P.: Flexible shaping: How learning in small steps helps. Cognition 110(3), 380–394 (2009)
Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NIPS, pp. 1189–1197 (2010)
Tomczak, J.M., Zięba, M.: Classification restricted boltzmann machine for comprehensible credit scoring model. Expert Syst. Appl. 42(4), 1789–1796 (2015)
Tomczak, J.M., Zięba, M.: Probabilistic combination of classification rules and its application to medical diagnosis. Mach. Learn. 101(1–3), 105–135 (2015)
Zhao, Q., Meng, D., Jiang, L., Xie, Q., Xu, Z., Hauptmann, A.G.: Self-paced learning for matrix factorization. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Acknowledgments
The research conducted by the authors has been partially co-financed by the Ministry of Science and Higher Education, Republic of Poland, namely, Maciej Zięba: grant No. B50083/W8/K3, Jakub M. Tomczak: grant No. B50106/W8/K3.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zięba, M., Tomczak, J.M., Świątek, J. (2016). Self-paced Learning for Imbalanced Data. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9621. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49381-6_54
Download citation
DOI: https://doi.org/10.1007/978-3-662-49381-6_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49380-9
Online ISBN: 978-3-662-49381-6
eBook Packages: Computer ScienceComputer Science (R0)