Abstract
Semi-supervised learning has attracted much attention in machine learning field over the past decades and a number of algorithms are proposed to improve the performance by exploiting unlabeled data. However, unlabeled data may hurt performance of semi-supervised learning in some cases. It is instinctively expected to design a reasonable strategy to safety exploit unlabeled data. To address the problem, we introduce a safe semi-supervised learning by analyzing the different characteristics of unlabeled data in supervised and semi-supervised learning. Our intuition is that unlabeled data may be often risky in semi-supervised setting and the risk degree are different. Hence, we assign different risk degree to unlabeled data and the risk degree serve as a sieve to determine the exploiting way of unlabeled data. The unlabeled data with high risk should be exploited by supervised learning and the other should be used for semi-supervised learning. In particular, we utilize kernel minimum squared error (KMSE) and Laplacian regularized KMSE for supervised and semi-supervised learning, respectively. Experimental results on several benchmark datasets illustrate the performance of our algorithm is never inferior to that of KMSE and indicate the effectiveness and efficiency of our algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on computational learning theory. ACM, New York, NY, USA, pp 92–100
Cao Y, He (Helen) H, Huang H (2011) Lift: a new framework of learning from testing data for face recognition. Neurocomputing 74(6):916–929
Chapelle O, Scholkopf B, Zien A. http://olivier.chapelle.cc/ssl-book/benchmarks.html. Accessed 28 July 2006
Chapelle O, Scholkopf B, Zien A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge
Chen H, Li L, Peng J (2009) Error bounds of multi-graph regularized semi-supervised classification. Inf Sci 179(12):1960–1969
Chen S, Li S, Su S, Cao D, Ji R (2014) Online semi-supervised compressive coding for robust visual tracking. J Vis Commun Image Rep 25(5):793–804
Gan H, Sang N, Chen X (2013) Semi-supervised kernel minimum squared error based on manifold structure. In: Proceedings of the 10th international symposium on neural networks, vol 7951. Springer-Verlag, Berlin, Heidelberg, pp 265–272
Gan H, Sang N, Huang R (2014) Self-training-based face recognition using semi-supervised linear discriminant analysis and affinity propagation. J Opt Soc Am A 31(1):1–6
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298
Grabner Helmut LC, Horst B (2008) Semi-supervised on-line boosting for robust tracking. In: Proceedings of the 10th European conference on computer vision: part I. Springer-Verlag, Berlin, Heidelberg, pp 234–247
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the 16th international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 200–209
Li Y, Zhou Z (2011) Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI conference on artificial intelligence. AAAI Press, pp 500–505
Li Y, Zhou Z (2011) Towards making unlabeled data never hurt. In: Proceedings of the 28th international conference on machine learning. Omnipress, pp 1081–1088
Liu X, Pan S, Hao Z, Lin Z (2014) Graph-based semi-supervised learning by mixed label propagation with a soft constraint. Inf Sci 277:327–337
Ni T, Chung FL, Wang S (2015) Support vector machine with manifold regularization and partially labeling privacy protection. Inf Sci 294:390–407
Qi Z, Xu Y, Wang L, Song Y (2011) Online multiple instance boosting for object detection. Neurocomputing 74(10):1769–1775
Van Vaerenbergh S, Santamaria I, Barbano P (2011) Semi-supervised handwritten digit recognition using very few labeled data. In: Proceedings of the 2011 IEEE international conference on acoustics, speech and signal processing, pp 2136–2139
Varadarajan B, Yu D, Deng L, Acero A (2009) Using collective information in semi-supervised learning for speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, pp 4633–4636. IEEE
Wang XZ, Dong CR (2009) Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy. IEEE Trans Fuzzy Syst 17(3):556–567
Wang XZ, Dong LC, Yan JH (2012) Maximum ambiguity-based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505
Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2014) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst. doi:10.1109/TFUZZ.2014.2371479
Wang Y, Chen S (2013) Safety-aware semi-supervised classification. IEEE Trans Neural Netw Learn Syst 24(11):1763–1772
Xu J, Zhang X, Li Y (2001) Kernel mse algorithm: a unified framework for KFD, LS-SVM and KRR. In: Proceedings of international joint conference on neural networks, pp 1486–1491
Yang T, Priebe CE (2011) The effect of model misspecification on semi-supervised classification. IEEE Trans Pattern Anal Mach Intell 33(10):2093–2103
Zhu X (2005) Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison
Acknowledgments
This work is supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY14F030023, and Natural Science Foundation of China under Grant No. 61172134, 61201302 and 61372023.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gan, H., Luo, Z., Meng, M. et al. A risk degree-based safe semi-supervised learning algorithm. Int. J. Mach. Learn. & Cyber. 7, 85–94 (2016). https://doi.org/10.1007/s13042-015-0416-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-015-0416-8