Abstract
In many real-world tasks, there are abundant unlabeled examples but the number of labeled training examples is limited, because labeling the examples requires human efforts and expertise. So, semi-supervised learning which tries to exploit unlabeled examples to improve learning performance has become a hot topic. Disagreement-based semi-supervised learning is an interesting paradigm, where multiple learners are trained for the task and the disagreements among the learners are exploited during the semi-supervised learning process. This survey article provides an introduction to research advances in this paradigm.
Similar content being viewed by others
References
Abe N, Mamitsuka H (1998) Query learning strategies using boosting and bagging. In: Proceedings of the 15th international conference on machine learning. Madison, WI, pp 1–9
Abney S (2002) Bootstrapping. In: Proceedings of the 40th annual meeting of the association for computational linguistics. Philadelphia, PA, pp 360–367
Altun Y, Tsochantaridis I, Hofmann T (2003) Hidden markov support vector machines. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 3–10
Amini MR, Gallinari P: Semi-supervised learning with an imperfect supervisor. Knowl Inf Syst 8(4), 385–413 (2005)
Angluin D, Laird P: Learning from noisy examples. Mach Learn 2(4), 343–370 (1988)
Balcan M-F, Blum A, Yang K (2005) Co-training and expansion: towards bridging theory and practice. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, pp 89–96
Belkin M, Niyogi P: Semi-supervised learning on Riemannian manifolds. Mach Learn 56(1–3), 209–239 (2004)
Belkin M, Niyogi P, Sindhwani V (2005) On manifold regularization. In: Proceedings of the 10th international workshop on artificial intelligence and statistics. Savannah, Barbados, pp 17–24
Belkin M, Niyogi P, Sindhwani V: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7, 2399–2434 (2006)
Bickel S, Scheffer T (2005) Estimation of mixture models using co-EM. In: Proceedings of the 16th European conference on machine learning. Porto, Portugal, pp 35–46
Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th international conference on machine learning. Williamston, MA, pp 19–26
Blum A, Lafferty J, Rwebangira M, Reddy R (2004) Semi-supervised learning using randomized mincuts. In: Proceedings of the 21st international conference on machine learning. Banff, Canada, pp 13–20
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on computational learning theory. Madison, WI, pp 92–100
Brefeld U, Büscher C, Scheffer T (2005) Multi-view hidden markov perceptrons. In: Proceedings of the GI workshops. Saarbrücken, Germany, pp 134–138
Brefeld U, Scheffer T (2004) Co-EM support vector learning. In: Proceedings of the 21st international conference on machine learning. Banff, Canada
Brefeld U, Scheffer T (2006) Semi-supervised learning for structured output variables. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 145–152
Breiman L: Bagging predictors. Mach Learn 24(2), 123–140 (1996)
Breiman L: Random forests. Mach Learn 45(1), 5–32 (2001)
Carreira-Perpinan MA, Zemel RS (2005) Proximity graphs for clustering and manifold learning. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge
Chapelle O, Chi M, Zien A (2006) A continuation method for semi-supervised SVMs. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 185–192
Chapelle, O, Schölkopf, B, Zien, A (eds): Semi-supervised learning. MIT Press, Cambridge (2006)
Chapelle O, Zien A (2005) Semi-supervised learning by low density separation. In: Proceedings of the 10th international workshop on artificial intelligence and statistics. Savannah Hotel, Barbados, pp 57–64
Cohen I, Cozman FG, Sebe N, Cirelo MC, Huang TS: Semisupervised learning of classifiers: theory, algorithm, and their application to human-computer interaction. IEEE Trans Pattern Anal Mach Intell 26(12), 1553–1567 (2004)
Collins M, Singer Y (1999) Unsupervised models for named entity classifications. In: Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora. College Park, MD, pp 100–110
Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 201–208
Cozman FG, Cohen I (2002) Unlabeled data can degrade classification performance of generative classifiers. In: Proceedings of the 15th international conference of the Florida Artificial Intelligence Research Society. Pensacola, FL, pp 327–331
Dasarathy BV: Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)
Dasgupta S, Littman M, McAllester D (2002) PAC generalization bounds for co-training. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 375–382
Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B 39(1), 1–38 (1977)
Dong A, Bhanu B (2003) A new semi-supervised EM algorithm for image retrieval. In: Proceedings of the IEEE international conference on computer vision and pattern recognition. Madison, WI, pp 662–667
Efron B, Tibshirani R: An introduction to the bootstrap. Chapman & Hall, New York (1993)
Farquhar JDR, Hardoon D, Meng H, Shawe-Taylor J, Szedmak S (2006) Two view learning: SVM-2K, theory and practice. In: Weiss Y, Schölkopf B, Platt J (eds). Advances in neural information processing systems 18. MIT Press, Cambridge MA, pp. 355–362
Fujino A, Ueda N, Saito K (2005) A hybrid generative/discriminative approach to semi-supervised classifier design. In: Proceedings of the 20th national conference on artificial intelligence. Pittsburgh, PA, pp 764–769
Garcke J, Griebel M (2005) Semi-supervised learning with sparse grids. In: Working Notes of the ICML’05 Workshop on learning with partially classified training data. Bonn, Germany
Goldberg AB, Li M, Zhu X (2008) Online manifold regularization: a new learning setting and empirical study. In: Proceedings of the 19th European conference on machine learning. Antwerp, Belgium, pp 393–407
Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th international conference on machine learning. San Francisco, CA, pp 327–334
Grandvalet Y, Bengio Y (2005) Semi-supervised learning by entropy minimization. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, MA, pp 529–536
Hardoon DR, Szedmak S, Shawe-Taylor J: Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12), 2639–2664 (2004)
Hein M, Maier M (2007) Manifold denoising. In: Schölkopf B, Platt JC, Hoffman T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 561–568
Hosmer W: A comparison of iterative maximum likelihood estimates of the parameters of a mixture of two normal distributions under three different types of sample. Biometrics 29(4), 761–770 (1973)
Hwa R, Osborne M, Sarkar A, Steedman M (2003) Corrected co-training for statistical parsers. In: Working notes of the ICML’03 Workshop on the continuum from labeled to unlabeled data in machine learning and data mining. Washington, DC
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the 16th international conference on machine learning. Bled, Slovenia, pp 200–209
Kockelkorn M, Lüneburg A, Scheffer T (2003) Using transduction and multi-view learning to answer emails. In: Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases. Cavtat-Dubrovnik, Croatia, pp 266–277
Lawrence ND, Jordan MI (2005) Semi-supervised learning via Gaussian processes. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, MA, pp 753–760
Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Dublin, Ireland, pp 3–12
Li M, Li H, Zhou Z-H: Semi-supervised document retrieval. Inf. Process. Manage. 45(3), 341–355 (2009)
Li M, Zhou Z-H (2005) SETRED: Self-training with editing. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining. Hanoi, Vietnam, pp 611–621
Li M, Zhou Z-H: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern Part A Syst Humans 37(6), 1088–1098 (2007)
Li T, Ogihara M: Semisupervised learning from different information sources. Knowl Inf Syst 7(3), 289–309 (2005)
Lippmann RP: Pattern classification using neural networks. IEEE Commun 27(11), 47–64 (1989)
Mavroeidis D, Chaidos K, Pirillos S, Christopoulos D, Vazirgiannis M (2006) Using tri-training and support vector machines for addressing the ECML-PKDD 2006 discovery challenge. In: Proceedings of ECML-PKDD 2006 discovery challenge workshop. Berlin, Germany, pp 39–47
McLachlan J: Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J Am Stat Assoc 70(350), 365–369 (1977)
McLachlan J, Ganesalingam S: Updating a discriminant function on the basis of unclassified data. Commun Stat Simul Comput 11(6), 753–767 (1982)
Miller DJ, Uyar HS (1997) A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Mozer M, Jordan MI, Petsche T (eds) Advances in neural information processing systems 9. MIT Press, Cambridge, MA, pp 571–577
Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th ACM international conference on information and knowledge management. Washington, DC, pp 86–93
Nigam K, McCallum AK, Thrun S, Mitchell T: Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2-3), 103–134 (2000)
O’Neill T: Normal discrimination with unclassified observations. J Am Stat Assoc 73(364), 821–826 (1978)
Pierce D, Cardie C (2001) Limitations of co-training for natural language learning from large data sets. In: Proceedings of the 2001 conference on empirical methods in natural language processing. Pittsburgh, PA, pp 1–9
Riloff E, Jones R (1999) Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the 16th national conference on artificial intelligence. Orlando, FL, pp 474–479
Roweis ST, Saul LK: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Rui Y, Huang TS, Ortega M, Mehrotra S: Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans Circuits Syst Video Technol 8(5), 644–655 (1998)
Sarkar A (2001) Applying co-training methods to statistical parsing. In: Proceedings of the 2nd annual meeting of the North American chapter of the association for computational linguistics. Pittsburgh, PA, pp 95–102
Seung H, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the 5th ACM workshop on computational learning theory. Pittsburgh, PA, pp 287–294
Shahshahani B, Landgrebe D: The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Trans Geosci Remote Sens 32(5), 1087–1095 (1994)
Sindhwani V, Keerthi SS (2006) Large scale semi-supervised linear SVMs. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. Seattle, WA, pp 477–484
Sindhwani V, Keerthi SS, Chapelle O (2006) Deterministic annealing for semi-supervised kernel machines. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 123–130
Sindhwani V, Niyogi P, Belkin M (2005) Beyond the point cloud: From transductive to semi-supervised learning. In: Proceedings of the 22nd international conference on machine learning. Bonn, Germany, pp 824–831
Steedman M, Osborne M, Sarkar A, Clark S, Hwa R, Hockenmaier J, Ruhlen P, Baker S, Crim J (2003) Bootstrapping statistical parsers from small data sets. In: Proceedings of the 11th conference on the European chapter of the association for computational linguistics. Budapest, Hungary, pp 331–338
Vapnik VN: Statistical Learning Theory. Wiley, New York (1998)
Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 985–992
Wang W, Zhou Z-H (2007) Analyzing co-training style algorithms. In: Proceedings of the 18th European conference on machine learning. Warsaw, Poland, pp 454–465
Wang W, Zhou Z-H (2008) On multi-view active learning and the combination with semi-supervised learning. In: Proceedings of the 25th international conference on machine learning. Helsinki, Finland, pp 1152–1159
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D: Top 10 algorithms in data mining. Knowledge and Information Systems 14(1), 1–37 (2008)
Wu Y, Tian Q, Huang TS (2000) Discriminant-EM algorithm with application to image retrieval. In: Proceedings of the IEEE international conference on computer vision and pattern recognition. Hilton Head, SC, pp 222–227
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics. Cambridge, MA, pp 189–196
Yu K, Yu S, Tresp V (2005) Blockwise supervised inference on large graphs. In: Working notes of the ICML’05 workshop on learning with partially classified training data. Bonn, Germany
Yuille AL, Rangarajan A (2002) The concave-convex procedure (CCCP). In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 1033–1040
Zhang T, Oles FJ (2000) A probability analysis on the value of unlabeled data for classification problems. In: Proceedings of 17th international conference on machine learning. Stanford, CA, pp 1191–1198
Zhang X, Lee WS (2007) Hyperparameter learning for graph based semi-supervised learning algorithms. In: Schölkopf B, Platt J, Hofmann T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 1585–1592
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, Cambridge
Zhou Y, Goldman S (2004) Democratic co-learning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence. Boca Raton, FL, pp 594–602
Zhou Z-H (2006) Learning with unlabeled data and its application to image retrieval. In: Proceedings of the 9th Pacific rim international conference on artificial intelligence. Guilin, China, pp 5–10
Zhou Z-H (2008) Semi-supervised learning by disagreement. In: Proceedings of the 4th IEEE international conference on granular computing. Hangzhou, China
Zhou Z-H: Ensemble learning. In: Li, SZ (eds) Encyclopedia of biometrics, Springer, Berlin (2009)
Zhou Z-H, Chen K-J, Dai H-B: Enhancing relevance feedback in image retrieval using unlabeled data. ACM Trans Inf Syst 24(2), 219–244 (2006)
Zhou Z-H, Chen K-J, Jiang Y (2004) Exploiting unlabeled data in content-based image retrieval. In: Proceedings of the 15th European conference on machine learning. Pisa, Italy, pp 525–536
Zhou Z-H, Li M (2005) Semi-supervised regression with co-training. In: Proceedings of the 19th international joint conference on artificial intelligence. Edinburgh, Scotland, pp 908–913
Zhou Z-H, Li M: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11), 1529–1541 (2005)
Zhou Z-H, Li M: Semi-supervised regression with co-training style algorithms. IEEE Trans Knowl Data Eng 19(11), 1479–1493 (2007)
Zhou Z-H, Ng M, She Q-Q, Jiang Y (2009) Budget semi-supervised learning. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining. Bangkok, Thailand, pp 588–595
Zhou Z-H, Zhan D-C, Yang Q (2007) Semi-supervised learning with very few labeled training examples. In: Proceedings of the 22nd AAAI conference on artificial intelligence. Vancouver, Canada, pp 675–680
Zhu X (2006) Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin at Madison, Madison, WI, http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 912–919
Zhu X, Lafferty J (2005) Harmonic mixtures: Combining mixture models and graph-based methods for inductive and scalable semi-supervised leanring. In: Proceedings of the 22nd international conference on machine learning. Bonn, Germany, pp 1052–1059
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, ZH., Li, M. Semi-supervised learning by disagreement. Knowl Inf Syst 24, 415–439 (2010). https://doi.org/10.1007/s10115-009-0209-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0209-z