Semi-supervised learning by disagreement

Zhi-Hua Zhou¹ &
Ming Li¹

2532 Accesses
293 Citations
Explore all metrics

Abstract

In many real-world tasks, there are abundant unlabeled examples but the number of labeled training examples is limited, because labeling the examples requires human efforts and expertise. So, semi-supervised learning which tries to exploit unlabeled examples to improve learning performance has become a hot topic. Disagreement-based semi-supervised learning is an interesting paradigm, where multiple learners are trained for the task and the disagreements among the learners are exploited during the semi-supervised learning process. This survey article provides an introduction to research advances in this paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

References

Abe N, Mamitsuka H (1998) Query learning strategies using boosting and bagging. In: Proceedings of the 15th international conference on machine learning. Madison, WI, pp 1–9
Abney S (2002) Bootstrapping. In: Proceedings of the 40th annual meeting of the association for computational linguistics. Philadelphia, PA, pp 360–367
Altun Y, Tsochantaridis I, Hofmann T (2003) Hidden markov support vector machines. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 3–10
Amini MR, Gallinari P: Semi-supervised learning with an imperfect supervisor. Knowl Inf Syst 8(4), 385–413 (2005)
Article Google Scholar
Angluin D, Laird P: Learning from noisy examples. Mach Learn 2(4), 343–370 (1988)
Google Scholar
Balcan M-F, Blum A, Yang K (2005) Co-training and expansion: towards bridging theory and practice. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, pp 89–96
Google Scholar
Belkin M, Niyogi P: Semi-supervised learning on Riemannian manifolds. Mach Learn 56(1–3), 209–239 (2004)
Article MATH Google Scholar
Belkin M, Niyogi P, Sindhwani V (2005) On manifold regularization. In: Proceedings of the 10th international workshop on artificial intelligence and statistics. Savannah, Barbados, pp 17–24
Belkin M, Niyogi P, Sindhwani V: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7, 2399–2434 (2006)
MathSciNet Google Scholar
Bickel S, Scheffer T (2005) Estimation of mixture models using co-EM. In: Proceedings of the 16th European conference on machine learning. Porto, Portugal, pp 35–46
Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th international conference on machine learning. Williamston, MA, pp 19–26
Blum A, Lafferty J, Rwebangira M, Reddy R (2004) Semi-supervised learning using randomized mincuts. In: Proceedings of the 21st international conference on machine learning. Banff, Canada, pp 13–20
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on computational learning theory. Madison, WI, pp 92–100
Brefeld U, Büscher C, Scheffer T (2005) Multi-view hidden markov perceptrons. In: Proceedings of the GI workshops. Saarbrücken, Germany, pp 134–138
Brefeld U, Scheffer T (2004) Co-EM support vector learning. In: Proceedings of the 21st international conference on machine learning. Banff, Canada
Brefeld U, Scheffer T (2006) Semi-supervised learning for structured output variables. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 145–152
Breiman L: Bagging predictors. Mach Learn 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Breiman L: Random forests. Mach Learn 45(1), 5–32 (2001)
Article MATH Google Scholar
Carreira-Perpinan MA, Zemel RS (2005) Proximity graphs for clustering and manifold learning. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge
Google Scholar
Chapelle O, Chi M, Zien A (2006) A continuation method for semi-supervised SVMs. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 185–192
Chapelle, O, Schölkopf, B, Zien, A (eds): Semi-supervised learning. MIT Press, Cambridge (2006)
Google Scholar
Chapelle O, Zien A (2005) Semi-supervised learning by low density separation. In: Proceedings of the 10th international workshop on artificial intelligence and statistics. Savannah Hotel, Barbados, pp 57–64
Cohen I, Cozman FG, Sebe N, Cirelo MC, Huang TS: Semisupervised learning of classifiers: theory, algorithm, and their application to human-computer interaction. IEEE Trans Pattern Anal Mach Intell 26(12), 1553–1567 (2004)
Article Google Scholar
Collins M, Singer Y (1999) Unsupervised models for named entity classifications. In: Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora. College Park, MD, pp 100–110
Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 201–208
Cozman FG, Cohen I (2002) Unlabeled data can degrade classification performance of generative classifiers. In: Proceedings of the 15th international conference of the Florida Artificial Intelligence Research Society. Pensacola, FL, pp 327–331
Dasarathy BV: Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)
Google Scholar
Dasgupta S, Littman M, McAllester D (2002) PAC generalization bounds for co-training. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 375–382
Google Scholar
Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Dong A, Bhanu B (2003) A new semi-supervised EM algorithm for image retrieval. In: Proceedings of the IEEE international conference on computer vision and pattern recognition. Madison, WI, pp 662–667
Efron B, Tibshirani R: An introduction to the bootstrap. Chapman & Hall, New York (1993)
MATH Google Scholar
Farquhar JDR, Hardoon D, Meng H, Shawe-Taylor J, Szedmak S (2006) Two view learning: SVM-2K, theory and practice. In: Weiss Y, Schölkopf B, Platt J (eds). Advances in neural information processing systems 18. MIT Press, Cambridge MA, pp. 355–362
Google Scholar
Fujino A, Ueda N, Saito K (2005) A hybrid generative/discriminative approach to semi-supervised classifier design. In: Proceedings of the 20th national conference on artificial intelligence. Pittsburgh, PA, pp 764–769
Garcke J, Griebel M (2005) Semi-supervised learning with sparse grids. In: Working Notes of the ICML’05 Workshop on learning with partially classified training data. Bonn, Germany
Goldberg AB, Li M, Zhu X (2008) Online manifold regularization: a new learning setting and empirical study. In: Proceedings of the 19th European conference on machine learning. Antwerp, Belgium, pp 393–407
Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th international conference on machine learning. San Francisco, CA, pp 327–334
Grandvalet Y, Bengio Y (2005) Semi-supervised learning by entropy minimization. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, MA, pp 529–536
Google Scholar
Hardoon DR, Szedmak S, Shawe-Taylor J: Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12), 2639–2664 (2004)
Article MATH Google Scholar
Hein M, Maier M (2007) Manifold denoising. In: Schölkopf B, Platt JC, Hoffman T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 561–568
Google Scholar
Hosmer W: A comparison of iterative maximum likelihood estimates of the parameters of a mixture of two normal distributions under three different types of sample. Biometrics 29(4), 761–770 (1973)
Article Google Scholar
Hwa R, Osborne M, Sarkar A, Steedman M (2003) Corrected co-training for statistical parsers. In: Working notes of the ICML’03 Workshop on the continuum from labeled to unlabeled data in machine learning and data mining. Washington, DC
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the 16th international conference on machine learning. Bled, Slovenia, pp 200–209
Kockelkorn M, Lüneburg A, Scheffer T (2003) Using transduction and multi-view learning to answer emails. In: Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases. Cavtat-Dubrovnik, Croatia, pp 266–277
Lawrence ND, Jordan MI (2005) Semi-supervised learning via Gaussian processes. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, Cambridge, MA, pp 753–760
Google Scholar
Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Dublin, Ireland, pp 3–12
Li M, Li H, Zhou Z-H: Semi-supervised document retrieval. Inf. Process. Manage. 45(3), 341–355 (2009)
Article Google Scholar
Li M, Zhou Z-H (2005) SETRED: Self-training with editing. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining. Hanoi, Vietnam, pp 611–621
Li M, Zhou Z-H: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern Part A Syst Humans 37(6), 1088–1098 (2007)
Article Google Scholar
Li T, Ogihara M: Semisupervised learning from different information sources. Knowl Inf Syst 7(3), 289–309 (2005)
Article Google Scholar
Lippmann RP: Pattern classification using neural networks. IEEE Commun 27(11), 47–64 (1989)
Article Google Scholar
Mavroeidis D, Chaidos K, Pirillos S, Christopoulos D, Vazirgiannis M (2006) Using tri-training and support vector machines for addressing the ECML-PKDD 2006 discovery challenge. In: Proceedings of ECML-PKDD 2006 discovery challenge workshop. Berlin, Germany, pp 39–47
McLachlan J: Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J Am Stat Assoc 70(350), 365–369 (1977)
Article MathSciNet Google Scholar
McLachlan J, Ganesalingam S: Updating a discriminant function on the basis of unclassified data. Commun Stat Simul Comput 11(6), 753–767 (1982)
Article MATH Google Scholar
Miller DJ, Uyar HS (1997) A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Mozer M, Jordan MI, Petsche T (eds) Advances in neural information processing systems 9. MIT Press, Cambridge, MA, pp 571–577
Google Scholar
Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th ACM international conference on information and knowledge management. Washington, DC, pp 86–93
Nigam K, McCallum AK, Thrun S, Mitchell T: Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2-3), 103–134 (2000)
Article MATH Google Scholar
O’Neill T: Normal discrimination with unclassified observations. J Am Stat Assoc 73(364), 821–826 (1978)
Article MATH MathSciNet Google Scholar
Pierce D, Cardie C (2001) Limitations of co-training for natural language learning from large data sets. In: Proceedings of the 2001 conference on empirical methods in natural language processing. Pittsburgh, PA, pp 1–9
Riloff E, Jones R (1999) Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the 16th national conference on artificial intelligence. Orlando, FL, pp 474–479
Roweis ST, Saul LK: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Rui Y, Huang TS, Ortega M, Mehrotra S: Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans Circuits Syst Video Technol 8(5), 644–655 (1998)
Article Google Scholar
Sarkar A (2001) Applying co-training methods to statistical parsing. In: Proceedings of the 2nd annual meeting of the North American chapter of the association for computational linguistics. Pittsburgh, PA, pp 95–102
Seung H, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the 5th ACM workshop on computational learning theory. Pittsburgh, PA, pp 287–294
Shahshahani B, Landgrebe D: The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Trans Geosci Remote Sens 32(5), 1087–1095 (1994)
Article Google Scholar
Sindhwani V, Keerthi SS (2006) Large scale semi-supervised linear SVMs. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. Seattle, WA, pp 477–484
Sindhwani V, Keerthi SS, Chapelle O (2006) Deterministic annealing for semi-supervised kernel machines. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 123–130
Sindhwani V, Niyogi P, Belkin M (2005) Beyond the point cloud: From transductive to semi-supervised learning. In: Proceedings of the 22nd international conference on machine learning. Bonn, Germany, pp 824–831
Steedman M, Osborne M, Sarkar A, Clark S, Hwa R, Hockenmaier J, Ruhlen P, Baker S, Crim J (2003) Bootstrapping statistical parsers from small data sets. In: Proceedings of the 11th conference on the European chapter of the association for computational linguistics. Budapest, Hungary, pp 331–338
Vapnik VN: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp 985–992
Wang W, Zhou Z-H (2007) Analyzing co-training style algorithms. In: Proceedings of the 18th European conference on machine learning. Warsaw, Poland, pp 454–465
Wang W, Zhou Z-H (2008) On multi-view active learning and the combination with semi-supervised learning. In: Proceedings of the 25th international conference on machine learning. Helsinki, Finland, pp 1152–1159
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D: Top 10 algorithms in data mining. Knowledge and Information Systems 14(1), 1–37 (2008)
Article Google Scholar
Wu Y, Tian Q, Huang TS (2000) Discriminant-EM algorithm with application to image retrieval. In: Proceedings of the IEEE international conference on computer vision and pattern recognition. Hilton Head, SC, pp 222–227
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics. Cambridge, MA, pp 189–196
Yu K, Yu S, Tresp V (2005) Blockwise supervised inference on large graphs. In: Working notes of the ICML’05 workshop on learning with partially classified training data. Bonn, Germany
Yuille AL, Rangarajan A (2002) The concave-convex procedure (CCCP). In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 1033–1040
Google Scholar
Zhang T, Oles FJ (2000) A probability analysis on the value of unlabeled data for classification problems. In: Proceedings of 17th international conference on machine learning. Stanford, CA, pp 1191–1198
Zhang X, Lee WS (2007) Hyperparameter learning for graph based semi-supervised learning algorithms. In: Schölkopf B, Platt J, Hofmann T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 1585–1592
Google Scholar
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, Cambridge
Google Scholar
Zhou Y, Goldman S (2004) Democratic co-learning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence. Boca Raton, FL, pp 594–602
Zhou Z-H (2006) Learning with unlabeled data and its application to image retrieval. In: Proceedings of the 9th Pacific rim international conference on artificial intelligence. Guilin, China, pp 5–10
Zhou Z-H (2008) Semi-supervised learning by disagreement. In: Proceedings of the 4th IEEE international conference on granular computing. Hangzhou, China
Zhou Z-H: Ensemble learning. In: Li, SZ (eds) Encyclopedia of biometrics, Springer, Berlin (2009)
Google Scholar
Zhou Z-H, Chen K-J, Dai H-B: Enhancing relevance feedback in image retrieval using unlabeled data. ACM Trans Inf Syst 24(2), 219–244 (2006)
Article Google Scholar
Zhou Z-H, Chen K-J, Jiang Y (2004) Exploiting unlabeled data in content-based image retrieval. In: Proceedings of the 15th European conference on machine learning. Pisa, Italy, pp 525–536
Zhou Z-H, Li M (2005) Semi-supervised regression with co-training. In: Proceedings of the 19th international joint conference on artificial intelligence. Edinburgh, Scotland, pp 908–913
Zhou Z-H, Li M: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11), 1529–1541 (2005)
Article Google Scholar
Zhou Z-H, Li M: Semi-supervised regression with co-training style algorithms. IEEE Trans Knowl Data Eng 19(11), 1479–1493 (2007)
Article Google Scholar
Zhou Z-H, Ng M, She Q-Q, Jiang Y (2009) Budget semi-supervised learning. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining. Bangkok, Thailand, pp 588–595
Zhou Z-H, Zhan D-C, Yang Q (2007) Semi-supervised learning with very few labeled training examples. In: Proceedings of the 22nd AAAI conference on artificial intelligence. Vancouver, Canada, pp 675–680
Zhu X (2006) Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin at Madison, Madison, WI, http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 912–919
Zhu X, Lafferty J (2005) Harmonic mixtures: Combining mixture models and graph-based methods for inductive and scalable semi-supervised leanring. In: Proceedings of the 22nd international conference on machine learning. Bonn, Germany, pp 1052–1059

Download references

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
Zhi-Hua Zhou & Ming Li

Authors

Zhi-Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ming Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, ZH., Li, M. Semi-supervised learning by disagreement. Knowl Inf Syst 24, 415–439 (2010). https://doi.org/10.1007/s10115-009-0209-z

Download citation

Received: 16 October 2008
Revised: 16 March 2009
Accepted: 03 April 2009
Published: 19 May 2009
Issue Date: September 2010
DOI: https://doi.org/10.1007/s10115-009-0209-z

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others