The Role of Unlabeled Data in Supervised Learning

Tom M. Mitchell⁴

Part of the book series: Philosophical Studies Series ((PSSP,volume 99))

546 Accesses
5 Citations

Abstract

Most computational models of supervised learning rely only on labeled training examples, and ignore the possible role of unlabeled data. This is true both for cognitive science models of learning such as SOAR [Newell 1990] and ACT–R [Anderson, et al. 1995], and for machine learning and data mining algorithms such as decision tree learning and inductive logic programming (see, e.g., [Mitchell 1997]). In this paper we consider the potential role of unlabeled data in supervised learning. We present an algorithm and experimental results demonstrating that unlabeled data can significantly improve learning accuracy in certain practical problems. We then identify the abstract problem structure that enables the algorithm to successfully utilize this unlabeled data, and prove that unlabeled data will boost learning accuracy for problems in this class. The problem class we identify includes problems where the features describing the examples are redundantly sufficient for classifying the example; a notion we make precise in this paper. This problem class includes many natural learning problems faced by humans, such as learning a semantic lexicon over noun phrases in natural language, and learning to recognize objects from multiple sensor inputs. We argue that models of human and animal learning should consider more strongly the potential role of unlabeled data, and that many natural learning problems fit the class we identify.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 103.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Hardcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Using a Domain Expert in Semi-supervised Learning

Machine Learning

Empowering Imbalanced Data in Supervised Learning: A Semi-supervised Learning Approach

References

A Anderson et al. [1995], Production system models of complex cognition. In Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society (pp. 9–12). Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Blum and Mitchell [1998], Combining Labeled and Unlabeled Data with CoTraining, COLT98. Available at http://www.cs.cmu.edu/-webkb/-webkb.
Google Scholar
A Craven et al. [1998], Learning to extract symbolic knowledge from the world wide web. In Proceedings of the 15 ^th National Conference on Artificial Intelligence (AAAI–98). Available at http://www.cs.cmu.edu.
Google Scholar
de Sa [1994], Learning classification with unlabeled data, NIPS–6, 1994.
Google Scholar
de Sa and Ballard [1998], Category learning through multi–modality sensing, Neural Computation 10(5), 1998.
Google Scholar
Riloff and Jones [1999], Learning dictionaries for information extraction by multi–level bootstrapping, AAAI99. Available at http://www.cs.cmu.edu/-webkb/-webkb.
Google Scholar
Mitchella [1997], Machine learning. New York: McGraw Hill, 1997. See http://www.cs.cmu.edu/-webkb/-webkb.
Google Scholar
Newell [1990], Unified theories of cognition. Cambridge, MA: Harvard University Press, 1990.
Google Scholar
Yarowsky [1995], Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33 ^rd Annual Meeting of the ACL, pp. 189–196.
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, USA
Tom M. Mitchell

Authors

Tom M. Mitchell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Logic, Cognition, Language and Information (ILCLI), The University of the Basque Country, Spain
Jesús M. Larrazabal & Luis A. Pérez Miranda &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mitchell, T.M. (2004). The Role of Unlabeled Data in Supervised Learning. In: Larrazabal, J.M., Miranda, L.A.P. (eds) Language, Knowledge, and Representation. Philosophical Studies Series, vol 99. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-2783-3_7

Download citation

DOI: https://doi.org/10.1007/978-1-4020-2783-3_7
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-015-7073-2
Online ISBN: 978-1-4020-2783-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

The Role of Unlabeled Data in Supervised Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Using a Domain Expert in Semi-supervised Learning

Machine Learning

Empowering Imbalanced Data in Supervised Learning: A Semi-supervised Learning Approach

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Role of Unlabeled Data in Supervised Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Using a Domain Expert in Semi-supervised Learning

Machine Learning

Empowering Imbalanced Data in Supervised Learning: A Semi-supervised Learning Approach

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation