[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2783258.2783409acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Reducing the Unlabeled Sample Complexity of Semi-Supervised Multi-View Learning

Published: 10 August 2015 Publication History

Abstract

In semi-supervised multi-view learning, unlabeled sample complexity (u.s.c.) specifies the size of unlabeled training sample that guarantees a desired learning error. In this paper, we improve the state-of-art u.s.c. from O(1/ε) to O(log 1/ε) for small error ε, under mild conditions. To obtain the improved result, as a primary step we prove a connection between the generalization error of a classifier and its incompatibility, which measures the fitness between the classifier and the sample distribution. We then prove that with a sufficiently large unlabeled sample, one is able to find classifiers with low incompatibility. Combining the two observations, we manage to prove a probably approximately correct (PAC) style learning bound for semi-supervised multi-view learning. We empirically verified our theory by designing two proof-of-concept multi-view learning algorithms, one based on active view sensing and the other based on online co-regularization, with real-world data sets.

Supplementary Material

MP4 File (p627.mp4)

References

[1]
S. Abney. Bootstrapping. In ACL. ACM, 2002.
[2]
M. Alvira and R. Rifkin. An empirical comparison of snow and svms for face detections. Tech.Report, 2001.
[3]
M.-R. Amini, N. Usunier, C. Goutte, et al. Learning from multiple partially observed views - an application to multilingual text categorization. In NIPS, 2009.
[4]
M.-F. Balcan and A. Blum. A pac-style model for learning from labeled and unlabeled data. In Learning Theory, pages 111--126. Springer, 2005.
[5]
M.-F. Balcan and A. Blum. A discriminative model for semi-supervised learning. Journal of the ACM, 57(3):19, 2010.
[6]
M.-F. Balcan, A. Blum, and K. Yang. Co-training and expansion: towards bridging theory and practice. In NIPS, 2004.
[7]
M.-F. Balcan, S. Hanneke, and J. W. Vaughan. The true sample complexity of active learning. Machine learning, 80(2--3):111--139, 2010.
[8]
N. Balcan, C. Berlind, S. Ehrlich, and Y. Liang. Efficient semi-supervised and active learning of disjunctions. In ICML, 2013.
[9]
S. Ben-David, T. Lu, and D. Pál. Does unlabeled data provably help? worst-case analysis of the sample complexity of semi-supervised learning. In COLT, 2008.
[10]
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT, 1998.
[11]
V. Castelli and T. M. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16(1):105--111, 1995.
[12]
V. Castelli and T. M. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. Information Theory, IEEE Transactions on, 42(6):2102--2117, 1996.
[13]
C. Christoudias, R. Urtasun, and T. Darrell. Multi-view learning in the presence of view disagreement. In UAI, 2008.
[14]
M. Culp and G. Michailidis. A co-training algorithm for multi-view data with applications in data fusion. Journal of chemometrics, 23(6):294--303, 2009.
[15]
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.
[16]
S. Dasgupta, M. L. Littman, and D. McAllester. Pac generalization bounds for co-training. NIPS, 2002.
[17]
T. de Ruijter, E. Tsivtsivadze, and T. Heskes. Online co-regularized algorithms. In Discovery Science, 2012.
[18]
F. Fogelman-Soulié et al. Large-scale semi-supervised learning. Mining Massive Data Sets for Security, 2008.
[19]
S. Hanneke. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2--3):131--309, 2014.
[20]
S. Hanneke. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2--3):131--309, 2014.
[21]
D. Haussler and E. Welzl. Epsilon-nets and simplex range queries. Discrete and Computational Geometry, 2(1):127--151, 1987.
[22]
J. Shawe-Taylor. Neural network learning: Theoretical foundations. AI Magazine, 22(2):99, 2001.
[23]
V. Sindhwani and S. S. Keerthi. Large scale semi-supervised linear svms. In SIGIR, 2006.
[24]
V. Sindhwani, P. Niyogi, and M. Belkin. A co-regularization approach to semi-supervised learning with multiple views. In ICML Workshop on Learning with Multiple Views, 2005.
[25]
A. Singh, R. Nowak, and X. Zhu. Unlabeled data: now it helps, now it doesn't. In NIPS, 2009.
[26]
K. Sinha and M. Belkin. The value of labeled and unlabeled examples when the model is imperfect. In NIPS, 2007.
[27]
W. Wang and Z.-H. Zhou. On multi-view active learning and the combination with semi-supervised learning. In ICML, 2008.
[28]
C. Xu, D. Tao, and C. Xu. A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.
[29]
S. Yu, B. Krishnapuram, R. Rosales, and R. B. Rao. Active sensing. In AISTATS, 2009.
[30]
X. Zhu. Semi-supervised learning literature survey. 2005.

Cited By

View all
  • (2020)Active Query of Private Demographic Data for Learning Fair ModelsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412074(2129-2132)Online publication date: 19-Oct-2020
  • (2020)Joint Multi-source ReductionMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-46150-8_18(293-309)Online publication date: 30-Apr-2020
  • (2018)Progressive Semisupervised Learning of Multiple ClassifiersIEEE Transactions on Cybernetics10.1109/TCYB.2017.265111448:2(689-702)Online publication date: Feb-2018
  • Show More Cited By

Index Terms

  1. Reducing the Unlabeled Sample Complexity of Semi-Supervised Multi-View Learning

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
      August 2015
      2378 pages
      ISBN:9781450336642
      DOI:10.1145/2783258
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 August 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. multi-view learning
      2. sample complexity
      3. semi-supervised learning

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      KDD '15
      Sponsor:

      Acceptance Rates

      KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 21 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Active Query of Private Demographic Data for Learning Fair ModelsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412074(2129-2132)Online publication date: 19-Oct-2020
      • (2020)Joint Multi-source ReductionMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-46150-8_18(293-309)Online publication date: 30-Apr-2020
      • (2018)Progressive Semisupervised Learning of Multiple ClassifiersIEEE Transactions on Cybernetics10.1109/TCYB.2017.265111448:2(689-702)Online publication date: Feb-2018
      • (2017)Learning Social Circles in Ego-Networks Based on Multi-View Network StructureIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.268538529:8(1681-1694)Online publication date: 1-Aug-2017
      • (2017)Robust Multi-view Subspace LearningRobust Representation for Data Analytics10.1007/978-3-319-60176-2_5(73-93)Online publication date: 11-Aug-2017
      • (2017)Fundamentals of Robust RepresentationsRobust Representation for Data Analytics10.1007/978-3-319-60176-2_2(9-16)Online publication date: 11-Aug-2017
      • (2016)Multi-View Time Series ClassificationProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983780(989-998)Online publication date: 24-Oct-2016
      • (2016)A disagreement-based active matrix completion approach with provable guarantee2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727731(4082-4088)Online publication date: Jul-2016

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media