More Web Proxy on the site http://driver.im/

research-article

Reducing the Unlabeled Sample Complexity of Semi-Supervised Multi-View Learning

Authors:

Jun HuanAuthors Info & Claims

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 627 - 634

https://doi.org/10.1145/2783258.2783409

Published: 10 August 2015 Publication History

Abstract

In semi-supervised multi-view learning, unlabeled sample complexity (u.s.c.) specifies the size of unlabeled training sample that guarantees a desired learning error. In this paper, we improve the state-of-art u.s.c. from O(1/ε) to O(log 1/ε) for small error ε, under mild conditions. To obtain the improved result, as a primary step we prove a connection between the generalization error of a classifier and its incompatibility, which measures the fitness between the classifier and the sample distribution. We then prove that with a sufficiently large unlabeled sample, one is able to find classifiers with low incompatibility. Combining the two observations, we manage to prove a probably approximately correct (PAC) style learning bound for semi-supervised multi-view learning. We empirically verified our theory by designing two proof-of-concept multi-view learning algorithms, one based on active view sensing and the other based on online co-regularization, with real-world data sets.

Supplementary Material

MP4 File (p627.mp4)

Download
139.48 MB

References

[1]

S. Abney. Bootstrapping. In ACL. ACM, 2002.

Digital Library

[2]

M. Alvira and R. Rifkin. An empirical comparison of snow and svms for face detections. Tech.Report, 2001.

[3]

M.-R. Amini, N. Usunier, C. Goutte, et al. Learning from multiple partially observed views - an application to multilingual text categorization. In NIPS, 2009.

[4]

M.-F. Balcan and A. Blum. A pac-style model for learning from labeled and unlabeled data. In Learning Theory, pages 111--126. Springer, 2005.

Digital Library

[5]

M.-F. Balcan and A. Blum. A discriminative model for semi-supervised learning. Journal of the ACM, 57(3):19, 2010.

Digital Library

[6]

M.-F. Balcan, A. Blum, and K. Yang. Co-training and expansion: towards bridging theory and practice. In NIPS, 2004.

Digital Library

[7]

M.-F. Balcan, S. Hanneke, and J. W. Vaughan. The true sample complexity of active learning. Machine learning, 80(2--3):111--139, 2010.

Digital Library

[8]

N. Balcan, C. Berlind, S. Ehrlich, and Y. Liang. Efficient semi-supervised and active learning of disjunctions. In ICML, 2013.

[9]

S. Ben-David, T. Lu, and D. Pál. Does unlabeled data provably help? worst-case analysis of the sample complexity of semi-supervised learning. In COLT, 2008.

[10]

A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT, 1998.

Digital Library

[11]

V. Castelli and T. M. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16(1):105--111, 1995.

Digital Library

[12]

V. Castelli and T. M. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. Information Theory, IEEE Transactions on, 42(6):2102--2117, 1996.

[13]

C. Christoudias, R. Urtasun, and T. Darrell. Multi-view learning in the presence of view disagreement. In UAI, 2008.

[14]

M. Culp and G. Michailidis. A co-training algorithm for multi-view data with applications in data fusion. Journal of chemometrics, 23(6):294--303, 2009.

[15]

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

Digital Library

[16]

S. Dasgupta, M. L. Littman, and D. McAllester. Pac generalization bounds for co-training. NIPS, 2002.

[17]

T. de Ruijter, E. Tsivtsivadze, and T. Heskes. Online co-regularized algorithms. In Discovery Science, 2012.

[18]

F. Fogelman-Soulié et al. Large-scale semi-supervised learning. Mining Massive Data Sets for Security, 2008.

[19]

S. Hanneke. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2--3):131--309, 2014.

Digital Library

[20]

S. Hanneke. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2--3):131--309, 2014.

Digital Library

[21]

D. Haussler and E. Welzl. Epsilon-nets and simplex range queries. Discrete and Computational Geometry, 2(1):127--151, 1987.

Digital Library

[22]

J. Shawe-Taylor. Neural network learning: Theoretical foundations. AI Magazine, 22(2):99, 2001.

[23]

V. Sindhwani and S. S. Keerthi. Large scale semi-supervised linear svms. In SIGIR, 2006.

Digital Library

[24]

V. Sindhwani, P. Niyogi, and M. Belkin. A co-regularization approach to semi-supervised learning with multiple views. In ICML Workshop on Learning with Multiple Views, 2005.

[25]

A. Singh, R. Nowak, and X. Zhu. Unlabeled data: now it helps, now it doesn't. In NIPS, 2009.

[26]

K. Sinha and M. Belkin. The value of labeled and unlabeled examples when the model is imperfect. In NIPS, 2007.

[27]

W. Wang and Z.-H. Zhou. On multi-view active learning and the combination with semi-supervised learning. In ICML, 2008.

Digital Library

[28]

C. Xu, D. Tao, and C. Xu. A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.

[29]

S. Yu, B. Krishnapuram, R. Rosales, and R. B. Rao. Active sensing. In AISTATS, 2009.

[30]

X. Zhu. Semi-supervised learning literature survey. 2005.

Cited By

Liu YLan Cd'Aquin MDietze SHauff CCurry ECudre Mauroux P(2020)Active Query of Private Demographic Data for Learning Fair ModelsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412074(2129-2132)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3340531.3412074
Zhang LWang SJin XJia S(2020)Joint Multi-source ReductionMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-46150-8_18(293-309)Online publication date: 30-Apr-2020
https://doi.org/10.1007/978-3-030-46150-8_18
Yu ZLu YZhang JYou JWong HWang YHan G(2018)Progressive Semisupervised Learning of Multiple ClassifiersIEEE Transactions on Cybernetics10.1109/TCYB.2017.265111448:2(689-702)Online publication date: Feb-2018
https://doi.org/10.1109/TCYB.2017.2651114
Show More Cited By

Index Terms

Reducing the Unlabeled Sample Complexity of Semi-Supervised Multi-View Learning
1. Computing methodologies
  1. Machine learning
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Combining labeled and unlabeled data with co-training

COLT' 98: Proceedings of the eleventh annual conference on Computational learning theory

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2015

2378 pages

ISBN:9781450336642

DOI:10.1145/2783258

General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

KDD '15

Sponsor:

KDD '15: The 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 10 - 13, 2015

NSW, Sydney, Australia

Acceptance Rates

KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
645
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu YLan Cd'Aquin MDietze SHauff CCurry ECudre Mauroux P(2020)Active Query of Private Demographic Data for Learning Fair ModelsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412074(2129-2132)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3340531.3412074
Zhang LWang SJin XJia S(2020)Joint Multi-source ReductionMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-46150-8_18(293-309)Online publication date: 30-Apr-2020
https://doi.org/10.1007/978-3-030-46150-8_18
Yu ZLu YZhang JYou JWong HWang YHan G(2018)Progressive Semisupervised Learning of Multiple ClassifiersIEEE Transactions on Cybernetics10.1109/TCYB.2017.265111448:2(689-702)Online publication date: Feb-2018
https://doi.org/10.1109/TCYB.2017.2651114
Lan CYang YLi XLuo BHuan J(2017)Learning Social Circles in Ego-Networks Based on Multi-View Network StructureIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.268538529:8(1681-1694)Online publication date: 1-Aug-2017
https://doi.org/10.1109/TKDE.2017.2685385
Li SFu YLi SFu Y(2017)Robust Multi-view Subspace LearningRobust Representation for Data Analytics10.1007/978-3-319-60176-2_5(73-93)Online publication date: 11-Aug-2017
https://doi.org/10.1007/978-3-319-60176-2_5
Li SFu YLi SFu Y(2017)Fundamentals of Robust RepresentationsRobust Representation for Data Analytics10.1007/978-3-319-60176-2_2(9-16)Online publication date: 11-Aug-2017
https://doi.org/10.1007/978-3-319-60176-2_2
Li SLi YFu YMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Multi-View Time Series ClassificationProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983780(989-998)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983780
Lan CDeng YHuan J(2016)A disagreement-based active matrix completion approach with provable guarantee2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727731(4082-4088)Online publication date: Jul-2016
https://doi.org/10.1109/IJCNN.2016.7727731

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents