[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2984093.2984152guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Semi-supervised learning in gigantic image collections

Published: 07 December 2009 Publication History

Abstract

With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. "Clean labels" can be manually obtained on a small fraction, "noisy labels" may be extracted automatically from surrounding text, while for most images there are no labels at all. Semi-supervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semi-supervised learning that are linear in the number of images. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators. Our algorithm enables us to apply semi-supervised learning to a database of 80 million images gathered from the Internet.

References

[1]
M. Belkin and P. Niyogi. Towards a theoretical foundation for laplacian based manifold methods. Journal of Computer and System Sciences, 2007.
[2]
M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR, 7:2399-2434, 2006.
[3]
Y. Bengio, O. Delalleau, N. L. Roux, J.-F. Paiement, P. Vincent, and M. Ouimet. Learning eigenfunctions links spectral embedding and kernel PCA. In NIPS, pages 2197-2219, 2004.
[4]
T. Berg and D. Forsyth. Animals on the web. In CVPR, pages 1463-1470, 2006.
[5]
O. Chapelle, B. Schölkopf, and A. Zien. Semi-Supervised Learning. MIT Press, 2006.
[6]
R. R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker. Geometric diffusion as a tool for harmonic analysis and structure definition of data, part i: Diffusion maps. PNAS, 21(102):7426-7431, 2005.
[7]
R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 2008.
[8]
R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning object categories from google's image search. In ICCV, volume 2, pages 1816-1823, Oct. 2005.
[9]
J. Garcke and M. Griebel. Semi-supervised learning with sparse grids. In ICML workshop on learning with partially classified training data, 2005.
[10]
A. Kapoor, K. Grauman, R. Urtasun, and T. Darrell. Active learning with gaussian processes for object categorization. In CVPR, 2007.
[11]
A. Krizhevsky and G. E. Hinton. Learning multiple layers of features from tiny images. Technical report, Computer Science Department, University of Toronto, 2009.
[12]
S. Kumar, M. Mohri, and A. Talwalkar. Sampling techniques for the Nystrom method. In AISTATS, 2009.
[13]
L. J. Li, G. Wang, and L. Fei-Fei. Optimol: automatic object picture collection via incremental model learning. In CVPR, 2007.
[14]
B. Nadler, S. Lafon, R. R. Coifman, and I. G. Kevrekidis. Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Applied and Computational Harmonic Analysis, 21:113-127, 2006.
[15]
A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 42:145-175, 2001.
[16]
B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. Labelme: a database and web-based tool for image annotation. IJCV, 77(1):157-173, 2008.
[17]
B. Schoelkopf and A. Smola. Learning with Kernels Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
[18]
A. Talwalkar, S. Kumar, and H. Rowley. Large-scale manifold learning. In CVPR, 2008.
[19]
A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: a large database for non-parametric object and scene recognition. IEEE PAMI, 30(11):1958-1970, November 2008.
[20]
I. Tsang and J. Kwok. Large-scale sparsified manifold regularization. In NIPS, 2006.
[21]
L. van Ahn. The ESP game, 2006.
[22]
S. Vijayanarasimhan and K. Grauman. Keywords to visual categories: Multiple-instance learning for weakly supervised object categorization. In CVPR, 2008.
[23]
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.
[24]
B. Yao, X. Yang, and S. C. Zhu. Introduction to a large scale general purpose ground truth dataset: methodology, annotation tool, and benchmarks. In EMMCVPR, 2007.
[25]
K. Yu, S. Yu, and V. Tresp. Blockwise supervised inference on large graphs. In ICML workshop on learning with partially classified training data, 2005.
[26]
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf. Learning with local and global consistency. In NIPS, 2004.
[27]
X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin Madison, 2008.
[28]
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In In ICML, pages 912-919, 2003.
[29]
X. Zhu and J. Lafferty. Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. In ICML, 2005.

Cited By

View all
  • (2024)One-Bit Supervision for Image Classification: Problem, Solution, and BeyondACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363377920:4(1-22)Online publication date: 11-Jan-2024
  • (2021)Fast and Accurate Anchor Graph-based Label PredictionProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482258(504-513)Online publication date: 26-Oct-2021
  • (2018)Lightweight label propagation for large-scale network dataProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304222.3304243(3421-3427)Online publication date: 13-Jul-2018
  • Show More Cited By
Index terms have been assigned to the content through auto-classification.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'09: Proceedings of the 23rd International Conference on Neural Information Processing Systems
December 2009
2348 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 07 December 2009

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)One-Bit Supervision for Image Classification: Problem, Solution, and BeyondACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363377920:4(1-22)Online publication date: 11-Jan-2024
  • (2021)Fast and Accurate Anchor Graph-based Label PredictionProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482258(504-513)Online publication date: 26-Oct-2021
  • (2018)Lightweight label propagation for large-scale network dataProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304222.3304243(3421-3427)Online publication date: 13-Jul-2018
  • (2018)Semi-Supervised Image Classification With Self-Paced Cross-Task NetworksIEEE Transactions on Multimedia10.1109/TMM.2017.275852220:4(851-865)Online publication date: 1-Apr-2018
  • (2018)Centroid Neural Network with Pairwise Constraints for Semi-supervised LearningNeural Processing Letters10.1007/s11063-018-9794-848:3(1721-1747)Online publication date: 1-Dec-2018
  • (2018)A Deep Neural Network Based on ELM for Semi-supervised Learning of Image ClassificationNeural Processing Letters10.1007/s11063-017-9709-048:1(375-388)Online publication date: 1-Aug-2018
  • (2017)Toward robustness against label noise in training deep discriminative neural networksProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3295222.3295311(5601-5610)Online publication date: 4-Dec-2017
  • (2017)Optimized learning instance-based image retrievalMultimedia Tools and Applications10.5555/3124201.312423876:15(16749-16766)Online publication date: 1-Aug-2017
  • (2017)An Evaluation of Large-scale Methods for Image Instance and Class DiscoveryProceedings of the on Thematic Workshops of ACM Multimedia 201710.1145/3126686.3126711(1-9)Online publication date: 23-Oct-2017
  • (2017)Adaptive Knowledge Propagation in Web OntologiesACM Transactions on the Web10.1145/310596112:1(1-28)Online publication date: 21-Aug-2017
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media