[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1553374.1553486acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Large-scale deep unsupervised learning using graphics processors

Published: 14 June 2009 Publication History

Abstract

The promise of unsupervised learning methods lies in their potential to use vast amounts of unlabeled data to learn complex, highly nonlinear models with millions of free parameters. We consider two well-known unsupervised learning models, deep belief networks (DBNs) and sparse coding, that have recently been applied to a flurry of machine learning applications (Hinton & Salakhutdinov, 2006; Raina et al., 2007). Unfortunately, current learning algorithms for both models are too slow for large-scale applications, forcing researchers to focus on smaller-scale models, or to use fewer training examples.
In this paper, we suggest massively parallel methods to help resolve these problems. We argue that modern graphics processors far surpass the computational capabilities of multicore CPUs, and have the potential to revolutionize the applicability of deep unsupervised learning methods. We develop general principles for massively parallelizing unsupervised learning tasks using graphics processors. We show that these principles can be applied to successfully scaling up learning algorithms for both DBNs and sparse coding. Our implementation of DBN learning is up to 70 times faster than a dual-core CPU implementation for large models. For example, we are able to reduce the time required to learn a four-layer DBN with 100 million free parameters from several weeks to around a single day. For sparse coding, we develop a simple, inherently parallel algorithm, that leads to a 5 to 15-fold speedup over previous methods.

References

[1]
Andrew, G., & Gao, J. (2007). Scalable training of L 1-regularized log-linear models. International Conference on Machine Learning (pp. 33--40).
[2]
Banko, M., & Brill, E. (2001). Scaling to very very large corpora for natural language disambiguation. Annual Meeting of the Association for Computational Linguistics (pp. 26--33).
[3]
Bengio, Y. (2007). Speeding up stochastic gradient descent. Neural Information Processing Systems Workshop on Efficient Machine Learning.
[4]
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2006). Greedy layer-wise training of deep networks. Neural Information Processing Systems (pp. 153--160).
[5]
Bradley, D., & Bagnell, J. A. (2008). Differentiable sparse coding. Neural Information Processing Systems (pp. 113--120).
[6]
Brants, T., Popat, A. C., Xu, P., Och, F. J., & Dean, J. (2007). Large language models in machine translation. Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL).
[7]
Catanzaro, B. C., Sundaram, N., & Keutzer, K. (2008). Fast support vector machine training and classification on graphics processors. International Conference on Machine Learning (pp. 104--111).
[8]
Chellapilla, K., Puri, S., & Simard, P. (2006). High performance convolutional neural networks for document processing. International Workshop on Frontiers in Handwriting Recognition.
[9]
Chu, C. T., Kim, S. K., Lin, Y. A., Yu, Y., Bradski, G. R., Ng, A. Y., & Olukotun, K. (2006). Map-reduce for machine learning on multicore. Neural Information Processing Systems (pp. 281--288).
[10]
Dean, J., & Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. Operating System Design and Implementation (pp. 137--150).
[11]
Desjardins, G., & Bengio, Y. (2008). Empirical evaluation of convolutional RBMs for vision. Tech Report.
[12]
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Stat., 32, 407.
[13]
Frank, D. (2002). Power-constrained CMOS scaling limits. IBM Jour. of Res. and Devel., 46, 235--244.
[14]
Friedman, J., Hastie, T., Hfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. App. Stat., 2, 302--332.
[15]
Gelsinger, P. (2001). Microprocessors for the new millennium: Challenges, opportunities and new frontiers. ISSCC Tech. Digest, 22--25.
[16]
Goto, K., & Van De Geijn, R. (2008). High-performance implementation of the level-3 BLAS. ACM Trans. Math. Softw., 35, 1--14.
[17]
Harris, M. (2008). Many-core GPU computing with NVIDIA CUDA. Int. Conf. Supercomputing (p. 1).
[18]
Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771--1800.
[19]
Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554.
[20]
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.
[21]
Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2008). Fast inference in sparse coding algorithms with applications to object recognition. NYU Tech Report.
[22]
Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. Neural Information Processing Systems (pp. 801--808).
[23]
Lee, H., Chaitanya, E., & Ng, A. Y. (2007). Sparse deep belief net model for visual area V2. Neural Information Processing Systems (pp. 873--880).
[24]
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. International Conference on Machine Learning (to appear).
[25]
Murray, J. F., & Kreutz-Delgado, K. (2006). Learning sparse overcomplete codes for images. J. VLSI Signal Processing Systems, 45, 97--110.
[26]
Ng, A. Y. (2004). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. International Conference on Machine Learning (pp. 78--85).
[27]
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607--609.
[28]
Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: Transfer learning from unlabeled data. International Conference on Machine Learning (pp. 759--766).
[29]
Ranzato, M. A., & Szummer, M. (2008). Semi-supervised learning of compact document representations with deep networks. International Conference on Machine Learning (pp. 792--799).
[30]
Salakhutdinov, R., & Hinton, G. (2007). Semantic Hashing. SIGIR Workshop on Information Retrieval and Applications of Graphical Models.
[31]
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B., 58, 267--288.
[32]
van Hateren, J. H., & van der Schaaff, A. (1997). Independent component filters of natural images compared with simple cells in primary visual cortex. Royal Soc. Lond. B, 265, 359--366.
[33]
Whaley, R. C., Petitet, A., & Dongarra, J. J. (2001). Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27, 3--35.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)172
  • Downloads (Last 6 weeks)18
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media