More Web Proxy on the site http://driver.im/

research-article

Large-scale deep unsupervised learning using graphics processors

Authors:

Anand Madhavan,

Andrew Y. NgAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 873 - 880

https://doi.org/10.1145/1553374.1553486

Published: 14 June 2009 Publication History

Abstract

The promise of unsupervised learning methods lies in their potential to use vast amounts of unlabeled data to learn complex, highly nonlinear models with millions of free parameters. We consider two well-known unsupervised learning models, deep belief networks (DBNs) and sparse coding, that have recently been applied to a flurry of machine learning applications (Hinton & Salakhutdinov, 2006; Raina et al., 2007). Unfortunately, current learning algorithms for both models are too slow for large-scale applications, forcing researchers to focus on smaller-scale models, or to use fewer training examples.

In this paper, we suggest massively parallel methods to help resolve these problems. We argue that modern graphics processors far surpass the computational capabilities of multicore CPUs, and have the potential to revolutionize the applicability of deep unsupervised learning methods. We develop general principles for massively parallelizing unsupervised learning tasks using graphics processors. We show that these principles can be applied to successfully scaling up learning algorithms for both DBNs and sparse coding. Our implementation of DBN learning is up to 70 times faster than a dual-core CPU implementation for large models. For example, we are able to reduce the time required to learn a four-layer DBN with 100 million free parameters from several weeks to around a single day. For sparse coding, we develop a simple, inherently parallel algorithm, that leads to a 5 to 15-fold speedup over previous methods.

References

[1]

Andrew, G., & Gao, J. (2007). Scalable training of L ₁-regularized log-linear models. International Conference on Machine Learning (pp. 33--40).

Digital Library

[2]

Banko, M., & Brill, E. (2001). Scaling to very very large corpora for natural language disambiguation. Annual Meeting of the Association for Computational Linguistics (pp. 26--33).

Digital Library

[3]

Bengio, Y. (2007). Speeding up stochastic gradient descent. Neural Information Processing Systems Workshop on Efficient Machine Learning.

[4]

Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2006). Greedy layer-wise training of deep networks. Neural Information Processing Systems (pp. 153--160).

[5]

Bradley, D., & Bagnell, J. A. (2008). Differentiable sparse coding. Neural Information Processing Systems (pp. 113--120).

[6]

Brants, T., Popat, A. C., Xu, P., Och, F. J., & Dean, J. (2007). Large language models in machine translation. Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL).

[7]

Catanzaro, B. C., Sundaram, N., & Keutzer, K. (2008). Fast support vector machine training and classification on graphics processors. International Conference on Machine Learning (pp. 104--111).

Digital Library

[8]

Chellapilla, K., Puri, S., & Simard, P. (2006). High performance convolutional neural networks for document processing. International Workshop on Frontiers in Handwriting Recognition.

[9]

Chu, C. T., Kim, S. K., Lin, Y. A., Yu, Y., Bradski, G. R., Ng, A. Y., & Olukotun, K. (2006). Map-reduce for machine learning on multicore. Neural Information Processing Systems (pp. 281--288).

[10]

Dean, J., & Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. Operating System Design and Implementation (pp. 137--150).

Digital Library

[11]

Desjardins, G., & Bengio, Y. (2008). Empirical evaluation of convolutional RBMs for vision. Tech Report.

[12]

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Stat., 32, 407.

[13]

Frank, D. (2002). Power-constrained CMOS scaling limits. IBM Jour. of Res. and Devel., 46, 235--244.

Digital Library

[14]

Friedman, J., Hastie, T., Hfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. App. Stat., 2, 302--332.

[15]

Gelsinger, P. (2001). Microprocessors for the new millennium: Challenges, opportunities and new frontiers. ISSCC Tech. Digest, 22--25.

[16]

Goto, K., & Van De Geijn, R. (2008). High-performance implementation of the level-3 BLAS. ACM Trans. Math. Softw., 35, 1--14.

Digital Library

[17]

Harris, M. (2008). Many-core GPU computing with NVIDIA CUDA. Int. Conf. Supercomputing (p. 1).

Digital Library

[18]

Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771--1800.

Digital Library

[19]

Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554.

Digital Library

[20]

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.

[21]

Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2008). Fast inference in sparse coding algorithms with applications to object recognition. NYU Tech Report.

[22]

Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. Neural Information Processing Systems (pp. 801--808).

[23]

Lee, H., Chaitanya, E., & Ng, A. Y. (2007). Sparse deep belief net model for visual area V2. Neural Information Processing Systems (pp. 873--880).

[24]

Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. International Conference on Machine Learning (to appear).

Digital Library

[25]

Murray, J. F., & Kreutz-Delgado, K. (2006). Learning sparse overcomplete codes for images. J. VLSI Signal Processing Systems, 45, 97--110.

Digital Library

[26]

Ng, A. Y. (2004). Feature selection, L ₁ vs. L ₂ regularization, and rotational invariance. International Conference on Machine Learning (pp. 78--85).

Digital Library

[27]

Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607--609.

[28]

Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: Transfer learning from unlabeled data. International Conference on Machine Learning (pp. 759--766).

Digital Library

[29]

Ranzato, M. A., & Szummer, M. (2008). Semi-supervised learning of compact document representations with deep networks. International Conference on Machine Learning (pp. 792--799).

Digital Library

[30]

Salakhutdinov, R., & Hinton, G. (2007). Semantic Hashing. SIGIR Workshop on Information Retrieval and Applications of Graphical Models.

[31]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B., 58, 267--288.

[32]

van Hateren, J. H., & van der Schaaff, A. (1997). Independent component filters of natural images compared with simple cells in primary visual cortex. Royal Soc. Lond. B, 265, 359--366.

[33]

Whaley, R. C., Petitet, A., & Dongarra, J. J. (2001). Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27, 3--35.

Digital Library

Cited By

Liu ZLiu Z(2025)Ensemble LearningArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_9(221-242)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_9
Liu ZLiu Z(2025)Deep LearningArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_8(191-220)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_8
Liu ZLiu Z(2025)Artificial Neural NetworksArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_7(175-190)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_7
Show More Cited By

Index Terms

Large-scale deep unsupervised learning using graphics processors
1. Computing methodologies
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Relational query coprocessing on graphics processors

Graphics processors (GPUs) have recently emerged as powerful coprocessors for general purpose computation. Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth. Moreover, new-generation GPUs ...
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Acceleration of cfd and data analysis using graphics processors

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

Sponsors

NSF
Microsoft Research: Microsoft Research
MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

423
Total Citations
View Citations
3,234
Total Downloads

Downloads (Last 12 months)172
Downloads (Last 6 weeks)18

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu ZLiu Z(2025)Ensemble LearningArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_9(221-242)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_9
Liu ZLiu Z(2025)Deep LearningArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_8(191-220)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_8
Liu ZLiu Z(2025)Artificial Neural NetworksArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_7(175-190)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_7
Liu ZLiu Z(2025)Bayesian AlgorithmsArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_6(141-173)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_6
Liu ZLiu Z(2025)Support Vector MachinesArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_5(129-140)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_5
Liu ZLiu Z(2025)Decision TreesArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_4(115-128)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_4
Liu ZLiu Z(2025)Linear ModelsArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_3(95-114)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_3
Liu ZLiu Z(2025)Tools for Artificial IntelligenceArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_2(45-93)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_2
Liu ZLiu Z(2025)Policy-Based Reinforcement LearningArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_15(357-378)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_15
Liu ZLiu Z(2025)Value-Based Reinforcement LearningArtificial Intelligence for Engineers10.1007/978-3-031-75953-6_14(337-355)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-3-031-75953-6_14
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten