[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/370049.370428acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

98¢/Mflops/s ultra-large-scale neural-network training on a pIII cluster

Published: 01 November 2000 Publication History

Abstract

Artificial neural networks with millions of adjustable parameters and a similar number of training examples are a potential solution for difficult, large-scale pattern recognition problems in areas such as speech and face recognition, classification of large volumes of web data and finance. The bottleneck is that neural network training involves iterative gradient descent and is extremely computationally intensive. In this paper we present a technique for distributed training of Ultra Large Scale Neural Networks (ULSNN) on Bunyip, a Linux-based cluster of 196 Pentium III processors. To illustrate ULSNN training we describe an experiment in which a neural network with 1.73 million adjustable parameters was trained to recognize machine-printed Japanese characters from a database containing 9 million training patterns. The training runs with a average performance of 163.3 Gflops/s (single precision). With a machine cost of $150,913, this yields a price/performance ratio of 92.4¢ /Mflops/s (single precision).

References

[1]
D. Aberdeen and J. Baxter. Ememrald: A fast matrixmatrix multiply using Intel SIMD technology. Technical report, Research School of Information Science and Engineering, Australian National University, August 1999. http://csl.anu.edu.au/ c daa/files/emmerald.ps.
[2]
K. Asanovic and N. Morgan. Experimental determination of precision requirements for back-propagation training of artificial neural networks. Technical report, The International Computer Science Institute, 1991. ftp://ftp.ICSI.Berkeley.EDU/pub/techreports/1991/tr-91- 036.ps.gz.
[3]
J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Using PHiPAC to speed Error Back-Propogation learning. In ICASSP, April 1997.
[4]
T. L. Fine. Feedforward Neural Network Methodology. Springer, New York, 1999.
[5]
B. Greer and G. Henry. High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS. Technical report, Intel, August 1997. http:// www.cs.utk.edu/cghenry/sc97/paper.htm.
[6]
J.Bilmes, K.Asanovic, J.Demmel, D.Lam, and C.W.Chin. PHiPAC: A portable, high-performace, ANSI C coding methodoloogy and its application to matrix multiply. Technical report, University of Tennessee, August 1996. http://www.icsi.berkeley.edu/cbilmes/phipac.
[7]
LAM Team. Lam/mpi source code v6.3.2. http://www.mpi.nd.edu/lam/download/.
[8]
Netlib. Basic Linear Algebra Subroutines, November 1998. http://www.netlib.org/blas/index.html.
[9]
V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13:354-356, 1969.
[10]
R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. Technical report, Computer Science Department, University of Tennessee, 1997. http://www.netlib.org/utk/projects/atlas/.
[11]
R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimizations of software and the atlas project. Technical report, Dept. of Computer Sciences, Univ. of TN, Knoxville, March 2000. http://www.cs.utk.edu/crwhaley/ATLAS/atlas.html.

Cited By

View all
  • (2003)Communication performance issues for two cluster computersProceedings of the 26th Australasian computer science conference - Volume 1610.5555/783106.783126(171-180)Online publication date: 1-Feb-2003

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing
November 2000
889 pages
ISBN:0780398025

Sponsors

In-Cooperation

  • SIAM: Society for Industrial and Applied Mathematics

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 2000

Check for updates

Author Tags

  1. Linux cluster
  2. matrix-multiply
  3. neural-network

Qualifiers

  • Article

Conference

SC '00
Sponsor:

Acceptance Rates

SC '00 Paper Acceptance Rate 62 of 179 submissions, 35%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)4
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2003)Communication performance issues for two cluster computersProceedings of the 26th Australasian computer science conference - Volume 1610.5555/783106.783126(171-180)Online publication date: 1-Feb-2003

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media