Article

Free access

98¢/Mflops/s ultra-large-scale neural-network training on a pIII cluster

Authors:

Douglas A. Aberdeen,

Jonathan Baxter,

Robert EdwardsAuthors Info & Claims

SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing

Pages 44 - es

Published: 01 November 2000 Publication History

PDF eReader

Abstract

Artificial neural networks with millions of adjustable parameters and a similar number of training examples are a potential solution for difficult, large-scale pattern recognition problems in areas such as speech and face recognition, classification of large volumes of web data and finance. The bottleneck is that neural network training involves iterative gradient descent and is extremely computationally intensive. In this paper we present a technique for distributed training of Ultra Large Scale Neural Networks (ULSNN) on Bunyip, a Linux-based cluster of 196 Pentium III processors. To illustrate ULSNN training we describe an experiment in which a neural network with 1.73 million adjustable parameters was trained to recognize machine-printed Japanese characters from a database containing 9 million training patterns. The training runs with a average performance of 163.3 Gflops/s (single precision). With a machine cost of $150,913, this yields a price/performance ratio of 92.4¢ /Mflops/s (single precision).

References

[1]

D. Aberdeen and J. Baxter. Ememrald: A fast matrixmatrix multiply using Intel SIMD technology. Technical report, Research School of Information Science and Engineering, Australian National University, August 1999. http://csl.anu.edu.au/ c daa/files/emmerald.ps.

Google Scholar

[2]

K. Asanovic and N. Morgan. Experimental determination of precision requirements for back-propagation training of artificial neural networks. Technical report, The International Computer Science Institute, 1991. ftp://ftp.ICSI.Berkeley.EDU/pub/techreports/1991/tr-91- 036.ps.gz.

Google Scholar

[3]

J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Using PHiPAC to speed Error Back-Propogation learning. In ICASSP, April 1997.

Crossref

Google Scholar

[4]

T. L. Fine. Feedforward Neural Network Methodology. Springer, New York, 1999.

Digital Library

Google Scholar

[5]

B. Greer and G. Henry. High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS. Technical report, Intel, August 1997. http:// www.cs.utk.edu/cghenry/sc97/paper.htm.

Google Scholar

[6]

J.Bilmes, K.Asanovic, J.Demmel, D.Lam, and C.W.Chin. PHiPAC: A portable, high-performace, ANSI C coding methodoloogy and its application to matrix multiply. Technical report, University of Tennessee, August 1996. http://www.icsi.berkeley.edu/cbilmes/phipac.

Digital Library

Google Scholar

[7]

LAM Team. Lam/mpi source code v6.3.2. http://www.mpi.nd.edu/lam/download/.

Google Scholar

[8]

Netlib. Basic Linear Algebra Subroutines, November 1998. http://www.netlib.org/blas/index.html.

Google Scholar

[9]

V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13:354-356, 1969.

Digital Library

Google Scholar

[10]

R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. Technical report, Computer Science Department, University of Tennessee, 1997. http://www.netlib.org/utk/projects/atlas/.

Digital Library

Google Scholar

[11]

R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimizations of software and the atlas project. Technical report, Dept. of Computer Sciences, Univ. of TN, Knoxville, March 2000. http://www.cs.utk.edu/crwhaley/ATLAS/atlas.html.

Google Scholar

Cited By

View all

Vaughan FGrove DCoddington P(2003)Communication performance issues for two cluster computersProceedings of the 26th Australasian computer science conference - Volume 1610.5555/783106.783126(171-180)Online publication date: 1-Feb-2003
https://dl.acm.org/doi/10.5555/783106.783126

Index Terms

98¢/Mflops/s ultra-large-scale neural-network training on a pIII cluster

Recommendations

Parallelizing neural network training for cluster systems
PDCN '08: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks

We present a technique for parallelizing the training of neural networks. Our technique is designed for parallelization on a cluster of workstations. To take advantage of parallelization on clusters, a solution must account for the higher network ...
A max-piecewise-linear neural network for function approximation

This paper proposes a Max-Piecewise-Linear (MPWL) Neural Network for function approximation. The MPWL network consists of a single hidden layer and employs the Piecewise-Linear (PWL) Basis Functions as the activation functions of hidden neurons. Since a ...
Complexity of training ReLU neural network
Abstract
In this paper, we explore some basic questions on the complexity of training neural networks with ReLU activation function. We show that it is NP-hard to train a two-hidden layer feedforward ReLU neural network. If dimension of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing

November 2000

889 pages

ISBN:0780398025

Conference Chair:
Louis Turcotte
Rose-Hulman Institute of Technology

In-Cooperation

SIAM: Society for Industrial and Applied Mathematics

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 2000

Check for updates

Author Tags

Qualifiers

Article

Conference

SC '00

Sponsor:

SIGARCH
IEEE-CS

SC '00: International Conference for High Performance Computing, Networking, Storage and Analysis

November 4 - 10, 2000

Texas, Dallas, USA

Acceptance Rates

SC '00 Paper Acceptance Rate 62 of 179 submissions, 35%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
486
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)4

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Vaughan FGrove DCoddington P(2003)Communication performance issues for two cluster computersProceedings of the 26th Australasian computer science conference - Volume 1610.5555/783106.783126(171-180)Online publication date: 1-Feb-2003
https://dl.acm.org/doi/10.5555/783106.783126

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Parallelizing neural network training for cluster systems

A max-piecewise-linear neural network for function approximation

Complexity of training ReLU neural network