[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1601896.1601950acmconferencesArticle/Chapter ViewAbstractPublication PagessbcciConference Proceedingsconference-collections
research-article

Architecture for dense matrix multiplication on a high-performance reconfigurable system

Published: 31 August 2009 Publication History

Abstract

The recent evolution of the programmable logic devices, such as FPGAs (Field Programmable Gate Array), associated with the growing demand for performance improvements in scientific computing applications, has attracted the attention of supercomputers vendors. They have been developing hybrid platforms that links general-purpose processors with co-processors based on FPGAs, aiming computing acceleration.
In this work we present the analysis and development of an important scientific computing operation: matrix multiplication, targeting the commercial hybrid platform RASC (Reconfigurable Application-Specific Computing), developed by Silicon Graphics.
The proposed architecture aims to reach better performance than conventional architectures, dissipating less power. To achieve this goal, we investigated the possibilities of implementation in parallel and data reuse intrinsic to the algorithm. Based on this investigation we propose a case study that uses the available resources in the target platform to explore these features.

References

[1]
Laurenz Christian Buri, Studies of Classical HPC Problems on fine-grained and massively parallel computing enviromnment based on reconfigurable hardware, Msc. Thesis, Department of Microelectronics and Information Technology IMIT KTH, 2006.
[2]
Ronald Scrofano, Jr. Accelerating Scientific Computing Applications with reconfigurable hardware, Ph.D. Thesis, Faculty of the Graduate School University of Southern California, 2006.
[3]
Aiichiro Nakano. Class notes for CSCI 599: High performance scientific computing University of Southern California, Fall semester, 2003.
[4]
D. C. Rapaport. The Art of Molecular Dynamics Simulation. Cambridge University Press, Cambridge, 2004.
[5]
Maya B. Gokhale and Paul S. Graham. Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays. Springer, Dordrecht, The Netherlands, 2005.
[6]
Ling Zhuo, Viktor K. Prasanna, Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems, IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 18, No. 4, pp. 433--448, April 2007.
[7]
Ling Zhuo, Viktor K. Prasanna, Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems, Proceedings of the 12th International Conference on Parallel and Distributed Systems, p. 87--95, July 12--15, 2006.
[8]
L. Zhuo and V. K. Prasanna. Design Tradeoffs for BLAS Operations on Reconfigurable Hardware. In Proc. 34th Int'l Conf. Parallel Processing (ICPP'05), Oslo, Norway, June 2005.
[9]
SRC Computers, Inc., http://www.srccomp.com/. Accesed in: March/2009.
[10]
SGI RASC, www.sgi.com/products/rasc/. Accessed in: March/2009.
[11]
Ling Zhuo, Viktor K. Prasanna, Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems, Proceedings of the 12th International Conference on Parallel and Distributed Systems, p. 87--95, July 12--15, 2006.
[12]
L. Zhuo and V. K. Prasanna, "High-Performance Linear Algebra Operations on Reconfigurable Systems," Proc. Supercomputing 2005, IEEE CS Press, 2005, p. 2.
[13]
R. Scrofano and V. K. Prasanna. Computing Lennard-Jones Potentials and Forces with Reconfigurable Hardware. In Proc. Int'l Conf. Eng. of Reconfigurable Systems and Algorithms (ERSA'04), pages 284--290, June 2004.
[14]
R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Ed. SIAM, 1994.
[15]
NUMALink. http://www.nasi.com/sgi_NUMAlink.php. Accessed in: March/2009.
[16]
Laurenz Christian Buri, Studies of Cassicals HPC Problems on fine-grained and massively parallel computing enviromnment based on reconfigurable hardware, Msc. Thesis, Department of Microelectronics and Information Technology IMIT KTH, 2006.
[17]
Barros, A. C., Medeiros, V. W., Souza, V. L., Nascimento, P. S., Mazer, Â., Barbosa, J. P., Neves, B. P., Santos, I., and de Lima, M. E. 2008. Implementation of a double-precision multiplier accumulator with exception treatment to a dense matrix multiplier module in FPGA. In Proceedings of the 21st Annual Symposium on integrated Circuits and System Design (Gramado, Brazil, September 01--04, 2008). SBCCI '08. ACM, New York, NY, 40--45.
[18]
SSP Stub Users Guide http://techpubs.sgi.com/library/tpl/cgibin/getdoc.cgi?coll=linux&db=bks&fname=/SGI_EndUser/RASC_UG/apb.html. Accessed in: March/2009.

Cited By

View all
  • (2011)An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point OperationsProceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPS.2011.165(306-309)Online publication date: 16-May-2011
  • (2010)A high performance full pipelined arquitecture of MLP Neural Networks in FPGA2010 17th IEEE International Conference on Electronics, Circuits and Systems10.1109/ICECS.2010.5724619(742-745)Online publication date: Dec-2010

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SBCCI '09: Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
August 2009
325 pages
ISBN:9781605587059
DOI:10.1145/1601896
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 August 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BRAMs (RAM blocks)
  2. FPGA (field programmable gate array)
  3. MAC (multiplier unit)
  4. RASC (reconfigurable application-specific computing)
  5. data reuse
  6. matrix multiplication
  7. parallelism
  8. performance

Qualifiers

  • Research-article

Conference

SBCCI '09
Sponsor:

Acceptance Rates

SBCCI '09 Paper Acceptance Rate 50 of 119 submissions, 42%;
Overall Acceptance Rate 133 of 347 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)3
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2011)An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point OperationsProceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPS.2011.165(306-309)Online publication date: 16-May-2011
  • (2010)A high performance full pipelined arquitecture of MLP Neural Networks in FPGA2010 17th IEEE International Conference on Electronics, Circuits and Systems10.1109/ICECS.2010.5724619(742-745)Online publication date: Dec-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media