More Web Proxy on the site http://driver.im/

Article

Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores

Authors:

Gerald R. Morris,

Viktor K. PrasannaAuthors Info & Claims

IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04

Page 147.1

Published: 04 April 2005 Publication History

Abstract

The use of pipelined floating-point arithmetic cores to create high-performance FPGA-based computational kernels has introduced a new class of problems that do not exist when using single-cycle arithmetic cores. In particular, the data hazards associated with pipelined floating-point reduction circuits can limit the scalability or severely reduce the performance of an otherwise high-performance computational kernel. The inability to efficiently execute the reduction in hardware coupled with memory bandwidth issues may even negate the performance gains derived from hardware acceleration of the kernel. In this paper we introduce a method for developing scalable floating-point reduction circuits that run in optimal time while requiring only (lg (n)) space and a single pipelined floating-point unit. Using a Xilinx Virtex-II Pro as the target device, we implement reference instances of our reduction method and present the FPGA design statistics supporting our scalability claims.

References

[1]

Altera Corporation. http://www.altera.com/.

[2]

N. W. Bergmann and J. A. Williams. The Egret platform for reconfigurable System-on-Chip. In Proceedings of the IEEE International Conference on Field-Programmable Technology , pages 340-343, Tokyo, December 2003.

[3]

Cray Inc. Cray XD1¿. http://www.cray.com/ products/xd1/.

[4]

W. Fithian, S. Brown, R. Singleterry, and O. Storaasli. Iterative matrix equation solver for a reconfigurable FPGA-based hypercomputer. http://www.starbridgesystems. com/resources/publications, September 2003.

[5]

G. Govindu, L. Zhuo, S. Choi, and V. K. Prasanna. Analysis of high-performance floating-point arithmetic on FPGAs. In Proceedings of the 11th Reconfigurable Architectures Workshop , Santa Fe, NM, April 2004.

[6]

M. Leeser and X. Wang. Variable precision floating-point division and square root. In Proceedings of the 8th Annual High Performance Embedded Computing Workshop, HPEC 2004 , pages 47-48, Lexington, MA, September 2004.

[7]

Z. Luo and M. Martonosi. Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques. IEEE Transactions of Computers , 49(3):208-218, March 2000.

Digital Library

[8]

P. Lysaght and D. Levi. Of gates and wires. In Proceedings of the 18th International Parallel and Distributed Processing Symposium , page 132, Santa Fe, NM, April 2004.

[9]

U. Malik, K. So, and O. Diessel. Resource-aware run-time elaboration of behavioural FPGA specifications. In Proceedings of the IEEE International Conference on Field-Programmable Technology , pages 68-75, Hong Kong, December 2002.

[10]

R. Scrofano and V. K. Prasanna. Computing Lennard-Jones potentials and forces with reconfigurable hardware. In Proceedings of the Interational Conference on Engineering Reconfigurable Systems and Algorithms , pages 284-290, Las Vegas, NV, June 2004.

[11]

SRC Computers. MAPstation¿. http://www. srccomp.com/MAPstations.htm.

[12]

K. D. Underwood and K. S. Hemmert. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines , April 2004.

Digital Library

[13]

Xilinx Inc. http://www.xilinx.com.

[14]

L. Zhuo and V. K. Prasanna. Design Tradeoffs for BLAS Operations on Reconfigurable Hardware. submitted to the 34th International Conference on Parallel Processing, 2005.

Digital Library

[15]

L. Zhuo and V. K. Prasanna. Sparse Matrix-Vector Multiplication on FPGAs. In Proceedings of the 13th ACM International Symposium on Field-Programmable Gate Arrays , Montery, California, February 2005.

Digital Library

Cited By

Jin ZFinkel H(2018)A Case Study of Integer Sum Reduction using AtomicsProceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies10.1145/3241793.3241809(1-7)Online publication date: 20-Jun-2018
https://dl.acm.org/doi/10.1145/3241793.3241809
Grigoras PBurovskiy PLuk WChen DGreene J(2016)CASKProceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/2847263.2847338(179-184)Online publication date: 21-Feb-2016
https://dl.acm.org/doi/10.1145/2847263.2847338
Wilson DStitt G(2016)The Unified Accumulator ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/28094329:3(1-23)Online publication date: 20-May-2016
https://dl.acm.org/doi/10.1145/2809432
Show More Cited By

Index Terms

Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores
1. Computer systems organization
  1. Architectures
    1. Serial architectures
      1. Pipeline computing
2. Hardware
  1. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits
  2. Very large scale integration design
    1. Application-specific VLSI designs

Recommendations

Floating-point FPGA: architecture and modeling

This paper presents an architecture for a reconfigurable device that is specifically optimized for floating-point applications. Fine-grained units are used for implementing control logic and bit-oriented operations, while parameterized and ...
FPGA-based, floating-point reduction operations
MATH'06: Proceedings of the 10th WSEAS International Conference on APPLIED MATHEMATICS

Floating-point reduction operations are a vital part of scientific computational kernels, such as vector dot-products, discrete cosine transforms (DCT), and matrix-matrix multiplications. As FPGAs continue to gain popularity in custom and embedded ...
FPGA optimizations for a pipelined floating-point exponential unit
ARC'11: Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications

The large number of available DSP slices on new-generation FPGAs allows for efficient mapping and acceleration of floating-point intensive codes. Numerous scientific codes heavily rely on executing the exponential function. To this end, we present the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04

April 2005

ISBN:0769523129

Publisher

IEEE Computer Society

United States

Publication History

Published: 04 April 2005

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jin ZFinkel H(2018)A Case Study of Integer Sum Reduction using AtomicsProceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies10.1145/3241793.3241809(1-7)Online publication date: 20-Jun-2018
https://dl.acm.org/doi/10.1145/3241793.3241809
Grigoras PBurovskiy PLuk WChen DGreene J(2016)CASKProceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/2847263.2847338(179-184)Online publication date: 21-Feb-2016
https://dl.acm.org/doi/10.1145/2847263.2847338
Wilson DStitt G(2016)The Unified Accumulator ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/28094329:3(1-23)Online publication date: 20-May-2016
https://dl.acm.org/doi/10.1145/2809432
Mish SZenor JCrosbie RVakilzadian HCrosbie RHuntsinger RCooper K(2013)An efficient FPGA matrix multiplier for linear system simulationProceedings of the 2013 Grand Challenges on Modeling and Simulation Conference10.5555/2557668.2557671(1-5)Online publication date: 7-Jul-2013
https://dl.acm.org/doi/10.5555/2557668.2557671
Wang XLeeser M(2010)VFloatACM Transactions on Reconfigurable Technology and Systems10.1145/1839480.18394863:3(1-34)Online publication date: 1-Sep-2010
https://dl.acm.org/doi/10.1145/1839480.1839486
Zhuo LMorris GPrasanna V(2007)High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.106818:10(1377-1392)Online publication date: 1-Oct-2007
https://dl.acm.org/doi/10.1109/TPDS.2007.1068
Kwatra APrasanna VSingh M(2006)Accelerating DTI tractography using FPGAsProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898953.1899141(194-194)Online publication date: 25-Apr-2006
https://dl.acm.org/doi/10.5555/1898953.1899141
Zhuo LPrasanna VSchmit HWilton S(2005)Sparse Matrix-Vector multiplication on FPGAsProceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays10.1145/1046192.1046202(63-74)Online publication date: 20-Feb-2005
https://dl.acm.org/doi/10.1145/1046192.1046202
Zhuo LPrasanna VKramer W(2005)High Performance Linear Algebra Operations on Reconfigurable SystemsProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.31Online publication date: 12-Nov-2005
https://dl.acm.org/doi/10.1109/SC.2005.31

View Options

View options

Figures

Tables

Media

View Table of Conten