[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1053730.1054524guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores

Published: 04 April 2005 Publication History

Abstract

The use of pipelined floating-point arithmetic cores to create high-performance FPGA-based computational kernels has introduced a new class of problems that do not exist when using single-cycle arithmetic cores. In particular, the data hazards associated with pipelined floating-point reduction circuits can limit the scalability or severely reduce the performance of an otherwise high-performance computational kernel. The inability to efficiently execute the reduction in hardware coupled with memory bandwidth issues may even negate the performance gains derived from hardware acceleration of the kernel. In this paper we introduce a method for developing scalable floating-point reduction circuits that run in optimal time while requiring only (lg (n)) space and a single pipelined floating-point unit. Using a Xilinx Virtex-II Pro as the target device, we implement reference instances of our reduction method and present the FPGA design statistics supporting our scalability claims.

References

[1]
Altera Corporation. http://www.altera.com/.
[2]
N. W. Bergmann and J. A. Williams. The Egret platform for reconfigurable System-on-Chip. In Proceedings of the IEEE International Conference on Field-Programmable Technology , pages 340-343, Tokyo, December 2003.
[3]
Cray Inc. Cray XD1¿. http://www.cray.com/ products/xd1/.
[4]
W. Fithian, S. Brown, R. Singleterry, and O. Storaasli. Iterative matrix equation solver for a reconfigurable FPGA-based hypercomputer. http://www.starbridgesystems. com/resources/publications, September 2003.
[5]
G. Govindu, L. Zhuo, S. Choi, and V. K. Prasanna. Analysis of high-performance floating-point arithmetic on FPGAs. In Proceedings of the 11th Reconfigurable Architectures Workshop , Santa Fe, NM, April 2004.
[6]
M. Leeser and X. Wang. Variable precision floating-point division and square root. In Proceedings of the 8th Annual High Performance Embedded Computing Workshop, HPEC 2004 , pages 47-48, Lexington, MA, September 2004.
[7]
Z. Luo and M. Martonosi. Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques. IEEE Transactions of Computers , 49(3):208-218, March 2000.
[8]
P. Lysaght and D. Levi. Of gates and wires. In Proceedings of the 18th International Parallel and Distributed Processing Symposium , page 132, Santa Fe, NM, April 2004.
[9]
U. Malik, K. So, and O. Diessel. Resource-aware run-time elaboration of behavioural FPGA specifications. In Proceedings of the IEEE International Conference on Field-Programmable Technology , pages 68-75, Hong Kong, December 2002.
[10]
R. Scrofano and V. K. Prasanna. Computing Lennard-Jones potentials and forces with reconfigurable hardware. In Proceedings of the Interational Conference on Engineering Reconfigurable Systems and Algorithms , pages 284-290, Las Vegas, NV, June 2004.
[11]
SRC Computers. MAPstation¿. http://www. srccomp.com/MAPstations.htm.
[12]
K. D. Underwood and K. S. Hemmert. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines , April 2004.
[13]
Xilinx Inc. http://www.xilinx.com.
[14]
L. Zhuo and V. K. Prasanna. Design Tradeoffs for BLAS Operations on Reconfigurable Hardware. submitted to the 34th International Conference on Parallel Processing, 2005.
[15]
L. Zhuo and V. K. Prasanna. Sparse Matrix-Vector Multiplication on FPGAs. In Proceedings of the 13th ACM International Symposium on Field-Programmable Gate Arrays , Montery, California, February 2005.

Cited By

View all
  • (2018)A Case Study of Integer Sum Reduction using AtomicsProceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies10.1145/3241793.3241809(1-7)Online publication date: 20-Jun-2018
  • (2016)CASKProceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/2847263.2847338(179-184)Online publication date: 21-Feb-2016
  • (2016)The Unified Accumulator ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/28094329:3(1-23)Online publication date: 20-May-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
April 2005
ISBN:0769523129

Publisher

IEEE Computer Society

United States

Publication History

Published: 04 April 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)A Case Study of Integer Sum Reduction using AtomicsProceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies10.1145/3241793.3241809(1-7)Online publication date: 20-Jun-2018
  • (2016)CASKProceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/2847263.2847338(179-184)Online publication date: 21-Feb-2016
  • (2016)The Unified Accumulator ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/28094329:3(1-23)Online publication date: 20-May-2016
  • (2013)An efficient FPGA matrix multiplier for linear system simulationProceedings of the 2013 Grand Challenges on Modeling and Simulation Conference10.5555/2557668.2557671(1-5)Online publication date: 7-Jul-2013
  • (2010)VFloatACM Transactions on Reconfigurable Technology and Systems10.1145/1839480.18394863:3(1-34)Online publication date: 1-Sep-2010
  • (2007)High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.106818:10(1377-1392)Online publication date: 1-Oct-2007
  • (2006)Accelerating DTI tractography using FPGAsProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898953.1899141(194-194)Online publication date: 25-Apr-2006
  • (2005)Sparse Matrix-Vector multiplication on FPGAsProceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays10.1145/1046192.1046202(63-74)Online publication date: 20-Feb-2005
  • (2005)High Performance Linear Algebra Operations on Reconfigurable SystemsProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.31Online publication date: 12-Nov-2005

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media