More Web Proxy on the site http://driver.im/

research-article

Area-efficient arithmetic expression evaluation using deeply pipelined floating-point cores

Authors:

Ronald Scrofano,

Viktor K. PrasannaAuthors Info & Claims

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume 16, Issue 2

Pages 167 - 176

https://doi.org/10.1109/TVLSI.2007.912038

Published: 01 February 2008 Publication History

Abstract

Recently, it has become possible to implement floating-point cores on field-programmable gate arrays (FPGAs) to provide acceleration for the myriad applications that require high-performance floating-point arithmetic. To achieve high clock rates, floating-point cores for FPGAs must be deeply pipelined. This deep pipelining makes it difficult to reuse the same floating-point core for a series of dependent computations. However, floating-point cores use a great deal of area, so it is important to use as few of them in an architecture as possible. In this paper, we describe area-efficient architectures and algorithms for arithmetic expression evaluation. Such expression evaluation is necessary in applications from a wide variety of fields, including scientific computing and cognition. The proposed designs effectively hide the pipeline latency of the floating-point cores and use at most two floating-point cores for each type of operator in the expression. While best-suited for particular classes of expressions, the proposed designs can evaluate general expressions as well. Additionally, multiple expressions can be evaluated without reconfiguration. Experimental results show that the areas of our designs increase linearly with the number of types of operations in the expression and that our designs occupy less area and achieve higher throughput than designs generated by a commercial hard-ware compiler.

References

[1]

D. Bader, S. Sreshta, and N. Weisse-Bernstein, "Evaluating arithmetic expressions using tree contraction: A fast and scalable parallel implementation for symmetric multiprocessors (SMPs)," in Proc. 9th Int. Conf. High Perform. Comput., 2002, pp. 63-75.

Digital Library

[2]

R.P. Brent, "The parallel evaluation of general arithmetic expressions," J. Assoc. Comput. Mach., vol. 21, no. 2, pp. 201-206, Apr. 1974.

Digital Library

[3]

G. L. Miller and J. H. Reif, "Parallel tree contraction and its application," in Proc. 26th IEEE Symp. Foundations Comput. Sci., 1985, pp. 478-489.

Digital Library

[4]

J. JáJá, An Introduction to Parallel Algorithms. Reading, MA: Addison-Wesley, 1992.

Digital Library

[5]

D. Kuck and Y. Muraoka, "Bounds on the parallel evaluation of arithmetic expressions using associativity and commutativity," Acta Inform., vol. 3, no. 3, pp. 203-216, Sep. 1974.

Digital Library

[6]

D. Kuck and K. Maruyama, "Time bounds on the parallel evaluation of arithmetic expressions," SIAM J. Comput., vol. 4, no. 2, pp. 147-162, Jun. 1975.

[7]

A. V. Kozlov and J. P. Singh, "A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference," in Proc. 1994 ACM/IEEE Conf. Supercomput., 1994, pp. 320-329.

Digital Library

[8]

G. Govindu, R. Scrofano, and V. K. Prasanna, "A library of parameterizable floating-point cores for FPGAs and their application to scientifc computing," in Proc. Int. Conf. Eng. Reconfigurable Syst. Algorithms, 2005, pp. 137-148.

[9]

L. Zhuo and V. K. Prasanna, "Sparse matrix-vector multiplication on FPGAs," in Proc. 13th ACM Int. Symp. Field-Program. Gate Arrays, 2005, pp. 63-74.

Digital Library

[10]

R. Cole and U. Vishkin, "The accelerated centroid decomposition technique for optimal parallel tree evaluation in logarithmic time," Algorithmica , vol. 3, pp. 329-346, Mar. 1988.

Digital Library

[11]

B. Pradeep and C. S. R. Murthy, "Parallel arithmetic expression evaluation on reconfigurable meshes," Comp. Lang., vol. 20, no. 4, pp. 267-277, Nov. 1994.

Digital Library

[12]

M. Wojko and H. ElGindy, "On determining polynomial evaluation structures for FPGA based custom computing machines," in Proc. 4th Australasian Comput. Arch. Conf., 1999, pp. 11-22.

[13]

N. Park and A. Parker, "Sehwa: A program for synthesis of pipelines," in Proc. 23rd Des. Autom. Conf., 1986, pp. 454-460.

Digital Library

[14]

P. Paulin and J. Knight, "Force-directed scheduling in automatic data path synthesis," in Proc. 24th Des. Autom. Conf., 1987, pp. 195-202.

Digital Library

[15]

R. Jain, A. Parker, and N. Park, "Module selection for pipelined synthesis," in Proc. 25th Design Autom. Conf., 1988, pp. 542-547.

Digital Library

[16]

C. Chen and M. Moricz, "Data path scheduling for two-level pipelining," in Proc. 28th Des. Autom. Conf., 1991, pp. 603-606.

Digital Library

[17]

W. Sun, M. Wirthlin, and S. Neuendorffer, "Combining module selection and resource sharing for efficient FPGA pipeline synthesis," in Proc. 14th ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, 2006, pp. 179-188.

Digital Library

[18]

D. S. Poznanovic, "Application development on the SRC Computers, Inc. Systems," in Proc. IEEE Int. Parallel Distrib. Symp., 2005, p. 78a.

Digital Library

Cited By

(2018)Throughput enhancement of SISO parallel LTE turbo decoders using floating point turbo decoding algorithmInternational Journal of Wireless and Mobile Computing10.5555/3282783.328279115:1(58-66)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3282783.3282791
Martinez RTorres DMadrigal MMaximov S(2009)Parallel processors architecture in FPGA for the solution of linear equations systemsProceedings of the 8th WSEAS international conference on System science and simulation in engineering10.5555/1938841.1938866(119-124)Online publication date: 17-Oct-2009
https://dl.acm.org/doi/10.5555/1938841.1938866
Martinez RTorres DMadrigal MMaximov S(2009)Parallel architecture for the solution of linear equations systems based on division free Gaussian elimination method implemented in FPGAWSEAS Transactions on Circuits and Systems10.5555/1718026.17180308:10(832-842)Online publication date: 1-Oct-2009
https://dl.acm.org/doi/10.5555/1718026.1718030

Index Terms

Area-efficient arithmetic expression evaluation using deeply pipelined floating-point cores
1. Hardware

Recommendations

Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores
IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04

The use of pipelined floating-point arithmetic cores to create high-performance FPGA-based computational kernels has introduced a new class of problems that do not exist when using single-cycle arithmetic cores. In particular, the data hazards ...
Multiple-Valued Arithmetic Integrated Circuits Based on 1.5V-Supply Dual-Rail Source-Coupled Logic
ISMVL '95: Proceedings of the 25th International Symposium on Multiple-Valued Logic

Abstract: This paper presents a new multiple-valued current-mode MOS integrated circuit for high-speed arithmetic systems with a low supply voltage. The use of a multiple-valued source-coupled logic circuit with dual-rail complementary inputs makes a ...
FPGA optimizations for a pipelined floating-point exponential unit
ARC'11: Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications

The large number of available DSP slices on new-generation FPGAs allows for efficient mapping and acceleration of floating-point intensive codes. Numerous scientific codes heavily rely on executing the exponential function. To this end, we present the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Volume 16, Issue 2

February 2008

104 pages

ISSN:1063-8210

Issue’s Table of Contents

Copyright © 2008.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 February 2008

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

(2018)Throughput enhancement of SISO parallel LTE turbo decoders using floating point turbo decoding algorithmInternational Journal of Wireless and Mobile Computing10.5555/3282783.328279115:1(58-66)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3282783.3282791
Martinez RTorres DMadrigal MMaximov S(2009)Parallel processors architecture in FPGA for the solution of linear equations systemsProceedings of the 8th WSEAS international conference on System science and simulation in engineering10.5555/1938841.1938866(119-124)Online publication date: 17-Oct-2009
https://dl.acm.org/doi/10.5555/1938841.1938866
Martinez RTorres DMadrigal MMaximov S(2009)Parallel architecture for the solution of linear equations systems based on division free Gaussian elimination method implemented in FPGAWSEAS Transactions on Circuits and Systems10.5555/1718026.17180308:10(832-842)Online publication date: 1-Oct-2009
https://dl.acm.org/doi/10.5555/1718026.1718030

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents