[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Area-efficient arithmetic expression evaluation using deeply pipelined floating-point cores

Published: 01 February 2008 Publication History

Abstract

Recently, it has become possible to implement floating-point cores on field-programmable gate arrays (FPGAs) to provide acceleration for the myriad applications that require high-performance floating-point arithmetic. To achieve high clock rates, floating-point cores for FPGAs must be deeply pipelined. This deep pipelining makes it difficult to reuse the same floating-point core for a series of dependent computations. However, floating-point cores use a great deal of area, so it is important to use as few of them in an architecture as possible. In this paper, we describe area-efficient architectures and algorithms for arithmetic expression evaluation. Such expression evaluation is necessary in applications from a wide variety of fields, including scientific computing and cognition. The proposed designs effectively hide the pipeline latency of the floating-point cores and use at most two floating-point cores for each type of operator in the expression. While best-suited for particular classes of expressions, the proposed designs can evaluate general expressions as well. Additionally, multiple expressions can be evaluated without reconfiguration. Experimental results show that the areas of our designs increase linearly with the number of types of operations in the expression and that our designs occupy less area and achieve higher throughput than designs generated by a commercial hard-ware compiler.

References

[1]
D. Bader, S. Sreshta, and N. Weisse-Bernstein, "Evaluating arithmetic expressions using tree contraction: A fast and scalable parallel implementation for symmetric multiprocessors (SMPs)," in Proc. 9th Int. Conf. High Perform. Comput., 2002, pp. 63-75.
[2]
R.P. Brent, "The parallel evaluation of general arithmetic expressions," J. Assoc. Comput. Mach., vol. 21, no. 2, pp. 201-206, Apr. 1974.
[3]
G. L. Miller and J. H. Reif, "Parallel tree contraction and its application," in Proc. 26th IEEE Symp. Foundations Comput. Sci., 1985, pp. 478-489.
[4]
J. JáJá, An Introduction to Parallel Algorithms. Reading, MA: Addison-Wesley, 1992.
[5]
D. Kuck and Y. Muraoka, "Bounds on the parallel evaluation of arithmetic expressions using associativity and commutativity," Acta Inform., vol. 3, no. 3, pp. 203-216, Sep. 1974.
[6]
D. Kuck and K. Maruyama, "Time bounds on the parallel evaluation of arithmetic expressions," SIAM J. Comput., vol. 4, no. 2, pp. 147-162, Jun. 1975.
[7]
A. V. Kozlov and J. P. Singh, "A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference," in Proc. 1994 ACM/IEEE Conf. Supercomput., 1994, pp. 320-329.
[8]
G. Govindu, R. Scrofano, and V. K. Prasanna, "A library of parameterizable floating-point cores for FPGAs and their application to scientifc computing," in Proc. Int. Conf. Eng. Reconfigurable Syst. Algorithms, 2005, pp. 137-148.
[9]
L. Zhuo and V. K. Prasanna, "Sparse matrix-vector multiplication on FPGAs," in Proc. 13th ACM Int. Symp. Field-Program. Gate Arrays, 2005, pp. 63-74.
[10]
R. Cole and U. Vishkin, "The accelerated centroid decomposition technique for optimal parallel tree evaluation in logarithmic time," Algorithmica , vol. 3, pp. 329-346, Mar. 1988.
[11]
B. Pradeep and C. S. R. Murthy, "Parallel arithmetic expression evaluation on reconfigurable meshes," Comp. Lang., vol. 20, no. 4, pp. 267-277, Nov. 1994.
[12]
M. Wojko and H. ElGindy, "On determining polynomial evaluation structures for FPGA based custom computing machines," in Proc. 4th Australasian Comput. Arch. Conf., 1999, pp. 11-22.
[13]
N. Park and A. Parker, "Sehwa: A program for synthesis of pipelines," in Proc. 23rd Des. Autom. Conf., 1986, pp. 454-460.
[14]
P. Paulin and J. Knight, "Force-directed scheduling in automatic data path synthesis," in Proc. 24th Des. Autom. Conf., 1987, pp. 195-202.
[15]
R. Jain, A. Parker, and N. Park, "Module selection for pipelined synthesis," in Proc. 25th Design Autom. Conf., 1988, pp. 542-547.
[16]
C. Chen and M. Moricz, "Data path scheduling for two-level pipelining," in Proc. 28th Des. Autom. Conf., 1991, pp. 603-606.
[17]
W. Sun, M. Wirthlin, and S. Neuendorffer, "Combining module selection and resource sharing for efficient FPGA pipeline synthesis," in Proc. 14th ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, 2006, pp. 179-188.
[18]
D. S. Poznanovic, "Application development on the SRC Computers, Inc. Systems," in Proc. IEEE Int. Parallel Distrib. Symp., 2005, p. 78a.

Cited By

View all
  • (2018)Throughput enhancement of SISO parallel LTE turbo decoders using floating point turbo decoding algorithmInternational Journal of Wireless and Mobile Computing10.5555/3282783.328279115:1(58-66)Online publication date: 1-Jan-2018
  • (2009)Parallel processors architecture in FPGA for the solution of linear equations systemsProceedings of the 8th WSEAS international conference on System science and simulation in engineering10.5555/1938841.1938866(119-124)Online publication date: 17-Oct-2009
  • (2009)Parallel architecture for the solution of linear equations systems based on division free Gaussian elimination method implemented in FPGAWSEAS Transactions on Circuits and Systems10.5555/1718026.17180308:10(832-842)Online publication date: 1-Oct-2009

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems  Volume 16, Issue 2
February 2008
104 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 February 2008

Author Tags

  1. Expression evaluation
  2. expression evaluation
  3. pipeline arithmetic

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Throughput enhancement of SISO parallel LTE turbo decoders using floating point turbo decoding algorithmInternational Journal of Wireless and Mobile Computing10.5555/3282783.328279115:1(58-66)Online publication date: 1-Jan-2018
  • (2009)Parallel processors architecture in FPGA for the solution of linear equations systemsProceedings of the 8th WSEAS international conference on System science and simulation in engineering10.5555/1938841.1938866(119-124)Online publication date: 17-Oct-2009
  • (2009)Parallel architecture for the solution of linear equations systems based on division free Gaussian elimination method implemented in FPGAWSEAS Transactions on Circuits and Systems10.5555/1718026.17180308:10(832-842)Online publication date: 1-Oct-2009

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media