[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1356802.1356841acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

Efficient synthesis of compressor trees on FPGAs

Published: 21 January 2008 Publication History

Abstract

FPGA performance is currently lacking for arithmetic circuits. Large sums of k > 2 integer values is a computationally intensive operation in applications such as digital signal and video processing. In ASIC design, compressor trees, such as Wallace and Dadda Trees, are used for parallel accumulation; however, the LUT structure and fast carry-chains employed by modern FPGAs favor trees of carry-propagate adders (CPAs), which are a poor choice for ASIC design. This paper presents the first method to successfully synthesize compressor trees on LUT-based FPGAs. In particular, we have found that generalized parallel counters (GPCs) map quite well to LUTs on FPGAs; a heuristic, presented within, constructs a compressor tree from a library of GPCs that can efficiently be implemented on the target FPGA. Compared to the ternary adder trees produced by commercial synthesis tools, our heuristc reduces the combinational delay by 27.5%, on average, within a tolerable average area increase of 5.7%.

References

[1]
Altera, Corp. "Stratix-II vs. Virtex-4 performance comparison, ver. 2.0", September, 2006, available online from http://www.altera.com/
[2]
Altera, Corp. "The Stratix-II device handbook" available online from http://www.altera.com/
[3]
Altera, Corp. "The Stratix-III device handbook," available online from http://www.altera.com/
[4]
P. Brisk, A. K. Verma, H. Parandeh-Afshar, and P. Ienne, "Enhancing FPGA performance for arithmetic circuits," Design Automation Conf., San Diego, CA, USA, June 4--8, 2007.
[5]
C-Y. Chen, S-Y. Chien, Y-W. Huang, T-C. Chen, T-C. Wang, and L-G. Chen, "Analysis and architecture design of variable block-size motion estimation for H.264/AVC," IEEE Trans. Circuits and Systems-I, Vol. 53, No. 2, pp. 578--593, Feb., 2006.
[6]
J. Cong and Y. Ding, "FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs," IEEE Trans. CAD, Vol. 13, No. 1, pp. 1--12, Jan., 1994.
[7]
L. Dadda, "Some schemes for parallel multipliers," Alta Frequenza, Vol. 34, pp. 349--356, May, 1965.
[8]
S. Hauck, M. M. Hosler, and T. W. Fry, "High-performance carry chains for FPGAs," IEEE Trans. VLSI Systems, Vol 8, No. 2, pp. 138--147, April, 2000.
[9]
I. Kuon and J. Rose, "Measuring the gap between FPGAs and ASICs," IEEE Trans. CAD, Vol. 26, No. 2, pp. 203--215, Feb., 2007.
[10]
S. Mirzaei, A. Hosangadi, and R. Kastner, "High speed FIR filter implementation using add and shift method," Int. Conf. Computer Design, San Jose, CA, USA, Oct. 1--4, 2006.
[11]
B. Parhami, Computer Arithmetic, Algorithms and Hardware Designs. New York: Oxford Press, 2000.
[12]
J. Poldre, and K. Tammemae, "Reconfigurable multiplier for Virtex FPGA family," Int. Workshop on Field- Programmable Logic and Applications, Glasgow, Scotland, UK, pp. 359--364, Aug. 30 -- Sept. 1, 1999.
[13]
S. Sriram, K. Brown, R. Defosseux, F. Moerman, O. Paviot, V. Sundararajan, and A. Gatherer, "A 64 channel programmable receiver chip for 3G wireless infrastructure," IEEE Custom Integrated Circuits Conf., San Jose, CA, USA, pp. 59--62, Sept. 18--21, 2005.
[14]
P. F. Stelling, C. U. Martel, V. J. Oklobdzija, and R. Ravi, "Optimal circuits for parallel multipliers," IEEE Trans. Computers, Vol. 47, No. 3, pp. 273--285, March, 1998.
[15]
W. J. Stenzel, W. J. Kubitz, and G. H. Garcia, "A compact high-speed parallel multiplication scheme," IEEE Trans. Computers, Vol. C-26, No. 10, pp. 948--957, October, 1977.
[16]
J. Um and T. Kim, "Layout-aware synthesis of arithmetic circuits," Design Automation Conf, New Orleans, LA, USA, pp. 207--212, June 10--14, 2002.
[17]
A. K. Verma and P. Ienne, "Automatic synthesis of compressor trees: reevaluating large counters," Design Automation and Test in Europe, pp. 1--6, Nice, France, April 16--20, 2007.
[18]
A. K. Verma and P. Ienne, "Improved use of the carry-save representation for the synthesis of complex arithmetic circuits," Int. Conf. Computer-Aided Design, San Jose, CA, USA, pp. 791--798, Nov. 7--11, 2004.
[19]
C. S. Wallace, "A suggestion for a fast multiplier," IEEE Trans. Electronic Computers, Vol. 13, pp. 14--17, Feb., 1964.
[20]
Xilinx Corporation, Virtex-4 User Guide, available online from: http://www.xilinx.com/
[21]
Xilinx Corporation, Virtex-5 User Guide, available online from: http://www.xilinx.com/
[22]
P. S. Zuchowski, C. B. Reynolds, R. J. Grupp, S. G. Davis, B. Cremen, and B. Troxel, "A hybrid ASIC and FPGA architecture," Int. Conf. Computer-Aided Design, San Jose, CA, USA pp. 187--194, Nov. 10--14, 2002.

Cited By

View all
  • (2021)Efficient LUT-based FPGA Accelerator Design for Universal Quantized CNN Inference2021 2nd Asia Service Sciences and Software Engineering Conference10.1145/3456126.3456140(108-115)Online publication date: 24-Feb-2021
  • (2020)LUXORProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375303(161-171)Online publication date: 23-Feb-2020
  • (2019)Reconfigurable Convolutional Kernels for Neural Networks on FPGAsProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293905(43-52)Online publication date: 20-Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASP-DAC '08: Proceedings of the 2008 Asia and South Pacific Design Automation Conference
January 2008
812 pages
ISBN:9781424419227

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 21 January 2008

Check for updates

Qualifiers

  • Research-article

Conference

ASPDAC '08
Sponsor:

Acceptance Rates

ASP-DAC '08 Paper Acceptance Rate 122 of 350 submissions, 35%;
Overall Acceptance Rate 466 of 1,454 submissions, 32%

Upcoming Conference

ASPDAC '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Efficient LUT-based FPGA Accelerator Design for Universal Quantized CNN Inference2021 2nd Asia Service Sciences and Software Engineering Conference10.1145/3456126.3456140(108-115)Online publication date: 24-Feb-2021
  • (2020)LUXORProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375303(161-171)Online publication date: 23-Feb-2020
  • (2019)Reconfigurable Convolutional Kernels for Neural Networks on FPGAsProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293905(43-52)Online publication date: 20-Feb-2019
  • (2017)Fast Multiplier Generator for FPGAs with LUT based Partial Product Generation and Column/row CompressionIntegration, the VLSI Journal10.1016/j.vlsi.2016.12.01257:C(147-157)Online publication date: 1-Mar-2017
  • (2016)Comment on “High Efficiency Generalized Parallel Counters for Look-Up Table Based FPGAsźInternational Journal of Reconfigurable Computing10.1155/2016/30154032016(1)Online publication date: 1-Sep-2016
  • (2016)High efficiency generalized parallel counters for look-up table based FPGAsInternational Journal of Reconfigurable Computing10.1155/2015/5182722015(5-5)Online publication date: 1-Jan-2016
  • (2011)Power and delay aware synthesis of multi-operand adders targeting LUT-based FPGAsProceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design10.5555/2016802.2016854(217-222)Online publication date: 1-Aug-2011
  • (2011)Compressor tree synthesis on commercial high-performance FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/2068716.20687254:4(1-19)Online publication date: 28-Dec-2011
  • (2010)Multi-operand adder synthesis on FPGAs using generalized parallel countersProceedings of the 2010 Asia and South Pacific Design Automation Conference10.5555/1899721.1899793(337-342)Online publication date: 18-Jan-2010
  • (2009)Efficient scheme for implementing large size signed multipliers using multigranular embedded DSP blocks in FPGAsInternational Journal of Reconfigurable Computing10.1155/2009/1451302009(1-11)Online publication date: 1-Jan-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media