More Web Proxy on the site http://driver.im/

research-article

Resource-constrained multiprocessor synthesis for floating-point applications on FPGAs

Authors:

Pallav GuptaAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 16, Issue 4

Article No.: 41, Pages 1 - 29

https://doi.org/10.1145/2003695.2003701

Published: 27 October 2011 Publication History

Abstract

Although state-of-the-art field-programmable gate arrays offer exciting new opportunities in exploring low-cost high-performance architectures for data-intensive scientific applications, they also present serious challenges. Multiprocessor-on-programmable-chip, which integrates software programmability and hardware reconfiguration, provides substantial flexibility that results in shorter design cycles, higher performance, and lower cost. In this article, we present an application-specific design methodology for multiprocessor-on-programmable-chip architectures that target applications involving large matrices and floating-point operations. Given an application with specific energy-performance and resource constraints, our methodology aims to customize the architecture to match the diverse computation and communication requirements of the application tasks. Graph-based analysis of the application drives system synthesis that employs a precharacterized, parameterized hardware component library of functional units. Extensive experimental results for three diverse applications are presented to demonstrate the efficacy of our design methodology.

References

[1]

Altera Nios II. 2001. http://www.altera.com/products/ip/processors/nios2/ni2-index.html.

[2]

Bower, J., Luk, W., Mencer, O., Flynn, M. J., and Morf, M. 2006. Dynamic clock-frequencies for FPGAs. Microprocess. Microsyst. 30. 6, 388--397.

[3]

Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the International Symposium of Computer Architecture. 83--94.

Digital Library

[4]

Cardoso, J. M. P. and Diniz, P. C. 2008. Compilation Techniques for Reconfigurable Architectures. Springer, Berlin.

Digital Library

[5]

Choi, S., Jang, J., Mohanty, S., and Prasanna, V. 2003. Domain-specific modeling for rapid energy estimation of reconfigurable architectures. J. Supercomput. 26, 3, 259--281.

Digital Library

[6]

Clarke, J., Gaffar, A., and Constantinides, G. 2005. Parameterized logic power consumption models for FPGA-based arithmetic. In Proceedings of the International Conference on Field-Programmable Logic Applications. 626--629.

[7]

Cong, J., Han, G., and Jiang, W. 2007. Synthesis of an application-specific soft multiprocessor system. In Proceedings of the International Symposium on FPGAs. 99--107.

Digital Library

[8]

Cosoroaba, A. and Rivoallon, F. 2006. Achieving higher system performance with Virtex-5 family FPGAs. Tech. rep., Xilinx Corporation.

[9]

Craven, S., Patterson, C., and Athanas, P. 2006. A methodology for generating application-specific heterogeneous processor arrays. In Proceedings of the International Conference on System Sciences.

Digital Library

[10]

Eghan, A. 2006. Applying compact thermal models. Xilinx Xcell J. 59, 38--41.

[11]

El-Araby, E., Gonzalez, I., and El-Ghazawi, T. 2009. Exploiting partial runtime reconfiguration for high-performance reconfigur. computing. ACM Trans. Reconfigur. Technol. Syst. 1, 4, 1--23.

Digital Library

[12]

El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., and Buell, D. 2008. The promise of high-performance reconfigurable computing. IEEE Computer 41, 2, 69--76.

Digital Library

[13]

Golub, G. and Kahan, W. 1965. Calculating the singular values and pseudo-inverse of a matrix. SIAM J. Numer. Anal. 2, 3, 205--224.

[14]

Golub, G. and Van Loan, C. 1996. Matrix Computations. Johns Hopkins University Press, Baltimore, MD.

[15]

Ho, C. H., Yu, C. W., Leong, P., Luk, W., and Wilton, S. 2009. Floating-point FPGA: Architecture and modeling. IEEE Trans. VLSI Syst. 17, 12, 1709--1718.

Digital Library

[16]

Hofstee, H. P. 2005. Power efficient processor architecture and the cell processor. In Proceedings of the International Symposium on High-Perform. Computer Architecture. 258--262.

Digital Library

[17]

Huang, W., Ghosh, S., Velusamy, S., Sankaranarayanan, K., Skadron, K., and Stan, M. R. 2006. HotSpot: A compact thermal modeling methodology for early-stage VLSI design. IEEE Trans. VLSI Syst. 14, 5, 501--513.

Digital Library

[18]

Ishebabi, H. and Bobda, C. 2009. Automated architecture synthesis for parallel programs on FPGA multiprocessor systems. Microprocess. Microsyst. 33, 1, 63--71.

Digital Library

[19]

Kapre, N. and Dehon, A. 2009. Performance comparison of single-precision SPICE model-evaluation on FPGA, GPU, Cell, and multi-core processors. In Proceedings of the International Conference on Field Programmable Logic Applications. 65--72.

[20]

Krashinsky, R., Batten, C., Hampton, M., Gerding, S., Pharris, B., Casper, J., and Asanovic, K. 2004. The vector-thread architecture. IEEE Micro 24, 6, 84--90.

Digital Library

[21]

Kumar, A., Fernando, S., Ha, Y., Mesman, B., and Corporaal, H. 2008. Multiprocessor systems synthesis for multiple use-cases of multiple applications on FPGA. ACM Trans. Des. Autom. Electron. Syst. 13, 40:1--40:27.

Digital Library

[22]

Kuon, I., Tessier, R., and Rose, J. 2008. FPGA architecture: Survey and challenges. Found. Trends Electron. Design Autom. 2, 2, 135--253.

Digital Library

[23]

Lysecky, R. and Vahid, F. 2009. Design and implementation of a microblaze-based warp processor. ACM Trans. Embed. Comput. Syst. 8, 3, 1--22.

Digital Library

[24]

Matrix Market 2007. http://math.nist.gov/MatrixMarket/.Netlib 2005. http://www.netlib.org/.

[25]

Ronen, R., Mendelson, A., Lai, K., Lu, S.-L., Pollack, F., and Shen, J. 2001. Coming challenges in micro-architecture and architecture. Proc. IEEE 89, 3, 325--340.

[26]

Salminen, E., Kulmala, A., and Hamalaninen, T. 2005. HIBI-based multiprocessor SoC on FPGA. In Proceedings of the International Symposium on Circuits and Systems. 3351--3354.

[27]

Sangiovanni-Vincentelli, A., Chen, L.-K., and Chua, L. O. 1977. An efficient heuristic cluster algorithm for tearing large-scale networks. IEEE Trans. Circ. Syst. 24, 12, 709--717.

[28]

Shang, L., Kaviani, A. S., and Bathala, K. 2002. Dynamic power consumption in VirtexTM-II FPGA. In Proceedings of the International Symposium on FPGAs. 157--164.

Digital Library

[29]

Siegel, H. J., Braun, T. D., Dietz, H. G., Kulaczewski, M. B., Maheswaran, M., Pero, P. H., Siegel, J. M., So, J. J. E., Tan, M., Theys, M. D., and Wang, L. 1996. The PASM project: A study of reconfigurable parallel computing. In Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks. 529--536.

Digital Library

[30]

Sun, F., Ravi, S., Raghunathan, A., and Jha, N. K. 2006. Application-specific heterogeneous multiprocessor synthesis using extensible processors. IEEE Trans. Comput.-Aid. Des. Integrat. Circuits 25, 9, 1589--1602.

Digital Library

[31]

Tinney, W. F. and Hart, C. E. 1967. Power flow solution by Newton's method. IEEE Trans. Power Appl. Syst. 86, 3, 1449--1460.

[32]

Tiwari, V., Malik, S., and Wolfe, A. 1994. Power analysis of embedded software: A first step towards software power minimization. IEEE Trans. VLSI Syst. 2. 437--445.

Digital Library

[33]

Todman, T. J., Constantinides, G. A., Wilton, S. J. E., Mencer, O., Luk, W., and Cheung, P. Y. K. 2005. Reconfigurable computing: Architectures and design methods. IEE Proc. Comput. Digital Tech. 152, 2, 193--207.

[34]

Underwood, K. 2004. FPGAs vs CPUs: Trends in peak floating-point performance. In Proceedings of the International Symposium on FPGAs. 171--180.

Digital Library

[35]

Unnikrishnan, D., Zhao, J., and Tessier, R. 2009. Application specific customization and scalability of soft multiprocessors. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines. 123--130.

Digital Library

[36]

Virtex-6 Family Overview 2010. http://www.xilinx.com/support/documentation/data sheets/ds150.pdf.

[37]

Virtex II FPGA datasheet 2007. http://direct.xilinx.com/bvdocs/publications/ds031.pdf.

[38]

Wang, X., Ziavras, S., Nwankpa, C., Johnson, J., and Nagvajara, P. 2007. Parallel solution of Newton's power flow equations on configurable chips. Int. J. Electric Power Energy Syst. 29, 5, 422--431.

[39]

Wang, X. and Ziavras, S. G. 2006. Exploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration. IEE Proc. Comput. Digital Tech. 153, 4, 249--260.

[40]

Xilinx 2006. Power vs. performance: The 90 nm inflection point. http://www.xilinx.com/support/documentation/white papers/wp223.pdf.

[41]

Xilinx 2010. XPower estimator user guide. http://www.xilinx.com/support/documentation/user guides/ug440.pdf.

[42]

Xilinx Microblaze 2001. http://www.xilinx.com/products/design resources/proc central/microblaze.htm.

[43]

Zhuo, L. and Prasanna, V. K. 2008. High-performance designs for linear algebra operations on reconfigurable hardware. IEEE Trans. Comput. 57, 8, 1057--1071.

Digital Library

Cited By

Jiang WJiang KZhang XMa Y(2018)Energy Optimization of Security-Critical Real-Time Applications with Guaranteed Security ProtectionJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2015.05.00561:7(282-292)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1016/j.sysarc.2015.05.005
Awan MPetters S(2018)Race-to-halt energy saving strategiesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2014.10.00160:10(796-815)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1016/j.sysarc.2014.10.001
Sandhu FSelamat HAlavi SBehtaji Siahkal Mahalleh V(2017)FPGA-Based Implementation of Kalman Filter for Real-Time Estimation of Tire Velocity and AccelerationIEEE Sensors Journal10.1109/JSEN.2017.272652917:17(5749-5758)Online publication date: 1-Sep-2017
https://doi.org/10.1109/JSEN.2017.2726529
Show More Cited By

Index Terms

Resource-constrained multiprocessor synthesis for floating-point applications on FPGAs

Recommendations

Floating-point FPGA: architecture and modeling

This paper presents an architecture for a reconfigurable device that is specifically optimized for floating-point applications. Fine-grained units are used for implementing control logic and bit-oriented operations, while parameterized and ...
Floating-point divider design for FPGAs

Growth in floating-point applications for field-programmable gate arrays (FPGAs) has made it critical to optimize floating-point units for FPGA technology. The divider is of particular interest because the design space is large and divider usage in ...
Architectural modifications to enhance the floating-point performance of FPGAs

With the density of field-programmable gate arrays (FPGAs) steadily increasing, FPGAs have reached the point where they are capable of implementing complex floating-point applications. However, their general-purpose nature has limited the use of FPGAs ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 16, Issue 4

October 2011

326 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/2003695

Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 27 October 2011

Accepted: 01 June 2011

Revised: 01 January 2011

Received: 01 April 2010

Published in TODAES Volume 16, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang WJiang KZhang XMa Y(2018)Energy Optimization of Security-Critical Real-Time Applications with Guaranteed Security ProtectionJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2015.05.00561:7(282-292)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1016/j.sysarc.2015.05.005
Awan MPetters S(2018)Race-to-halt energy saving strategiesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2014.10.00160:10(796-815)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1016/j.sysarc.2014.10.001
Sandhu FSelamat HAlavi SBehtaji Siahkal Mahalleh V(2017)FPGA-Based Implementation of Kalman Filter for Real-Time Estimation of Tire Velocity and AccelerationIEEE Sensors Journal10.1109/JSEN.2017.272652917:17(5749-5758)Online publication date: 1-Sep-2017
https://doi.org/10.1109/JSEN.2017.2726529
Jiang WPop PJiang K(2017)Design optimization for security- and safety-critical distributed real-time applicationsMicroprocessors & Microsystems10.1016/j.micpro.2016.08.00252:C(401-415)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1016/j.micpro.2016.08.002
Patan ASantoro AConca PCarapezza GMagna ARomano VNicosia G(2017)Multi-objective optimization and analysis for the design space exploration of analog circuits and solar cellsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.08.01062:C(373-383)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1016/j.engappai.2016.08.010
Park HMalik ASalcic Z(2015)Scheduling Globally Asynchronous Locally Synchronous Programs for Guaranteed Response TimesACM Transactions on Design Automation of Electronic Systems10.1145/274096120:3(1-25)Online publication date: 24-Jun-2015
https://dl.acm.org/doi/10.1145/2740961
Sheikh HTan HAhmad IRanka SBv P(2012)Energy- and performance-aware scheduling of tasks on parallel and distributed systemsACM Journal on Emerging Technologies in Computing Systems (JETC)10.1145/2367736.23677438:4(1-37)Online publication date: 30-Nov-2012
https://dl.acm.org/doi/10.1145/2367736.2367743

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents