[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Resource-constrained multiprocessor synthesis for floating-point applications on FPGAs

Published: 27 October 2011 Publication History

Abstract

Although state-of-the-art field-programmable gate arrays offer exciting new opportunities in exploring low-cost high-performance architectures for data-intensive scientific applications, they also present serious challenges. Multiprocessor-on-programmable-chip, which integrates software programmability and hardware reconfiguration, provides substantial flexibility that results in shorter design cycles, higher performance, and lower cost. In this article, we present an application-specific design methodology for multiprocessor-on-programmable-chip architectures that target applications involving large matrices and floating-point operations. Given an application with specific energy-performance and resource constraints, our methodology aims to customize the architecture to match the diverse computation and communication requirements of the application tasks. Graph-based analysis of the application drives system synthesis that employs a precharacterized, parameterized hardware component library of functional units. Extensive experimental results for three diverse applications are presented to demonstrate the efficacy of our design methodology.

References

[1]
Altera Nios II. 2001. http://www.altera.com/products/ip/processors/nios2/ni2-index.html.
[2]
Bower, J., Luk, W., Mencer, O., Flynn, M. J., and Morf, M. 2006. Dynamic clock-frequencies for FPGAs. Microprocess. Microsyst. 30. 6, 388--397.
[3]
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the International Symposium of Computer Architecture. 83--94.
[4]
Cardoso, J. M. P. and Diniz, P. C. 2008. Compilation Techniques for Reconfigurable Architectures. Springer, Berlin.
[5]
Choi, S., Jang, J., Mohanty, S., and Prasanna, V. 2003. Domain-specific modeling for rapid energy estimation of reconfigurable architectures. J. Supercomput. 26, 3, 259--281.
[6]
Clarke, J., Gaffar, A., and Constantinides, G. 2005. Parameterized logic power consumption models for FPGA-based arithmetic. In Proceedings of the International Conference on Field-Programmable Logic Applications. 626--629.
[7]
Cong, J., Han, G., and Jiang, W. 2007. Synthesis of an application-specific soft multiprocessor system. In Proceedings of the International Symposium on FPGAs. 99--107.
[8]
Cosoroaba, A. and Rivoallon, F. 2006. Achieving higher system performance with Virtex-5 family FPGAs. Tech. rep., Xilinx Corporation.
[9]
Craven, S., Patterson, C., and Athanas, P. 2006. A methodology for generating application-specific heterogeneous processor arrays. In Proceedings of the International Conference on System Sciences.
[10]
Eghan, A. 2006. Applying compact thermal models. Xilinx Xcell J. 59, 38--41.
[11]
El-Araby, E., Gonzalez, I., and El-Ghazawi, T. 2009. Exploiting partial runtime reconfiguration for high-performance reconfigur. computing. ACM Trans. Reconfigur. Technol. Syst. 1, 4, 1--23.
[12]
El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., and Buell, D. 2008. The promise of high-performance reconfigurable computing. IEEE Computer 41, 2, 69--76.
[13]
Golub, G. and Kahan, W. 1965. Calculating the singular values and pseudo-inverse of a matrix. SIAM J. Numer. Anal. 2, 3, 205--224.
[14]
Golub, G. and Van Loan, C. 1996. Matrix Computations. Johns Hopkins University Press, Baltimore, MD.
[15]
Ho, C. H., Yu, C. W., Leong, P., Luk, W., and Wilton, S. 2009. Floating-point FPGA: Architecture and modeling. IEEE Trans. VLSI Syst. 17, 12, 1709--1718.
[16]
Hofstee, H. P. 2005. Power efficient processor architecture and the cell processor. In Proceedings of the International Symposium on High-Perform. Computer Architecture. 258--262.
[17]
Huang, W., Ghosh, S., Velusamy, S., Sankaranarayanan, K., Skadron, K., and Stan, M. R. 2006. HotSpot: A compact thermal modeling methodology for early-stage VLSI design. IEEE Trans. VLSI Syst. 14, 5, 501--513.
[18]
Ishebabi, H. and Bobda, C. 2009. Automated architecture synthesis for parallel programs on FPGA multiprocessor systems. Microprocess. Microsyst. 33, 1, 63--71.
[19]
Kapre, N. and Dehon, A. 2009. Performance comparison of single-precision SPICE model-evaluation on FPGA, GPU, Cell, and multi-core processors. In Proceedings of the International Conference on Field Programmable Logic Applications. 65--72.
[20]
Krashinsky, R., Batten, C., Hampton, M., Gerding, S., Pharris, B., Casper, J., and Asanovic, K. 2004. The vector-thread architecture. IEEE Micro 24, 6, 84--90.
[21]
Kumar, A., Fernando, S., Ha, Y., Mesman, B., and Corporaal, H. 2008. Multiprocessor systems synthesis for multiple use-cases of multiple applications on FPGA. ACM Trans. Des. Autom. Electron. Syst. 13, 40:1--40:27.
[22]
Kuon, I., Tessier, R., and Rose, J. 2008. FPGA architecture: Survey and challenges. Found. Trends Electron. Design Autom. 2, 2, 135--253.
[23]
Lysecky, R. and Vahid, F. 2009. Design and implementation of a microblaze-based warp processor. ACM Trans. Embed. Comput. Syst. 8, 3, 1--22.
[24]
Matrix Market 2007. http://math.nist.gov/MatrixMarket/.Netlib 2005. http://www.netlib.org/.
[25]
Ronen, R., Mendelson, A., Lai, K., Lu, S.-L., Pollack, F., and Shen, J. 2001. Coming challenges in micro-architecture and architecture. Proc. IEEE 89, 3, 325--340.
[26]
Salminen, E., Kulmala, A., and Hamalaninen, T. 2005. HIBI-based multiprocessor SoC on FPGA. In Proceedings of the International Symposium on Circuits and Systems. 3351--3354.
[27]
Sangiovanni-Vincentelli, A., Chen, L.-K., and Chua, L. O. 1977. An efficient heuristic cluster algorithm for tearing large-scale networks. IEEE Trans. Circ. Syst. 24, 12, 709--717.
[28]
Shang, L., Kaviani, A. S., and Bathala, K. 2002. Dynamic power consumption in VirtexTM-II FPGA. In Proceedings of the International Symposium on FPGAs. 157--164.
[29]
Siegel, H. J., Braun, T. D., Dietz, H. G., Kulaczewski, M. B., Maheswaran, M., Pero, P. H., Siegel, J. M., So, J. J. E., Tan, M., Theys, M. D., and Wang, L. 1996. The PASM project: A study of reconfigurable parallel computing. In Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks. 529--536.
[30]
Sun, F., Ravi, S., Raghunathan, A., and Jha, N. K. 2006. Application-specific heterogeneous multiprocessor synthesis using extensible processors. IEEE Trans. Comput.-Aid. Des. Integrat. Circuits 25, 9, 1589--1602.
[31]
Tinney, W. F. and Hart, C. E. 1967. Power flow solution by Newton's method. IEEE Trans. Power Appl. Syst. 86, 3, 1449--1460.
[32]
Tiwari, V., Malik, S., and Wolfe, A. 1994. Power analysis of embedded software: A first step towards software power minimization. IEEE Trans. VLSI Syst. 2. 437--445.
[33]
Todman, T. J., Constantinides, G. A., Wilton, S. J. E., Mencer, O., Luk, W., and Cheung, P. Y. K. 2005. Reconfigurable computing: Architectures and design methods. IEE Proc. Comput. Digital Tech. 152, 2, 193--207.
[34]
Underwood, K. 2004. FPGAs vs CPUs: Trends in peak floating-point performance. In Proceedings of the International Symposium on FPGAs. 171--180.
[35]
Unnikrishnan, D., Zhao, J., and Tessier, R. 2009. Application specific customization and scalability of soft multiprocessors. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines. 123--130.
[36]
Virtex-6 Family Overview 2010. http://www.xilinx.com/support/documentation/data sheets/ds150.pdf.
[37]
Virtex II FPGA datasheet 2007. http://direct.xilinx.com/bvdocs/publications/ds031.pdf.
[38]
Wang, X., Ziavras, S., Nwankpa, C., Johnson, J., and Nagvajara, P. 2007. Parallel solution of Newton's power flow equations on configurable chips. Int. J. Electric Power Energy Syst. 29, 5, 422--431.
[39]
Wang, X. and Ziavras, S. G. 2006. Exploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration. IEE Proc. Comput. Digital Tech. 153, 4, 249--260.
[40]
Xilinx 2006. Power vs. performance: The 90 nm inflection point. http://www.xilinx.com/support/documentation/white papers/wp223.pdf.
[41]
Xilinx 2010. XPower estimator user guide. http://www.xilinx.com/support/documentation/user guides/ug440.pdf.
[42]
Xilinx Microblaze 2001. http://www.xilinx.com/products/design resources/proc central/microblaze.htm.
[43]
Zhuo, L. and Prasanna, V. K. 2008. High-performance designs for linear algebra operations on reconfigurable hardware. IEEE Trans. Comput. 57, 8, 1057--1071.

Cited By

View all
  • (2018)Energy Optimization of Security-Critical Real-Time Applications with Guaranteed Security ProtectionJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2015.05.00561:7(282-292)Online publication date: 29-Dec-2018
  • (2018)Race-to-halt energy saving strategiesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2014.10.00160:10(796-815)Online publication date: 29-Dec-2018
  • (2017)FPGA-Based Implementation of Kalman Filter for Real-Time Estimation of Tire Velocity and AccelerationIEEE Sensors Journal10.1109/JSEN.2017.272652917:17(5749-5758)Online publication date: 1-Sep-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 16, Issue 4
October 2011
326 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2003695
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 27 October 2011
Accepted: 01 June 2011
Revised: 01 January 2011
Received: 01 April 2010
Published in TODAES Volume 16, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA design and synthesis
  2. Multiprocessor-on-programmable-chip
  3. heterogeneous multiprocessors
  4. mixed-mode parallel processing
  5. resource-constrained optimization

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Energy Optimization of Security-Critical Real-Time Applications with Guaranteed Security ProtectionJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2015.05.00561:7(282-292)Online publication date: 29-Dec-2018
  • (2018)Race-to-halt energy saving strategiesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2014.10.00160:10(796-815)Online publication date: 29-Dec-2018
  • (2017)FPGA-Based Implementation of Kalman Filter for Real-Time Estimation of Tire Velocity and AccelerationIEEE Sensors Journal10.1109/JSEN.2017.272652917:17(5749-5758)Online publication date: 1-Sep-2017
  • (2017)Design optimization for security- and safety-critical distributed real-time applicationsMicroprocessors & Microsystems10.1016/j.micpro.2016.08.00252:C(401-415)Online publication date: 1-Jul-2017
  • (2017)Multi-objective optimization and analysis for the design space exploration of analog circuits and solar cellsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.08.01062:C(373-383)Online publication date: 1-Jun-2017
  • (2015)Scheduling Globally Asynchronous Locally Synchronous Programs for Guaranteed Response TimesACM Transactions on Design Automation of Electronic Systems10.1145/274096120:3(1-25)Online publication date: 24-Jun-2015
  • (2012)Energy- and performance-aware scheduling of tasks on parallel and distributed systemsACM Journal on Emerging Technologies in Computing Systems (JETC)10.1145/2367736.23677438:4(1-37)Online publication date: 30-Nov-2012

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media