More Web Proxy on the site http://driver.im/

research-article

Reconfiguration and Communication-Aware Task Scheduling for High-Performance Reconfigurable Computing

Authors:

Miaoqing Huang,

Vikram K. Narayana,

Harald Simmler,

Olivier Serres,

Tarek El-GhazawiAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 3, Issue 4

Article No.: 20, Pages 1 - 25

https://doi.org/10.1145/1862648.1862650

Published: 01 November 2010 Publication History

Abstract

High-performance reconfigurable computing involves acceleration of significant portions of an application using reconfigurable hardware. When the hardware tasks of an application cannot simultaneously fit in an FPGA, the task graph needs to be partitioned and scheduled into multiple FPGA configurations, in a way that minimizes the total execution time. This article proposes the Reduced Data Movement Scheduling (RDMS) algorithm that aims to improve the overall performance of hardware tasks by taking into account the reconfiguration time, data dependency between tasks, intertask communication as well as task resource utilization. The proposed algorithm uses the dynamic programming method. A mathematical analysis of the algorithm shows that the execution time would at most exceed the optimal solution by a factor of around 1.6, in the worst-case. Simulations on randomly generated task graphs indicate that RDMS algorithm can reduce interconfiguration communication time by 11% and 44% respectively, compared with two other approaches that consider data dependency and hardware resource utilization only. The practicality, as well as efficiency of the proposed algorithm over other approaches, is demonstrated by simulating a task graph from a real-life application - N-body simulation - along with constraints for bandwidth and FPGA parameters from existing high-performance reconfigurable computers. Experiments on SRC-6 are carried out to validate the approach.

References

[1]

Bazargan, K., Kastner, R., and Sarrafzadeh, M. 2000. Fast template placement for reconfigurable computing systems. IEEE Des. Test Comput. 17, 1, 68--83.

Digital Library

[2]

Brebner, G. and Diessel, O. 2001. Chip-based reconfigurable task management. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’01). 182--191.

Digital Library

[3]

Caprara, A. and Pferschy, U. 2004. Worst-case analysis of the subset sum algorithm for bin packing. Oper. Res. Lett. 32, 20, 159--166.

Digital Library

[4]

Coffman, Jr., E. G., Garey, M. R., and Johnson, D. S. 1996. Approximation algorithms for bin packing: a survey. In Approximation Algorithms for NP-Hard Problems. D. Hochbaum Ed., PWS Publishing, Boston. 46--93.

Digital Library

[5]

Compton, K., Li, Z., Cooley, J., Knol, S., and Hauck, S. 2002. Configuration relocation and defragmentation for run-time reconfigurable computing. IEEE Trans. VLSI Syst. 10, 3, 209--220.

Digital Library

[6]

Diessel, O., ElGindy, H., Middendorf, M., Schmeck, H., and Schmidt, B. 2000. Dynamic scheduling of tasks on partially reconfigurable FPGAs. IEE Proc. Comput. Digital Techniq. (Special Issue on Reconfigurable Systems) 147, 3, 181--188.

[7]

Fekete, S. P., Köhler, E., and Teich, J. 2001. Optimal FPGA module placement with temporal precedence constraints. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’01). 658--665.

Digital Library

[8]

Govindu, G., Scrofano, R., and Prasanna, V. K. 2005. A library of parameterizable floating-point cores for FPGAs and their application to scientific computing. In Proceedings of the International Conference on Engineering Reconfigurable Systems and Algorithms (ERSA’05). 137--145.

[9]

Handa, M. and Vemuri, R. 2004. A fast algorithm for finding maximal empty rectangles for dynamic FPGA placement. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04). Vol. 1. 744--745.

Digital Library

[10]

Hemmert, K. S. and Underwood, K. D. 2006. Open source high performance floating-point modules. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). 349--350.

Digital Library

[11]

Huang, M., Simmler, H., Saha, P., and El-Ghazawi, T. 2008. Hardware task scheduling optimizations for reconfigurable computing. In Proceedings of the 2nd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’08).

[12]

Huang, M., Simmler, H., Serres, O., and El-Ghazawi, T. 2009. RDMS: A hardware task scheduling algorithm for reconfigurable computing. In Proceedings of the 16th Reconfigurable Architectures Workshop (RAW’09).

Digital Library

[13]

Kellerer, H., Pferschy, U., and Pisinger, D. 2004. Knapsack Problems. Springer, Berlin.

[14]

Kleinberg, J. and Tardos, É. 2005. Algorithm Design. Pearson/Addison-Wesley, Boston, MA.

Digital Library

[15]

Lienhart, G., Kugel, A., and Männer, R. 2002. Using floating-point arithmetic on FPGAs to accelerate scientific N-body simulations. In Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02). 182--191.

Digital Library

[16]

Lucy, L. B. 1977. A numerical approach to the testing of the fission hypothesis. Astronom. J. 82, 12, 1013--1024.

[17]

Monaghan, J. J. and Lattanzio, J. C. 1985. A refined particle method for astrophysical problems. Astron. Astrophys. 149, 135--143.

[18]

Pisinger, D. 1999. Linear time algorithms for knapsack problems with bounded weights. J. Algor. 33, 1, 1--14.

Digital Library

[19]

Saha, P. 2007. Automatic software hardware co-design for reconfigurable computing systems. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’07). 507--508.

[20]

Thakkar, A. J. and Ejnioui, A. 2006. Design and implementation of double precision floating point division and square root on FPGAs. In Proceedings of the IEEE Aerospace Conference.

[21]

Walder, H. and Platzner, M. 2002. Non-preemptive multitasking on fpga: Task placement and footprint transform. In Proceedings of the 2nd International Conference on Engineering of Reconfigurable Systems and Architectures (ERSA). 24--30.

[22]

Walder, H., Steiger, C., and Platzner, M. 2003. Fast online task placement on FPGAs: free space partitioning and 2D-hashing. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’03). 178--185.

Digital Library

[23]

Wiangtong, T., Cheung, P., and Luk, W. 2003. Multitasking in hardware-software codesign for reconfigurable computer. In Proceedings of the International Symposium on Circuits and Systems (ISCAS’03). Vol. 5. 621--624.

[24]

Zhuo, L. and Prasanna, V. K. 2007. Scalable and modular algorithms for floating-point matrix multiplication on reconfigurable computing systems. IEEE Trans. Para. Distrib. Syst. 18, 4, 433--448.

Digital Library

Cited By

Vivas ATchernykh ACastro H(2024)Trends, Approaches, and Gaps in Scientific Workflow Scheduling: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.350921812(182203-182231)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3509218
Tianyang LFan ZWei GMingqian SLi C(2021)A Survey: FPGA‐Based Dynamic Scheduling of Hardware TasksChinese Journal of Electronics10.1049/cje.2021.07.02130:6(991-1007)Online publication date: Nov-2021
https://doi.org/10.1049/cje.2021.07.021
(2019)Using the loop chain abstraction to schedule across loops in existing codeInternational Journal of High Performance Computing and Networking10.5555/3302714.330272013:1(86-104)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3302714.3302720
Show More Cited By

Recommendations

Exploiting Partial Runtime Reconfiguration for High-Performance Reconfigurable Computing

Runtime Reconfiguration (RTR) has been traditionally utilized as a means for exploiting the flexibility of High-Performance Reconfigurable Computers (HPRCs). However, the RTR feature comes with the cost of high configuration overhead which might ...
FPGA Dynamic and Partial Reconfiguration: A Survey of Architectures, Methods, and Applications

Dynamic and partial reconfiguration are key differentiating capabilities of field programmable gate arrays (FPGAs). While they have been studied extensively in academic literature, they find limited use in deployed systems. We review FPGA ...
Exascale computing and big data

Scientific discovery and engineering innovation requires unifying traditionally separated high-performance computing and big data analytics.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 3, Issue 4

November 2010

240 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/1862648

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2010

Accepted: 01 August 2009

Revised: 01 July 2009

Received: 01 March 2009

Published in TRETS Volume 3, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Division of Industrial Innovation and Partnerships

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
486
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vivas ATchernykh ACastro H(2024)Trends, Approaches, and Gaps in Scientific Workflow Scheduling: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.350921812(182203-182231)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3509218
Tianyang LFan ZWei GMingqian SLi C(2021)A Survey: FPGA‐Based Dynamic Scheduling of Hardware TasksChinese Journal of Electronics10.1049/cje.2021.07.02130:6(991-1007)Online publication date: Nov-2021
https://doi.org/10.1049/cje.2021.07.021
(2019)Using the loop chain abstraction to schedule across loops in existing codeInternational Journal of High Performance Computing and Networking10.5555/3302714.330272013:1(86-104)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3302714.3302720
Koraei MJahre MFatemi S(2017)DTPProceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3120895.3120901(1-11)Online publication date: 7-Jun-2017
https://dl.acm.org/doi/10.1145/3120895.3120901
Yoosefi ANaji H(2017)A Clustering Algorithm for Communication-Aware Scheduling of Task Graphs on Multi-Core Reconfigurable SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.270312328:10(2718-2732)Online publication date: 7-Sep-2017
https://dl.acm.org/doi/10.1109/TPDS.2017.2703123
Wang JWu WQin ZZhao D(2017)A Floorplanning Algorithm for Partially Reconfigurable FPGA in Wireless Sensor NetworkSecurity, Privacy, and Anonymity in Computation, Communication, and Storage10.1007/978-3-319-72395-2_60(667-679)Online publication date: 9-Dec-2017
https://doi.org/10.1007/978-3-319-72395-2_60
Bertolacci IStrout MGuzik SRiley JOlschanowsky CChandrasekaran SJuckeland G(2016)Identifying and scheduling loop chains using directivesProceedings of the Third International Workshop on Accelerator Programming Using Directives10.5555/3019120.3019126(57-67)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3019120.3019126
Bertolacci IStrout MGuzik SRiley JOlschanowsky C(2016)Identifying and Scheduling Loop Chains Using Directives2016 Third Workshop on Accelerator Programming Using Directives (WACCPD)10.1109/WACCPD.2016.010(57-67)Online publication date: Nov-2016
https://doi.org/10.1109/WACCPD.2016.010
Kao C(2015)Performance-Oriented Partitioning for Task Scheduling of Parallel Reconfigurable ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2014.231292426:3(858-867)Online publication date: Mar-2015
https://doi.org/10.1109/TPDS.2014.2312924
Nilakantan SBattle SHempstead M(2013)Metrics for Early-Stage Modeling of Many-Accelerator ArchitecturesIEEE Computer Architecture Letters10.1109/L-CA.2012.912:1(25-28)Online publication date: 1-Jan-2013
https://dl.acm.org/doi/10.1109/L-CA.2012.9
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents