[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Reconfiguration and Communication-Aware Task Scheduling for High-Performance Reconfigurable Computing

Published: 01 November 2010 Publication History

Abstract

High-performance reconfigurable computing involves acceleration of significant portions of an application using reconfigurable hardware. When the hardware tasks of an application cannot simultaneously fit in an FPGA, the task graph needs to be partitioned and scheduled into multiple FPGA configurations, in a way that minimizes the total execution time. This article proposes the Reduced Data Movement Scheduling (RDMS) algorithm that aims to improve the overall performance of hardware tasks by taking into account the reconfiguration time, data dependency between tasks, intertask communication as well as task resource utilization. The proposed algorithm uses the dynamic programming method. A mathematical analysis of the algorithm shows that the execution time would at most exceed the optimal solution by a factor of around 1.6, in the worst-case. Simulations on randomly generated task graphs indicate that RDMS algorithm can reduce interconfiguration communication time by 11% and 44% respectively, compared with two other approaches that consider data dependency and hardware resource utilization only. The practicality, as well as efficiency of the proposed algorithm over other approaches, is demonstrated by simulating a task graph from a real-life application - N-body simulation - along with constraints for bandwidth and FPGA parameters from existing high-performance reconfigurable computers. Experiments on SRC-6 are carried out to validate the approach.

References

[1]
Bazargan, K., Kastner, R., and Sarrafzadeh, M. 2000. Fast template placement for reconfigurable computing systems. IEEE Des. Test Comput. 17, 1, 68--83.
[2]
Brebner, G. and Diessel, O. 2001. Chip-based reconfigurable task management. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’01). 182--191.
[3]
Caprara, A. and Pferschy, U. 2004. Worst-case analysis of the subset sum algorithm for bin packing. Oper. Res. Lett. 32, 20, 159--166.
[4]
Coffman, Jr., E. G., Garey, M. R., and Johnson, D. S. 1996. Approximation algorithms for bin packing: a survey. In Approximation Algorithms for NP-Hard Problems. D. Hochbaum Ed., PWS Publishing, Boston. 46--93.
[5]
Compton, K., Li, Z., Cooley, J., Knol, S., and Hauck, S. 2002. Configuration relocation and defragmentation for run-time reconfigurable computing. IEEE Trans. VLSI Syst. 10, 3, 209--220.
[6]
Diessel, O., ElGindy, H., Middendorf, M., Schmeck, H., and Schmidt, B. 2000. Dynamic scheduling of tasks on partially reconfigurable FPGAs. IEE Proc. Comput. Digital Techniq. (Special Issue on Reconfigurable Systems) 147, 3, 181--188.
[7]
Fekete, S. P., Köhler, E., and Teich, J. 2001. Optimal FPGA module placement with temporal precedence constraints. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’01). 658--665.
[8]
Govindu, G., Scrofano, R., and Prasanna, V. K. 2005. A library of parameterizable floating-point cores for FPGAs and their application to scientific computing. In Proceedings of the International Conference on Engineering Reconfigurable Systems and Algorithms (ERSA’05). 137--145.
[9]
Handa, M. and Vemuri, R. 2004. A fast algorithm for finding maximal empty rectangles for dynamic FPGA placement. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04). Vol. 1. 744--745.
[10]
Hemmert, K. S. and Underwood, K. D. 2006. Open source high performance floating-point modules. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). 349--350.
[11]
Huang, M., Simmler, H., Saha, P., and El-Ghazawi, T. 2008. Hardware task scheduling optimizations for reconfigurable computing. In Proceedings of the 2nd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’08).
[12]
Huang, M., Simmler, H., Serres, O., and El-Ghazawi, T. 2009. RDMS: A hardware task scheduling algorithm for reconfigurable computing. In Proceedings of the 16th Reconfigurable Architectures Workshop (RAW’09).
[13]
Kellerer, H., Pferschy, U., and Pisinger, D. 2004. Knapsack Problems. Springer, Berlin.
[14]
Kleinberg, J. and Tardos, É. 2005. Algorithm Design. Pearson/Addison-Wesley, Boston, MA.
[15]
Lienhart, G., Kugel, A., and Männer, R. 2002. Using floating-point arithmetic on FPGAs to accelerate scientific N-body simulations. In Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02). 182--191.
[16]
Lucy, L. B. 1977. A numerical approach to the testing of the fission hypothesis. Astronom. J. 82, 12, 1013--1024.
[17]
Monaghan, J. J. and Lattanzio, J. C. 1985. A refined particle method for astrophysical problems. Astron. Astrophys. 149, 135--143.
[18]
Pisinger, D. 1999. Linear time algorithms for knapsack problems with bounded weights. J. Algor. 33, 1, 1--14.
[19]
Saha, P. 2007. Automatic software hardware co-design for reconfigurable computing systems. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’07). 507--508.
[20]
Thakkar, A. J. and Ejnioui, A. 2006. Design and implementation of double precision floating point division and square root on FPGAs. In Proceedings of the IEEE Aerospace Conference.
[21]
Walder, H. and Platzner, M. 2002. Non-preemptive multitasking on fpga: Task placement and footprint transform. In Proceedings of the 2nd International Conference on Engineering of Reconfigurable Systems and Architectures (ERSA). 24--30.
[22]
Walder, H., Steiger, C., and Platzner, M. 2003. Fast online task placement on FPGAs: free space partitioning and 2D-hashing. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’03). 178--185.
[23]
Wiangtong, T., Cheung, P., and Luk, W. 2003. Multitasking in hardware-software codesign for reconfigurable computer. In Proceedings of the International Symposium on Circuits and Systems (ISCAS’03). Vol. 5. 621--624.
[24]
Zhuo, L. and Prasanna, V. K. 2007. Scalable and modular algorithms for floating-point matrix multiplication on reconfigurable computing systems. IEEE Trans. Para. Distrib. Syst. 18, 4, 433--448.

Cited By

View all
  • (2024)Trends, Approaches, and Gaps in Scientific Workflow Scheduling: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.350921812(182203-182231)Online publication date: 2024
  • (2021)A Survey: FPGA‐Based Dynamic Scheduling of Hardware TasksChinese Journal of Electronics10.1049/cje.2021.07.02130:6(991-1007)Online publication date: Nov-2021
  • (2019)Using the loop chain abstraction to schedule across loops in existing codeInternational Journal of High Performance Computing and Networking10.5555/3302714.330272013:1(86-104)Online publication date: 1-Jan-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 3, Issue 4
November 2010
240 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/1862648
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2010
Accepted: 01 August 2009
Revised: 01 July 2009
Received: 01 March 2009
Published in TRETS Volume 3, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hardware task scheduling
  2. reconfigurable computing

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Trends, Approaches, and Gaps in Scientific Workflow Scheduling: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.350921812(182203-182231)Online publication date: 2024
  • (2021)A Survey: FPGA‐Based Dynamic Scheduling of Hardware TasksChinese Journal of Electronics10.1049/cje.2021.07.02130:6(991-1007)Online publication date: Nov-2021
  • (2019)Using the loop chain abstraction to schedule across loops in existing codeInternational Journal of High Performance Computing and Networking10.5555/3302714.330272013:1(86-104)Online publication date: 1-Jan-2019
  • (2017)DTPProceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3120895.3120901(1-11)Online publication date: 7-Jun-2017
  • (2017)A Clustering Algorithm for Communication-Aware Scheduling of Task Graphs on Multi-Core Reconfigurable SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.270312328:10(2718-2732)Online publication date: 7-Sep-2017
  • (2017)A Floorplanning Algorithm for Partially Reconfigurable FPGA in Wireless Sensor NetworkSecurity, Privacy, and Anonymity in Computation, Communication, and Storage10.1007/978-3-319-72395-2_60(667-679)Online publication date: 9-Dec-2017
  • (2016)Identifying and scheduling loop chains using directivesProceedings of the Third International Workshop on Accelerator Programming Using Directives10.5555/3019120.3019126(57-67)Online publication date: 13-Nov-2016
  • (2016)Identifying and Scheduling Loop Chains Using Directives2016 Third Workshop on Accelerator Programming Using Directives (WACCPD)10.1109/WACCPD.2016.010(57-67)Online publication date: Nov-2016
  • (2015)Performance-Oriented Partitioning for Task Scheduling of Parallel Reconfigurable ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2014.231292426:3(858-867)Online publication date: Mar-2015
  • (2013)Metrics for Early-Stage Modeling of Many-Accelerator ArchitecturesIEEE Computer Architecture Letters10.1109/L-CA.2012.912:1(25-28)Online publication date: 1-Jan-2013
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media