[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3508352.3549409acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

SHAPE: Scheduling of Fixed-Priority Tasks on Heterogeneous Architectures with Multiple CPUs and Many PEs

Published: 22 December 2022 Publication History

Abstract

Despite being employed in burgeoning efforts to accelerate artificial intelligence, heterogeneous architectures have yet to be well managed with strict timing constraints. As a classic task model, multi-segment self-suspension (MSSS) has been proposed for general I/O-intensive systems and computation offloading. However, directly applying this model to heterogeneous architectures with multiple CPUs and many processing units (PEs) suffers tremendous pessimism. In this paper, we present a real-time scheduling approach, SHAPE, for general heterogeneous architectures with significant schedulability and improved utilization rate. We start with building the general task execution pattern on a heterogeneous architecture integrating multiple CPU cores and many PEs such as GPU streaming multiprocessors and FPGA IP cores. A real-time scheduling strategy and corresponding schedulability analysis are presented following the task execution pattern. Compared with state-of-the-art scheduling algorithms through comprehensive experiments on unified and versatile tasks, SHAPE improves the schedulability by 11.1% - 100%. Moreover, experiments performed on the NVIDIA GPU systems further indicate up to 70.9% of pessimism reduction can be achieved by the proposed scheduling. Since we target general heterogeneous architectures, SHAPE can be directly applied to off-the-shelf heterogeneous computing systems with guaranteed deadlines and improved schedulability.

References

[1]
Jeff Anderson, Armin Mehrabian, Jiaxin Peng, and Tarek A El-Ghazawi. Extreme heterogeneity in deep learning architectures., 2019.
[2]
Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263--7271, 2017.
[3]
Philipp Michel, Joel Chestnutt, Satoshi Kagami, Koichi Nishiwaki, James Kuffner, and Takeo Kanade. Gpu-accelerated real-time 3d tracking for humanoid locomotion and stair climbing. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 463--469. IEEE, 2007.
[4]
Jack Choquette and Wish Gandhi. Nvidia a100 gpu: Performance & innovation for gpu computing. In 2020 IEEE Hot Chips 32 Symposium (HCS), pages 1--43. IEEE Computer Society, 2020.
[5]
Steve Leibson and Nick Mehta. Xilinx ultrascale: The next-generation architecture for your next-generation architecture. Xilinx White Paper WP435, 143, 2013.
[6]
Benjamin Schwaller, Barath Ramesh, and Alan D George. Investigating ti keystone ii and quad-core arm cortex-a53 architectures for on-board space processing. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--7. IEEE, 2017.
[7]
Ronald B Brightwell. Resource management challenges in the era of extreme heterogeneity. Technical report, Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2018.
[8]
Jian-Jia Chen, Geoffrey Nelissen, Wen-Hung Huang, Maolin Yang, Björn Brandenburg, Konstantinos Bletsas, Cong Liu, Pascal Richard, Frédéric Ridouard, Neil Audsley, et al. Many suspensions, many problems: a review of self-suspending tasks in real-time systems. Real-Time Systems, 55(1):144--207, 2019.
[9]
Wen-Hung Huang and Jian-Jia Chen. Self-suspension real-time tasks under fixed-relative-deadline fixed-priority scheduling. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1078--1083. IEEE, 2016.
[10]
Pratyush Patel, Iljoo Baek, Hyoseung Kim, and Ragunathan Rajkumar. Analytical enhancements and practical insights for mpcp with self-suspensions. In 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 177--189. IEEE, 2018.
[11]
Sujan Kumar Saha, Yecheng Xiang, and Hyoseung Kim. Stgm: Spatio-temporal gpu management for real-time tasks. In 2019 IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 1--6. IEEE, 2019.
[12]
R Gandham, Y Zhang, K Esler, and V Natoli. Improving gpu throughput of reservoir simulations using nvidia mps and mig. In Fifth EAGE Workshop on High Performance Computing for Upstream, volume 2021, pages 1--5. European Association of Geoscientists & Engineers, 2021.
[13]
Nathan Otterness and James H Anderson. Exploring amd gpu scheduling details by experimenting with "worst practices",". In Proceedings of the 29th International Conference on Real-Time Networks and Systems, 2021.
[14]
Nathan Otterness and James H Anderson. Amd gpus as an alternative to nvidia for supporting real-time workloads. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
[15]
Konstantinos Bletsas, Neil Audsley, Wen-Hung Huang, Jian-Jia Chen, and Geoffrey Nelissen. Errata for three papers (2004--05) on fixed-priority scheduling with self-suspensions. Technical report, CISTER-Research Centre in Realtime and Embedded Computing Systems, 2015.
[16]
Wen-Hung Huang and Jian-Jia Chen. Schedulability and priority assignment for multi-segment self-suspending real-time tasks under fixed-priority scheduling. In Technical report. Technical University of Dortmund, 2015.
[17]
Shinpei Kato, Karthik Lakshmanan, Raj Rajkumar, and Yutaka Ishikawa. Time-graph: Gpu scheduling for real-time multi-tasking environments. In Proc. USENIX ATC, pages 17--30, 2011.
[18]
Glenn A Elliott and James H Anderson. Globally scheduled real-time multiprocessor systems with gpus. Real-Time Systems, 48(1):34--74, 2012.
[19]
Glenn A Elliott, Bryan C Ward, and James H Anderson. Gpusync: A framework for real-time gpu management. In 2013 IEEE 34th Real-Time Systems Symposium, pages 33--44. IEEE, 2013.
[20]
Vladislav Golyanik, Mitra Nasri, and Didier Stricker. Towards scheduling hard real-time image processing tasks on a single gpu. In International Conference on Image Processing (ICIP). IEEE, 2017.
[21]
Husheng Zhou, Soroush Bateni, and Cong Liu. S^ 3dnn: Supervised streaming and scheduling for gpu-accelerated real-time dnn workloads. In IEEE Real-Time and Embedded Technology and Applications Symposium, pages 190--201. IEEE, 2018.
[22]
Enrico Rossi, Marvin Damschen, Lars Bauer, Giorgio Buttazzo, and Jörg Henkel. Preemption of the partial reconfiguration process to enable real-time computing with fpgas. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 11(2):1--24, 2018.
[23]
Houssam-Eddine Zahaf, Giuseppe Lipari, Smail Niar, et al. Preemption-aware allocation, deadline assignment for conditional dags on partitioned edf. In 2020 IEEE 26th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 1--10. IEEE, 2020.
[24]
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. Chimera: Collaborative preemption for multitasking on a shared gpu. ACM SIGARCH Computer Architecture News, 43(1):593--606, 2015.
[25]
Can Basaran and Kyoung-Don Kang. Supporting preemptive task executions and memory copies in gpgpus. In 24th Euromicro Conference on Real-Time Systems (ECRTS 2012). IEEE, 2012.
[26]
Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, and Mateo Valero. Enabling preemptive multiprogramming on gpus. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on, pages 193--204. IEEE, 2014.
[27]
Husheng Zhou, Guangmo Tong, and Cong Liu. Gpes: A preemptive execution system for gpgpu computing. In Real-Time and Embedded Technology and Applications Symposium. IEEE, 2015.
[28]
Guoyang Chen, Yue Zhao, Xipeng Shen, and Huiyang Zhou. Effisha: A software framework for enabling effficient preemptive scheduling of gpu. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017.
[29]
Beyazit Yalcinkaya, Mitra Nasri, and Björn B Brandenburg. An exact schedulability test for non-preemptive self-suspending real-time tasks. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1228--1233. IEEE, 2019.
[30]
Jounghoo Lee, Jinwoo Choi, Jaeyeon Kim, Jinho Lee, and Youngsok Kim. Dataflow mirroring: Architectural support for highly efficient fine-grained spatial multitasking on systolic-array npus. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 247--252. IEEE, 2021.
[31]
An Zou, Jing Li, Christopher D Gill, and Xuan Zhang. Rtgpu: Real-time gpu scheduling of hard deadline parallel tasks with fine-grain utilization. arXiv preprint arXiv:2101.10463, 2021.
[32]
Chao Yu, Yuebin Bai, Hailong Yang, Kun Cheng, Yuhao Gu, Zhongzhi Luan, and Depei Qian. Smguard: A flexible and fine-grained resource management framework for gpus. IEEE Transactions on Parallel and Distributed Systems, 2018.
[33]
Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 754--768, 2019.
[34]
Shuai Che, Jeremy W Sheaffer, and Kevin Skadron. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, pages 1--11, 2011.
[35]
Raphael Landaverde, Tiansheng Zhang, Ayse K Coskun, and Martin Herbordt. An investigation of unified memory access performance in cuda. In 2014 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--6. IEEE, 2014.
[36]
William F Gilreath and Phillip A Laplante. Parallel architectures. In Computer Architecture: A Minimalist Perspective, pages 113--132. Springer, 2003.
[37]
Adam Betts and Alastair Donaldson. Estimating the wcet of gpu-accelerated applications using hybrid analysis. In 2013 25th Euromicro Conference on Real-Time Systems, pages 193--202. IEEE, 2013.
[38]
Tao Chen, Alexander Rucker, and G Edward Suh. Execution time prediction for energy-efficient hardware accelerators. In Proceedings of the 48th International Symposium on Microarchitecture, pages 457--469, 2015.
[39]
Neil C Audsley. Optimal priority assignment and feasibility of static priority tasks with arbitrary start times. Citeseer, 1991.
[40]
Kshitij Gupta, Jeff A Stuart, and John D Owens. A study of persistent threads style gpu programming for gpgpu workloads. In Innovative Parallel Computing-Foundations & Applications of GPU, Manycore, and Heterogeneous Systems (INPAR 2012). IEEE, 2012.
[41]
Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, and Jeffrey Vetter. Enabling and exploiting flexible task assignment on gpu through sm-centric program transformations. In Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, 2015.

Cited By

View all
  • (2023)A Robust Scheduling Algorithm for Overload-Tolerant Real-Time Systems2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)10.1109/ISORC58943.2023.00013(1-10)Online publication date: May-2023

Index Terms

  1. SHAPE: Scheduling of Fixed-Priority Tasks on Heterogeneous Architectures with Multiple CPUs and Many PEs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
    October 2022
    1467 pages
    ISBN:9781450392174
    DOI:10.1145/3508352
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE-EDS: Electronic Devices Society
    • IEEE CAS
    • IEEE CEDA

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 December 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. heterogeneous computing
    2. real-time scheduling

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICCAD '22
    Sponsor:
    ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
    October 30 - November 3, 2022
    California, San Diego

    Acceptance Rates

    Overall Acceptance Rate 457 of 1,762 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)49
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Robust Scheduling Algorithm for Overload-Tolerant Real-Time Systems2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)10.1109/ISORC58943.2023.00013(1-10)Online publication date: May-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media