More Web Proxy on the site http://driver.im/

research-article

SHAPE: Scheduling of Fixed-Priority Tasks on Heterogeneous Architectures with Multiple CPUs and Many PEs

Authors:

An ZouAuthors Info & Claims

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

Article No.: 110, Pages 1 - 9

https://doi.org/10.1145/3508352.3549409

Published: 22 December 2022 Publication History

Abstract

Despite being employed in burgeoning efforts to accelerate artificial intelligence, heterogeneous architectures have yet to be well managed with strict timing constraints. As a classic task model, multi-segment self-suspension (MSSS) has been proposed for general I/O-intensive systems and computation offloading. However, directly applying this model to heterogeneous architectures with multiple CPUs and many processing units (PEs) suffers tremendous pessimism. In this paper, we present a real-time scheduling approach, SHAPE, for general heterogeneous architectures with significant schedulability and improved utilization rate. We start with building the general task execution pattern on a heterogeneous architecture integrating multiple CPU cores and many PEs such as GPU streaming multiprocessors and FPGA IP cores. A real-time scheduling strategy and corresponding schedulability analysis are presented following the task execution pattern. Compared with state-of-the-art scheduling algorithms through comprehensive experiments on unified and versatile tasks, SHAPE improves the schedulability by 11.1% - 100%. Moreover, experiments performed on the NVIDIA GPU systems further indicate up to 70.9% of pessimism reduction can be achieved by the proposed scheduling. Since we target general heterogeneous architectures, SHAPE can be directly applied to off-the-shelf heterogeneous computing systems with guaranteed deadlines and improved schedulability.

References

[1]

Jeff Anderson, Armin Mehrabian, Jiaxin Peng, and Tarek A El-Ghazawi. Extreme heterogeneity in deep learning architectures., 2019.

[2]

Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263--7271, 2017.

[3]

Philipp Michel, Joel Chestnutt, Satoshi Kagami, Koichi Nishiwaki, James Kuffner, and Takeo Kanade. Gpu-accelerated real-time 3d tracking for humanoid locomotion and stair climbing. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 463--469. IEEE, 2007.

[4]

Jack Choquette and Wish Gandhi. Nvidia a100 gpu: Performance & innovation for gpu computing. In 2020 IEEE Hot Chips 32 Symposium (HCS), pages 1--43. IEEE Computer Society, 2020.

[5]

Steve Leibson and Nick Mehta. Xilinx ultrascale: The next-generation architecture for your next-generation architecture. Xilinx White Paper WP435, 143, 2013.

[6]

Benjamin Schwaller, Barath Ramesh, and Alan D George. Investigating ti keystone ii and quad-core arm cortex-a53 architectures for on-board space processing. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--7. IEEE, 2017.

[7]

Ronald B Brightwell. Resource management challenges in the era of extreme heterogeneity. Technical report, Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2018.

[8]

Jian-Jia Chen, Geoffrey Nelissen, Wen-Hung Huang, Maolin Yang, Björn Brandenburg, Konstantinos Bletsas, Cong Liu, Pascal Richard, Frédéric Ridouard, Neil Audsley, et al. Many suspensions, many problems: a review of self-suspending tasks in real-time systems. Real-Time Systems, 55(1):144--207, 2019.

Digital Library

[9]

Wen-Hung Huang and Jian-Jia Chen. Self-suspension real-time tasks under fixed-relative-deadline fixed-priority scheduling. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1078--1083. IEEE, 2016.

[10]

Pratyush Patel, Iljoo Baek, Hyoseung Kim, and Ragunathan Rajkumar. Analytical enhancements and practical insights for mpcp with self-suspensions. In 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 177--189. IEEE, 2018.

[11]

Sujan Kumar Saha, Yecheng Xiang, and Hyoseung Kim. Stgm: Spatio-temporal gpu management for real-time tasks. In 2019 IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 1--6. IEEE, 2019.

[12]

R Gandham, Y Zhang, K Esler, and V Natoli. Improving gpu throughput of reservoir simulations using nvidia mps and mig. In Fifth EAGE Workshop on High Performance Computing for Upstream, volume 2021, pages 1--5. European Association of Geoscientists & Engineers, 2021.

[13]

Nathan Otterness and James H Anderson. Exploring amd gpu scheduling details by experimenting with "worst practices",". In Proceedings of the 29th International Conference on Real-Time Networks and Systems, 2021.

Digital Library

[14]

Nathan Otterness and James H Anderson. Amd gpus as an alternative to nvidia for supporting real-time workloads. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.

[15]

Konstantinos Bletsas, Neil Audsley, Wen-Hung Huang, Jian-Jia Chen, and Geoffrey Nelissen. Errata for three papers (2004--05) on fixed-priority scheduling with self-suspensions. Technical report, CISTER-Research Centre in Realtime and Embedded Computing Systems, 2015.

[16]

Wen-Hung Huang and Jian-Jia Chen. Schedulability and priority assignment for multi-segment self-suspending real-time tasks under fixed-priority scheduling. In Technical report. Technical University of Dortmund, 2015.

[17]

Shinpei Kato, Karthik Lakshmanan, Raj Rajkumar, and Yutaka Ishikawa. Time-graph: Gpu scheduling for real-time multi-tasking environments. In Proc. USENIX ATC, pages 17--30, 2011.

[18]

Glenn A Elliott and James H Anderson. Globally scheduled real-time multiprocessor systems with gpus. Real-Time Systems, 48(1):34--74, 2012.

Digital Library

[19]

Glenn A Elliott, Bryan C Ward, and James H Anderson. Gpusync: A framework for real-time gpu management. In 2013 IEEE 34th Real-Time Systems Symposium, pages 33--44. IEEE, 2013.

Digital Library

[20]

Vladislav Golyanik, Mitra Nasri, and Didier Stricker. Towards scheduling hard real-time image processing tasks on a single gpu. In International Conference on Image Processing (ICIP). IEEE, 2017.

Digital Library

[21]

Husheng Zhou, Soroush Bateni, and Cong Liu. S^ 3dnn: Supervised streaming and scheduling for gpu-accelerated real-time dnn workloads. In IEEE Real-Time and Embedded Technology and Applications Symposium, pages 190--201. IEEE, 2018.

[22]

Enrico Rossi, Marvin Damschen, Lars Bauer, Giorgio Buttazzo, and Jörg Henkel. Preemption of the partial reconfiguration process to enable real-time computing with fpgas. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 11(2):1--24, 2018.

[23]

Houssam-Eddine Zahaf, Giuseppe Lipari, Smail Niar, et al. Preemption-aware allocation, deadline assignment for conditional dags on partitioned edf. In 2020 IEEE 26th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 1--10. IEEE, 2020.

[24]

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. Chimera: Collaborative preemption for multitasking on a shared gpu. ACM SIGARCH Computer Architecture News, 43(1):593--606, 2015.

Digital Library

[25]

Can Basaran and Kyoung-Don Kang. Supporting preemptive task executions and memory copies in gpgpus. In 24th Euromicro Conference on Real-Time Systems (ECRTS 2012). IEEE, 2012.

Digital Library

[26]

Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, and Mateo Valero. Enabling preemptive multiprogramming on gpus. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on, pages 193--204. IEEE, 2014.

Digital Library

[27]

Husheng Zhou, Guangmo Tong, and Cong Liu. Gpes: A preemptive execution system for gpgpu computing. In Real-Time and Embedded Technology and Applications Symposium. IEEE, 2015.

[28]

Guoyang Chen, Yue Zhao, Xipeng Shen, and Huiyang Zhou. Effisha: A software framework for enabling effficient preemptive scheduling of gpu. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017.

Digital Library

[29]

Beyazit Yalcinkaya, Mitra Nasri, and Björn B Brandenburg. An exact schedulability test for non-preemptive self-suspending real-time tasks. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1228--1233. IEEE, 2019.

[30]

Jounghoo Lee, Jinwoo Choi, Jaeyeon Kim, Jinho Lee, and Youngsok Kim. Dataflow mirroring: Architectural support for highly efficient fine-grained spatial multitasking on systolic-array npus. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 247--252. IEEE, 2021.

Digital Library

[31]

An Zou, Jing Li, Christopher D Gill, and Xuan Zhang. Rtgpu: Real-time gpu scheduling of hard deadline parallel tasks with fine-grain utilization. arXiv preprint arXiv:2101.10463, 2021.

[32]

Chao Yu, Yuebin Bai, Hailong Yang, Kun Cheng, Yuhao Gu, Zhongzhi Luan, and Depei Qian. Smguard: A flexible and fine-grained resource management framework for gpus. IEEE Transactions on Parallel and Distributed Systems, 2018.

[33]

Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 754--768, 2019.

Digital Library

[34]

Shuai Che, Jeremy W Sheaffer, and Kevin Skadron. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, pages 1--11, 2011.

Digital Library

[35]

Raphael Landaverde, Tiansheng Zhang, Ayse K Coskun, and Martin Herbordt. An investigation of unified memory access performance in cuda. In 2014 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--6. IEEE, 2014.

[36]

William F Gilreath and Phillip A Laplante. Parallel architectures. In Computer Architecture: A Minimalist Perspective, pages 113--132. Springer, 2003.

[37]

Adam Betts and Alastair Donaldson. Estimating the wcet of gpu-accelerated applications using hybrid analysis. In 2013 25th Euromicro Conference on Real-Time Systems, pages 193--202. IEEE, 2013.

Digital Library

[38]

Tao Chen, Alexander Rucker, and G Edward Suh. Execution time prediction for energy-efficient hardware accelerators. In Proceedings of the 48th International Symposium on Microarchitecture, pages 457--469, 2015.

Digital Library

[39]

Neil C Audsley. Optimal priority assignment and feasibility of static priority tasks with arbitrary start times. Citeseer, 1991.

[40]

Kshitij Gupta, Jeff A Stuart, and John D Owens. A study of persistent threads style gpu programming for gpgpu workloads. In Innovative Parallel Computing-Foundations & Applications of GPU, Manycore, and Heterogeneous Systems (INPAR 2012). IEEE, 2012.

[41]

Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, and Jeffrey Vetter. Enabling and exploiting flexible task assignment on gpu through sm-centric program transformations. In Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, 2015.

Digital Library

Cited By

Avan AAzim AMahmoud Q(2023)A Robust Scheduling Algorithm for Overload-Tolerant Real-Time Systems2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)10.1109/ISORC58943.2023.00013(1-10)Online publication date: May-2023
https://doi.org/10.1109/ISORC58943.2023.00013

Index Terms

SHAPE: Scheduling of Fixed-Priority Tasks on Heterogeneous Architectures with Multiple CPUs and Many PEs
1. Computer systems organization
  1. Real-time systems

Recommendations

MIC acceleration of short-range molecular dynamics simulations
COSMIC '13: Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores

Heterogeneous systems containing accelerators such as GPUs or co-processors such as Intel MIC are becoming more prevalent due to their ability of exploiting large-scale parallelism in applications. In this paper, we have developed a hierarchical ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Optimized HPL for AMD GPU and multi-core CPU usage

The installation of the LOEWE-CSC ( http://csc.uni-frankfurt.de/csc/__ __51 ) supercomputer at the Goethe University in Frankfurt lead to the development of a Linpack which can fully utilize the installed AMD Cypress GPUs. At its core, a fast DGEMM for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

October 2022

1467 pages

ISBN:9781450392174

DOI:10.1145/3508352

Conference Chair:
Tulika Mitra
National University of Singapore
,
Program Chairs:
Evangeline Young
The Chinese University of Hong Kong
,
Jinjun Xiong
University at Buffalo (UB)

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE-EDS: Electronic Devices Society
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Shanghai Chenguang Program
Huawei Technologies
NSFC

Conference

ICCAD '22

Sponsor:

SIGDA

ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design

October 30 - November 3, 2022

California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
197
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)11

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Avan AAzim AMahmoud Q(2023)A Robust Scheduling Algorithm for Overload-Tolerant Real-Time Systems2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)10.1109/ISORC58943.2023.00013(1-10)Online publication date: May-2023
https://doi.org/10.1109/ISORC58943.2023.00013

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten