[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3386263.3406946acmotherconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

Multi-task Scheduling for PIM-based Heterogeneous Computing System

Published: 07 September 2020 Publication History

Abstract

Processing-in-Memory (PIM) or Near-Data Processing has been recognized as the most potential solution to resolve the ever-aggravating memory wall especially as the thrive of memory-intensive scale-out workloads such as graph computing and data analytics. However, when the future computing system becomes more and more likely to adopt PIM architectures as a type of the storage and processing component, there is a lack of literature and research work on the general scheduling framework with the emerging heterogeneous system except for some ad-hoc task partitioning methods with specialized PIM designs. This work is the first to propose a formalized model to quantitatively describe the multi-task scheduling problem in PIM+CPU platform without loss of generality, and also an optimized task mapping-and-scheduling algorithm to boost the hardware utility for these novel heterogeneous systems. The proposed scheduling framework is fully aware of the data access bandwidth and processing capability distinction between the CPU and PIM devices, and also the implications of task mapping on the bandwidth contention, data communication intensity and hardware utility for the concurrent workloads. Experimental results show that, compared to the traditional scheduling algorithm for heterogeneous system, the proposed method is able to improve the system performance by over 10% and the energy efficiency by almost 10% for multi-core scale-out applications.

Supplementary Material

MP4 File (3386263.3406946.mp4)
Presentation video

References

[1]
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, June 13--17, 2015, Deborah T. Marr and David H. Albonesi (Eds.). ACM, 105--117. https://doi.org/10.1145/2749469.2750386
[2]
Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, June 13--17, 2015, Deborah T. Marr and David H. Albonesi (Eds.). ACM, 336--348. https://doi.org/10.1145/2749469.2750385
[3]
Raphaël Bleuse, Sascha Hunold, Safia Kedad-Sidhoum, Florence Monna, Gregory Mounie, and Denis Trystram. 2017. Scheduling Independent Moldable Tasks on Multi-Cores with GPUs. IEEE Trans. Parallel Distrib. Syst. 28, 9 (2017), 2689--2702. https://doi.org/10.1109/TPDS.2017.2675891
[4]
Ingrid Y Bucher and Donald A Calahan. 1990. Access conflicts in multiprocessor memories queueing models and simulation studies. SIGARCH 18, 3b (1990), 428--438.
[5]
Ingrid Y. Bucher and Donald A. Calahan. 1990. Access conflicts in multiprocessor memories queueing models and simulation studies. In Proceedings of the 4th international conference on Supercomputing, ICS 1990, Amsterdam, The Netherlands, June 11--15, 1990, Ahmed H. Sameh and Henk A. van der Vorst (Eds.). ACM, 428-- 438. https://doi.org/10.1145/77726.255184
[6]
Louis-Claude Canon and Emmanuel Jeannot. 2010. Evaluation and Optimization of the Robustness of DAG Schedules in Heterogeneous Environments. IEEE Trans. Parallel Distrib. Syst. 21, 4 (2010), 532--546. https://doi.org/10.1109/TPDS.2009.84
[7]
Kevin Hsieh, Eiman Ebrahimi, and Gwangsun Kim. 2016. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2016, Seoul, South Korea, June 18--22, 2016. IEEE Computer Society, 204--216. https://doi.org/10.1109/ISCA.2016.27
[8]
Sang Cheol Kim and Sunggu Lee. 2007. Push-Pull: Deterministic Search-Based DAG Scheduling for Heterogeneous Cluster Systems. IEEE Trans. Parallel Distrib. Syst. 18, 11 (2007), 1489--1502. https://doi.org/10.1109/TPDS.2007.1106
[9]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, and Dean M. Tullsen. 2009. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), December 12--16, 2009, New York, New York, USA, David H. Albonesi, Margaret Martonosi, David I. August, and José F. Martínez (Eds.). ACM, 469--480. https://doi.org/10.1145/1669112.1669172
[10]
Sadegh Mirshekarian and Dusan N. Sormaz. 2016. Correlation of job-shop scheduling problem features with scheduling efficiency. Expert Syst. Appl. 62 (2016), 131--147. https://doi.org/10.1016/j.eswa.2016.06.014
[11]
Lifeng Nai and Hyesoon Kim. 2015. Instruction Offloading withHMC2.0 Standard: A Case Study for Graph Traversals. In Proceedings of the 2015 International Symposium on Memory Systems, MEMSYS 2015, Washington DC, DC, USA, October 5--8, 2015, Bruce Jacob (Ed.). ACM, 258--261. https://doi.org/10.1145/2818950.2818982
[12]
Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, and Chita R. Das. 2016. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel, September 11--15, 2016, Ayal Zaks, Bilha Mendelson, Lawrence Rauchwerger, and Wen-mei W. Hwu (Eds.). ACM, 31--44. https://doi.org/10.1145/2967938.2967940
[13]
Haluk Topcuoglu, Salim Hariri, and Min-You Wu. 2002. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing. IEEE Trans. Parallel Distrib. Syst. 13, 3 (2002), 260--274. https://doi.org/10.1109/71.993206
[14]
Rafael Ubal, Julio Sahuquillo, Salvador Petit, and Pedro López. 2007. Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors. In 19th Symposium on Computer Architecture and High Performance Computing (SBACPAD 2007), 24--27 October 2007, Gramado, RS, Brazil. IEEE Computer Society, 62--68. https://doi.org/10.1109/SBAC-PAD.2007.30
[15]
Guoqi Xie, Yuekun Chen, Yan Liu, Yehua Wei, Renfa Li, and Keqin Li. 2017. Resource Consumption Cost Minimization of Reliable Parallel Applications on Heterogeneous Embedded Systems. IEEE Trans. Ind. Informatics 13, 4 (2017), 1629--1640. https://doi.org/10.1109/TII.2016.2641473
[16]
Guoqi Xie, Junqiang Jiang, Yan Liu, Renfa Li, and Keqin Li. 2017. Minimizing Energy Consumption of Real-Time Parallel Applications Using Downward and Upward Approaches on Heterogeneous Systems. IEEE Trans. Ind. Informatics 13, 3 (2017), 1068--1078. https://doi.org/10.1109/TII.2017.2676183

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
GLSVLSI '20: Proceedings of the 2020 on Great Lakes Symposium on VLSI
September 2020
597 pages
ISBN:9781450379441
DOI:10.1145/3386263
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. heterogeneous (hybrid) systems
  2. process in memory
  3. task scheduling

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Conference

GLSVLSI '20
GLSVLSI '20: Great Lakes Symposium on VLSI 2020
September 7 - 9, 2020
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 328
    Total Downloads
  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)7
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media