More Web Proxy on the site http://driver.im/

research-article

Open access

Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

Authors:

Minyi GuoAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 11, Issue 1

Article No.: 8, Pages 1 - 25

https://doi.org/10.1145/2579674

Published: 01 February 2014 Publication History

Abstract

Single-ISA Asymmetric Multicore (AMC) architectures have shown high performance as well as power efficiency. However, current parallel programming environments do not perform well on AMC because they are designed for symmetric multicore architectures in which all cores provide equal performance. Their random task scheduling policies can result in unbalanced workloads in AMC and severely degrade the performance of parallel applications. To balance the workloads of parallel applications in AMC, this article proposes an adaptive Workload-Aware Task Scheduler (WATS) that consists of a history-based task allocator and a preference-based task scheduler. The history-based task allocator is based on a near-optimal, static task allocation using the historical statistics collected during the execution of a parallel application. The preference-based task scheduler, which schedules tasks based on a preference list, can dynamically adjust the workloads in AMC if the task allocation is less optimal due to approximation in the history-based task allocator. Experimental results show that WATS can improve both the performance and energy efficiency of task-based applications, with the performance gain up to 66.1% compared with traditional task schedulers.

References

[1]

E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. 2009. The design of openmp tasks. IEEE Transactions on Parallel and Distributed Systems 20, 3, 404--418.

Digital Library

[2]

S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. 2005. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. IEEE, 506--517.

Digital Library

[3]

M. A. Bender and M. O. Rabin. 2000. Scheduling Cilk multithreaded parallel programs on processors of different speeds. In Proceedings of the 12th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, 13--21.

Digital Library

[4]

M. Bhadauria and S. A. McKee. 2010. An approach to resource-aware co-scheduling for cmps. In Proceedings of the 24th ACM International Conference on Supercomputing. ACM, 189--199.

Digital Library

[5]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, 72--81.

Digital Library

[6]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. 1996. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 37, 1f, 55--69.

Digital Library

[7]

Q. Chen, Y. Chen, Z. Huang, and M. Guo. 2012a. WATS: Workload-Aware Task Scheduling in asymmetric multi-core architectures. In Proceedings of the 26th International Parallel and Distributed Processing Symposium. IEEE, 249--260.

Digital Library

[8]

Q. Chen, M. Guo, Q. Deng, L. Zheng, S. Guo, and Y. Shen. 2013b. HAT: History-based auto-tuning MapReduce in heterogeneous environments. Journal of Supercomputing, 1--17.

Digital Library

[9]

Q. Chen, M. Guo, and Z. Huang. 2012b. CATS: Cache Aware Task-Stealing based on online profiling in multi-socket multi-core architectures. In Proceedings of the 26th International Conference on Supercomputing. IEEE, 163--172.

Digital Library

[10]

Q. Chen, M. Guo, and Z. Huang. 2013a. Adaptive cache aware bi-tier work-stealing in multi-socket multi-core architectures. IEEE Transactions on Parallel and Distributed Systems 24, 12, 2334--2343.

Digital Library

[11]

Q. Chen, Z. Huang, M. Guo, and J. Zhou. 2011. CAB: Cache-Aware Bi-tier task-stealing in multi-socket multi-core architecture. In Proceedings of the International Conference on Parallel Processing. IEEE.

Digital Library

[12]

M. De Vuyst, R. Kumar, and D. M. Tullsen. 2006. Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE.

Digital Library

[13]

A. El-Moursy, R. Garg, D. H. Albonesi, and S. Dwarkadas. 2006. Compatible phase co-scheduling on a CMP of multi-threaded processors. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE.

Digital Library

[14]

M. Frigo, C. E. Leiserson, and K. H. Randall. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 212--223.

Digital Library

[15]

S. Ghiasi, T. Keller, and F. Rawson. 2005. Scheduling for heterogeneous processors in server systems. In Proceedings of the 2nd Conference on Computing Frontiers. ACM, 199--210.

Digital Library

[16]

Y. Guo, R. Barik, R. Raman, and V. Sarkar. 2009. Work-first and help-first scheduling policies for async-finish task parallelism. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE.

Digital Library

[17]

Y. Guo, J. Zhao, V. Cave, and V. Sarkar. 2010. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE.

[18]

M. D. Hill and M. R. Marty. 2008. Amdahl's law in the multicore era. Computer 41, 7, 33--38.

Digital Library

[19]

D. S. Hochbaum and D. B. Shmoys. 1988. A polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach. SIAM Journal on Computing 17, 539.

Digital Library

[20]

S. Hofmeyr, C. Iancu, and F. Blagojević. 2010. Load balancing on speed. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 147--158.

Digital Library

[21]

J. A. Joao, M. Aater Suleman, O. Mutlu, and Y. N. Patt. 2012. Bottleneck identification and scheduling in multithreaded applications. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems. 223--234.

Digital Library

[22]

D. Koufaty, D. Reddy, and S. Hahn. 2010. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of the 5th European Conference on Computer Systems. ACM, 125--138.

Digital Library

[23]

R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan. 2005. Heterogeneous chip multiprocessors. Computer 38, 11, 32--38.

Digital Library

[24]

R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture. IEEE.

Digital Library

[25]

N. B. Lakshminarayana, J. Lee, and H. Kim. 2009. Age based scheduling for asymmetric multiprocessors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, 25.

Digital Library

[26]

J. K. Lee and J. Palsberg. 2010. Featherweight X10: A core calculus for async-finish parallelism. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 25--36.

Digital Library

[27]

T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing. ACM.

Digital Library

[28]

J. W. S. Liu and C. L. Liu. 1974. Bounds on Scheduling Algorithms for Heterogeneous Computing Systems. Department of Computer Science, University of Illinois at Urbana--Champaign.

[29]

M. Mahoney. 2013. Data Compression Programs. http://mattmahoney.net/dc/.

[30]

C. Maia, L. Nogueira, and L. M. Pinho. 2013. Scheduling parallel real-time tasks using a fixed-priority work-stealing algorithm on multiprocessors. In Proceedings of the 8th International Symposium on Industrial Embedded Systems. IEEE.

[31]

S. Mattheis, T. Schuele, A. Raabe, T. Henties, and U. Gleim. 2012. Work stealing strategies for parallel stream processing in soft real-time systems. In Architecture of Computing Systems. Springer, 172--183.

Digital Library

[32]

S. Miguet and J.-M. Pierson. 1997. Heuristics for 1D rectilinear partitioning as a low cost and high quality answer to dynamic load balancing. In Proceedings of HPCN Europe. 550--564.

Digital Library

[33]

A. Navarro, R. Asenjo, S. Tabik, and C. Caşcaval. 2009. Load balancing using work-stealing for pipeline parallelism in emerging applications. In Proceedings of the 23rd International Conference on Supercomputing. ACM, 517--518.

Digital Library

[34]

A. Patel, F. Afram, S. Chen, and K. Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Proceedings of the 48th Design Automation Conference. ACM, 1050--1055.

Digital Library

[35]

J. Reinders. 2007. Intel Threading Building Blocks. O'Reilly.

Digital Library

[36]

A. L. Rosenberg and R. C. Chiang. 2010. Toward understanding heterogeneity in computing. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE, 1--10.

[37]

J. C. Saez, A. Fedorova, M. Prieto, and H. Vegas. 2010a. Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors. In Proceedings of the 7th ACM International Conference on Computing Frontiers. ACM, 31--40.

Digital Library

[38]

J. C. Saez, M. Prieto, A. Fedorova, and S. Blagodurov. 2010b. A comprehensive scheduler for asymmetric multicore systems. In Proceedings of the 5th European Conference on Computer Systems. ACM, 139--152.

Digital Library

[39]

D. Shelepov, J. C. Saez Alcaide, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar. 2009. HASS: A scheduler for heterogeneous multicore systems. ACM SIGOPS Operating Systems Review 43, 2, 66--75.

Digital Library

[40]

M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. 2009. Accelerating critical section execution with asymmetric multi-core architectures. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 253--264.

Digital Library

[41]

K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 39th International Symposium on Computer Architecture. IEEE, 213--224.

Digital Library

[42]

L. Zheng, Y. Lu, M. Ding, Y. Shen, M. Guo, and S. Guo. 2011. Architecture-based performance evaluation of genetic algorithms on multi/many-core systems. In Proceedings of the 14th International Conference on Computational Science and Engineering. IEEE, 321--334.

Digital Library

Cited By

Huang TLin DLin CLin Y(2022)Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310425533:6(1303-1320)Online publication date: 1-Jun-2022
https://doi.org/10.1109/TPDS.2021.3104255
Gu LHu JZeng DGuo SJin H(2020)Service Function Chain Deployment and Network Flow Scheduling in Geo-Distributed Data CentersIEEE Transactions on Network Science and Engineering10.1109/TNSE.2020.29973767:4(2587-2597)Online publication date: 1-Oct-2020
https://doi.org/10.1109/TNSE.2020.2997376
Salami BNoori HNaghibzadeh M(2020)Fairness-Aware Energy Efficient Scheduling on Heterogeneous Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2020.2984607(1-1)Online publication date: 2020
https://doi.org/10.1109/TC.2020.2984607
Show More Cited By

Index Terms

Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management

Recommendations

Improving makespan in dynamic task scheduling for cloud robotic systems with time window constraints
Abstract
A scheduling method in a robotic network cloud system with minimal makespan is beneficial as the system can complete all the tasks assigned to it in the fastest way. Time window constraints on tasks are a natural way to order tasks. The makespan ...
Task scheduling using Ant Colony Optimization in multicore architectures: a survey

The problem of determining a set of real-time tasks that can be assigned to the multiprocessors and finding a feasible solution of scheduling these tasks among the multiprocessors is a challenging issue and known to be NP-complete. Many applications ...
Dynamic Scheduling Real-Time Task Using Primary-Backup Overloading Strategy for Multiprocessor Systems

The scheduling of real-time tasks with fault-tolerant requirements has been an important problem in multiprocessor systems. The primary-backup (PB) approach is often used as a fault-tolerant technique to guarantee the deadlines of tasks despite the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 11, Issue 1

February 2014

373 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2591460

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2014

Accepted: 01 November 2013

Revised: 01 November 2013

Received: 01 June 2013

Published in TACO Volume 11, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Shanghai Excellent Academic Leaders Plan (No. 11XD1402900)
National Natural Science Foundation of China
Program for Changjiang Scholars and Innovative Research Team in University (IRT1158, PCSIRT) China
Ministry of Science and Technology of the People's Republic of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
732
Total Downloads

Downloads (Last 12 months)75
Downloads (Last 6 weeks)8

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang TLin DLin CLin Y(2022)Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310425533:6(1303-1320)Online publication date: 1-Jun-2022
https://doi.org/10.1109/TPDS.2021.3104255
Gu LHu JZeng DGuo SJin H(2020)Service Function Chain Deployment and Network Flow Scheduling in Geo-Distributed Data CentersIEEE Transactions on Network Science and Engineering10.1109/TNSE.2020.29973767:4(2587-2597)Online publication date: 1-Oct-2020
https://doi.org/10.1109/TNSE.2020.2997376
Salami BNoori HNaghibzadeh M(2020)Fairness-Aware Energy Efficient Scheduling on Heterogeneous Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2020.2984607(1-1)Online publication date: 2020
https://doi.org/10.1109/TC.2020.2984607
Bagchi SAggarwal VChaterji SDouglis FGamal AHan JJ. Henz BHoffmann HJana SKulkarni MLin FMarais KMittal PMou SQiu XScutari G(2020)Vision Paper: Grand Challenges in Resilience: Autonomous System Resilience through Design and Runtime MeasuresIEEE Open Journal of the Computer Society10.1109/OJCS.2020.30068071(155-172)Online publication date: 2020
https://doi.org/10.1109/OJCS.2020.3006807
Fang JYu TWei Z(2019)Heterogeneous Multiprocessor Matching Degree Scheduling Algorithm Based on OpenCL FrameworkIOP Conference Series: Materials Science and Engineering10.1088/1757-899X/490/4/042045490(042045)Online publication date: 10-Apr-2019
https://doi.org/10.1088/1757-899X/490/4/042045
Zhao HChen QQiu YWu MShen YLeng JLi CGuo M(2018)Bandwidth and Locality Aware Task-stealing for Manycore Architectures with Bandwidth-Asymmetric MemoryACM Transactions on Architecture and Code Optimization10.1145/329105815:4(1-26)Online publication date: 8-Dec-2018
https://dl.acm.org/doi/10.1145/3291058
Chen QGuo M(2018)Contention and Locality-Aware Work-Stealing for Iterative Applications in Multi-Socket ComputersIEEE Transactions on Computers10.1109/TC.2017.278393267:6(784-798)Online publication date: 1-Jun-2018
https://doi.org/10.1109/TC.2017.2783932
Zhao SHao CJian ZLi M(2018)Energy-Efficient Phase-Aware Load Balancing on Asymmetric Multicore Processors2018 IEEE 4th International Conference on Computer and Communications (ICCC)10.1109/CompComm.2018.8780697(2575-2579)Online publication date: Dec-2018
https://doi.org/10.1109/CompComm.2018.8780697
Jin SWang ZChen QGuo M(2017)Preemption-Aware Kernel Scheduling for GPUs2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC)10.1109/ISPA/IUCC.2017.00087(525-532)Online publication date: Dec-2017
https://doi.org/10.1109/ISPA/IUCC.2017.00087
Lu YChen QShen YGuo M(2017)Electro: Toward QoS-Aware Power Management for Latency-Critical Applications2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC)10.1109/ISPA/IUCC.2017.00040(221-228)Online publication date: Dec-2017
https://doi.org/10.1109/ISPA/IUCC.2017.00040
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents