[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

Published: 01 February 2014 Publication History

Abstract

Single-ISA Asymmetric Multicore (AMC) architectures have shown high performance as well as power efficiency. However, current parallel programming environments do not perform well on AMC because they are designed for symmetric multicore architectures in which all cores provide equal performance. Their random task scheduling policies can result in unbalanced workloads in AMC and severely degrade the performance of parallel applications. To balance the workloads of parallel applications in AMC, this article proposes an adaptive Workload-Aware Task Scheduler (WATS) that consists of a history-based task allocator and a preference-based task scheduler. The history-based task allocator is based on a near-optimal, static task allocation using the historical statistics collected during the execution of a parallel application. The preference-based task scheduler, which schedules tasks based on a preference list, can dynamically adjust the workloads in AMC if the task allocation is less optimal due to approximation in the history-based task allocator. Experimental results show that WATS can improve both the performance and energy efficiency of task-based applications, with the performance gain up to 66.1% compared with traditional task schedulers.

References

[1]
E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. 2009. The design of openmp tasks. IEEE Transactions on Parallel and Distributed Systems 20, 3, 404--418.
[2]
S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. 2005. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. IEEE, 506--517.
[3]
M. A. Bender and M. O. Rabin. 2000. Scheduling Cilk multithreaded parallel programs on processors of different speeds. In Proceedings of the 12th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, 13--21.
[4]
M. Bhadauria and S. A. McKee. 2010. An approach to resource-aware co-scheduling for cmps. In Proceedings of the 24th ACM International Conference on Supercomputing. ACM, 189--199.
[5]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, 72--81.
[6]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. 1996. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 37, 1f, 55--69.
[7]
Q. Chen, Y. Chen, Z. Huang, and M. Guo. 2012a. WATS: Workload-Aware Task Scheduling in asymmetric multi-core architectures. In Proceedings of the 26th International Parallel and Distributed Processing Symposium. IEEE, 249--260.
[8]
Q. Chen, M. Guo, Q. Deng, L. Zheng, S. Guo, and Y. Shen. 2013b. HAT: History-based auto-tuning MapReduce in heterogeneous environments. Journal of Supercomputing, 1--17.
[9]
Q. Chen, M. Guo, and Z. Huang. 2012b. CATS: Cache Aware Task-Stealing based on online profiling in multi-socket multi-core architectures. In Proceedings of the 26th International Conference on Supercomputing. IEEE, 163--172.
[10]
Q. Chen, M. Guo, and Z. Huang. 2013a. Adaptive cache aware bi-tier work-stealing in multi-socket multi-core architectures. IEEE Transactions on Parallel and Distributed Systems 24, 12, 2334--2343.
[11]
Q. Chen, Z. Huang, M. Guo, and J. Zhou. 2011. CAB: Cache-Aware Bi-tier task-stealing in multi-socket multi-core architecture. In Proceedings of the International Conference on Parallel Processing. IEEE.
[12]
M. De Vuyst, R. Kumar, and D. M. Tullsen. 2006. Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE.
[13]
A. El-Moursy, R. Garg, D. H. Albonesi, and S. Dwarkadas. 2006. Compatible phase co-scheduling on a CMP of multi-threaded processors. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE.
[14]
M. Frigo, C. E. Leiserson, and K. H. Randall. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 212--223.
[15]
S. Ghiasi, T. Keller, and F. Rawson. 2005. Scheduling for heterogeneous processors in server systems. In Proceedings of the 2nd Conference on Computing Frontiers. ACM, 199--210.
[16]
Y. Guo, R. Barik, R. Raman, and V. Sarkar. 2009. Work-first and help-first scheduling policies for async-finish task parallelism. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE.
[17]
Y. Guo, J. Zhao, V. Cave, and V. Sarkar. 2010. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE.
[18]
M. D. Hill and M. R. Marty. 2008. Amdahl's law in the multicore era. Computer 41, 7, 33--38.
[19]
D. S. Hochbaum and D. B. Shmoys. 1988. A polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach. SIAM Journal on Computing 17, 539.
[20]
S. Hofmeyr, C. Iancu, and F. Blagojević. 2010. Load balancing on speed. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 147--158.
[21]
J. A. Joao, M. Aater Suleman, O. Mutlu, and Y. N. Patt. 2012. Bottleneck identification and scheduling in multithreaded applications. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems. 223--234.
[22]
D. Koufaty, D. Reddy, and S. Hahn. 2010. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of the 5th European Conference on Computer Systems. ACM, 125--138.
[23]
R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan. 2005. Heterogeneous chip multiprocessors. Computer 38, 11, 32--38.
[24]
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture. IEEE.
[25]
N. B. Lakshminarayana, J. Lee, and H. Kim. 2009. Age based scheduling for asymmetric multiprocessors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, 25.
[26]
J. K. Lee and J. Palsberg. 2010. Featherweight X10: A core calculus for async-finish parallelism. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 25--36.
[27]
T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing. ACM.
[28]
J. W. S. Liu and C. L. Liu. 1974. Bounds on Scheduling Algorithms for Heterogeneous Computing Systems. Department of Computer Science, University of Illinois at Urbana--Champaign.
[29]
M. Mahoney. 2013. Data Compression Programs. http://mattmahoney.net/dc/.
[30]
C. Maia, L. Nogueira, and L. M. Pinho. 2013. Scheduling parallel real-time tasks using a fixed-priority work-stealing algorithm on multiprocessors. In Proceedings of the 8th International Symposium on Industrial Embedded Systems. IEEE.
[31]
S. Mattheis, T. Schuele, A. Raabe, T. Henties, and U. Gleim. 2012. Work stealing strategies for parallel stream processing in soft real-time systems. In Architecture of Computing Systems. Springer, 172--183.
[32]
S. Miguet and J.-M. Pierson. 1997. Heuristics for 1D rectilinear partitioning as a low cost and high quality answer to dynamic load balancing. In Proceedings of HPCN Europe. 550--564.
[33]
A. Navarro, R. Asenjo, S. Tabik, and C. Caşcaval. 2009. Load balancing using work-stealing for pipeline parallelism in emerging applications. In Proceedings of the 23rd International Conference on Supercomputing. ACM, 517--518.
[34]
A. Patel, F. Afram, S. Chen, and K. Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Proceedings of the 48th Design Automation Conference. ACM, 1050--1055.
[35]
J. Reinders. 2007. Intel Threading Building Blocks. O'Reilly.
[36]
A. L. Rosenberg and R. C. Chiang. 2010. Toward understanding heterogeneity in computing. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE, 1--10.
[37]
J. C. Saez, A. Fedorova, M. Prieto, and H. Vegas. 2010a. Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors. In Proceedings of the 7th ACM International Conference on Computing Frontiers. ACM, 31--40.
[38]
J. C. Saez, M. Prieto, A. Fedorova, and S. Blagodurov. 2010b. A comprehensive scheduler for asymmetric multicore systems. In Proceedings of the 5th European Conference on Computer Systems. ACM, 139--152.
[39]
D. Shelepov, J. C. Saez Alcaide, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar. 2009. HASS: A scheduler for heterogeneous multicore systems. ACM SIGOPS Operating Systems Review 43, 2, 66--75.
[40]
M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. 2009. Accelerating critical section execution with asymmetric multi-core architectures. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 253--264.
[41]
K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 39th International Symposium on Computer Architecture. IEEE, 213--224.
[42]
L. Zheng, Y. Lu, M. Ding, Y. Shen, M. Guo, and S. Guo. 2011. Architecture-based performance evaluation of genetic algorithms on multi/many-core systems. In Proceedings of the 14th International Conference on Computational Science and Engineering. IEEE, 321--334.

Cited By

View all
  • (2022)Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310425533:6(1303-1320)Online publication date: 1-Jun-2022
  • (2020)Service Function Chain Deployment and Network Flow Scheduling in Geo-Distributed Data CentersIEEE Transactions on Network Science and Engineering10.1109/TNSE.2020.29973767:4(2587-2597)Online publication date: 1-Oct-2020
  • (2020)Fairness-Aware Energy Efficient Scheduling on Heterogeneous Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2020.2984607(1-1)Online publication date: 2020
  • Show More Cited By

Index Terms

  1. Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 11, Issue 1
    February 2014
    373 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/2591460
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 February 2014
    Accepted: 01 November 2013
    Revised: 01 November 2013
    Received: 01 June 2013
    Published in TACO Volume 11, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Task grouping
    2. dynamic task scheduling
    3. history-based task allocation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)75
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310425533:6(1303-1320)Online publication date: 1-Jun-2022
    • (2020)Service Function Chain Deployment and Network Flow Scheduling in Geo-Distributed Data CentersIEEE Transactions on Network Science and Engineering10.1109/TNSE.2020.29973767:4(2587-2597)Online publication date: 1-Oct-2020
    • (2020)Fairness-Aware Energy Efficient Scheduling on Heterogeneous Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2020.2984607(1-1)Online publication date: 2020
    • (2020)Vision Paper: Grand Challenges in Resilience: Autonomous System Resilience through Design and Runtime MeasuresIEEE Open Journal of the Computer Society10.1109/OJCS.2020.30068071(155-172)Online publication date: 2020
    • (2019)Heterogeneous Multiprocessor Matching Degree Scheduling Algorithm Based on OpenCL FrameworkIOP Conference Series: Materials Science and Engineering10.1088/1757-899X/490/4/042045490(042045)Online publication date: 10-Apr-2019
    • (2018)Bandwidth and Locality Aware Task-stealing for Manycore Architectures with Bandwidth-Asymmetric MemoryACM Transactions on Architecture and Code Optimization10.1145/329105815:4(1-26)Online publication date: 8-Dec-2018
    • (2018)Contention and Locality-Aware Work-Stealing for Iterative Applications in Multi-Socket ComputersIEEE Transactions on Computers10.1109/TC.2017.278393267:6(784-798)Online publication date: 1-Jun-2018
    • (2018)Energy-Efficient Phase-Aware Load Balancing on Asymmetric Multicore Processors2018 IEEE 4th International Conference on Computer and Communications (ICCC)10.1109/CompComm.2018.8780697(2575-2579)Online publication date: Dec-2018
    • (2017)Preemption-Aware Kernel Scheduling for GPUs2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC)10.1109/ISPA/IUCC.2017.00087(525-532)Online publication date: Dec-2017
    • (2017)Electro: Toward QoS-Aware Power Management for Latency-Critical Applications2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC)10.1109/ISPA/IUCC.2017.00040(221-228)Online publication date: Dec-2017
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media