Abstract
Job scheduling of MapReduce is a research hot spot, especially on the heterogeneous datacenter. Huge energy consumption and operating costs are key challenges. Most of the previous work only considers the scheduling optimization of a single job. In this paper, we take multiple jobs of MapReduce as research objects and focus on the goal of “jointly optimizing the scheduling time, job costs and energy consumption.” For that, an energy- and locality-efficient MapReduce multi-job scheduling algorithm is developed for the heterogeneous datacenter. Firstly, we use rack as the basic unit of resource in job scheduling to reduce data communication between jobs and to facilitate energy savings. Secondly, according to the capacity of heterogeneous rack, we design a multi-job pre-mapping method to optimize the execution order of jobs and jointly optimize the scheduling time, job costs and energy consumption. Based this pre-mapping method, we can assign one job to the virtual machine on the same rack, so as to minimize the amount of online rack. This centralized mapping strategy is very helpful to save energy and reduce data transmission of jobs. Thirdly, the map and reduce tasks of a job will be divided into multiple task groups for parallel execution, thereby further reducing data communication and energy consumption. Finally, a lot of experimental results prove the advantages of our algorithm.
Similar content being viewed by others
References
Hashem IAT, Anuar NB, Marjani M et al (2018) MapReduce scheduling algorithms: a review. J Supercomput 2018(1):1–31
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Dahiphale D, Karve R, Vasilakos AV et al (2014) An advanced mapreduce: cloud mapreduce, enhancements and applications. IEEE Trans Netw Serv Manag 11(1):101–115
Mashayekhy L, Nejad MM, Grosu D et al (2015) Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733
Bampis E, Chau V, Letsios D, Lucarelli G, Milis I, Zois G (2014) Energy efficient scheduling of mapreduce jobs. In: Euro-Par 2014 parallel processing. Springer
Wang J, Li X, Yang J (2015) Energy-aware task scheduling of mapreduce cluster. In: 2015 international conference on service science (ICSS)
Maheshwari N, Nanduri R, Varma V (2012) Dynamic energy efficient data placement and cluster reconfiguration algorithm for mapreduce framework. Future Gener Comput Syst 28(1):119–127
Chen Y, Alspaugh S, Borthakur D, et al (2012) Energy efficiency for large-scale mapreduce workloads with significant interactive analysis. In: Proceedings of the 7th ACM European conference on computer systems
Palanisamy B, Singh A, Liu L, Jain B (2011) Purlieus: locality-aware resource allocation for mapreduce in a cloud. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis
Chen L, Zhang J, Cai L et al (2017) Fast community detection based on distance dynamics. Tsinghua Sci Technol 22(6):564–585
Tang Z, Jiang L, Zhou J, Li K, Li K (2015) A self-adaptive scheduling algorithm for reduce start time. Future Gener Comput Syst 43:51–60
Ramanathan R, Latha B (2018) Towards optimal resource provisioning for Hadoop-MapReduce jobs using scale-out strategy and its performance analysis in private cloud environment. Clust Comput 2:1–11
Lin JW, Arul JM, Lin CY (2018) Joint deadline-constrained and influence-aware design for allocating MapReduce jobs in cloud computing systems. Clust Comput 1:1–14
Zhu Y, Jiang Y, Wu W, Ding L, Teredesai A, Li D, Lee W (2014) Minimizing makespan and total completion time in mapreduce-like systems. In: 2014 proceedings on INFOCOM. IEEE
Palanisamy B, Singh A, Liu L (2015) Cost-effective resource provisioning for mapreduce in a cloud. IEEE Trans Parallel Distrib Syst 26(5):1265–1279
Lin M, Zhang L, Wierman A, Tan J (2013) Joint optimization of overlapping phases in mapreduce. Perform Eval 70(10):720–735
Heintz B, Chandra A, Weissman J (2014) Cross-phase optimization in mapreduce. In: Cloud computing for data-intensive applications
Anjos JC, Carrera I, Kolberg W, Tibola AL, Arantes LB, Geyer CR (2015) Mar++: scheduling and data placement on mapreduce for heterogeneous environments. Future Gener Comput Syst 42:22–35
Jin H, Yang X, Sun X-H, Raicu I (2012) Adapt: availability-aware mapreduce data placement for non-dedicated distributed computing. In: 2012 IEEE 32nd international conference on distributed computing systems (ICDCS). IEEE
Xie J, Yin S, Ruan X, Ding Z, Tian Y, Majors J, Manzanares A, Qin X (2010) Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In: 2010 IEEE international symposium on parallel and distributed processing, workshops and Ph.D. forum (IPDPSW). IEEE
Al-Khasawneh MA, Shamsuddin SM, Hasan S et al (2018) MapReduce a comprehensive review. In: 2018 international conference on smart computing and electronic enterprise (ICSCEE) on IEEE
Gregory A, Majumdar S (2018) Resource management for deadline constrained MapReduce jobs for minimising energy consumption. Int J Big Data Intell 5(4):270–287
Elzein NM, Majid MA, Hashem IAT et al (2018) Managing big RDF data in clouds: challenges, opportunities, and solutions. Sustain Cities Soc 39:375–386
Chen L, Zhang J, Cai L et al (2016) Locality-aware and energy-aware job pre-assignment for mapreduce. In: International conference on intelligent networking and collaborative systems
Acknowledgements
This work was supported by the Science Research Project of Education Department of Hunan Province (18C0296); the Open Project of State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body (31715010); Hunan Provincial Natural Science Foundation of China (2018JJ2134); Hunan Provincial Young Talents Project (2018RS3095); and Ph.D. research startup foundation of Hunan University of Science and Technology (E51863).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, L., Liu, ZH. Energy- and locality-efficient multi-job scheduling based on MapReduce for heterogeneous datacenter. SOCA 13, 297–308 (2019). https://doi.org/10.1007/s11761-019-00273-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11761-019-00273-x