Abstract
MapReduce, a popular programming model for processing data-intensive tasks, has achieved great success in a wide range of applications such as search indexing, social network mining, collaborative recommendation, and spam detection. However, the ability of MapReduce is limited in two respects by its default schedulers. First, it does not support concurrent services sharing a cloud datacenter and second, it fails to guarantee response time for deadline-constrained services. This paper proposes the Paused Rate Monotonic (PRM) algorithm for scheduling hard real-time tasks on a MapReduce-based cloud. The scheduling performance is analyzed theoretically. We prove a bound on cluster utilization, which can be used as a sufficient condition to test whether a given task set can be scheduled. Both the theoretical analysis and experimental evaluation show that the PRM algorithm outperforms traditional real-time ones by improving the probability that a real-time task set can be scheduled on a MapReduce-based cloud.
Similar content being viewed by others
References
Berliska J, Drozdowski M (2011) Scheduling divisible MapReduce computations. J Paral Distr Comput 71(3):450–459
Bini E, Buttazzo GC (2005) Measuring the performance of schedulability tests. Real Time Syst 30(1–2):129–154
Chang H, Kodialam M, Kompella R, Lakshman TV, Lee M, Mukherjee S (2011) Scheduling in mapreduce-like systems for fast completion time. In: Proceedings of the IEEE International Conference on Computer Communications, INFOCOM’11, pp 3074–3082
Chen F, Kodialam M, Lakshman T (2012) Joint scheduling of processing and shuffle phases in mapreduce systems. In: Proceedings of the IEEE International Conference on Computer Communications, INFOCOM’12, pp 1143–1151
Chen Q, Guo M, Deng Q, Zheng L, Guo S, Shen Y (2011) Hat: history-based auto-tuning mapreduce in heterogeneous environments. J Supercomput 64(3): 1038–1054
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Dong X, Wang Y, Liao H (2011) Scheduling mixed real-time and non-real-time applications in mapreduce environment. In: Proceedings of the IEEE 17th International Conference on Parallel and Distributed Systems, ICPADS’11, pp 9–16
Ferguson A, Bodik P, Kandula S, Boutin E, Fonseca R (2012) Jockey: guaranteed job latency in data parallel clusters. In: Proceedings of the 7th ACM European Conference on Computer Systems, Eurosys’12, pp 99–112
He C, Lu Y, Swanson D (2013) Real-time scheduling in mapreduce clusters. In: Proceedings of IEEE the 15th International Conference on High Performance Computing and Communications, HPCC’13, pp 1536–1544
Herodotou H, Babu S (2013) A what-if engine for cost-based MapReduce optimization. IEEE Data Eng Bull 36(1):5–14
Kc K and Anyanwu K (2010). Scheduling hadoop jobs to meet deadlines. In: Proceeding of IEEE Second International Conference on Cloud Computing Technology and Science, CloudCom’10, pp 388–392
Lee K-H, Lee Y-J, Choi H, Chung YD, Moon B (2012) Parallel data processing with MapReduce: a survey. SIGMOD Record 40(4):11–20
Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard real-time environment. J Assoc Comput Mach 20(1):46–61
Morton K, Balazinska M, Grossman D (2010) Paratimer: a progress indicator for mapreduce dags. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pp 507–518
Moseley B, Dasgupta A, Kumar R, Sarlós T (2011) On scheduling in map-reduce and flow-shops. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’11, pp 289–298
Myung J, Lee SG (2013) Exploiting inter-operation parallelism for matrix chain multiplication using MapReduce. J Supercomput 66(1):594–609
Phan LT, Zhang Z, Loo BT, Lee I (2010) Real time MapReduce scheduling. Computer and Information Science, University of Pennsylvania, Pennsylvania
Polo J, Carrera D, Becerra Y, Torres J, Ayguade E, Steinder M, Whalley I (2010). Performance-driven task co-scheduling for mapreduce environments. In: Proceeding of 11th IEEE Network Operations and Management Symposium, NOMS’11, pp 373–380
Sandholm T and Lai K (2010) Dynamic proportional share scheduling in hadoop. In: Proceedings of the 15th International Conference on Job scheduling Strategies for Parallel Processing, JSSPP’10, pp 110–131
Serrano D, Bouchenak S, Kouki Y, Ledoux T, Lejeune J, Sopena J, Arantes L, Sens P et al. (2013) Towards qos-oriented sla guarantees for online cloud services. In: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid, Computing, CCGRID2013, pp 10–18
Teng F, Yang H, Li T, Yang Y, Li Z (2013) Scheduling real-time workflow on mapreduce-based cloud. In: Proceedings of the Third International Conference on Innovative Computing Technology, INTECH’13, pp 117–122
Teng F, Yu L, Magoulès F (2011) Simmapreduce: a simulator for modeling mapreduce framework. In: Proceeding of International Conference on Multimedia and Ubiquitous, Engineering, MUE’11, pp 277–282
Verma A, Cherkasova L, Campbell RH (2011) Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC ’11, pp 235–244
Wang W-J, Chang Y-S, Lo W-T, Lee Y-K (2013) Adaptive scheduling for parallel tasks with qos satisfaction for hybrid cloud environments. J Supercomput 66(2):783–811
Yu L, Magoulès F (2009) Service scheduling and rescheduling in an applications integration framework. Adv Eng Softw 40(9):941–946
Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. Department of Electrical Engineering and Computer Sciences, University of California, Berkeley
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer systems, EuroSys ’10, pp 265–278
Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pp 29–42
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partially supported by the National Natural Science Foundation of China (No. 61202043), the Fundamental Research Funds for the Central Universities (No. SWJTU12CX098), and the IRT SystemX (Pôle de Compétitivité Systematic).
Rights and permissions
About this article
Cite this article
Teng, F., Magoulès, F., Yu, L. et al. A novel real-time scheduling algorithm and performance analysis of a MapReduce-based cloud. J Supercomput 69, 739–765 (2014). https://doi.org/10.1007/s11227-014-1115-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1115-z