[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

A novel real-time scheduling algorithm and performance analysis of a MapReduce-based cloud

Published: 01 August 2014 Publication History

Abstract

MapReduce, a popular programming model for processing data-intensive tasks, has achieved great success in a wide range of applications such as search indexing, social network mining, collaborative recommendation, and spam detection. However, the ability of MapReduce is limited in two respects by its default schedulers. First, it does not support concurrent services sharing a cloud datacenter and second, it fails to guarantee response time for deadline-constrained services. This paper proposes the Paused Rate Monotonic (PRM) algorithm for scheduling hard real-time tasks on a MapReduce-based cloud. The scheduling performance is analyzed theoretically. We prove a bound on cluster utilization, which can be used as a sufficient condition to test whether a given task set can be scheduled. Both the theoretical analysis and experimental evaluation show that the PRM algorithm outperforms traditional real-time ones by improving the probability that a real-time task set can be scheduled on a MapReduce-based cloud.

References

[1]
Berliska J, Drozdowski M (2011) Scheduling divisible MapReduce computations. J Paral Distr Comput 71(3):450---459
[2]
Bini E, Buttazzo GC (2005) Measuring the performance of schedulability tests. Real Time Syst 30(1---2):129---154
[3]
Chang H, Kodialam M, Kompella R, Lakshman TV, Lee M, Mukherjee S (2011) Scheduling in mapreduce-like systems for fast completion time. In: Proceedings of the IEEE International Conference on Computer Communications, INFOCOM'11, pp 3074---3082
[4]
Chen F, Kodialam M, Lakshman T (2012) Joint scheduling of processing and shuffle phases in mapreduce systems. In: Proceedings of the IEEE International Conference on Computer Communications, INFOCOM'12, pp 1143---1151
[5]
Chen Q, Guo M, Deng Q, Zheng L, Guo S, Shen Y (2011) Hat: history-based auto-tuning mapreduce in heterogeneous environments. J Supercomput 64(3): 1038---1054
[6]
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107---113
[7]
Dong X, Wang Y, Liao H (2011) Scheduling mixed real-time and non-real-time applications in mapreduce environment. In: Proceedings of the IEEE 17th International Conference on Parallel and Distributed Systems, ICPADS'11, pp 9---16
[8]
Ferguson A, Bodik P, Kandula S, Boutin E, Fonseca R (2012) Jockey: guaranteed job latency in data parallel clusters. In: Proceedings of the 7th ACM European Conference on Computer Systems, Eurosys'12, pp 99---112
[9]
He C, Lu Y, Swanson D (2013) Real-time scheduling in mapreduce clusters. In: Proceedings of IEEE the 15th International Conference on High Performance Computing and Communications, HPCC'13, pp 1536---1544
[10]
Herodotou H, Babu S (2013) A what-if engine for cost-based MapReduce optimization. IEEE Data Eng Bull 36(1):5---14
[11]
Kc K and Anyanwu K (2010). Scheduling hadoop jobs to meet deadlines. In: Proceeding of IEEE Second International Conference on Cloud Computing Technology and Science, CloudCom'10, pp 388---392
[12]
Lee K-H, Lee Y-J, Choi H, Chung YD, Moon B (2012) Parallel data processing with MapReduce: a survey. SIGMOD Record 40(4):11---20
[13]
Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard real-time environment. J Assoc Comput Mach 20(1):46---61
[14]
Morton K, Balazinska M, Grossman D (2010) Paratimer: a progress indicator for mapreduce dags. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pp 507---518
[15]
Moseley B, Dasgupta A, Kumar R, Sarlós T (2011) On scheduling in map-reduce and flow-shops. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '11, pp 289---298
[16]
Myung J, Lee SG (2013) Exploiting inter-operation parallelism for matrix chain multiplication using MapReduce. J Supercomput 66(1):594---609
[17]
Phan LT, Zhang Z, Loo BT, Lee I (2010) Real time MapReduce scheduling. Computer and Information Science, University of Pennsylvania, Pennsylvania
[18]
Polo J, Carrera D, Becerra Y, Torres J, Ayguade E, Steinder M, Whalley I (2010). Performance-driven task co-scheduling for mapreduce environments. In: Proceeding of 11th IEEE Network Operations and Management Symposium, NOMS'11, pp 373---380
[19]
Sandholm T and Lai K (2010) Dynamic proportional share scheduling in hadoop. In: Proceedings of the 15th International Conference on Job scheduling Strategies for Parallel Processing, JSSPP'10, pp 110---131
[20]
Serrano D, Bouchenak S, Kouki Y, Ledoux T, Lejeune J, Sopena J, Arantes L, Sens P et al. (2013) Towards qos-oriented sla guarantees for online cloud services. In: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid, Computing, CCGRID2013, pp 10---18
[21]
Teng F, Yang H, Li T, Yang Y, Li Z (2013) Scheduling real-time workflow on mapreduce-based cloud. In: Proceedings of the Third International Conference on Innovative Computing Technology, INTECH'13, pp 117---122
[22]
Teng F, Yu L, Magoulès F (2011) Simmapreduce: a simulator for modeling mapreduce framework. In: Proceeding of International Conference on Multimedia and Ubiquitous, Engineering, MUE'11, pp 277---282
[23]
Verma A, Cherkasova L, Campbell RH (2011) Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC '11, pp 235---244
[24]
Wang W-J, Chang Y-S, Lo W-T, Lee Y-K (2013) Adaptive scheduling for parallel tasks with qos satisfaction for hybrid cloud environments. J Supercomput 66(2):783---811
[25]
Yu L, Magoulès F (2009) Service scheduling and rescheduling in an applications integration framework. Adv Eng Softw 40(9):941---946
[26]
Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. Department of Electrical Engineering and Computer Sciences, University of California, Berkeley
[27]
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer systems, EuroSys '10, pp 265---278
[28]
Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pp 29---42

Cited By

View all
  • (2023)MapReduce scheduling algorithms in Hadoop: a systematic studyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00520-912:1Online publication date: 10-Oct-2023
  • (2020)Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data centerService Oriented Computing and Applications10.1007/s11761-020-00290-114:2(101-118)Online publication date: 1-Jun-2020
  • (2019)A systematic literature review on MapReduce scheduling methodsIntelligent Decision Technologies10.3233/IDT-19036313:1(1-21)Online publication date: 1-Jan-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 69, Issue 2
August 2014
497 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2014

Author Tags

  1. Cloud computing
  2. Cluster utilization
  3. MapReduce
  4. Real-time scheduling
  5. Schedulability bound

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)MapReduce scheduling algorithms in Hadoop: a systematic studyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00520-912:1Online publication date: 10-Oct-2023
  • (2020)Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data centerService Oriented Computing and Applications10.1007/s11761-020-00290-114:2(101-118)Online publication date: 1-Jun-2020
  • (2019)A systematic literature review on MapReduce scheduling methodsIntelligent Decision Technologies10.3233/IDT-19036313:1(1-21)Online publication date: 1-Jan-2019
  • (2018)Multi-queue scheduling of heterogeneous jobs in hybrid geo-distributed cloud environmentThe Journal of Supercomputing10.5555/3288339.328838074:10(5263-5292)Online publication date: 1-Oct-2018
  • (2017)A new method for the informed discovery of resources in the grid system using particle swarm optimization algorithm (RDT_PSO)The Journal of Supercomputing10.1007/s11227-017-2090-y73:12(5354-5377)Online publication date: 1-Dec-2017
  • (2015)Improving the predictability of distributed stream processorsFuture Generation Computer Systems10.1016/j.future.2015.03.02352:C(22-36)Online publication date: 1-Nov-2015
  • (2015)Context-aware scheduling in MapReduceConcurrency and Computation: Practice & Experience10.1002/cpe.357827:17(5332-5349)Online publication date: 10-Dec-2015

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media