Abstract
Job scheduling is one of the critical issues in MapReduce processing that affects the performance of Hadoop framework. Delay scheduling introduces a small delay during job scheduling to optimize the data locality. Delay scheduler may scan a job more than once before reaching a certain deadline after which the job is scheduled. This causes extra overhead on the scheduler. Moreover a higher priority job may get delayed. We propose an algorithm in which the load is distributed among the individual nodes. Our algorithm insists the scheduler to launch a high priority job on a free node. The node then executes the job locally or schedules it to some other node based on the availability of data. Experimental results show that the proposed algorithm performs better than Hadoop and records less execution time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Turner, V., et al.: The digital universe of opportunities: rich data and the increasing value of the internet of things. In: International Data Corporation, White Paper, IDC_1672 (2014)
Philip Chen, C.L., Zhang, Chun-Yang: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Hashem, Ibrahim Abaker Targio, Yaqoob, Ibrar, Badrul Anuar, Nor, Mokhtar, Salimah, Gani, Abdullah, Ullah Khan, Samee: The rise of big data on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015)
Kambatla, Karthik, Kollias, Giorgos, Kumar, Vipin, Grama, Ananth: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)
Hashem, Targio, Ibrahim Abaker, et al.: The rise of “big data” on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015)
Dean, Jeffrey, Ghemawat, Sanjay: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed data-parallel programs from sequential building blocks. In: Conference Computer System (EuroSys), pp. 59–72 (2007)
Yang, H.C., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-Reduce-Merge: simplified relational data processing on large clusters. In: Proceeding of ACM SIGMOD International Conference Management of Data (2007)
Polato, Ivanilton, et al.: A comprehensive view of Hadoop research—A systematic literature review. J. Netw. Comput. Appl. 46, 1–25 (2014)
Apache Hadoop.: http://hadoop.apache.orgJune 2011
Zaharia, M., et al.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems. ACM (2010)
Hadoop’s Fair Scheduler.: https://hadoop.apache.org/docs/r1.2.1/fair_scheduler
Zaharia, M., et al.: Improving MapReduce performance in heterogeneous environments. In: OSDI, vol. 8(4) (2008)
Chen, Q., et al.: Samr: A self-adaptive Mapreduce scheduling algorithm in heterogeneous environment. In: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT). IEEE (2010)
Guo, Z., Fox, G., Zhou, M.: Investigation of data locality in Mapreduce. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012). IEEE Computer Society (2012)
Ibrahim, S., et al.: LEEN: Locality/fairness-aware key partitioning for Mapreduce in the cloud. In: IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), (2010)
Nguyen, P., et al.: A hybrid scheduling algorithm for data intensive workloads in a Mapreduce environment. In: Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing. IEEE Computer Society (2012)
He, C., Lu, Y., Swanson, D.: Matchmaking: a new Mapreduce scheduling technique. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom). IEEE (2011)
Abad, C.L., Lu, Y., Campbell, R.H.: DARE: Adaptive data replication for efficient cluster scheduling. In: 2011 IEEE International Conference on Cluster Computing (CLUSTER). IEEE (2011)
Ibrahim, S., et al.: Maestro: Replica-aware map scheduling for Mapreduce. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE (2012)
Ahmad, Faraz, et al.: MapReduce with communication overlap (MaRCO). J. Parallel Distrib. Comput. 73(5), 608–620 (2013)
Tang, Zhuo, et al.: A self-adaptive scheduling algorithm for reduce start time. Future Gener. Comput. Syst. 43, 51–60 (2015)
Hammoud, M., Rehman, M.S., Sakr, M.F.: Center-of-gravity reduce task scheduling to lower Mapreduce network traffic. In: Cloud Computing (CLOUD). IEEE (2012)
Hammoud, M, Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom). IEEE (2011)
Acknowledgements
The research work is supported by Department of Computer Science & Engineering, Indian School of Mines, Dhanbad, India.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Sethi, K.K., Ramesh, D. (2016). Delay Scheduling with Reduced Workload on JobTracker in Hadoop. In: Snášel, V., Abraham, A., Krömer, P., Pant, M., Muda, A. (eds) Innovations in Bio-Inspired Computing and Applications. Advances in Intelligent Systems and Computing, vol 424. Springer, Cham. https://doi.org/10.1007/978-3-319-28031-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-28031-8_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28030-1
Online ISBN: 978-3-319-28031-8
eBook Packages: EngineeringEngineering (R0)