Abstract
MapReduce framework is used for the distribution and parallelization of large-scale data processing. This framework breaks a job into several MapReduce tasks and assigns them to different nodes. A weak performance of a node in executing a task may result in a long execution of the job which is called Straggler Task. Also, detecting the nodes with the weak capability and assigning their tasks to other nodes is called Speculative Execution. This research proposes a dynamic framework to find straggler tasks in heterogeneous environments. SEWANN framework uses a neural network algorithm in order to estimate the stage weights of task execution to estimate the execution time of the tasks, accurately. Reducing the error in estimating the remaining execution time results in increasing the efficiency of big data that is the main purpose of this research. First, the proposed method was implemented in Hadoop open-source software and both estimated and actual weights were calculated. SEWANN outperformed SVR, Decision Trees, ESAMR and LATE as baseline methods 99%, 81%, 85%, and 99%, respectively. Second, SEWANN improved task execution time compared to the baseline method ESAMR by 15%, and LATE by 24%.
Similar content being viewed by others
References
Chen, Q., et al.: SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment. In: Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on (2010)
Shvachko, K., et al.: The Hadoop Distributed File System. In: In the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (2010)
Zaharia, M., et al.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX conference on Operating systems design and implementation, pp. 29–42. USENIX association, San Diego, California (2008)
Sun, X., He, C., Lu, Y.: ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm. In: Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on (2012)
Sun, M., et al.: Scheduling algorithm based on prefetching in MapReduce clusters. Appl. Soft Comput.
Hsu, C.-H., Slagter, K.D., Chung, Y.-C.: Locality, and loading aware virtual machine mapping techniques for optimizing communications in MapReduce applications. Futur. Gener. Comput. Syst. 53, 43–54 (2015)
Golhar, J.: Understanding the impact of Speculative Execution in Hadoop, p. 36 (2016)
White, T.: OReilly.Hadoop.The.Definitive.Guide, 4th.Edition edn, p. 3 (2015)
Khezr, S.N., Navimipour, N.J.: MapReduce and its applications, challenges, and architecture: a comprehensive review and directions for future research. J. Grid Comput. 15(3), 295–321 (2017)
Lu, W.: Improved K-means clustering algorithm for big data mining under Hadoop parallel framework. J. Grid Comput. (2019)
Zaharia, M., et al.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on Computer systems, pp. 265–278. ACM, Paris, France (2010)
Holden Karau, A.K., Patrick Wendell & Matei Zaharia, Learning Spark, Lightning-Fast Big Data Analysis. 2015
Danish Khan, K.M., Rahul Godha, Yuvraj Patel, Empirical Study of Stragglers in Spark SQL and Spark Streaming, 2015
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn,
Fábio M. Soares, A.M.F.S.: Neural Network Programming with Java
Alaei, N., Safi, F.: RePro-active: a reactive–proactive scheduling method based on simulation in cloud computing. J. Supercomput. (2017)
Fadaei Tehrani, A., Safi, F.: A threshold sensitive failure prediction method using support vector machine. Multiage Grid. Syst. 13, 97–111 (2017)
Haratian, P., et al.: Fuzzy Resource Management Approach in Cloud Computing. IEEE Trans. Cloud Comput. 1–1 (2017)
Hemasian-Etefagh, F., Safi-Esfahani, F.: Dynamic scheduling applying new population grouping of whales meta-heuristic in cloud computing. J. Supercomput. (2019)
Meshkati, J., Safi-Esfahani, F.: Energy-aware resource utilization based on particle swarm optimization and artificial bee colony algorithms in cloud computing. J. Supercomput. 75(5), 2455–2496 (2019)
Khorsand, R., et al.: ATSDS: adaptive two-stage deadline-constrained workflow scheduling considering run-time circumstances in cloud computing environments. J. Supercomput. 73(6), 2430–2455 (2017)
Momenzadeh Zahra, F.S.: Workflow scheduling applying adaptable and dynamic fragmentation (WSADF) based on runtime conditions in cloud computing. Futur. Gener. Comput. Syst. 90, 327–346 (2019)
Motavaselalhagh, F., Safi Esfahani, F., Arabnia, H.R.: Knowledge-based adaptable scheduler for SaaS providers in cloud computing. Human-centric Comput. Inf. Sci. 5(1), 16 (2015)
Safi, F., Salimian, L.: Energy-efficient placement of virtual machines in cloud data centres based on fuzzy decision making. Int. J. Grid Utility Comput. 9, 367 (2018)
Torabi, S., Safi-Esfahani, F.: A hybrid algorithm based on chicken swarm and improved raven roosting optimization. Soft. Comput. 23(20), 10129–10171 (2019)
Li, Y., et al.: A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop. In: Intelligent Computation in Big Data Era: International Conference of Young Computer Scientists, Engineers, and Educators, ICYCSEE 2015, Harbin, China, January 10–12, 2015, pp. 284–291 (2015)
Liu, X., Liu, Q.: An Optimized Speculative Execution Strategy Based on Local Data Prediction in a Heterogeneous Hadoop Environment. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC) (2017)
Apache, W.E.: " [Online]. Available: http://wiki.apache.org/hadoop/WordCount. [Accessed 2014]
Yang, G.: The Application of MapReduce in the Cloud Computing. In: 2011 2nd International Symposium on Intelligence Information Processing and Trusted Computing (2011)
Wang, Y., et al.: Improving MapReduce performance with partial speculative execution. J. Grid Comput. 13(4), 587–604 (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Farhang, M., Safi-Esfahani, F. Recognizing MapReduce Straggler Tasks in Big Data Infrastructures Using Artificial Neural Networks. J Grid Computing 18, 879–901 (2020). https://doi.org/10.1007/s10723-020-09514-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-020-09514-2