[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Recognizing MapReduce Straggler Tasks in Big Data Infrastructures Using Artificial Neural Networks

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

MapReduce framework is used for the distribution and parallelization of large-scale data processing. This framework breaks a job into several MapReduce tasks and assigns them to different nodes. A weak performance of a node in executing a task may result in a long execution of the job which is called Straggler Task. Also, detecting the nodes with the weak capability and assigning their tasks to other nodes is called Speculative Execution. This research proposes a dynamic framework to find straggler tasks in heterogeneous environments. SEWANN framework uses a neural network algorithm in order to estimate the stage weights of task execution to estimate the execution time of the tasks, accurately. Reducing the error in estimating the remaining execution time results in increasing the efficiency of big data that is the main purpose of this research. First, the proposed method was implemented in Hadoop open-source software and both estimated and actual weights were calculated. SEWANN outperformed SVR, Decision Trees, ESAMR and LATE as baseline methods 99%, 81%, 85%, and 99%, respectively. Second, SEWANN improved task execution time compared to the baseline method ESAMR by 15%, and LATE by 24%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen, Q., et al.: SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment. In: Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on (2010)

    Google Scholar 

  2. Shvachko, K., et al.: The Hadoop Distributed File System. In: In the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (2010)

    Google Scholar 

  3. Zaharia, M., et al.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX conference on Operating systems design and implementation, pp. 29–42. USENIX association, San Diego, California (2008)

    Google Scholar 

  4. Sun, X., He, C., Lu, Y.: ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm. In: Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on (2012)

    Google Scholar 

  5. Sun, M., et al.: Scheduling algorithm based on prefetching in MapReduce clusters. Appl. Soft Comput.

  6. Hsu, C.-H., Slagter, K.D., Chung, Y.-C.: Locality, and loading aware virtual machine mapping techniques for optimizing communications in MapReduce applications. Futur. Gener. Comput. Syst. 53, 43–54 (2015)

    Article  Google Scholar 

  7. Golhar, J.: Understanding the impact of Speculative Execution in Hadoop, p. 36 (2016)

    Google Scholar 

  8. White, T.: OReilly.Hadoop.The.Definitive.Guide, 4th.Edition edn, p. 3 (2015)

    Google Scholar 

  9. Khezr, S.N., Navimipour, N.J.: MapReduce and its applications, challenges, and architecture: a comprehensive review and directions for future research. J. Grid Comput. 15(3), 295–321 (2017)

    Article  Google Scholar 

  10. Lu, W.: Improved K-means clustering algorithm for big data mining under Hadoop parallel framework. J. Grid Comput. (2019)

  11. Zaharia, M., et al.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on Computer systems, pp. 265–278. ACM, Paris, France (2010)

    Chapter  Google Scholar 

  12. Holden Karau, A.K., Patrick Wendell & Matei Zaharia, Learning Spark, Lightning-Fast Big Data Analysis. 2015

    Google Scholar 

  13. Danish Khan, K.M., Rahul Godha, Yuvraj Patel, Empirical Study of Stragglers in Spark SQL and Spark Streaming, 2015

    Google Scholar 

  14. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn,

  15. Fábio M. Soares, A.M.F.S.: Neural Network Programming with Java

  16. Alaei, N., Safi, F.: RePro-active: a reactive–proactive scheduling method based on simulation in cloud computing. J. Supercomput. (2017)

  17. Fadaei Tehrani, A., Safi, F.: A threshold sensitive failure prediction method using support vector machine. Multiage Grid. Syst. 13, 97–111 (2017)

    Article  Google Scholar 

  18. Haratian, P., et al.: Fuzzy Resource Management Approach in Cloud Computing. IEEE Trans. Cloud Comput. 1–1 (2017)

  19. Hemasian-Etefagh, F., Safi-Esfahani, F.: Dynamic scheduling applying new population grouping of whales meta-heuristic in cloud computing. J. Supercomput. (2019)

  20. Meshkati, J., Safi-Esfahani, F.: Energy-aware resource utilization based on particle swarm optimization and artificial bee colony algorithms in cloud computing. J. Supercomput. 75(5), 2455–2496 (2019)

    Article  Google Scholar 

  21. Khorsand, R., et al.: ATSDS: adaptive two-stage deadline-constrained workflow scheduling considering run-time circumstances in cloud computing environments. J. Supercomput. 73(6), 2430–2455 (2017)

    Article  Google Scholar 

  22. Momenzadeh Zahra, F.S.: Workflow scheduling applying adaptable and dynamic fragmentation (WSADF) based on runtime conditions in cloud computing. Futur. Gener. Comput. Syst. 90, 327–346 (2019)

    Article  Google Scholar 

  23. Motavaselalhagh, F., Safi Esfahani, F., Arabnia, H.R.: Knowledge-based adaptable scheduler for SaaS providers in cloud computing. Human-centric Comput. Inf. Sci. 5(1), 16 (2015)

    Article  Google Scholar 

  24. Safi, F., Salimian, L.: Energy-efficient placement of virtual machines in cloud data centres based on fuzzy decision making. Int. J. Grid Utility Comput. 9, 367 (2018)

    Article  Google Scholar 

  25. Torabi, S., Safi-Esfahani, F.: A hybrid algorithm based on chicken swarm and improved raven roosting optimization. Soft. Comput. 23(20), 10129–10171 (2019)

    Article  Google Scholar 

  26. Li, Y., et al.: A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop. In: Intelligent Computation in Big Data Era: International Conference of Young Computer Scientists, Engineers, and Educators, ICYCSEE 2015, Harbin, China, January 10–12, 2015, pp. 284–291 (2015)

    Chapter  Google Scholar 

  27. Liu, X., Liu, Q.: An Optimized Speculative Execution Strategy Based on Local Data Prediction in a Heterogeneous Hadoop Environment. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC) (2017)

    Google Scholar 

  28. Apache, W.E.: " [Online]. Available: http://wiki.apache.org/hadoop/WordCount. [Accessed 2014]

  29. Yang, G.: The Application of MapReduce in the Cloud Computing. In: 2011 2nd International Symposium on Intelligence Information Processing and Trusted Computing (2011)

    Google Scholar 

  30. Wang, Y., et al.: Improving MapReduce performance with partial speculative execution. J. Grid Comput. 13(4), 587–604 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faramarz Safi-Esfahani.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farhang, M., Safi-Esfahani, F. Recognizing MapReduce Straggler Tasks in Big Data Infrastructures Using Artificial Neural Networks. J Grid Computing 18, 879–901 (2020). https://doi.org/10.1007/s10723-020-09514-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-020-09514-2

Keywords

Navigation