[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

HAT: history-based auto-tuning MapReduce in heterogeneous environments

Published: 01 June 2013 Publication History

Abstract

In MapReduce model, a job is divided into a series of map tasks and reduce tasks . The execution time of the job is prolonged by some slow tasks seriously, especially in heterogeneous environments. To finish the slow tasks as soon as possible, current MapReduce schedulers launch a backup task on other nodes for each of the slow tasks. However, traditional MapReduce schedulers cannot detect slow tasks correctly since they cannot estimate the progress of tasks accurately (Hadoop home page http://hadoop.apache.org/, 2011; Zaharia et al. in 8th USENIX symposium on operating systems design and implementation, ACM, New York, pp. 29---42, 2008). To solve this problem, this paper proposes a History-based Auto-Tuning (HAT) MapReduce scheduler, which calculates the progress of tasks accurately and adapts to the continuously varying environment automatically. HAT tunes the weight of each phase of a map task and a reduce task according to the value of them in history tasks and uses the accurate weights of the phases to calculate the progress of current tasks. Based on the accurate-calculated progress of tasks, HAT estimates the remaining time of tasks accurately and further launches backup tasks for the tasks that have the longest remaining time. Experimental results show that HAT can significantly improve the performance of MapReduce applications up to 37% compared with Hadoop and up to 16% compared with LATE scheduler.

References

[1]
Aboulnaga A, Wang Z, Zhang ZY (2009) Packing the most onto your cloud. In: Proceeding of the first international workshop on Cloud data management. ACM, New York, pp 25-28.
[2]
Barroso LA, Dean J, Holzle U (2003) Web search for a planet: the Google cluster architecture. IEEE MICRO 23(2):22-28.
[3]
Buyya R, Yeo CS, Venugopal S, Broberg J, Brandic I (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener Comput Syst 25(6):599-616.
[4]
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2006) Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX symposium on operating systems design and implementation (OSDI 2006).
[5]
Chen R, Chen H, Zang B (2010) Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques. ACM, New York, pp 523-534.
[6]
De Kruijf M, Sankaralingam K (2010) MapReduce for the cell broadband engine architecture. IBM J Res Dev 53(5):10.
[7]
Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72-77.
[8]
Dean J, Ghemawat S (2004) Mapreduce: simplied data processing on large clusters. In: OSDI 2004: proceedings of 6th symposium on operating system design and implemention. ACMPress, New York, pp 137-150.
[9]
Elespuru P, Shakya S, Mishra S (2009) Mapreduce system over heterogeneous mobile devices. In: Software technologies for embedded and ubiquitous systems, pp 168-179.
[10]
Fang W, He B, Luo Q, Govindaraju NK (2010) Mars: accelerating MapReduce with graphics processors. IEEE Trans Parallel Distrib Syst.
[11]
Fischer MJ, Su X, Yin Y (2010) Assigning tasks for efficiency in Hadoop. In: Proceedings of the 22nd ACM symposium on parallelism in algorithms and architectures. ACM, New York, pp 30-39.
[12]
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: SOSP 2003: proceedings of the 9th ACM symposium on operating systems principles. ACM, New York, pp 29-43.
[13]
Hadoop (2011) Hadoop home page. http://hadoop.apache.org/
[14]
Jiang W, Ravi VT, Agrawal G (2010) A map-reduce system with an alternate API for multi-core environments. In: 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing. IEEE Press, New York, pp 84-93.
[15]
Morton K, Balazinska M, Grossman D (2010) ParaTimer: a progress indicator for MapReduce DAGs. In: Proceedings of the 2010 international conference on management of data. ACM, New York, pp 507-518.
[16]
Polo J, Carrera D, Becerra Y, Torres J, Ayguadé E, Steinder M, Whalley I (2010) Performance management of accelerated MapReduce workloads in heterogeneous clusters. In: 39th international conference on parallel processing (ICPP2010). San Diego, CA, USA.
[17]
Rafique MM, Rose B, Butt AR, Nikolopoulos DS (2009) CellMR: a framework for supporting mapreduce on asymmetric cell-based clusters. In: IEEE international symposium on parallel & distributed processing. IPDPS 2009. IEEE Press, New York, pp 1-12.
[18]
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In: HPCA 2007: proceedings of the 2007 IEEE 13th international symposium on high performance computer architecture. IEEE Computer Society, Washington, DC, pp 13-24.
[19]
Sandholm T, Lai K (2010) Dynamic proportional share scheduling in hadoop. In: Job scheduling strategies for parallel processing. Springer, Berlin, pp 110-131.
[20]
Schatz MC (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363.
[21]
Shan Y, Wang B, Yan J, Wang Y, Xu N, Yang H (2010) FPMR: MapReduce framework on FPGA. In: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays. ACM, New York, pp 93-102.
[22]
Tian C, Zhou H, He Y, Zha L (2009) A dynamic MapReduce scheduler for heterogeneous workloads. In: Proceedings of the 2009 eighth international conference on grid and cooperative computing. IEEE Computer Society, Los Alamitos, pp 218-224.
[23]
Vaquero LM, Rodero-Merino L, Caceres J, Lindner M (2008) A break in the clouds: towards a cloud definition. Comput Commun Rev 39(1):50-55.
[24]
Varia J (2008) Cloud architectures. White paper of Amazon. jineshvaria.s3.amazonaws.com/public/ cloudarchitectures-varia.pdf
[25]
Yahoo (2011) Yahoo! hadoop tutorial. http://developer.yahoo.com/hadoop/tutorial/
[26]
Yoo RM, Romano A, Kozyrakis C (2009) Phoenix rebirth: scalable MapReduce on a large-scale shared-memory system. In: IEEE international symposium on workload characterization. IISWC 2009. IEEE Press, New York, pp 198-207.
[27]
Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user mapreduce clusters. Technical report, UCB/EECS-2009-55, University of California at Berkeley.
[28]
Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: 8th USENIX symposium on operating systems design and implementation. ACM, New York, pp 29-42.

Cited By

View all
  • (2022)Early straggler tasks detection by recurrent neural network in a heterogeneous environmentApplied Intelligence10.1007/s10489-022-03837-153:7(7369-7389)Online publication date: 22-Jul-2022
  • (2018)A case study of spark resource configuration and management for image processing applicationsProceedings of the 28th Annual International Conference on Computer Science and Software Engineering10.5555/3291291.3291295(18-29)Online publication date: 29-Oct-2018
  • (2018)rTunerProceedings of the 10th International Conference on Computer Modeling and Simulation10.1145/3177457.3191710(176-183)Online publication date: 8-Jan-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 64, Issue 3
June 2013
514 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2013

Author Tags

  1. Heterogeneous environments
  2. History-based auto-tuning
  3. MapReduce
  4. Scheduling algorithm

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Early straggler tasks detection by recurrent neural network in a heterogeneous environmentApplied Intelligence10.1007/s10489-022-03837-153:7(7369-7389)Online publication date: 22-Jul-2022
  • (2018)A case study of spark resource configuration and management for image processing applicationsProceedings of the 28th Annual International Conference on Computer Science and Software Engineering10.5555/3291291.3291295(18-29)Online publication date: 29-Oct-2018
  • (2018)rTunerProceedings of the 10th International Conference on Computer Modeling and Simulation10.1145/3177457.3191710(176-183)Online publication date: 8-Jan-2018
  • (2017)Empirical Study of Job Scheduling Algorithms in Hadoop MapReduceCybernetics and Information Technologies10.1515/cait-2017-001217:1(146-163)Online publication date: 1-Mar-2017
  • (2017)Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) StrategyWireless Personal Communications: An International Journal10.1007/s11277-017-3953-595:3(2709-2733)Online publication date: 1-Aug-2017
  • (2016)MapReduce Parallel Programming ModelInternational Journal of Parallel Programming10.1007/s10766-015-0395-044:4(832-866)Online publication date: 1-Aug-2016
  • (2015)AdoopProceedings of the 25th Annual International Conference on Computer Science and Software Engineering10.5555/2886444.2886449(26-34)Online publication date: 2-Nov-2015
  • (2015)Classification Framework of MapReduce Scheduling AlgorithmsACM Computing Surveys10.1145/269331547:3(1-38)Online publication date: 16-Apr-2015
  • (2014)Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architecturesACM Transactions on Architecture and Code Optimization10.1145/257967411:1(1-25)Online publication date: 1-Feb-2014
  • (2014)A novel real-time scheduling algorithm and performance analysis of a MapReduce-based cloudThe Journal of Supercomputing10.1007/s11227-014-1115-z69:2(739-765)Online publication date: 1-Aug-2014

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media