[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2638404.2737594acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
Article

Impact of Data Transfer to Hadoop Job Performance: Architectural Analysis and Experiment

Published: 28 March 2014 Publication History

Abstract

Hadoop is a distributed processing platform for analyzing a large amount of data. It uses MapReduce framework for distributed parallel processing where reduce computation uses all the relevant map computation results obtained in other nodes, thus needs massive amounts of data transfer between nodes. This paper explores how such interdependence between nodes affects the job performance in Hadoop cluster and clarifies the mechanism of job performance deterioration. For this purpose, we built two kind of experimental Hadoop clusters using real machines in our laboratory and virtual machines on Amazon EC2 and tracked the progress of tasks which proceeds in parallel. As a result, we revealed that delay in one task caused by congestions in disk I/O or data transfer propagates to other tasks and deteriorates the overall job performance significantly. Furthermore, we found that speculative task execution brings adverse effects when the task delay is caused by disk I/O.

References

[1]
Amazon EC2. http://aws.amazon.com/ec2.
[2]
Apache Hadoop. http://hadoop.apache.org.
[3]
Y. Chen, A. Ganapathi, R. Griffith, and R. Katz. The Case for Evaluating MapReduce Performance Using Workload Suites. In MASCOTS, pages 390-399, 2011.
[4]
M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoika. Managing data transfers in computer clusters with orchestra. In SIGCOMM, pages 98-109, 2011.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137-150, 2004.
[6]
A. Konwinski. Improving MapReduce Performance in Heterogeneous Environments. Technical Report of EECS Department, University of California, Berkeley, 2009.
[7]
T. White. Hadoop: The Definitive Guide, 3rd Edition, O'Reilly Media / Yahoo Press, California, 2012.
[8]
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010.
[9]
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Job Scheduling for Multi-User MapReduce Clusters. Technical Report of EECS Department, University of California, Berkeley, 2009.

Cited By

View all
  • (2015)System Status Aware Hadoop Scheduling Methods for Job Performance ImprovementIEICE Transactions on Information and Systems10.1587/transinf.2014EDP7385E98.D:7(1275-1285)Online publication date: 2015

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACMSE '14: Proceedings of the 2014 ACM Southeast Conference
March 2014
265 pages
ISBN:9781450329231
DOI:10.1145/2638404
  • Conference Chair:
  • Ken Hoganson,
  • Program Chair:
  • Selena He
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data-intensive applications
  2. MapReduce
  3. Task Scheduling

Qualifiers

  • Article

Conference

ACM SE '14
ACM SE '14: ACM Southeast Regional Conference 2014
March 28 - 29, 2014
Georgia, Kennesaw

Acceptance Rates

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2015)System Status Aware Hadoop Scheduling Methods for Job Performance ImprovementIEICE Transactions on Information and Systems10.1587/transinf.2014EDP7385E98.D:7(1275-1285)Online publication date: 2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media