[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2534645.2534651acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

On the core affinity and file upload performance of Hadoop

Published: 18 November 2013 Publication History

Abstract

The MapReduce programming model is introduced for big-data processing, where the data nodes perform both data storing and computation. Thus, we need to understand different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. The core affinity defines mapping between a set of cores and a given task. The core affinity can be decided based on resource requirements of a task because this largely affects the efficiency of computation, memory, and I/O resource utilization. In this paper, we analyze the impact of core affinity on the file upload performance of Hadoop Distributed File System (HDFS). Our study can provide the insight into the process scheduling issues on big-data processing systems. We also suggest a framework for dynamic core affinity based on our observations and show that a preliminary implementation can improve the throughput more than 40% compared with default Linux system.

References

[1]
bonnie++. http://www.coker.com.au/bonnie++/.
[2]
VTune Performance Analyzer. http://www.intel.com/software/products/vtune.
[3]
V. Ahuja, M. Farrens, and D. Ghosal. Cache-aware affinitization on commodity multicores for high-speed network flows. In Proc. of ANCS, Oct 2012.
[4]
D. Borthakur. The Hadoop Distributed File System: Architecture and Design. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html, 2007.
[5]
B. Chen and R. Morris. Flexible control of parallelism in a multiprocessor pc router. In Proc. of USENIX ATC, pages 333--346, June 2001.
[6]
J. Dean and J. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proc. of USENIX OSDI, Dec 2004.
[7]
A. Foong, J. Fung, and D. Newell. An in-depth analysis of the impact of processor affinity on network performance. In Proc. of ICON, pages 244--250, Nov 2004.
[8]
A. Foong, J. Fung, D. Newell, A. Lopez-Estrada, S. Abraham, and P. Irelan. Architectural characterization of processor affinity in network processing. In Proc. of ISPASS, Mar 2005.
[9]
H.-C. Jang and H.-W. Jin. Miami: Multi-core aware processor affinity for tcp/ip over multiple network interfaces. In Proc. of HotI 2009, pages 73--82, August 2009.
[10]
L. Kencl and J. Boudec. Adaptive load sharing for network processors. In Proc. of INFOCOM, pages 545--554, June 2002.
[11]
M. Lee and K. Schwan. Region scheduling: efficiently using the cache architectures via page-level affinity. In Proc. of ASPLOS 2012, pages 451--462, March 2012.
[12]
E. Lemoine, C. Pham, and L. Lefevre. Packet classification in the nic for improved smp-based internet servers. In Proc. of ICN 2004, Feb 2004.
[13]
G. Narayanaswamy, P. Balaji, and W. Feng. An analysis of 10-gigabit ethernet protocol stacks in multicore environments. In Proc. of HotI, Aug 2007.
[14]
G. Narayanaswamy, P. Balaji, and W. Feng. Impact of network sharing in multi-core architectures. In Proc. of ICCCN, Aug 2008.
[15]
M. Ott, T. Klug, J. Weidendorfer, and C. Trinitis. autopin - automated optimization of thread-to-core pinning on multicore systems. In Proc. of MULTIPROG, Jan 2008.
[16]
J. D. Salehi, J. F. Kurose, and D. Towsley. The effectiveness of affinity-based scheduling in multiprocessor network protocol processing. IEEE/ACM Transactions on Networking, 4(4): 516--530, Aug 1996.
[17]
T. Scogland, P. Balaji, W. Feng, and G. Narayanaswamy. Asymmetric interactions in symmetric multi-core systems: Analysis, enhancements and evaluation. In Proc. of SC2008, Nov 2008.
[18]
W. Shi and L. Kencl. Sequence-preserving adaptive load balancers. In Proc. of ANCS, pages 143--152, Dec 2006.
[19]
W. Shi, M. MacGregor, and P. Gburzynski. Load balancing for parallel forwarding. Transactions on Networking, 13(4): 790--801, Aug 2005.
[20]
L. Soares and M. Stumm. Flexsc: flexible system call scheduling with exception-less system calls. In Proc. of USENIX OSDI, June 2010.
[21]
P. Strazdins, R. Alexander, and D. Barr. Performance enhancement of smp clusters with multiple network interfaces using virtualization. In Proc. of ISPA 2006 Workshops, Dec 2006.
[22]
USNA. TTCP: A test of TCP and UDP performance. 1984.
[23]
B. Veal and A. Foong. Performance scalability of a multi-core web server. In Proc. of ANCS'07, Dec 2007.
[24]
P. Willmann, S. Rixner, and A. Cox. An evaluation of network stack parallelization strategies in modern operating systems. In Proc. of USENIX ATC, May 2006.

Cited By

View all
  • (2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
  • (2014)Dynamic core affinity for high-performance file upload on Hadoop Distributed File SystemParallel Computing10.1016/j.parco.2014.07.00540:10(722-737)Online publication date: 1-Dec-2014

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
November 2013
66 pages
ISBN:9781450325066
DOI:10.1145/2534645
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hadoop distributed file system
  2. affinity
  3. big-data
  4. multi-core
  5. process scheduling

Qualifiers

  • Research-article

Funding Sources

Conference

SC13

Acceptance Rates

DISCS-2013 Paper Acceptance Rate 10 of 19 submissions, 53%;
Overall Acceptance Rate 19 of 34 submissions, 56%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
  • (2014)Dynamic core affinity for high-performance file upload on Hadoop Distributed File SystemParallel Computing10.1016/j.parco.2014.07.00540:10(722-737)Online publication date: 1-Dec-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media