More Web Proxy on the site http://driver.im/

research-article

Analysis of the effect of core affinity on high-throughput flows

Authors:

Nathan Hanford,

Matthew Farrens,

Brian TierneyAuthors Info & Claims

NDM '14: Proceedings of the Fourth International Workshop on Network-Aware Data Management

Pages 9 - 15

Published: 16 November 2014 Publication History

Abstract

Network throughput is scaling-up to higher data rates while end-system processors are scaling-out to multiple cores. In order to optimize high speed data transfer into multicore end-systems, techniques such as network adapter offloads and performance tuning have received a great deal of attention. Furthermore, several methods of multithreading the network receive process have been proposed. However, thus far attention has been focused on how to set the tuning parameters and which offloads to select for higher performance, and little has been done to understand why the settings do (or do not) work. In this paper we build on previous research to track down the source(s) of the end-system bottleneck for high-speed TCP flows. For the purposes of this paper, we consider protocol processing efficiency to be the amount of system resources used (such as CPU and cache) per unit of achieved throughout (in Gbps). The amount of various system resources consumed are measured using low-level system event counters. Affinitization, or core binding, is the decision about which processor cores on an end system are responsible for interrupt, network, and application processing. We conclude that affinitization has a significant impact on protocol processing efficiency, and that the performance bottleneck of the network receive process changes drastically with three distinct affinitization scenarios.

References

[1]

G. Keiser, Optical Fiber Communications. John Wiley & Sons, Inc., 2003.

[2]

C. Benvenuti, Understanding Linux Network Internals. O'Reilly Media, 2005.

Digital Library

[3]

N. Hanford, V. Ahuja, M. Balman, M. K. Farrens, D. Ghosal, E. Pouyoul, and B. Tierney, "Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows," in Proceedings of the Third International Workshop on Network-Aware Data Management, NDM '13, (New York, NY, USA), pp. 1:1--1:10, ACM, 2013.

Digital Library

[4]

N. Hanford, V. Ahuja, M. Balman, M. K. Farrens, D. Ghosal, E. Pouyoul, and B. Tierney, "Impact of the end-system and affinities on the throughput of high-speed flows." poster - Proceedings of The Tenth ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) ANCS14, 2014.

Digital Library

[5]

A. Pande and J. Zambreno, "Efficient translation of algorithmic kernels on large-scale multi-cores," in Computational Science and Engineering, 2009. CSE'09. International Conference on, vol. 2, pp. 915--920, IEEE, 2009.

Digital Library

[6]

A. Foong, J. Fung, and D. Newell, "An in-depth analysis of the impact of processor affinity on network performance," in Networks, 2004. (ICON 2004). Proceedings. 12th IEEE International Conference on, vol. 1, pp. 244--250 vol.1, Nov 2004.

[7]

M. Faulkner, A. Brampton, and S. Pink, "Evaluating the performance of network protocol processing on multi-core systems," in Advanced Information Networking and Applications, 2009. AINA '09. International Conference on, pp. 16--23, May 2009.

Digital Library

[8]

J. Mogul and K. Ramakrishnan, "Eliminating receive livelock in an interrupt-driven kernel," ACM Transactions on Computer Systems (TOCS), vol. 15, no. 3, pp. 217--252, 1997.

Digital Library

[9]

J. Salim, "When napi comes to town," in Linux 2005 Conf, 2005.

[10]

T. Marian, D. Freedman, K. Birman, and H. Weatherspoon, "Empirical characterization of uncongested optical lambda networks and 10gbe commodity endpoints," in Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on, pp. 575--584, IEEE, 2010.

[11]

T. Marian, Operating systems abstractions for software packet processing in datacenters. PhD thesis, Cornell University, 2011.

Digital Library

[12]

S. Larsen, P. Sarangam, R. Huggahalli, and S. Kulkarni, "Architectural breakdown of end-to-end latency in a tcp/ip network," International Journal of Parallel Programming, vol. 37, no. 6, pp. 556--571, 2009.

Digital Library

[13]

W. Wu, P. DeMar, and M. Crawford, "A transport-friendly nic for multicore/multiprocessor systems," Parallel and Distributed Systems, IEEE Transactions on, vol. 23, no. 4, pp. 607--615, 2012.

Digital Library

[14]

G. Liao, X. Zhu, and L. Bhuyan, "A new server i/o architecture for high speed networks," in High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, pp. 255--265, IEEE, 2011.

Digital Library

[15]

S. Networking, "Eliminating the receive processing bottleneckintroducing rss," Microsoft WinHEC (April 2004), 2004.

[16]

T. Herbert, "rps: receive packet steering, september 2010." http://lwn.net/Articles/361440/.

[17]

T. Herbert, "rfs: receive flow steering, september 2010." http://lwn.net/Articles/381955/.

[18]

V. Ahuja, M. Farrens, and D. Ghosal, "Cache-aware affinitization on commodity multicores for high-speed network flows," in Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems, pp. 39--48, ACM, 2012.

Digital Library

[19]

A. Foong, J. Fung, D. Newell, S. Abraham, P. Irelan, and A. Lopez-Estrada, "Architectural characterization of processor affinity in network processing," in Performance Analysis of Systems and Software, 2005. ISPASS 2005. IEEE International Symposium on, pp. 207--218, IEEE, 2005.

Digital Library

[20]

G. Narayanaswamy, P. Balaji, and W. Feng, "Impact of network sharing in multi-core architectures," in Computer Communications and Networks, 2008. ICCCN'08. Proceedings of 17th International Conference on, pp. 1--6, IEEE, 2008.

[21]

B. Weller and S. Simon, "Closed loop method and apparatus for throttling the transmit rate of an ethernet media access controller," Aug. 26 2008. US Patent 7,417,949.

[22]

M. Mathis, "Raising the internet mtu," http://www.psc.edu/mathis/MTU, 2009.

[23]

W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster, "The globus striped gridftp framework and server," in Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p. 54, IEEE Computer Society, 2005.

Digital Library

[24]

S. Han, S. Marshall, B.-G. Chun, and S. Ratnasamy, "Megapipe: A new programming interface for scalable network i/o.," in OSDI, pp. 135--148, 2012.

Digital Library

[25]

M. Balman and T. Kosar, "Data scheduling for large scale distributed applications," in Proceedings of the 9th International Conference on Enterprise Information Systems Doctoral Symposium (DCEIS 2007), DCEIS 2007, 2007.

[26]

M. Balman, Data Placement in Distributed Systems: Failure Awareness and Dynamic Adaptation in Data Scheduling. VDM Verlag, 2009.

Digital Library

[27]

M. Balman and T. Kosar, "Dynamic adaptation of parallelism level in data transfer scheduling," in Complex, Intelligent and Software Intensive Systems, 2009. CISIS '09. International Conference on, pp. 872--877, March 2009.

[28]

M. Balman, E. Pouyoul, Y. Yao, E. W. Bethel, B. Loring, M. Prabhat, J. Shalf, A. Sim, and B. L. Tierney, "Experiences with 100gbps network applications," in Proceedings of the Fifth International Workshop on Data-Intensive Distributed Computing, DIDC '12, (New York, NY, USA), pp. 33--42, ACM, 2012.

Digital Library

[29]

M. Balman, "Memznet: Memory-mapped zero-copy network channel for moving large datasets over 100gbps network," in Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC '12, IEEE Computer Society, 2012.

Digital Library

[30]

E. He, J. Leigh, O. Yu, and T. Defanti, "Reliable blast udp: predictable high performance bulk data transfer," in Cluster Computing, 2002. Proceedings. 2002 IEEE International Conference on, pp. 317--324, 2002.

Digital Library

[31]

Y. Gu and R. L. Grossman, "Udt: Udp-based data transfer for high-speed wide area networks," Computer Networks, vol. 51, no. 7, pp. 1777--1799, 2007. Protocols for Fast, Long-Distance Networks.

Digital Library

[32]

R. Recio, P. Culley, D. Garcia, J. Hilland, and B. Metzler, "An rdma protocol specification," tech. rep., IETF Internet-draft draft-ietf-rddp-rdmap-03. txt (work in progress), 2005.

[33]

I. T. Association et al., InfiniBand Architecture Specification: Release 1.0. InfiniBand Trade Association, 2000.

[34]

ESnet, "Linux tuning, http://fasterdata.es.net/host-tuning/linux."

[35]

ESnet, "iperf3, http://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf-and-iperf3/."

[36]

E. Dart, L. Rotman, B. Tierney, M. Hester, and J. Zurawski, "The science dmz: A network design pattern for data-intensive science," in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '13, (New York, NY, USA), pp. 85:1--85:10, ACM, 2013.

Digital Library

[37]

"Esnet 100gbps testbed." http://www.es.net/RandD/100g-testbed.

[38]

J. Levon and P. Elie, "Oprofile: A system profiler for linux." http://oprofile.sf.net, 2004.

Cited By

Hanford NAhuja VFarrens MTierney BGhosal D(2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
https://dl.acm.org/doi/10.1145/3184899
Rashti MSabin GKettimuthu R(2016)Long-haul secure data transfer using hardware-assisted GridFTPFuture Generation Computer Systems10.1016/j.future.2015.09.01456:C(265-276)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.future.2015.09.014

Recommendations

Dynamic core affinity for high-performance file upload on Hadoop Distributed File System

We analyze the impact of core affinity on both network and disk I/O performance.Both parallelism and locality are important for tasks that access disk and network.We suggest a novel approach to dynamically decide the core affinity of HDFS threads.Our ...
Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation
ISPASS '11: Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software

The state-of-the-art general-purpose graphic processing units (GPGPUs) can offer very high computational throughput for general-purpose, highly-parallel applications using hundreds of available on-chip cores. Meanwhile, as technology is scaled down ...
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

NDM '14: Proceedings of the Fourth International Workshop on Network-Aware Data Management

November 2014

37 pages

ISBN:9781479970193

General Chairs:
Mehmet Balman
VMWare Inc. & Lawrence Berkeley National Laboratory
,
Surendra Byna
Lawrence Berkeley National Laboratory
,
Brian L. Tierney
Energy Sciences Network & Lawrence Berkeley National Laboratory

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
IEEE: IEEE Computer Society Technical Committee on Design Automation
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Press

Publication History

Published: 16 November 2014

Check for updates

Qualifiers

Research-article

Conference

SC '14

Sponsor:

SC '14: International Conference for High Performance Computing, Networking, Storage and Analysis

November 16 - 21, 2014

Louisiana, New Orleans

Acceptance Rates

Overall Acceptance Rate 14 of 23 submissions, 61%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
93
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hanford NAhuja VFarrens MTierney BGhosal D(2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
https://dl.acm.org/doi/10.1145/3184899
Rashti MSabin GKettimuthu R(2016)Long-haul secure data transfer using hardware-assisted GridFTPFuture Generation Computer Systems10.1016/j.future.2015.09.01456:C(265-276)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.future.2015.09.014

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents