[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2396556.2396564acmconferencesArticle/Chapter ViewAbstractPublication PagesancsConference Proceedingsconference-collections
research-article

Cache-aware affinitization on commodity multicores for high-speed network flows

Published: 29 October 2012 Publication History

Abstract

For a given TCP or UDP flow, protocol processing of incoming packets is performed on the core that receives the interrupt, while the user-space application which consumes the data may run on the same or a different core. If the cores are not the same, additional costs due to context switches, cache misses, and the movement of data between the caches of the cores may occur. The magnitude of this cost depends upon the processor affinity of the user-space process relative to the network stack. In this paper we present a prototype implementation of a tool which enables the application processing and protocol processing to occur on cores which share the lowest cache level. The Cache-Aware Affinity Deamon (CAAD) analyzes the topology of the die and the NIC characteristics and conveys information to the sender which allows the entire end-to-end path for each new flow to be be managed and controlled. This is done in a light-weight manner for both uni and bi-directional flows. Measurements show that for bulk data transfers using commodity multicore machines, the use of CAAD improves the overall TCP throughput by as much as 31%, and reduces the cache miss rate as much as 37.5%. GridFTP combined with CAAD improves the download time for big file transfers by up to 18%.

References

[1]
irqbalance. http://www.irqbalance.org/.
[2]
Rss verification. http://www.intel.com/content/www/us/en/ ethernet-controllers/82598--10-gbe-controller-datasheet.html.
[3]
Microsoft corporation. scalable networking with rss, 2005.
[4]
W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster. The globus striped gridftp framework and server. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 54. IEEE Computer Society, 2005.
[5]
A. Foong, J. Fung, and D. Newell. An in-depth analysis of the impact of processor affinity on network performance. In Networks, 2004.(ICON 2004). Proceedings. 12th IEEE International Conference on, volume 1, pages 244--250. IEEE, 2004.
[6]
A. Foong, J. Fung, D. Newell, S. Abraham, P. Irelan, and A. Lopez-Estrada. Architectural characterization of processor affinity in network processing. In Performance Analysis of Systems and Software, 2005. ISPASS 2005. IEEE International Symposium on, pages 207--218. IEEE, 2005.
[7]
S.H. Fuller and L.I. Millett. Computing performance: Game over or next level? Computer, 44(1):31--38, 2011.
[8]
T. Herbert. rfs: receive flow steering, september 2010. http://lwn.net/Articles/381955/.
[9]
T. Herbert. rps: receive packet steering, september 2010. http://lwn.net/Articles/361440/.
[10]
R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network i/o. In ACM SIGARCH Computer Architecture News, volume 33, pages 50--59. IEEE Computer Society, 2005.
[11]
H.C. Jang and H.W. Jin. Miami: Multi-core aware processor affinity for tcp/ip over multiple network interfaces. In High Performance Interconnects, 2009. HOTI 2009. 17th IEEE Symposium on, pages 73--82. IEEE, 2009.
[12]
R. Jones et al. Netperf: a network performance benchmark. Information Networks Division, Hewlett-Packard Company, 1996.
[13]
A. Kumar, R. Huggahalli, and S. Makineni. Characterization of direct cache access on multi-core systems and 10gbe. In High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, pages 341--352. Ieee, 2009.
[14]
J. Levon and P. Elie. Oprofile: A system profiler for linux. http://oprofile.sf.net, 2004.
[15]
T. Marian, D.A. Freedman, K. Birman, and H. Weatherspoon. Empirical characterization of uncongested optical lambda networks and 10gbe commodity endpoints. In Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on, pages 575--584. IEEE, 2010.
[16]
T.S. Marian. Operating systems abstractions for software packet processing in datacenters. PhD thesis, Cornell University, 2011.
[17]
G. Narayanaswamy, P. Balaji, and W. Feng. Impact of network sharing in multi-core architectures. In Computer Communications and Networks, 2008. ICCCN'08. Proceedings of 17th International Conference on, pages 1--6. IEEE, 2008.
[18]
A Pande and J Zambreno. Efficient translation of algorithmic kernels on large-scale multi-cores. In Intl. Work. Reconfigurable and Multicore Embedded Systems (WoRMES), IEEE Intl. Conf. Computational Science and Engineering, pages 915--920. IEEE Computer Society, 2009.
[19]
A. Pesterev, J. Strauss, N. Zeldovich, and R.T. Morris. Improving network connection locality on multicore systems. In Proceedings of the EuroSys 2012 Conference, EuroSys 2012. EuroSys, 2012.
[20]
T. Scogland, P. Balaji, W. Feng, and G. Narayanaswamy. Asymmetric interactions in symmetric multi-core systems: analysis, enhancements and evaluation. In High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for, pages 1--12. IEEE, 2008.
[21]
Leah Shalev, Julian Satran, Eran Borovik, and Muli Ben-Yehuda. Isostack: highly efficient network processing on dedicated cores. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC'10, pages 5--5, Berkeley, CA, USA, 2010. USENIX Association.
[22]
W.R. Stevens. TCP/IP Illustrated: the protocols, volume 1. Addison-Wesley Professional, 1994.
[23]
D. Ghosal V. Ahuja and M. Farrens. Minimizing the data transfer time using multicore end-system aware flow bifurcation. In CCGrid, 2012.12th IEEEACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 2012.
[24]
W. Wu, P. DeMar, and M. Crawford. A transport-friendly nic for multicore/multiprocessor systems. Parallel and Distributed Systems, IEEE Transactions on, (99):1--1, 2011.

Cited By

View all
  • (2021)NUMA-aware I/O System Call Steering2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00077(805-806)Online publication date: Sep-2021
  • (2019)Advising Big Data Transfer Over Dedicated Connections Based on Profiling OptimizationIEEE/ACM Transactions on Networking10.1109/TNET.2019.294388427:6(2280-2293)Online publication date: Dec-2019
  • (2018)Stochastic Approximation-Based Transport Profiling for Big Data Movement Over Dedicated ConnectionsStochastic Methods for Estimation and Problem Solving in Engineering10.4018/978-1-5225-5045-7.ch005(113-138)Online publication date: 2018
  • Show More Cited By

Index Terms

  1. Cache-aware affinitization on commodity multicores for high-speed network flows

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ANCS '12: Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
        October 2012
        270 pages
        ISBN:9781450316859
        DOI:10.1145/2396556
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 29 October 2012

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. cache affinity
        2. high-speed networks
        3. processor affinity
        4. receive livelock

        Qualifiers

        • Research-article

        Conference

        ANCS '12

        Acceptance Rates

        Overall Acceptance Rate 88 of 314 submissions, 28%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)5
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 14 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)NUMA-aware I/O System Call Steering2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00077(805-806)Online publication date: Sep-2021
        • (2019)Advising Big Data Transfer Over Dedicated Connections Based on Profiling OptimizationIEEE/ACM Transactions on Networking10.1109/TNET.2019.294388427:6(2280-2293)Online publication date: Dec-2019
        • (2018)Stochastic Approximation-Based Transport Profiling for Big Data Movement Over Dedicated ConnectionsStochastic Methods for Estimation and Problem Solving in Engineering10.4018/978-1-5225-5045-7.ch005(113-138)Online publication date: 2018
        • (2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
        • (2017)Protocol-Aware Packet Scheduling Algorithm for Multi-Protocol Processing in Multi-Core MPL ArchitectureIEICE Transactions on Information and Systems10.1587/transinf.2017PAP0016E100.D:12(2837-2846)Online publication date: 2017
        • (2017)Data Transfer Advisor with Transport Profiling Optimization2017 IEEE 42nd Conference on Local Computer Networks (LCN)10.1109/LCN.2017.23(269-277)Online publication date: Oct-2017
        • (2016)A Technique for Improving Lifetime of Non-Volatile Caches Using Write-MinimizationJournal of Low Power Electronics and Applications10.3390/jlpea60100016:1(1)Online publication date: 18-Jan-2016
        • (2016)Event-Driven Approach for Flow-to-Core Mapping by NICs in Multicore SystemsIEEE Communications Letters10.1109/LCOMM.2016.253876320:5(882-885)Online publication date: May-2016
        • (2016)Profiling Optimization for Big Data Transfer over Dedicated Channels2016 25th International Conference on Computer Communication and Networks (ICCCN)10.1109/ICCCN.2016.7568562(1-9)Online publication date: Aug-2016
        • (2016)Improving network performance on multicore systemsFuture Generation Computer Systems10.1016/j.future.2015.09.01256:C(277-283)Online publication date: 1-Mar-2016
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media