[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3627703.3629591acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Transparent Multicore Scaling of Single-Threaded Network Functions

Published: 22 April 2024 Publication History

Abstract

This paper presents NFOS, a programming model, runtime, and profiler for productively developing software network functions (NFs) that scale on multicore machines. Writing shared-state concurrent systems that are both correct and scalable is still a serious challenge, which is why NFOS insulates developers from writing concurrent code.
In the NFOS programming model, developers write their NF as a sequential program, concerning themselves with the NF logic instead of parallelism and shared-state synchronization. The NFOS abstractions are both familiar to the NF programmer and convey to the NFOS runtime crucial information that enables it to correctly execute the NF's packet processing in parallel on multiple cores. Paired with NFOS's domain-specific concurrent data structures, this parallelism scales the NF transparently, obviating the need for developers to write concurrent code. We show that serial, stateful NFs run atop NFOS achieve scalability on par with their concurrent, hand-optimized counterparts in Cisco VPP [8].
Some scalability bottlenecks are inherent to the NF's semantics, and thus cannot be resolved while preserving those semantics. NFOS identifies the root causes of such bottlenecks and provides scalability recipes that guide developers in relaxing the NF's semantics to eliminate these bottlenecks. We present examples where such NFOS-guided relaxation of NF semantics further improves scalability by 2x to 91x.

References

[1]
The CAIDA UCSD Anonymized Internet Traces - 2016. https://www.caida.org/catalog/datasets/passive_dataset. [Last accessed on 2023-10-29].
[2]
DPDK Release 20.11. https://doc.dpdk.org/guides-20.11/rel_notes/release_20_11.html. [Last accessed on 2023-10-29].
[3]
Fix of VPP NAT Race Condition on Address Mappings. https://gerrit.fd.io/r/c/vpp/+/31174. [Last accessed on 2023-10-29].
[4]
HTTP Caching. https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching. [Last accessed on 2023-10-29].
[5]
Juniper Networks vSRX Virtual Firewall Datasheet. https://www.juniper.net/us/en/products/security/srx-series/vsrx-virtual-firewall-datasheet.html. [Last accessed on 2023-10-29].
[6]
netElastic Systems Carrier Grade NAT (CGNAT). https://netelastic.com/products/carrier-grade-nat-cgnat/. [Last accessed on 2023-10-29].
[7]
NFF-Go. https://github.com/aregm/nff-go. [Last accessed on 2023-10-29].
[8]
Vector Packet Processiong (VPP). https://github.com/FDio/vpp/tree/v21.01. [Last accessed on 2023-10-29].
[9]
The Year of 100GbE in Data Center Networks. https://www.datacenterknowledge.com/networks/year-100gbe-data-center-networks. [Last accessed on 2023-10-29].
[10]
Utpal Banerjee, Rudolf Eigenmann, Alexandra Nicolau, and David A. Padua. Automatic Program Parallelization. Proceedings of the IEEE, 81(2), 1993.
[11]
Tom Barbette, Georgios P Katsikas, Gerald Q Maguire Jr, and Dejan Kostić. RSS++: Load and State-Aware Receive Side Scaling. In Intl. Conf. on Emerging Networking Experiments and Technologies (CoNEXT), 2019.
[12]
Theophilus Benson, Aditya Akella, and David A. Maltz. Network Traffic Characteristics of Data Centers in the Wild. In ACM Internet Measurement Conf. (IMC), 2010.
[13]
Lusheng Ji Bo Han, Vijay Gopalakrishnan and Seungjoon Lee. Network Function Virtualization: Challenges and Opportunities for Innovations. IEEE Communications Magazine, 53, 2015.
[14]
Michael D. Bond, Katherine E. Coons, and Kathryn S. McKinley. PACER: Proportional Detection of Data Races. In Intl. Conf. on Programming Language Design and Implementation (PLDI), 2010.
[15]
Kevin Borders, Jonathan Springer, and Matthew Burnside. Chimera: A Declarative Language for Streaming Network Traffic Analysis. In USENIX Security Symp., 2012.
[16]
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K Aguilera. Black-Box Concurrent Data Structures for NUMA Architectures. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017.
[17]
LAN/MAN Standards Committee. IEEE Standard for Local and Metropolitan Area Network-Bridges and Bridged Networks. IEEE Std 802.1Q-2018 (Revision of IEEE Std 802.1Q-2014), 2018.
[18]
Charlie Curtsinger and Emery D Berger. Coz: Finding Code that Counts with Causal Profiling. In ACM Symp. on Operating Systems Principles (SOSP), 2015.
[19]
Arnaldo Carvalho de Melo. The New Linux Perf Tools. http://vger.kernel.org/~acme/perf/lk2010-perf-paper.pdf. [Last accessed on 2023-10-29].
[20]
Mihai Dobrescu, Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, and Sylvia Ratnasamy. RouteBricks: Exploiting Parallelism To Scale Software Routers. In ACM Symp. on Operating Systems Principles (SOSP), 2009.
[21]
DPDK: Data Plane Development Kit. https://dpdk.org. [Last accessed on 2023-10-29].
[22]
Daniel E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Roman Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein. Maglev: A Fast and Reliable Software Network Load Balancer. In Symp. on Networked Systems Design and Implementation (NSDI), 2016.
[23]
Paul Emmerich, Sebastian Gallenmüller, Daniel Raumer, Florian Wohlfart, and Georg Carle. MoonGen: A Scriptable High-Speed Packet Generator. In ACM Internet Measurement Conf. (IMC), 2015.
[24]
Aaron Gember-Jacobson, Raajay Viswanathan, Chaithan Prakash, Robert Grandl, Junaid Khalid, Sourav Das, and Aditya Akella. OpenNF: Enabling Innovation in Network Function Control. ACM SIGCOMM Computer Communication Review, 44(4), 2014.
[25]
Cary G. Gray and David R. Cheriton. Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency. In ACM Symp. on Operating Systems Principles (SOSP), 1989.
[26]
Manish Gupta, Sayak Mukhopadhyay, and Navin Sinha. Automatic Parallelization of Recursive Procedures. Intl. Journal of Parallel Programming, 28, 2000.
[27]
Sangjin Han, Keon Jang, Aurojit Panda, Shoumik Palkar, Dongsu Han, and Sylvia Ratnasamy. SoftNIC: A Software NIC to Augment Hardware. Technical Report UCB/EECS-2015-155, 2015.
[28]
Maurice Herlihy and J. Eliot B. Moss. Transactional Memory: Architectural Support for Lock-Free Data Structures. In Intl. Symp. on Computer Architecture (ISCA), 1993.
[29]
Evolved Packet Core (EPC) for Communications Service Providers. https://networkbuilders.intel.com/docs/networkbuilders/Evolved-packet-core-EPC-for-communications-service-providers-ra.pdf. [Last accessed on 2023-10-29].
[30]
Muhammad Asim Jamshed, Jihyung Lee, Sangwoo Moon, Insu Yun, Deokjin Kim, Sungryoul Lee, Yung Yi, and KyoungSoo Park. Kargus: A Highly-Scalable Software-Based Intrusion Detection System. In ACM Conf. on Computer and Communications Security (CCS), 2012.
[31]
Muhammad Asim Jamshed, YoungGyoun Moon, Donghwi Kim, Dongsu Han, and KyoungSoo Park. mOS: A Reusable Networking Stack for Flow Monitoring Middleboxes. In Symp. on Networked Systems Design and Implementation (NSDI), 2017.
[32]
Cullen Jennings and Francois Audet. Network Address Translation (NAT) Behavioral Requirements for Unicast UDP. RFC 4787, Internet Engineering Task Force, 2007.
[33]
Murad Kablan, Blake Caldwell, Richard Han, Hani Jamjoom, and Eric Keller. Stateless Network Functions. In ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization, 2015.
[34]
Charlie Kaufman, Paul Hoffman, Yoav Nir, Pasi Eronen, and Tero Kivinen. Internet Key Exchange Protocol Version 2 (IKEv2). RFC 7296, Internet Engineering Task Force, 2014.
[35]
Junaid Khalid, Aaron Gember-Jacobson, Roney Michael, Anubhavnidhi Abhashkumar, and Aditya Akella. Paving the Way for NFV: Simplifying Middlebox Modifications Using StateAlyzr. In Symp. on Networked Systems Design and Implementation (NSDI), 2016.
[36]
Jaeho Kim, Ajit Mathew, Sanidhya Kashyap, Madhava Krishnan Ramanathan, and Changwoo Min. MV-RLU: Scaling Read-Log-Update with Multi-Versioning. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019.
[37]
Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M. Frans Kaashoek. The Click Modular Router. ACM Transactions on Computer Systems (TOCS), 18(3), 2000.
[38]
Bohuslav Krena, Zdenek Letko, Rachel Tzoref, Shmuel Ur, and Tomás Vojnar. Healing Data Races On-the-Fly. In Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging (PADTAD), 2007.
[39]
Zdenek Letko, Tomás Vojnar, and Bohuslav Krena. AtomRace: Data Race and Atomicity Violation Detector and Healer. In Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, 2008.
[40]
Guangpu Li, Dongjie Chen, Shan Lu, Madanlal Musuvathi, and Suman Nath. SherLock: Unsupervised Synchronization-Operation Inference. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2021.
[41]
Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from Mistakes - A Comprehensive Study on Real World Concurrency Bug Characteristics. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008.
[42]
Brandon Lucia, Joseph Devietti, Karin Strauss, and Luis Ceze. Atom-Aid: Detecting and Surviving Atomicity Violations. In Intl. Symp. on Computer Architecture (ISCA), 2008.
[43]
Joao Martins, Mohamed Ahmed, Costin Raiciu, Vladimir Olteanu, Michio Honda, Roberto Bifulco, and Felipe Huici. ClickOS and the Art of Network Function Virtualization. In Symp. on Networked Systems Design and Implementation (NSDI), 2014.
[44]
Paul E McKenney and John D Slingwine. Read-Copy Update: Using Execution History to Solve Concurrency Problems. In Parallel and Distributed Computing and Systems, 1998.
[45]
Moonpol. https://github.com/erkinkirdan/moonpol. [Last accessed on 2023-10-29].
[46]
Satish Narayanasamy, Zhenghao Wang, Jordan Tigani, Andrew Edwards, and Brad Calder. Automatically Classifying Benign and Harmful Data Races Using Replay Analysis. In Intl. Conf. on Programming Language Design and Implementation (PLDI), 2007.
[47]
Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, and Scott Shenker. NetBricks: Taking the V out of NFV. In Symp. on Operating Systems Design and Implementation (OSDI), 2016.
[48]
Francisco Pereira, Fernando M. V. Ramos, and Luis Pedrosa. Automatic Parallelization of Software Network Functions. In Symp. on Networked Systems Design and Implementation (NSDI), 2024.
[49]
Shriram Rajagopalan, Dan Williams, Hani Jamjoom, and Andrew Warfield. Split/Merge: System Support for Elastic Execution in Virtual Middleboxes. In Symp. on Networked Systems Design and Implementation (NSDI), 2013.
[50]
Introduction to Receive Side Scaling. https://docs.microsoft.com/en-us/windows-hardware/drivers/network/introduction-to-receive-side-scaling. [Last accessed on 2023-10-29].
[51]
Stuart E Schechter, Jaeyeon Jung, and Arthur W Berger. Fast Detection of Scanning Worm Infections. In Recent Advances in Intrusion Detection, 2004.
[52]
Tomer Shanny and Adam Morrison. Occualizer: Optimistic Concurrent Search Trees From Sequential Code. In Symp. on Operating Systems Design and Implementation (OSDI), 2022.
[53]
Nir Shavit and Dan Touitou. Software Transactional Memory. In Symp. on Principles of Distributed Computing, 1995.
[54]
Pyda Srisuresh and Kjeld B. Egevang. Traditional IP Network Address Translator. RFC 3022, Internet Engineering Task Force, 2001.
[55]
Mohammad Mejbah ul Alam, Tongping Liu, Guangming Zeng, and Abdullah Muzahid. SyncPerf: Categorizing, Detecting, and Diagnosing Synchronization Performance Bugs. In ACM EuroSys European Conf. on Computer Systems (EUROSYS), 2017.
[56]
Hans Vandierendonck, Sean Rul, and Koen De Bosschere. The Paralax Infrastructure: Automatic Parallelization with a Helping Hand. In Intl. Conf. on Parallel Architectures and Compilation Techniques, 2010.
[57]
Kaushik Veeraraghavan, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. Detecting and Surviving Data Races Using Complementary Schedules. In ACM Symp. on Operating Systems Principles (SOSP), 2011.
[58]
Haris Volos, Andres Jaan Tack, Michael M. Swift, and Shan Lu. Applying Transactional Memory to Concurrency Bugs. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012.
[59]
The Vector Packet Processing (VPP) Platform. https://wiki.fd.io/view/VPP/What_is_VPP%3f. [Last accessed on 2023-10-29].
[60]
Intel VTune Performance Analyzer. https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html. [Last accessed on 2023-10-29].
[61]
Shinae Woo, Justine Sherry, Sangjin Han, Sue Moon, Sylvia Ratnasamy, and Scott Shenker. Elastic Scaling of Stateful Network Functions. In Symp. on Networked Systems Design and Implementation (NSDI), 2018.
[62]
Zhengming Yi, Yiping Yao, and Kai Chen. A Universal Construction to Implement Concurrent Data Structure for NUMA-Muticore. In Intl. Conf. on Parallel Processing, 2021.
[63]
Tingting Yu and Michael Pradel. SyncProf: Detecting, Localizing, and Optimizing Synchronization Bottlenecks. In Intl. Symp. on Software Testing and Analysis (ISSTA), 2016.
[64]
Arseniy Zaostrovnykh, Solal Pirelli, Rishabh R. Iyer, Matteo Rizzo, Luis Pedrosa, Katerina J. Argyraki, and George Candea. Verifying Software Network Functions with No Verification Expertise. In ACM Symp. on Operating Systems Principles (SOSP), 2019.
[65]
Minjia Zhang, Jipeng Huang, Man Cao, and Michael D Bond. Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics. In Symp. on Principles and Practice of Parallel Computing (PPoPP), 2015.
[66]
Zhipeng Zhao, Hugo Sadok, Nirav Atre, James C Hoe, Vyas Sekar, and Justine Sherry. Achieving 100Gbps Intrusion Prevention on a Single Server. In Symp. on Operating Systems Design and Implementation (OSDI), 2020.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems
April 2024
1245 pages
ISBN:9798400704376
DOI:10.1145/3627703
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2024

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Concurrency
  2. Network functions
  3. Performance profiling and debugging
  4. Transparent scaling

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroSys '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 191
    Total Downloads
  • Downloads (Last 12 months)191
  • Downloads (Last 6 weeks)18
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media