[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2741948.2741969acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

NBA (network balancing act): a high-performance packet processing framework for heterogeneous processors

Published: 17 April 2015 Publication History

Abstract

We present the NBA framework, which extends the architecture of the Click modular router to exploit modern hardware, adapts to different hardware configurations, and reaches close to their maximum performance without manual optimization. NBA takes advantages of existing performance-excavating solutions such as batch processing, NUMA-aware memory management, and receive-side scaling with multi-queue network cards. Its abstraction resembles Click but also hides the details of architecture-specific optimization, batch processing that handles the path diversity of individual packets, CPU/GPU load balancing, and complex hardware resource mappings due to multi-core CPUs and multi-queue network cards. We have implemented four sample applications: an IPv4 and an IPv6 router, an IPsec encryption gateway, and an intrusion detection system (IDS) with Aho-Corasik and regular expression matching. The IPv4/IPv6 router performance reaches the line rate on a commodity 80 Gbps machine, and the performances of the IPsec gateway and the IDS reaches above 30 Gbps. We also show that our adaptive CPU/GPU load balancer reaches near-optimal throughput in various combinations of sample applications and traffic conditions.

Supplementary Material

MP4 File (a22-sidebyside.mp4)

References

[1]
General Purpose computation on GPUs. http://www.gpgpu.org.
[2]
NVIDIA CUDA. http://developer.nvidia.com/cuda.
[3]
Intel® DPDK (Data Plane Development Kit). https://dpdk.org.
[4]
PCRE (Perl Compatible Regular Expressions). http://pcre.org.
[5]
PF_RING ZC (Zero Copy). http://www.ntop.org/products/pf_ring/pf_ring-zc-zero-copy/.
[6]
PacketShader I/O Engine. https://github.com/PacketShader/Packet-IO-Engine.
[7]
M. Ahmed, F. Huici, and A. Jahanpanah. Enabling dynamic network processing with ClickOS. In ACM SIGCOMM. ACM, 2012.
[8]
A. V. Aho and M. J. Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18 (6): 333--340, 1975.
[9]
M. B. Anwer and N. Feamster. Building a fast, virtualized data plane with programmable hardware. In Proceedings of the 1st ACM workshop on Virtualized infrastructure systems and architectures, VISA '09. ACM, 2009.
[10]
C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23(2): 187--198, 2011.
[11]
A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. IX: A Protected Dataplane Operating System for High Throughput and Low Latency. In OSDI, pages 49--65, 2014.
[12]
G. Chanda. The Market Need for 40 Gigabit Ethernet. http://www.cisco.com/c/en/us/products/collateral/switches/catalyst-6500-series-switches/white_paper_c11-696667.pdf, 2012. A white paper from Cisco Systems.
[13]
B. Chen and R. Morris. Flexible control of parallelism in a multiprocessor PC router. In USENIX ATC, 2001.
[14]
E. Coffman and R. Graham. Optimal scheduling for two-processor systems. Acta Informatica, 1(3): 200--213, 1972.
[15]
M. Dobrescu, N. Egi, K. Argyraki, B. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy. Route-Bricks: Exploiting parallelism to scale software routers. In ACM SOSP, volume 9. Citeseer, 2009.
[16]
M. Dobrescu, K. Argyraki, G. Iannaccone, M. Manesh, and S. Ratnasamy. Controlling parallelism in a multicore software router. In PRESTO, page 2. ACM, 2010.
[17]
P. Druschel, L. L. Peterson, and B. S. Davie. Experiences with a high-speed network adaptor: A software perspective. ACM, 1994.
[18]
N. Egi, A. Greenhalgh, M. Handley, M. Hoerdt, F. Huici, L. Mathy, and P. Papadimitriou. Forward path architectures for multi-core software routers. In ACM Co-NEXT PRESTO Workshop, 2010.
[19]
M. Garey and R. Graham. Bounds for multiprocessor scheduling with resource constraints. SIAM J. Comput., 4(2): 187--200, 1975.
[20]
P. Gupta, S. Lin, and N. McKeown. Routing lookups in hardware at memory access speeds. In IEEE INFOCOM, 1998.
[21]
S. Han, K. Jang, K. Park, and S. Moon. PacketShader: a GPU-accelerated software router. In ACM SIGCOMM Computer Communication Review, pages 195--206. ACM, 2010.
[22]
S. Han, S. Marshall, B.-G. Chun, and S. Ratnasamy. MegaPipe: A New Programming Interface for Scalable Network I/O. In OSDI, pages 135--148, 2012.
[23]
T. Hu. Parallel sequencing and assembly line problems. Operations research, pages 841--848, 1961.
[24]
J. Hwang, K. Ramakrishnan, and T. Wood. NetVM: high performance and flexible networking using virtualization on commodity platforms. In USENIX NSDI, 2014.
[25]
S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, et al. B4: Experience with a globally-deployed software defined wan. In ACM SIGCOMM. ACM, 2013.
[26]
M. Jamshed, J. Lee, S. Moon, I. Yun, D. Kim, S. Lee, Y. Yi, and K. Park. Kargus: a highly-scalable software-based intrusion detection system. In ACM CCS, 2012.
[27]
K. Jang, S. Han, S. Han, S. Moon, and K. Park. SSLShader: cheap SSL acceleration with commodity processors. In USENIX NSDI, 2011.
[28]
E. Jeong, S. Woo, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park. mTCP: a highly scalable user-level TCP stack for multicore systems. USENIX NSDI, 2014.
[29]
J. Kim, S. Huh, K. Jang, K. Park, and S. Moon. The power of batching in the Click modular router. In APSYS. ACM, 2012.
[30]
S. Kim, S. Huh, Y. Hu, X. Zhang, A. Wated, E. Witchel, and M. Silberstein. GPUnet: Networking abstractions for GPU programs. In USENIX OSDI, 2014.
[31]
E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. Kaashoek. The Click modular router. ACM TOCS, 18(3): 263--297, 2000.
[32]
L. Koromilas, G. Vasiliadis, I. Manousakis, and S. Ioannidis. Efficient software packet processing on heterogeneous and asymmetric hardware architectures. In ANCS. IEEE Press, ACM/IEEE, 2014.
[33]
H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: a holistic approach to fast in-memory key-value storage. In USENIX NSDI, 2014.
[34]
J. W. Lockwood, N. McKeown, G. Watson, G. Gibb, P. Hartke, J. Naous, R. Raghuraman, and J. Luo. NetFPGA--an open platform for gigabit-rate network switching and routing. In MSE. IEEE, 2007.
[35]
G. Lu, C. Guo, Y. Li, Z. Zhou, T. Yuan, H. Wu, Y. Xiong, R. Gao, and Y. Zhang. ServerSwitch: A Programmable and High Performance Platform for Data Center Networks. In USENIX NSDI, 2011.
[36]
C.-K. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In IEEE/ACM MICRO, 2009.
[37]
I. Marinos, R. N. Watson, and M. Handley. Network stack specialization for performance. In ACM HotNets. ACM, 2013.
[38]
J. C. Mogul, P. Yalagandula, J. Tourrilhes, R. McGeer, S. Banerjee, T. Connors, and P. Sharma. Orphal: API design challenges for open router platforms on proprietary hardware. In ACM SIGCOMM HotNets Workshop, 2008.
[39]
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. J. Purcell. A Survey of General-Purpose Computation on Graphics Hardware. In Eurographics 2005, State of the Art Reports, Aug. 2005.
[40]
C. Partridge, P. Carvey, E. Burgess, I. Castinerya, T. Clarke, L. Graham, M. Hathaway, P. Herman, A. King, S. Kohalmi, T. Ma, J. Mcallen, T. Mendez, W. Milliken, R. Pettyjohn, J. Rokosz, J. Seeger, M. Sollins, S. Storch, B. Tober, G. Troxel, D. Waitzman, and S. Winterble. A 50-Gb/s IP router. IEEE/ACM Transactions on Networking, June 1998.
[41]
P. Patel, D. Bansal, L. Yuan, A. Murthy, A. Greenberg, D. A. Maltz, R. Kern, H. Kumar, M. Zikos, H. Wu, et al. Ananta: cloud scale load balancing. In ACM SIGCOMM. ACM, 2013.
[42]
A. Pesterev, J. Strauss, N. Zeldovich, and R. T. Morris. Improving network connection locality on multicore systems. In EuroSys. ACM, 2012.
[43]
S. Peter, J. Li, I. Zhang, D. R. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. In USENIX OSDI, 2014.
[44]
L. Rizzo. netmap: A Novel Framework for Fast Packet I/O. In USENIX ATC, 2012.
[45]
J. Stankovic, M. Spuri, M. Di Natale, and G. Buttazzo. Implications of classical scheduling results for real-time systems. Computer, 28(6): 16--25, 1995.
[46]
J. E. Stone, D. Gohara, and G. Shi. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering, 12(3): 66, 2010.
[47]
W. Sun and R. Ricci. Fast and flexible: parallel packet processing with GPUs and click. In ANCS. ACM/IEEE, 2013.
[48]
K. Thompson. Programming techniques: Regular expression search algorithm. Communications of the ACM, 1968.
[49]
H. Topcuoglu, S. Hariri, and M.-Y. Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, 13(3): 260--274, mar 2002. ISSN 1045-9219.
[50]
G. Vasiliadis, S. Antonatos, M. Polychronakis, E. Markatos, and S. Ioannidis. Gnort: High performance network intrusion detection using graphics processors. In RAID, 2008.
[51]
G. Vasiliadis, M. Polychronakis, and S. Ioannidis. MIDeA: A multi-parallel intrusion detection architecture. In ACM CCS. ACM, 2011. ISBN 978-1-4503-0948-6. URL http://doi.acm.org/10.1145/2046707.2046741.
[52]
G. Vasiliadis, L. Koromilas, M. Polychronakis, and S. Ioannidis. GASPP: a GPU-accelerated stateful packet processing framework. In USENIX ATC. USENIX Association, 2014.
[53]
M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high speed IP routing lookups. In ACM SIGCOMM, 1997.
[54]
D. Zhou, B. Fan, H. Lim, M. Kaminsky, and D. G. Andersen. Scalable, high performance ethernet forwarding with CUCKOOSWITCH. In ACM CoNEXT, 2013.

Cited By

View all
  • (2024)MTDA: Efficient and Fair DPU Offloading Method for Multiple TenantsIEEE Transactions on Services Computing10.1109/TSC.2024.3433588(1-14)Online publication date: 2024
  • (2023)Towards a Machine Learning-Assisted Kernel with LAKEProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575697(846-861)Online publication date: 27-Jan-2023
  • (2023)Enabling Efficient Spatio-Temporal GPU Sharing for Network Function VirtualizationIEEE Transactions on Computers10.1109/TC.2023.327854172:10(2963-2977)Online publication date: Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '15: Proceedings of the Tenth European Conference on Computer Systems
April 2015
503 pages
ISBN:9781450332385
DOI:10.1145/2741948
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • Ministry of Future Creation and Science

Conference

EuroSys '15
Sponsor:
EuroSys '15: Tenth EuroSys Conference 2015
April 21 - 24, 2015
Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)7
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)MTDA: Efficient and Fair DPU Offloading Method for Multiple TenantsIEEE Transactions on Services Computing10.1109/TSC.2024.3433588(1-14)Online publication date: 2024
  • (2023)Towards a Machine Learning-Assisted Kernel with LAKEProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575697(846-861)Online publication date: 27-Jan-2023
  • (2023)Enabling Efficient Spatio-Temporal GPU Sharing for Network Function VirtualizationIEEE Transactions on Computers10.1109/TC.2023.327854172:10(2963-2977)Online publication date: Oct-2023
  • (2022)The Diversification and Enhancement of an IDS Scheme for the Cybersecurity Needs of Modern Supply ChainsElectronics10.3390/electronics1113194411:13(1944)Online publication date: 22-Jun-2022
  • (2022)QuadrantProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563471(493-509)Online publication date: 7-Nov-2022
  • (2022)The Best of Many Worlds: Scheduling Machine Learning Inference on CPU-GPU Integrated Architectures2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW55747.2022.00017(55-64)Online publication date: May-2022
  • (2021)Acceleration of Intrusion Detection in Encrypted Network Traffic Using Heterogeneous HardwareSensors10.3390/s2104114021:4(1140)Online publication date: 6-Feb-2021
  • (2021)MetronACM Transactions on Computer Systems10.1145/346562838:1-2(1-45)Online publication date: 8-Jul-2021
  • (2020)BatchyProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388289(633-650)Online publication date: 25-Feb-2020
  • (2020)GSLICEProceedings of the 11th ACM Symposium on Cloud Computing10.1145/3419111.3421284(492-506)Online publication date: 12-Oct-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media