[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3005745.3005751acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Public Access

CLOVE: How I learned to stop worrying about the core and love the edge

Published: 09 November 2016 Publication History

Abstract

Multi-tenant datacenters predominantly use equal-cost multipath (ECMP) routing to distribute traffic over multiple network paths. However, ECMP static hashing causes unequal load-balancing and collisions, leading to low throughput and high latencies. Recently proposed alternatives for load-balancing perform better, but are impractical as they require either changing the tenant VM network stacks (e.g., MPTCP) or replacing all the network switches (e.g., CONGA).
In this paper, we argue that the end-host hypervisor provides a sweet spot for implementing a spectrum of load-balancing algorithms that are fine-grained, congestion-aware, and reactive to network dynamics at round-trip timescales. We propose CLOVE, a scalable hypervisor-based load-balancer that requires no changes to guest VMs or to physical network switches. CLOVE uses standard ECMP in the physical network, learns about equal-cost network paths using a traceroute mechanism, and learns about congestion state along these paths using standard switch features such as ECN. It then manipulates packet header fields in the hypervisor virtual switch to route traffic over less congested paths. We introduce different variants of CLOVE that differ in the way they learn about congestion in the physical network. Using extensive simulations, we show that CLOVE captures some 80% of the performance gain of best-of-breed hardware-based load-balancing algorithms without the need for expensive hardware replacement.

References

[1]
M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, "Hedera: Dynamic flow scheduling for data center networks.," NSDI, 2010.
[2]
T. Benson, A. Anand, A. Akella, and M. Zhang, "Microte: Fine grained traffic engineering for data centers," ACM CoNEXT, 2011.
[3]
C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer, "Achieving high utilization with software-driven WAN," SIGCOMM CCR, vol. 43, no. 4, pp. 15-26, 2013.
[4]
J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal, "Fastpass: A centralized zero-queue datacenter network," ACM SIGCOMM, 2014.
[5]
D. Wischik, C. Raiciu, A. Greenhalgh, and M. Handley, "Design, implementation and evaluation of congestion control for multipath TCP," NSDI, 2011.
[6]
M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, F. Matus, R. Pan, N. Yadav, G. Varghese, et al., "CONGA: Distributed congestion-aware load balancing for datacenters," ACM SIGCOMM, 2014.
[7]
N. Katta, M. Hira, C. Kim, A. Sivaraman, and J. Rexford, "Hula: Scalable load balancing using programmable data planes," SOSR, 2016.
[8]
J. Cao, R. Xia, P. Yang, C. Guo, G. Lu, L. Yuan, Y. Zheng, H. Wu, Y. Xiong, and D. Maltz, "Per-packet load-balanced, low-latency routing for clos-based data center networks," in ACM CoNEXT, 2013.
[9]
S. Kandula, D. Katabi, S. Sinha, and A. Berger, "Dynamic load balancing without packet reordering," ACM SIGCOMM Computer Communication Review, vol. 37, no. 2, pp. 51-62, 2007.
[10]
S. Sen, D. Shue, S. Ihm, and M. J. Freedman, "Scalable, optimal flow routing in datacenters via local link balancing," ACM CoNEXT, 2013.
[11]
S. Ghorbani, B. Godfrey, Y. Ganjali, and A. Firoozshahian, "Micro load balancing in data centers with drill," ACM HotNets, 2015.
[12]
E. Zahavi, I. Keslassy, and A. Kolodny, "Distributed adaptive routing convergence to non-blocking DCN routing assignments," IEEE JSAC, 2014.
[13]
W. Cui and C. Qian, "Difs: Distributed flow scheduling for adaptive routing in hierarchical data center networks," ACM/IEEE ANCS, 2014.
[14]
X. Wu and X. Yang, "Dard: Distributed adaptive routing for datacenter networks," IEEE ICDCS, 2012.
[15]
A. Kabbani, B. Vamanan, J. Hasan, and F. Duchene, "Flowbender: Flow-level adaptive routing for improved latency and throughput in datacenter networks," ACM CoNEXT, 2014.
[16]
K. He, E. Rozner, K. Agarwal, W. Felter, J. Carter, and A. Akella, "Presto: Edge-based load balancing for fast datacenter networks," ACM SIGCOMM, 2015.
[17]
S. Guenender, K. Barabash, Y. Ben-Itzhak, A. Levin, E. Raichstein, and L. Schour, "NoEncap: overlay network virtualization with no encapsulation overheads," ACM SOSR, 2015.
[18]
C. Kim, A. Sivaraman, N. Katta, A. Bas, A. Dixit, and L. J. Wobker, "In-band network telemetry via programmable dataplanes," Demo paper at SIGCOMM '15.
[19]
N. Dukkipati and N. McKeown, "Why flow-completion time is the right metric for congestion control," SIGCOMM Comput. Commun. Rev., vol. 36, pp. 59-62, Jan. 2006.
[20]
B. Augustin, X. Cuvellier, B. Orgogozo, F. Viger, T. Friedman, M. Latapy, C. Magnien, and R. Teixeira, "Avoiding traceroute anomalies with Paris traceroute," IMC, 2006.
[21]
Cisco, "ACI Fabric Fundamentals." http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/aci-fundamentals/b_ACI-Fundamentals/b_ACI_Fundamentals_BigBook_chapter_0100.html.
[22]
"A stateless transport tunneling protocol for network virtualization." See https://tools.ietf.org/html/draft-davie-stt-01, 2012.
[23]
S. Kandula, D. Katabi, B. Davie, and A. Charny, "Walking the tightrope: Responsive yet stable traffic engineering," ACM SIGCOMM, 2005.
[24]
M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, "Data center tcp (DCTCP)," SIGCOMM 2010.
[25]
T. Issariyakul and E. Hossain, Introduction to Network Simulator NS2. Springer Publishing Company, Incorporated, 1st ed., 2010.

Cited By

View all
  • (2024)A high-performance design, implementation, deployment, and evaluation of the slim fly networkProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691882(1025-1044)Online publication date: 16-Apr-2024
  • (2024)LEFT: LightwEight and FasT packet Reordering for RDMAProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663418(67-73)Online publication date: 3-Aug-2024
  • (2024)Alibaba HPN: A Data Center Network for Large Language Model TrainingProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672265(691-706)Online publication date: 4-Aug-2024
  • Show More Cited By
  1. CLOVE: How I learned to stop worrying about the core and love the edge

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HotNets '16: Proceedings of the 15th ACM Workshop on Hot Topics in Networks
    November 2016
    217 pages
    ISBN:9781450346610
    DOI:10.1145/3005745
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 November 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    HotNets-XV
    Sponsor:

    Acceptance Rates

    HotNets '16 Paper Acceptance Rate 30 of 108 submissions, 28%;
    Overall Acceptance Rate 110 of 460 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)122
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 05 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A high-performance design, implementation, deployment, and evaluation of the slim fly networkProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691882(1025-1044)Online publication date: 16-Apr-2024
    • (2024)LEFT: LightwEight and FasT packet Reordering for RDMAProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663418(67-73)Online publication date: 3-Aug-2024
    • (2024)Alibaba HPN: A Data Center Network for Large Language Model TrainingProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672265(691-706)Online publication date: 4-Aug-2024
    • (2024)BurstBalancer: Do Less, Better Balance for Large-Scale Data Center TrafficIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329545435:6(932-949)Online publication date: Jun-2024
    • (2024)Anole: Scheduling Flows for Fast Datacenter Networks With Packet Re-PrioritizationIEEE Transactions on Cloud Computing10.1109/TCC.2024.337671612:2(550-562)Online publication date: Apr-2024
    • (2023)Mistill: Distilling Distributed Network Protocols From ExamplesIEEE Transactions on Network and Service Management10.1109/TNSM.2023.326352920:4(4110-4125)Online publication date: Dec-2023
    • (2023)Robot-Network Co-optimization Using Deep Reinforcement Learning2023 IEEE 20th Consumer Communications & Networking Conference (CCNC)10.1109/CCNC51644.2023.10060010(281-286)Online publication date: 8-Jan-2023
    • (2022)BULB: Lightweight and Automated Load Balancing for Fast Datacenter NetworksProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545021(1-11)Online publication date: 29-Aug-2022
    • (2022)Meet: Rack-Level Pooling Based Load Balancing in Datacenter NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.316229733:12(3628-3639)Online publication date: 1-Dec-2022
    • (2022)NetKernel: Making Network Stack Part of the Virtualized InfrastructureIEEE/ACM Transactions on Networking10.1109/TNET.2021.312980630:3(999-1013)Online publication date: Jun-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media