[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2619239.2626314acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

Traffic engineering with forward fault correction

Published: 17 August 2014 Publication History

Abstract

Faults such as link failures and high switch configuration delays can cause heavy congestion and packet loss. Because it takes time to detect and react to faults, these conditions can last long---even tens of seconds. We propose forward fault correction (FFC), a proactive approach to handling faults. FFC spreads network traffic such that freedom from congestion is guaranteed under arbitrary combinations of up to k faults. We show how FFC can be practically realized by compactly encoding the constraints that arise from this large number of possible faults and solving them efficiently using sorting networks. Experiments with data from real networks show that, with negligible loss in overall network throughput, FFC can reduce data loss by a factor of 7--130 in well-provisioned networks, and reduce the loss of high-priority traffic to almost zero in well-utilized networks.

References

[1]
S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat, "B4: Experience with a Globally-deployed Software Defined Wan," in SIGCOMM'13.
[2]
C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer, "Achieving High Utilization with Software-driven WAN," in SIGCOMM'13.
[3]
M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, "Hedera: Dynamic Flow Scheduling for Data Center Networks," in NSDI'10.
[4]
A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalag, P. Sharma, and S. Banerjee, "Devoflow: Scaling Flow Management for High-Performance Networks," in SIGCOMM'11.
[5]
D. Halperin, S. Kandula, J. Padhye, P. Bahl, and D. Wetherall, "Augmenting Data Center Networks with Multi-gigabit Wireless Links," in SIGCOMM'11.
[6]
T. Benson, A. Anand, A. Akella, and M. Zhang, "MicroTE: Fine Grained Traffic Engineering for Data Centers," in CoNext'11.
[7]
P. Gill, N. Jain, and N. Nagappan, "Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications," in SIGCOMM'11.
[8]
A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, Y. Ganjali, and C. Diot, "Characterization of Failures in an Operational IP Backbone Network," IEEE/ACM Transactions Networking, 2008.
[9]
D. Turner, K. Levchenko, A. C. Snoeren, and S. Savage, "California Fault Lines: Understanding the Causes and Impact of Network Failures," in SIGCOMM'10.
[10]
K. E. Batcher, "Sorting Networks and Their Applications," in AFIPS'68 (Spring).
[11]
B. Raghavan, K. Vishwanath, S. Ramabhadran, K. Yocum, and A. C. Snoeren, "Cloud Control with Distributed Rate Limiting," in SIGCOMM'07.
[12]
H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, "Towards Predictable Datacenter Networks," in SIGCOMM'11.
[13]
"OpenFlow 1.1." http://www.openflow.org/documents/openflow-spec-v1.1.0.pdf.
[14]
S. Even, A. Itai, and A. Shamir, "On the Complexity of Timetable and Multicommodity Flow Problems," SIAM Journal on Computing, 1976.
[15]
Y. Wang, H. Wang, A. Mahimkar, R. Alimi, Y. Zhang, L. Qiu, and Y. R. Yang, "R3: Resilient Routing Reconfiguration," in SIGCOMM'10.
[16]
H. H. Liu, X. Wu, M. Zhang, L. Yuan, R. Wattenhofer, and D. Maltz, "zUpdate: Updating Data Center Networks with Zero Loss," in SIGCOMM'13.
[17]
D. Xu, M. Chiang, and J. Rexford, "Link-state Routing with Hop-by-hop Forwarding Can Achieve Optimal Traffic Engineering," IEEE/ACM Transactions on Networking, 2011.
[18]
A. Elwalid, C. Jin, S. Low, and I. Widjaja, "MATE: MPLS Adaptive Traffic Engineering," in INFOCOM'01.
[19]
M. Casado, T. Koponen, S. Shenker, and A. Tootoonchian, "Fabric: A Retrospective on Evolving SDN," in HotSDN'12.
[20]
A. R. Sharafat, S. Das, G. Parulkar, and N. McKeown, "MPLS-TE and MPLS VPNS with Openflow," in SIGCOMM'11.
[21]
S. Kandula, D. Katabi, B. Davie, and A. Charny, "Walking the Tightrope: Responsive Yet Stable Traffic Engineering," in SIGCOMM'05.
[22]
M. Kodialam, T. V. Lakshman, and S. Sengupta, "Efficient and Robust Routing of Highly Variable Traffic," in HotNets'04.
[23]
D. Applegate, L. Breslau, and E. Cohen, "Coping with Network Failures: Routing Strategies for Optimal Demand Oblivious Restoration," in SIGMETRICS'04.
[24]
K. Kar, M. Kodialam, and T. V. Lakshman, "Routing Restorable Bandwidth Guaranteed Connections Using Maximum 2-route Flows," IEEE/ACM Transactions on Networking, 2003.
[25]
M. Kodialam, A. Member, T. V. Lakshman, and S. Member, "Dynamic Routing of Restorable Bandwidth-guaranteed Tunnels using Aggregated Network Resource Usage Information," IEEE/ACM Transactions on Networking, 2003.
[26]
M. Suchara, D. Xu, R. Doverspike, D. Johnson, and J. Rexford, "Network Architecture for Joint Failure Recovery and Traffic Engineering," in SIGMETRICS'11.
[27]
E. A. Atlas and E. A. Zinin, "Basic Specification for IP Fast Reroute: Loop-Free Alternates."
[28]
X. Jin, H. H. Liu, R. Gandhi, S. Kandula, R. Mahajan, M. Zhang, J. Rexford, and R. Wattenhofer, "Dynamic Scheduling of Network Updates," in SIGCOMM'14.
[29]
D. Applegate and E. Cohen, "Making Intra-domain Routing Robust to Changing and Uncertain Traffic Demands: Understanding Fundamental Tradeoffs," in SIGCOMM'03.
[30]
H. Wang, H. Xie, L. Qiu, Y. R. Yang, Y. Zhang, and A. Greenberg, "COPE: Traffic Engineering in Dynamic Networks," in SIGCOMM'06.

Cited By

View all
  • (2025)FRRL: A reinforcement learning approach for link failure recovery in a hybrid SDNJournal of Network and Computer Applications10.1016/j.jnca.2024.104054234(104054)Online publication date: Feb-2025
  • (2024)Solving max-min fair resource allocations quickly on large graphsProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691931(1937-1958)Online publication date: 16-Apr-2024
  • (2024)Reasoning about network traffic load property at production scaleProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691884(1063-1081)Online publication date: 16-Apr-2024
  • Show More Cited By

Index Terms

  1. Traffic engineering with forward fault correction

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMM
      August 2014
      662 pages
      ISBN:9781450328364
      DOI:10.1145/2619239
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 August 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. congestion-free
      2. fault tolerance
      3. traffic engineering

      Qualifiers

      • Research-article

      Conference

      SIGCOMM'14
      Sponsor:
      SIGCOMM'14: ACM SIGCOMM 2014 Conference
      August 17 - 22, 2014
      Illinois, Chicago, USA

      Acceptance Rates

      SIGCOMM '14 Paper Acceptance Rate 45 of 242 submissions, 19%;
      Overall Acceptance Rate 462 of 3,389 submissions, 14%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)232
      • Downloads (Last 6 weeks)31
      Reflects downloads up to 06 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)FRRL: A reinforcement learning approach for link failure recovery in a hybrid SDNJournal of Network and Computer Applications10.1016/j.jnca.2024.104054234(104054)Online publication date: Feb-2025
      • (2024)Solving max-min fair resource allocations quickly on large graphsProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691931(1937-1958)Online publication date: 16-Apr-2024
      • (2024)Reasoning about network traffic load property at production scaleProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691884(1063-1081)Online publication date: 16-Apr-2024
      • (2024)Measurement-Noise Filtering for Automatic Discovery of Flow Splitting Ratios in ISP NetworksFormal Aspects of Computing10.1145/370060236:4(1-18)Online publication date: 15-Oct-2024
      • (2024)FIGRET: Fine-Grained Robustness-Enhanced Traffic EngineeringProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672258(117-135)Online publication date: 4-Aug-2024
      • (2024)A General and Efficient Approach to Verifying Traffic Load Properties under Arbitrary k FailuresProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672246(228-243)Online publication date: 4-Aug-2024
      • (2024)MegaTE: Extending WAN Traffic Engineering to Millions of Endpoints in Virtualized CloudProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672242(103-116)Online publication date: 4-Aug-2024
      • (2024)Transferable Neural WAN TE for Changing TopologiesProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672237(86-102)Online publication date: 4-Aug-2024
      • (2024)Distributed Traffic Engineering in Hybrid Software Defined Networks: A Multi-Agent Reinforcement Learning FrameworkIEEE Transactions on Network and Service Management10.1109/TNSM.2024.345428221:6(6759-6769)Online publication date: Dec-2024
      • (2024)Improving Scalability in Traffic Engineering via Optical Topology ProgrammingIEEE Transactions on Network and Service Management10.1109/TNSM.2023.333589821:2(1581-1600)Online publication date: Apr-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media