[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3603269.3604853acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Open access

Masking Corruption Packet Losses in Datacenter Networks with Link-local Retransmission

Published: 01 September 2023 Publication History

Abstract

Packet loss due to link corruption is a major problem in large warehouse-scale datacenters. The current state-of-the-art approach of disabling corrupting links is not adequate because, in practice, all the corrupting links cannot be disabled due to capacity constraints. In this paper, we show that, it is feasible to implement link-local retransmission at sub-RTT timescales to completely mask corruption packet losses from the transport endpoints. Our system, LinkGuardian, employs a range of techniques to (i) keep the packet buffer requirement low, (ii) recover from tail packet losses without employing timeouts, and (iii) preserve packet ordering. We implement LinkGuardian on the Intel Tofino switch and show that for a 100G link with a loss rate of 10−3, LinkGuardian can reduce the loss rate by up to 6 orders of magnitude while incurring only 8% reduction in effective link speed. By eliminating tail packet losses, LinkGuardian improves the 99.9th percentile flow completion time (FCT) for TCP and RDMA by 51x and 66x respectively. Finally, we also show that in the context of datacenter networks, simple out-of-order retransmission is often sufficient to significantly mitigate the impact of corruption packet loss for short TCP flows.

References

[1]
3GPP. 2007. TS 36.321: E-UTRA; Medium Access Protocol Specification (Release 8). (2007).
[2]
3GPP. 2020. TS 36.321: LTE; E-UTRA; Medium Access Protocol Specification (Release 16). (2020).
[3]
Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In Proceedings of SIGCOMM.
[4]
Mark Allman, Vern Paxson, and Ethan Blanton. 2009. TCP Congestion Control. RFC 5681 (2009).
[5]
Alexey Andreyev. 2014. Introducing data center fabric, the next-generation Face-book data center network. https://engineering.fb.com/2014/11/14/production-engineering/introducing-data-center-fabric-the-next-generation-facebook-data-center-network/.
[6]
Mina Tahmasbi Arashloo, Alexey Lavrov, Manya Ghobadi, Jennifer Rexford, David Walker, and David Wentzlaff. 2020. Enabling Programmable Transport Protocols in High-SpeedNICs. In Proceedings of NSDI.
[7]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of SIGMETRICS.
[8]
Hari Balakrishnan, Venkata N. Padmanabhan, Srinivasan Seshan, and Randy H. Katz. 1996. A Comparison of Mechanisms for Improving TCP Performance over Wireless Links. In Proceedings of SIGCOMM.
[9]
Hari Balakrishnan, Srinivasan Seshan, Elan Amir, and Randy H. Katz. 1995. Improving TCP/IP Performance over Wireless Networks. In Proceedings of MOBICOM.
[10]
Ethan Blanton, Mark Allman, Lili Wang, Ilpo Jarvinen, Markku Kojo, and Yoshifumi Nishida. 2012. A conservative loss recovery algorithm based on selective acknowledgment (SACK) for TCP. RFC 6675 (2012).
[11]
Guo Chen, Yuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong Larry Luo, Yongqiang Xiong, Xiaoliang Wang, et al. 2016. Fast and cautious: Leveraging multi-path diversity for transport loss recovery in data centers. In Proceedings of NSDI.
[12]
Yuchung Cheng, Neal Cardwell, Nandita Dukkipati, and Priyaranjan Jha. 2021. The RACK-TLP Loss Detection Algorithm for TCP. RFC 8985 (2021).
[13]
Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56, 2 (2013).
[14]
Linux Networking Documentation. 2022. DCTCP (DataCenter TCP). https://www.kernel.org/doc/html/latest/networking/dctcp.html.
[15]
Edward John Forrest Jr. 2014. How to Precision Clean All Fiber Optic Connections: A Step By Step Guide.
[16]
fs.com. 2023. EdgeCore ET7302 SR compatible 25GBASE-SR optical transceiver. https://www.fs.com/sg/products/84279.html?attribute=739&id=393015.
[17]
fs.com. 2023. Optical transceiver 10GBASE-SR SFP. https://www.fs.com/sg/products/11589.html.
[18]
fs.com. 2023. Optical transceiver 50GBASE-SR SFP56. https://www.fs.com/sg/products/146526.html.
[19]
Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, et al. 2021. When Cloud Storage Meets RDMA. In Proceedings of NSDI.
[20]
Hans Giesen, Lei Shi, John Sonchack, Anirudh Chelluri, Nishanth Prabhu, Nik Sultana, Latha Kant, Anthony J McAuley, Alexander Poylisher, André DeHon, et al. 2018. In-network computing to the rescue of faulty links. In Proceedings of the NetCompute Workshop.
[21]
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over commodity ethernet at scale. In Proceedings of SIGCOMM.
[22]
Torsten Hoefler, Duncan Roweth, Keith Underwood, Bob Alverson, Mark Griswold, Vahid Tabatabaee, Mohan Kalkunte, Surendra Anubolu, Siyan Shen, Abdul Kabbani, Moray McLaren, and Steve Scott. 2023. Datacenter Ethernet and RDMA: Issues at Hyperscale. arXiv preprint arXiv:2302.03337 (2023).
[23]
IEEE. 2009. 802.11n-2009 Standard. https://standards.ieee.org/ieee/802.11n/3952/.
[24]
IEEE. 2013. 802.11ac-2013 Standard. https://ieeexplore.ieee.org/document/6687187.
[25]
IEEE. 2015. IEEE Standard for Ethernet - Amendment 3: Physical Layer Specifications and Management Parameters for 40 Gb/s and 100 Gb/s Operation over Fiber Optic Cables. IEEE Std 802.3bm-2015 (Amendment to IEEE Std 802.3-2012 as amended by IEEE Std 802.3bk-2013 and IEEE Std 802.3bj-2014) (2015).
[26]
IEEE. 2016. IEEE Standard for Ethernet - Amendment 2: Media Access Control Parameters, Physical Layers, and Management Parameters for 25 Gb/s Operation Amendment 2: Media Access Control Parameters, Physical Layers, and Management Parameters for 25 Gb/s Operation. IEEE Std 802.3by-2016 (Amendment to IEEE Std 802.3-2015 as amended by IEEE Std 802.3bw-2015) (2016).
[27]
IEEE. 2017. IEEE Standard for Ethernet - Amendment 10: Media Access Control Parameters, Physical Layers, and Management Parameters for 200 Gb/s and 400 Gb/s Operation. IEEE Std 802.3bs-2017 (Amendment to IEEE 802.3-2015 as amended by IEEE's 802.3bw-2015, 802.3by-2016, 802.3bq-2016, 802.3bp-2016, 802.3br-2016, 802.3bn-2016, 802.3bz-2016, 802.3bu-2016, 802.3bv-2017, and IEEE 802.3-2015/Cor1-2017) (2017).
[28]
IEEE. 2019. IEEE Standard for Ethernet - Amendment 3: Media Access Control Parameters for 50 Gb/s and Physical Layers and Management Parameters for 50 Gb/s, 100 Gb/s, and 200 Gb/s Operation. IEEE Std 802.3cd-2018 (Amendment to IEEE Std 802.3-2018 as amended by IEEE Std 802.3cb-2018 and IEEE Std 802.3bt-2018) (2019).
[29]
IEEE. 2020. IEEE Standard for Ethernet - Amendment 7: Physical Layer and Management Parameters for 400 Gb/s over Multimode Fiber. IEEE Std 802.3cm-2020 (Amendment to IEEE Std 802.3-2018 as amended by IEEE Std 802.3cb-2018, IEEE Std 802.3bt-2018, IEEE Std 802.3cd-2018, IEEE Std 802.3cn-2019, IEEE Std 802.3cg-2019, and IEEE Std 802.3cq-2020) (2020).
[30]
Raj Joshi, Qi Guo, Nishant Budhdev, Ayush Mishra, Mun Choon Chan, and Ben Leong. 2022. LinkGuardian: Mitigating the impact of packet corruption loss with link-local retransmission. In Proceedings of APNet.
[31]
Raj Joshi, Ben Leong, and Mun Choon Chan. 2019. Timertasks: Towards time-driven execution in programmable dataplanes. In Proceedings of SIGCOMM (Posters and Demos).
[32]
Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan M. G. Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, Michael Ryan, David Wetherall, and Amin Vahdat. 2020. Swift: Delay Is Simple and Effective for Congestion Control in the Datacenter. In Proceedings of SIGCOMM.
[33]
Jeongkeun Lee. 2020. Advanced Congestion & Flow Control with Programmable Switches. In P4 Expert Roundtable Series. https://opennetworking.org/wp-content/uploads/2020/04/JK-Lee-Slide-Deck.pdf
[34]
Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang, Zheng Cao, Ming Zhang, Frank Kelly, Mohammad Alizadeh, et al. 2019. HPCC: High precision congestion control. In Proceedings of SIGCOMM.
[35]
Hwijoon Lim, Wei Bai, Yibo Zhu, Youngmok Jung, and Dongsu Han. 2021. Towards timeout-less transport in commodity datacenter networks. In Proceedings EuroSys.
[36]
Justin Meza, Tianyin Xu, Kaushik Veeraraghavan, and Onur Mutlu. 2018. A Large Scale Study of Data Center Network Reliability. In Proceedings of IMC.
[37]
Rui Miao, Lingjun Zhu, Shu Ma, Kun Qian, Shujun Zhuang, Bo Li, Shuguang Cheng, Jiaqi Gao, Yan Zhuang, Pengcheng Zhang, et al. 2022. From luna to solar: the evolutions of the compute-to-storage networks in Alibaba cloud. In Proceedings of SIGCOMM.
[38]
Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. 2015. TIMELY: RTT-based Congestion Control for the Datacenter. In Proceedings of SIGCOMM.
[39]
Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind Krishnamurthy, Sylvia Ratnasamy, and Scott Shenker. 2018. Revisiting Network Support for RDMA. In Proceedings of SIGCOMM.
[40]
EdgeCore Networks. 2022. DCS802. https://www.edge-core.com/productsInfo.php?cls=1&cls2=5&cls3=181&id=334.
[41]
NVIDIA. 2020. Unbreakable Links - MLNX-OS v3.9.0300 - NVIDIA Networking Docs. https://docs.nvidia.com/networking/display/MLNXOSv390300/Unbreakable+Links.
[42]
NVIDIA. 2022. RDMA Transport Modes. https://docs.nvidia.com/networking/display/RDMAAwareProgrammingv17/Transport+Modes.
[43]
NVIDIA. 2022. RoCE Selective Repeat. https://docs.nvidia.com/networking/m/view-rendered-page.action?abstractPageId=25137694.
[44]
Christina Parsa and JJ Garcia-Luna-Aceves. 1999. TULIP: A Link-Level Protocol for Improving TCP over Wireless Links. In Proceedings of WCNC.
[45]
Ting Qu, Raj Joshi, Mun Choon Chan, Ben Leong, Deke Guo, and Zhong Liu. 2019. SQR: In-network packet loss recovery from link failures for highly reliable datacenter networks. In Proceedings of ICNP.
[46]
Mubashir Adnan Qureshi, Yuchung Cheng, Qianwen Yin, Qiaobin Fu, Gautam Kumar, Masoud Moshref, Junhua Yan, Van Jacobson, David Wetherall, and Abdul Kabbani. 2022. PLB: Congestion Signals Are Simple and Effective for Network Load Balancing. In Proceedings of SIGCOMM.
[47]
Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C Snoeren. 2015. Inside the social network's datacenter network. In Proceedings of SIGCOMM.
[48]
Matt Sargent, Jerry Chu, Vern Paxson, and Mark Allman. 2011. Computing TCP's Retransmission Timer. RFC 6298 (2011).
[49]
Omer S. Sella, Andrew W. Moore, and Noa Zilberman. 2018. FEC Killed The Cut-Through Switch. In Proceedings of NEAT.
[50]
Rajath Shashidhara, Tim Stamler, Antoine Kaufmann, and Simon Peter. 2022. FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism. In Proceedings of NSDI.
[51]
Rachee Singh, Manya Ghobadi, Klaus-Tycho Foerster, Mark Filer, and Phillipa Gill. 2018. RADWAN: Rate Adaptive Wide Area Network. In Proceedings of SIGCOMM.
[52]
R Sivaram. 2008. Some Measured Google Flow Sizes. Google internal memo, available on request (2008).
[53]
Ashish Vulimiri, Oliver Michel, P Brighten Godfrey, and Scott Shenker. 2012. More is less: Reducing latency via redundancy. In Proceedings of HotNets.
[54]
Shuai Wang, Kaihui Gao, Kun Qian, Dan Li, Rui Miao, Bo Li, Yu Zhou, Ennan Zhai, Chen Sun, Jiaqi Gao, Dai Zhang, Binzhang Fu, Frank Kelly, Dennis Cai, Hongqiang Harry Liu, and Ming Zhang. 2022. Predictable vFabric on Informative Data Plane. In Proceedings of SIGCOMM.
[55]
Jim Warner. 2022. Packet Buffers. https://people.ucsc.edu/~warner/buffer.html.
[56]
Xin Wu, Daniel Turner, Chao-Chih Chen, David A Maltz, Xiaowei Yang, Lihua Yuan, and Ming Zhang. 2012. NetPilot: Automating datacenter network failure mitigation. In Proceedings of SIGCOMM.
[57]
Gaoxiong Zeng, Li Chen, Bairen Yi, and Kai Chen. 2022. Cutting Tail Latency in Commodity Datacenters with Cloudburst. In Proceedings of INFOCOM.
[58]
Qiao Zhang, Vincent Liu, and Hongyi Zeng. 2017. High-Resolution Measurement of Data Center Microbursts. In Proceedings of IMC.
[59]
Yu Zhou, Chen Sun, Hongqiang Harry Liu, Rui Miao, Shi Bai, Bo Li, Zhilong Zheng, Lingjun Zhu, Zhen Shen, Yongqing Xi, et al. 2020. Flow event telemetry on programmable data plane. In Proceedings of SIGCOMM.
[60]
Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. Congestion control for large-scale RDMA deployments. In Proceedings of SIGCOMM.
[61]
Danyang Zhuo, Monia Ghobadi, Ratul Mahajan, Klaus-Tycho Förster, Arvind Krishnamurthy, and Thomas Anderson. 2017. Understanding and mitigating packet corruption in data center networks. In Proceedings of SIGCOMM.
[62]
Danyang Zhuo, Monia Ghobadi, Ratul Mahajan, Amar Phanishayee, Xuan Kelvin Zou, Hang Guan, Arvind Krishnamurthy, and Thomas Anderson. 2017. RAIL: A Case for Redundant Arrays of Inexpensive Links in Data Center Networks. In Proceedings of NSDI.

Cited By

View all
  • (2024)Congestion Control Mechanism Based on Backpressure Feedback in Data Center NetworksFuture Internet10.3390/fi1604013116:4(131)Online publication date: 15-Apr-2024
  • (2024)Toward Enhanced Reliability: An Efficient Method for Link-Local Retransmission in a Programmable Data PlaneElectronics10.3390/electronics1401013114:1(131)Online publication date: 31-Dec-2024
  • (2024)INT-MC: Low-Overhead In-Band Network-Wide Telemetry Based on Matrix CompletionProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004338:3(1-30)Online publication date: 10-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference
September 2023
1217 pages
ISBN:9798400702365
DOI:10.1145/3603269
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2023

Check for updates

Badges

Author Tags

  1. packet corruption
  2. link failures
  3. optical links
  4. link-local retransmission
  5. programmable switches
  6. in-network packet loss recovery

Qualifiers

  • Research-article

Funding Sources

Conference

ACM SIGCOMM '23
Sponsor:
ACM SIGCOMM '23: ACM SIGCOMM 2023 Conference
September 10, 2023
NY, New York, USA

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,087
  • Downloads (Last 6 weeks)95
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Congestion Control Mechanism Based on Backpressure Feedback in Data Center NetworksFuture Internet10.3390/fi1604013116:4(131)Online publication date: 15-Apr-2024
  • (2024)Toward Enhanced Reliability: An Efficient Method for Link-Local Retransmission in a Programmable Data PlaneElectronics10.3390/electronics1401013114:1(131)Online publication date: 31-Dec-2024
  • (2024)INT-MC: Low-Overhead In-Band Network-Wide Telemetry Based on Matrix CompletionProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004338:3(1-30)Online publication date: 10-Dec-2024
  • (2024)SatGuard: Concealing Endless and Bursty Packet Losses in LEO Satellite Networks for Delay-Sensitive Web ApplicationsProceedings of the ACM Web Conference 202410.1145/3589334.3645639(3053-3063)Online publication date: 13-May-2024
  • (2024)Real-Time In-Band Network Link Loss Detection With Programmable Data Plane2024 16th International Conference on Knowledge and Smart Technology (KST)10.1109/KST61284.2024.10499673(167-172)Online publication date: 28-Feb-2024
  • (2024)In-Band Locating High Delay Variance Links with Programmable Data Plane2024 Tenth International Conference on Communications and Electronics (ICCE)10.1109/ICCE62051.2024.10634729(66-71)Online publication date: 31-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media