[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3143361.3143382acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

Tagger: Practical PFC Deadlock Prevention in Data Center Networks

Published: 28 November 2017 Publication History

Abstract

Remote Direct Memory Access over Converged Ethernet (RoCE) deployments are vulnerable to deadlocks induced by Priority Flow Control (PFC). Prior solutions for deadlock prevention either require signi.cant changes to routing protocols, or require excessive bu.ers in the switches. In this paper, we propose Tagger, a scheme for deadlock prevention. It does not require any changes to the routing protocol, and needs only modest bu.ers. Tagger is based on the insight that given a set of expected lossless routes, a simple tagging scheme can be developed to ensure that no deadlock will occur under any failure conditions. Packets that do not travel on these lossless routes may be dropped under extreme conditions. We design such a scheme, prove that it prevents deadlock and implement it e.ciently on commodity hardware.

References

[1]
Ieee. 802.11qbb. Priority based flow control, 2011.
[2]
The Microsoft Cognitive Toolkit. https://github.com/Microsoft/CNTK/wiki, 2017.
[3]
Martín Abadi et al. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, 2016.
[4]
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A Scalable, Commodity Data Center Network Architecture. SIGCOMM '08.
[5]
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. Data Center TCP (DCTCP). SIGCOMM '10.
[6]
Ryan Beckett, Ratul Mahajan, Todd Millstein, Jitendra Padhye, and David Walker. Don'T Mind the Gap: Bridging Network-wide Objectives and Device-level Configurations. SIGCOMM '16.
[7]
Dimitri Bertsekas and Robert Gallager. Data Networks. Prentice Hall, 1992.
[8]
Mark S. Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D. Underwood, and Robert C. Zak. Intel Omni-Path Architecture Enabling Scalable, High Performance Fabrics. In Hot Interconnects, 2015.
[9]
Jacek Blazewicz, Daniel P. Bovet, Jerzy Brzezinski, Giorgio Gambosi, and Maurizio Talamo. Optimal centralized algorithms for store-and-forward deadlock avoidance. IEEE transactions on computers, 43(11):1333--1338, 1994.
[10]
Cisco. Priority Flow Control: Build Reliable Layer 2 Infrastructure. http://www.cisco.eom/c/en/us/products/collateral/switches/nexus-7000-series-switches/white_paper_c11-542809.pdf.
[11]
Charles Clos. A Study of Non-Blocking Switching Networks. Bell Labs Technical journal, 32(2):406--424, 1953.
[12]
William J. Dally and Hiromichi Aoki. Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels. IEEE Transactions on Parallel and Distributed Systems, 4, April 1993.
[13]
William J Dally and Charles L Seitz. Deadlock-free message routing in multi-processor interconnection networks. IEEE Transactions on computers, C-36, May 1987.
[14]
Jens Domke, Torsten Hoefler, and Wolfgang E. Nagel. Deadlock-Free Oblivious Routing for Arbitrary Topologies. IPDPS '11.
[15]
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. Farm: Fast remote memory. In Proceedings of the 11th USENLX Conference on Networked Systems Design and Implementation, pages 401--414, 2014.
[16]
J. Duato and T. M. Pinkston. A General Theory for Deadlock-Free Adaptive Routing Using a Mixed Set of Resources. IEEE Trans. Parallel Distrib. Syst., 2001.
[17]
Jos Duato. A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks. IEEE Transactions on Parallel and Distributed Systems, 4, December 1993.
[18]
Nathan Farrington, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat. Helios: a hybrid electrical/optical switch architecture for modular data centers. ACM SIGCOMM Computer Communication Review, 40(4):339--350, 2010.
[19]
Jose Flich, Tor Skeie, Andres Mejia, Olav Lysne, Pierre Lopez, Antonio Robles, Jose Duato, Michihiro Koibuchi, Tomas Rokicki, and Jose Carlos Sancho. A survey and evaluation of topology-agnostic deterministic routing algorithms. IEEE Transactions on Parallel and Distributed Systems, 2012.
[20]
Alan Gara, Matthias A Blumrich, Dong Chen, GL-T Chiu, Paul Coteus, Mark E Giampapa, Ruud A Haring, Philip Heidelberger, DirkHoenicke, Gerard V Kopcsay, et al. Overview of the Blue Gene/L system architecture. IBM Journal of Research and Development.
[21]
David Gelernter. A DAG-Based Algorithm for Prevention of Store-and-Forward Deadlock in Packet Networks. IEEE Trans. Compu., C-30, October 1981.
[22]
M. Gerla and L. Kleinrock. Flow Control: A Comparative Survey. IEEE Trans. Commun., COM-28, April 1980.
[23]
Monia Ghobadi, Ratul Mahajan, Amar Phanishayee, Nikhil Devanur, Janardhan Kulkarni, Gireeja Ranade, Pierre-Alexandre Blanche, Houman Rastegarfar, Madeleine Glick, and Daniel Kilper. Projector: Agile re configurable data center interconnect. In Proceedings of the 2016 conference on ACM SIGCOMM 2016 Conference, pages 216--229. ACM, 2016.
[24]
Christopher J. Glass and Lionel M. Ni. The Turn Model for Adaptive Routing. SIGARCH Comput. Archit. News, 1992.
[25]
Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. V12: A scalable and flexible data center network. In SIGCOMM, 2009.
[26]
Chuanxiong Guo, Guohan Lu, Dan Li, Haitao Wu, Xuan Zhang, Yunfeng Shi, Chen Tian, Yongguang Zhang, and Songwu Lu. BCube: A high performance, server-centric network architecture for modular data centers. In SIGCOMM, 2009.
[27]
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitendra Padhye, and Marina Lipshteyn. Rdma over commodity ethernet at scale. In SIGCOMM '16.
[28]
Daniel Halperin, Srikanth Kandula, Jitendra Padhye, Paramvir Bahl, and David Wetherall. Augmenting data center networks with multi-gigabit wireless links. In ACM SIGCOMM Computer Communication Review, volume 41, pages 38--49. ACM, 2011.
[29]
Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, and Kai Chen. Deadlocks in datacenter networks: Why do they form, and how to avoid them. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks, pages 92--98. ACM, 2016.
[30]
Infiniband Trade Association. Supplement to InfiniBand Architecture Specification Volume 1 Release 1.2.2 ANNEX A17: ROCEV2 (IP ROUTABLE ROCE)), 2014.
[31]
InfiniBandcntk. InfiniBand Trade Association, InfiniBand Architecture, Specification. http://www.infinibandta.com, 2001.
[32]
Mark Karol, S Jamaloddin Golestani, and David Lee. Prevention of deadlocks and livelocks in lossless backpressured packet networks. IEEE/ACM Transactions on Networking, 2003.
[33]
Vincent Liu, Daniel Halperin, Arvind Krishnamurthy, and Thomas Anderson. F10:A Fault-Tolerant Engineered Network. In NSDI, 2013.
[34]
P. López and J. Duato. A Very Efficient Distributed Deadlock Detection Mechanism for Wormhole Networks. HPCA '98.
[35]
Olav Lysne, Timothy Mark Pinkston, and Jose Duato. A methodology for developing deadlock-free dynamic network reconfiguration processes. part ii. IEEE Transactions on Parallel and Distributed Systems.
[36]
Juan Miguel Martinez, Pedro Lopez, Jose Duato, and Timothy Mark Pinkston. Software-based deadlock recovery technique for true fully adaptive routing in wormhole networks. In Parallel Processing, 1997., Proceedings of the 1997 International Conference on.
[37]
Christopher Mitchell, Yifeng Geng, and Jinyang Li. Using one-sided rdma reads to build a fast, cpu-efficient key-value store. In USENIX Annual Technical Conference, pages 103--114,2013.
[38]
Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. Timely: Rtt-based congestion control for the datacenter. In SIGCOMM '15.
[39]
Timothy Mark Pinkston, Ruoming Pang, and José Duato. Deadlock-free dynamic reconfiguration schemes for increased network dependability. IEEE Transactions on Parallel and Distributed Systems.
[40]
V. Puente, R. Beivide, J. A. Gregorio, J. M. Prellezo, J. Duato, and C. Izu. Adaptive Bubble Router: A Design to Improve Performance in Torus Networks. ICPP '99.
[41]
Sophie Y Qiu, Patrick Drew McDaniel, and Fabian Monrose. Toward valley-free inter-domain routing. In 2007IEEE International Conference on Communications, pages 2009--2016. IEEE, 2007.
[42]
E. Raubold and J. Haenle. A Method of Deadlock-free Resource Allocation and Flow Control in Packet Networks. In ICCC, Auguest 1976.
[43]
Thomas L Rodeheffer and Michael D Schroeder. Automatic reconfiguration in Autonet, volume 25. ACM, 1991.
[44]
Jose Carlos Sancho, Antonio Robles, and Jose Duato. An effective methodology to improve the performance of the up*/down* routing algorithm. IEEE Transactions on Parallel and Distributed Systems.
[45]
Alex Shpiner, Eitan Zahavi, Vladimir Zdornov, Tal Anker, and Matty Kadosh. Unlocking credit loop deadlocks. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks, pages 85--91. ACM, 2016.
[46]
Ankit Singla, Chi-Yao Hong, Lucian Popa, and P. Brighten Godfrey. Jellyfish: Networking data centers randomly. In NSDI, 2012.
[47]
Tor Skeie, Olav Lysne, and Ingebjorg Theiss. Layered Shortest Path (LASH) Routing in Irregular System Area Networks. In Prof. of IPDPS, 2012.
[48]
Brent Stephens and Alan L. Cox. Deadlock-Free Local Fast Failover for Arbitrary Data Center Networks. In IEEE Infocom, 2016.
[49]
Brent Stephens, Alan L Cox, Ankit Singla, John Carter, Colin Dixon, and Wesley Felter. Practical dcb for improved data center networks. In IEEE TNFOCOM 2014-IEEE Conference on Computer Communications, pages 1824--1832. IEEE, 2014.
[50]
Anjan K. V. and Timothy Mark Pinkston. An Efficient, Fully Adaptive Deadlock Recovery Scheme: DZSHA. In ISCA, 1995.
[51]
Anjan K. Venkatramani, Timothy Mark Pinkston, and José Duato. Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and Its Application to Disha Concurrent. IPPS '96.
[52]
Jie Wu. A fault-tolerant and deadlock-free routing protocol in 2d meshes based on odd-even turn model. IEEE Transactions on Computers, 52(9):1154--1169, 2003.
[53]
Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Maltz, Xiaowei Yang, Lihua Yuan, and Ming Zhang. Netpilot: Automating datacenter network failure mitigation. In SIGCOMM '12.
[54]
Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. Congestion control for large-scale rdma deployments. In SIGCOMM '15.

Cited By

View all

Index Terms

  1. Tagger: Practical PFC Deadlock Prevention in Data Center Networks

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CoNEXT '17: Proceedings of the 13th International Conference on emerging Networking EXperiments and Technologies
      November 2017
      492 pages
      ISBN:9781450354226
      DOI:10.1145/3143361
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 November 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Data Center Networks
      2. Deadlock Prevention
      3. RDMA
      4. Tag

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      CoNEXT '17
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 198 of 789 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)44
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 17 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)PB-FS: Postcard-Based Fast Start2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619894(86-94)Online publication date: 3-Jun-2024
      • (2024)HF^2T: Host-Based Flowlet Fine-Tuning for RDMA Load BalancingProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663410(9-15)Online publication date: 3-Aug-2024
      • (2024)Re-Architecting Buffer Management in Lossless EthernetIEEE/ACM Transactions on Networking10.1109/TNET.2024.343098932:6(4749-4764)Online publication date: Dec-2024
      • (2024)FlowSail: Fine-Grained and Practical Flow Control for Datacenter NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2024.340661332:5(3916-3928)Online publication date: Oct-2024
      • (2024)Load Balancing With Multi-Level Signals for Lossless Datacenter NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2024.336633632:3(2736-2748)Online publication date: Jun-2024
      • (2024)RDMA Transports in Datacenter Networks: SurveyIEEE Network10.1109/MNET.2024.339778138:6(380-387)Online publication date: Nov-2024
      • (2024)LHCC: Low-Latency and Hi-Precision Congestion Control in RDMA Datacenter Networks2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682889(1-10)Online publication date: 19-Jun-2024
      • (2024)Cepheus: Accelerating Datacenter Applications with High-Performance RoCE-Capable Multicast2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00074(908-921)Online publication date: 2-Mar-2024
      • (2024)MPTD: optimizing multi-path transport with dynamic target delay in datacentersCluster Computing10.1007/s10586-024-04470-y27:8(11455-11469)Online publication date: 27-May-2024
      • (2023)Scaling Switch-driven Flow Control with AquariusProceedings of the 7th Asia-Pacific Workshop on Networking10.1145/3600061.3600066(81-87)Online publication date: 29-Jun-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media