[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2807591.2807600acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Network endpoint congestion control for fine-grained communication

Published: 15 November 2015 Publication History

Abstract

Endpoint congestion in HPC networks creates tree saturation that is detrimental to performance. Endpoint congestion can be alleviated by reducing the injection rate of traffic sources, but requires fast reaction time to avoid congestion buildup. Congestion control becomes more challenging as application communication shift from traditional two-sided model to potentially fine-grained, one-sided communication embodied by various global address space programming models. Existing hardware solutions, such as Explicit Congestion Notification (ECN) and Speculative Reservation Protocol (SRP), either react too slowly or incur too much overhead for small messages.
In this study we present two new endpoint congestion-control protocols, Small-Message SRP (SMSRP) and Last-Hop Reservation Protocol (LHRP), both targeted specifically for small messages. Experiments show they can quickly respond to endpoint congestion and prevent tree saturation in the network. Under congestion-free traffic conditions, the new protocols generate minimal overhead with performance comparable to networks with no endpoint congestion control.

References

[1]
Infiniband trade association, infiniband architecture specification, volume 1, release 1.2.1, http://www.infinibandta.com.
[2]
M. Alizadeh, B. Atikoglu, A. Kabbani, A. Lakshmikantha, R. Pan, B. Prabhakar, and M. Seaman. Data center transport mechanisms: Congestion control theory and ieee standardization. In Communication, Control, and Computing, 2008 46th Annual Allerton Conference on, 2008.
[3]
M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center tcp (dctcp). ACM SIGCOMM computer communication review, 41(4).
[4]
R. Alverson, D. Roweth, and L. Kaplan. The gemini system interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects, HOTI '10.
[5]
A. Bhatele, K. Mohror, S. H. Langer, and K. E. Isaacs. There goes the neighborhood: Performance degradation due to nearby jobs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '13, 2013.
[6]
B. L. Chamberlain, D. Callahan, and H. P. Zima. Parallel programmability and the chapel language. International Journal of High Performance Computing Applications, 21(3).
[7]
B. Chapman, T. Curtis, S. Pophale, S. Poole, J. Kuehn, C. Koelbel, and L. Smith. Introducing openshmem: Shmem for the pgas community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model.
[8]
D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. Satterfield, B. Steinmacher-Burow, and J. Parker. The ibm blue gene/q interconnection fabric. IEEE Micro, 32(1), 2012.
[9]
S.-T. Chuang, A. Goel, N. McKeown, and B. Prabhakar. Matching output queueing with a combined input/output-queued switch. Selected Areas in Communications, IEEE Journal on, 17(6).
[10]
U. Consortium et al. Upc language specifications v1. 2. Lawrence Berkeley National Laboratory, 2005.
[11]
W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003.
[12]
W. J. Dally. Virtual-Channel Flow Control. IEEE Transactions on Parallel and Distributed Systems, 3(2), 1992.
[13]
J. Dinan, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. An implementation and evaluation of the mpi 3.0 one-sided communication interface. Concurrency and Computation: Practice and Experience, 2013.
[14]
J. Duato, I. Johnson, J. Flich, F. Naven, P. Garcia, and T. Nachiondo. A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks. In High-Performance Computer Architecture. 11th International Symposium on, 2005.
[15]
J. Escudero-Sahuquillo, P. García, F. Quiles, J. Flich, and J. Duato. Fbicm: Efficient congestion management for high-performance networks using distributed deterministic routing. In Proceedings of the 15th International Conference on High Performance Computing, HiPC'08, 2008.
[16]
J. Escudero-Sahuquillo, E. G. Gran, P. J. Garcia, J. Flich, T. Skeie, O. Lysne, F. J. Quiles, and J. Duato. Combining congested-flow isolation and injection throttling in hpc interconnection networks. In Proceedings of the 2011 International Conference on Parallel Processing, ICPP '11, 2011.
[17]
G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese, B. Alverson, T. Johnson, J. Kopnick, M. Higgins, and J. Reinhard. Cray cascade: a scalable hpc system based on a dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, 2012.
[18]
J.-L. Ferrer, E. Baydal, A. Robles, P. Lopez, and J. Duato. Congestion management in mins through marked and validated packets. In Proceedings of the 15th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, 2007.
[19]
J.-L. Ferrer, E. Baydal, A. Robles, P. Lopez, and J. Duato. A scalable and early congestion management mechanism for mins. In Parallel, Distributed and Network-Based Processing, 18th Euromicro International Conference on, 2010.
[20]
M. Garcia, E. Vallejo, R. Beivide, M. Odriozola, and M. Valero. Efficient routing mechanisms for dragonfly networks. In Parallel Processing (ICPP), 2013 42nd International Conference on, Oct 2013.
[21]
E. Gran, M. Eimot, S.-A. Reinemo, T. Skeie, O. Lysne, L. Huse, and G. Shainer. First experiences with congestion control in infiniband hardware. In Parallel Distributed Processing, 2010 IEEE International Symposium on.
[22]
N. Jiang, D. Becker, G. Michelogiannakis, J. Balfour, B. Towles, D. Shaw, J. Kim, and W. Dally. A detailed and flexible cycle-accurate network-on-chip simulator. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on, April 2013.
[23]
N. Jiang, D. U. Becker, G. Michelogiannakis, and W. J. Dally. Network congestion avoidance through speculative reservation. In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture, HPCA '12, 2012.
[24]
N. Jiang, J. Kim, and W. J. Dally. Indirect adaptive routing on large scale interconnection networks. SIGARCH Comput. Archit. News, 37(3), June 2009.
[25]
J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-driven, highly-scalable dragonfly network. Beijing, China, 2008.
[26]
M. Luo, D. K. Panda, K. Z. Ibrahim, and C. Iancu. Congestion avoidance on manycore high performance computing systems. In Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, 2012.
[27]
G. Michelogiannakis, N. Jiang, D. Becker, and W. J. Dally. Channel reservation protocol for over-subscribed channels and destinations. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '13, 2013.
[28]
L. Oden and H. Froning. Ggas: Global gpu address spaces for efficient communication in heterogeneous clusters. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on, Sept 2013.
[29]
G. Pfister, M. Gusat, W. Denzel, D. Craddock, N. Ni, W. Rooney, T. Engbersen, R. Luijten, R. Krishnamurthy, and J. Duato. Solving hot spot contention using infiniband architecture congestion control. In High Performance Interconnects for Distributed Computing, 2005.
[30]
G. Pfister and V. A. Norton. Hot spot contention and combining in multistage interconnection network. IEEE Trans. on Computers, C-34, October 1985.
[31]
S. Potluri. Toc-centric communication: A case study with nvshmem, 10 2014.
[32]
B. Prabhakar and N. McKeown. On the speedup required for combined input-and output-queued switching. Automatica, 35(12).
[33]
P. Sack and W. Gropp. Faster topology-aware collective algorithms through non-minimal communication. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, 2012.
[34]
S. Scott, D. Abts, J. Kim, and W. J. Dally. The blackwidow high-radix clos network. In Proceedings of the 33rd annual international symposium on Computer Architecture, 2006.
[35]
Y. Zheng, A. Kamil, M. Driscoll, H. Shan, and K. Yelick. Upc++: A pgas extension for c++. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, May 2014.

Cited By

View all
  • (2024)COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol CodesignACM Transactions on Architecture and Code Optimization10.1145/366052521:3(1-26)Online publication date: 22-Apr-2024
  • (2023)All-to-All Broadcast Algorithm in Galaxyfly NetworksMathematics10.3390/math1111245911:11(2459)Online publication date: 26-May-2023
  • (2023)FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070949(516-529)Online publication date: Feb-2023
  • Show More Cited By

Index Terms

  1. Network endpoint congestion control for fine-grained communication

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
      November 2015
      985 pages
      ISBN:9781450337236
      DOI:10.1145/2807591
      • General Chair:
      • Jackie Kern,
      • Program Chair:
      • Jeffrey S. Vetter
      © 2015 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 November 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Conference

      SC15
      Sponsor:

      Acceptance Rates

      SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;
      Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)42
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol CodesignACM Transactions on Architecture and Code Optimization10.1145/366052521:3(1-26)Online publication date: 22-Apr-2024
      • (2023)All-to-All Broadcast Algorithm in Galaxyfly NetworksMathematics10.3390/math1111245911:11(2459)Online publication date: 26-May-2023
      • (2023)FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070949(516-529)Online publication date: Feb-2023
      • (2022)DC4: Reconstructing Data-Credit-Coupled Congestion Control for Data CentersProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545023(1-11)Online publication date: 29-Aug-2022
      • (2022)MUA-Router: Maximizing the Utility-of-Allocation for On-chip Pipelining RoutersACM Transactions on Architecture and Code Optimization10.1145/351902719:3(1-23)Online publication date: 4-May-2022
      • (2022)Fast Convergence to Fairness for Reduced Long Flow Tail Latency in Datacenter Networks2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00102(1007-1017)Online publication date: May-2022
      • (2022)FastCredit: Expediting credit-based congestion control in datacentersComputer Networks10.1016/j.comnet.2022.109126214(109126)Online publication date: Sep-2022
      • (2022)Revisiting network congestion avoidance through adaptive packet-chaining reservationComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2022.109008212:COnline publication date: 20-Jul-2022
      • (2021)Receiver-Driven Congestion Control for InfiniBandProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472466(1-10)Online publication date: 9-Aug-2021
      • (2021)Delay sensitivity-driven congestion mitigation for HPC systemsProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460362(342-353)Online publication date: 3-Jun-2021
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media