[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3626183.3659975acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Open access

PolarStar: Expanding the Horizon of Diameter-3 Networks

Published: 17 June 2024 Publication History

Abstract

We present PolarStar, a novel family of diameter-3 network topologies derived from the star product of low-diameter factor graphs.
PolarStar gives the largest known diameter-3 network topologies for almost all radixes, thus providing the best known scalable diameter-3 network. Compared to current state-of-the-art diameter-3 networks, PolarStar achieves 1.3x geometric mean increase in scale over Bundlefly, 1.9x over Dragonfly, and 6.7x over 3-D HyperX. PolarStar has many other desirable properties, including a modular layout, large bisection, high resilience to link failures and a large number of feasible configurations for every radix.
We give a detailed evaluation with simulations of synthetic and real-world traffic patterns and show that PolarStar exhibits comparable or better performance than current diameter-3 networks.

References

[1]
Marcel Abas. 2017. Large Networks of Diameter Two Based on Cayley Graphs. In Computer Science On-line Conference. Springer, 225--233.
[2]
Jung Ho Ahn, Nathan Binkert, Al Davis, Moray McLaren, and Robert S Schreiber. 2009. HyperX: topology, routing, and packaging of efficient large-scale networks. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. 1--11.
[3]
Yoshinari Awaji, Kunimasa Saitoh, and Shoichiro Matsuo. 2013. Optical Fiber Telecommunications VIB: Chapter 13. Transmission Systems Using Multicore Fibers. Elsevier Inc. Chapters.
[4]
Jun Ho Bahn and Nader Bagherzadeh. 2008. A generic traffic model for on-chip interconnection networks. Network on Chip Architectures (2008), 22.
[5]
Eiichi Bannai and Tatsuro Ito. 1973. On finite Moore graphs. Journal of the Faculty of Science, the University of Tokyo. Sect. 1 A, Mathematics, Vol. 20 (1973), 191--208.
[6]
J.C. Bermond, C. Delorme, and G. Farhi. 1982. Large graphs with given degree and diameter III. Ann. of Discrete Math., Vol. 13 (1982), 23--32.
[7]
Maciej Besta and Torsten Hoefler. 2014. Slim Fly: A cost effective low-diameter network topology. In SC'14: proceedings of the international conference for high performance computing, networking, storage and analysis. IEEE, 348--359.
[8]
Maciej Besta and Torsten Hoefler. 2015. Accelerating irregular computations with hardware transactional memory and active messages. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. 161--172.
[9]
Mark S Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D Underwood, and Robert C Zak. 2015. Intel® omni-path architecture: Enabling scalable, high performance fabrics. In 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. IEEE, 1--9.
[10]
Dhananjay Brahme, Onkar Bhardwaj, and Vipin Chaudhary. 2013. SymSig: A low latency interconnection topology for HPC clusters. In 20th Annual International Conference on High Performance Computing. IEEE, 462--471.
[11]
W. G. Brown. 1966. On Graphs that do not Contain a Thomsen Graph. Can. Math. Bull., Vol. 9, 3 (1966), 281â??285. https://doi.org/10.4153/CMB-1966-036--2
[12]
Cristóbal Camarero, Carmen Mart'inez, Enrique Vallejo, and Ramón Beivide. 2016. Projective networks: Topologies for large parallel computer systems. IEEE Transactions on Parallel and Distributed Systems, Vol. 28, 7 (2016), 2003--2016.
[13]
Charles Q Choi. 2022. The Beating Heart of the Worldâ??s First Exascale Supercomputer. https://spectrum.ieee.org/frontier-exascale-supercomputer.
[14]
C. Dalfó. 2019. A survey on the missing Moore graph. Linear Algebra Appl. (2019).
[15]
R. M. Damerell. 1973. On Moore graphs. Proc. Camb. Phil. Soc., Vol. 74 (1973), 227--236.
[16]
Hoang-Vu Dang, Roshan Dathathri, Gurbinder Gill, Alex Brooks, Nikoli Dryden, Andrew Lenharth, Loc Hoang, Keshav Pingali, and Marc Snir. 2018. A lightweight communication runtime for distributed graph analytics. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 980--989.
[17]
Stuart Daudlin, Anthony Rizzo, Nathan C Abrams, Sunwoo Lee, Devesh Khilwani, Vaishnavi Murthy, James Robinson, Terence Collier, Alyosha Molnar, and Keren Bergman. 2021. 3D-Integrated Multichip Module Transceiver for Terabit-Scale DWDM Interconnects. In Optical Fiber Communications, OFC 2021.
[18]
Aleyah Dawkins, Kelly Isham, Ales Kubicek, Kartik Lakhotia, and Laura Monroe. 2024. Edge-Disjoint Spanning Trees on Star-Product Networks. arXiv preprint arXiv:2403.12231 (2024).
[19]
Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM, Vol. 56 (2013), 74--80. http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext
[20]
Jack Dongarra. 2020. Report on the Fujitsu Fugaku System. Technical Report ICL-UT-20-06. University of Tennessee, Knoxville.
[21]
Paul ErdH os and Alfred Rényi. 1962. On a problem in the theory of graphs. Publ. Math. Inst. Hungar. Acad. Sci., Vol. 7A (1962), 623--641.
[22]
Paul L. Erdos and Alfréd Rényi. 1963. Asymmetric graphs. Acta Mathematica Academiae Scientiarum Hungarica, Vol. 14 (1963), 295--315.
[23]
Mario Flajslik, Eric Borch, and Mike A Parker. 2018. Megafly: A topology for exascale systems. In High Performance Computing (ISC High Performance 2018). Springer, 289--310.
[24]
Denis Foley and John Danskin. 2017. Ultra-performance Pascal GPU and NVLink interconnect. IEEE Micro, Vol. 37, 2 (2017), 7--17.
[25]
Bernard Gold and Charles M Rader. 1969. Digital processing of signals. Digital processing of signals (1969).
[26]
Simon David Hammond, Karl Scott Hemmert, Michael J Levenhagen, Arun F Rodrigues, and Gwendolyn Renae Voskuilen. 2015. Ember: Reference Communication Patterns for Exascale. Technical Report. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States).
[27]
Karl Scott Hemmert. 2018. Merlin Element Library Deep Dive. Technical Report. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States).
[28]
A. J. Hoffman and R. R. Singleton. 1960. On Moore Graphs with Diameters 2 and 3. IBM Journal of Research and Development, Vol. 4, 5 (1960), 497--504. https://doi.org/10.1147/rd.45.0497
[29]
Adolfy Hoisie, Olaf Lubeck, and Harvey Wasserman. 2007. Performance analysis of wavefront algorithms on very-large scale distributed systems. In Workshop on wide area networks and high performance computing. Springer, 171--187.
[30]
Nan Jiang, Daniel U Becker, George Michelogiannakis, James Balfour, Brian Towles, David E Shaw, John Kim, and William J Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In 2013 IEEE international symposium on performance analysis of systems and software (ISPASS). IEEE.
[31]
Christoforos Kachris and Ioannis Tomkos. 2012. A survey on optical interconnects for data centers. IEEE Communications Surveys & Tutorials, Vol. 14, 4 (2012), 1021--1036.
[32]
George Karypis and Vipin Kumar. 2009. MeTis: Unstructured Graph Partitioning and Sparse Matrix Ordering System, Version 4.0. https://github.com/KarypisLab/METIS.
[33]
G. Kathareios, C. Minkenberg, B. Prisacari, G. Rodriguez, and Torsten Hoefler. 2015. Cost-Effective Diameter-Two Topologies: Analysis and Evaluation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, TX, USA). ACM.
[34]
Gordon Keeler. [n.,d.]. ERI Programs Panel - Phase II Overview. DARPA ERI Summit 2019.
[35]
John Kim, William J Dally, and Dennis Abts. 2007. Flattened butterfly: a cost-efficient topology for high-radix networks. In Proceedings of the 34th annual international symposium on Computer architecture. 126--137.
[36]
John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly Topology. In Proceedings of the 35th ISCA. IEEE Computer Society, Washington, DC, USA.
[37]
Kartik Lakhotia, Maciej Besta, Laura Monroe, Kelly Isham, Patrick Iff, Torsten Hoefler, and Fabrizio Petrini. 2022. PolarFly: A Cost-Effective and Flexible Low-Diameter Topology. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1--15.
[38]
Kartik Lakhotia, Kelly Isham, Laura Monroe, Maciej Besta, Torsten Hoefler, and Fabrizio Petrini. 2023. In-network Allreduce with Multiple Spanning Trees on PolarFly. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures. 165--176.
[39]
Kartik Lakhotia, Fabrizio Petrini, Rajgopal Kannan, and Viktor Prasanna. 2021. Accelerating Allreduce with in-network reduction on Intel PIUMA. IEEE Micro, Vol. 42, 2 (2021), 44--52.
[40]
Fei Lei, Dezun Dong, Xiangke Liao, and José Duato. 2020. Bundlefly: A low-diameter topology for multicore fiber. In Proceedings of the 34th ACM International Conference on Supercomputing.
[41]
Fei Lei, Dezun Dong, Xiangke Liao, Xing Su, and Cunlu Li. 2016. Galaxyfly: A Novel Family of Flexible-Radix Low-Diameter Topologies for Large-Scales Interconnection Networks. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS '16). Association for Computing Machinery, New York, NY, USA, Article 24, bibinfonumpages12 pages. https://doi.org/10.1145/2925426.2926275
[42]
Charles E Leiserson. 1985 a. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE transactions on Computers, Vol. 100, 10 (1985), 892--901.
[43]
Charles E. Leiserson. 1985 b. Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput., Vol. 34, 10 (Oct. 1985), 892--901.
[44]
Dongsheng Li, Xicheng Lu, and Jinshu Su. 2004. Graph-theoretic analysis of Kautz topology and DHT schemes. In IFIP International Conference on Network and Parallel Computing. Springer, 308--315.
[45]
Lightmatter. [n.,d.]. https://lightmatter.co/.
[46]
E Loz, H. Pérez-Rosés, and G.Pineda-Villavicencio. 2010. The Degree-Diameter Problem for General Graphs. http://www.combinatoricswiki.org/wiki/The_Degree_Diameter_Problem_for_General_Graphs.
[47]
Robert J. McEliece. 1987. Finite Fields for Computer Scientists and Engineers. Springer, Boston, MA.
[48]
Brendan D. McKay, Mirka Miller, and Jozef vSirán. 1998. A note on Large Graphs of Diameter Two and Given Maximum Degree. Journal of Combinatorial Theory, Series B (1998).
[49]
Hans Meuer, Erich Strohmaier, Jack Dongarra, Horst Simon, and Martin Meuer. 2023. Top 500: The List. https://top500.org/lists/top500/.
[50]
R E Paley. 1933. On Orthogonal Matrices. Journal of Mathematics and Physics, Vol. 12 (1933), 311--320.
[51]
Gregory F Pfister. 2001. An introduction to the infiniband architecture. High performance mass storage and parallel I/O, Vol. 42, 617--632 (2001), 10.
[52]
Karl E Prikopa, Wilfried N Gansterer, and Elias Wimmer. 2016. Parallel iterative refinement linear least squares solvers based on all-reduce operations. Parallel Comput., Vol. 57 (2016), 167--184.
[53]
Rolf Rabenseifner. 2004. Optimization of collective reduction operations. In International Conference on Computational Science, Vol. 3036. 1--9.
[54]
Arun F Rodrigues, K Scott Hemmert, Brian W Barrett, Chad Kersey, Ron Oldfield, Marlo Weston, Rolf Risen, Jeanine Cook, Paul Rosenfeld, Elliot Cooper-Balis, et al. 2011. The structural simulation toolkit. ACM SIGMETRICS Performance Evaluation Review, Vol. 38, 4 (2011), 37--42.
[55]
Hamid Sarbazi-Azad, Mohamed Ould-Khaoua, and Lewis M. Mackenzie. 2001. Communication delay in hypercubes in the presence of bit-reversal traffic. Parallel Comput., Vol. 27, 13 (2001), 1801--1816.
[56]
Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018).
[57]
Alexander Shpiner, Zachy Haramaty, Saar Eliad, Vladimir Zdornov, Barak Gafni, and Eitan Zahavi. 2017. Dragonfly: Low cost topology for scaling datacenters. In 2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB). IEEE, 1--8.
[58]
Ankit Singla, Chi-Yao Hong, Lucian Popa, and P. Brighten Godfrey. 2012. Jellyfish: networking data centers randomly. In Proceedings of the 9th USENIX NSDI (San Jose, CA). USENIX Association, Berkeley, CA, USA, 17--17.
[59]
Lawrence C Stewart and David Gingold. 2006. A new generation of cluster interconnect. White Paper, SiCortex Inc (2006).
[60]
Ola Torudbakken and Ashok V Krishnamoorthy. 2013. A 50Tbps optically-cabled InfiniBand datacenter switch. In Optical Fiber Communication Conference. Optica Publishing Group, OTu3H--1.
[61]
Mark Wade, Erik Anderson, Shaha Ardalan, Pavan Bhargava, Sidney Buchbinder, Michael Davenport, John Fini, Haiwei Lu, Chen Li, and Roy Meade. 2020. TeraPHY: a Chiplet Technology for Low-Power, High-Bandwidth In-Package optical I/O. IEEE Micro, Vol. 40, 2 (2020), 63--71.
[62]
Robert Wilber. 1989. Lower bounds for accessing binary search trees with rotations. SIAM journal on Computing, Vol. 18, 1 (1989), 56--67.
[63]
Stephen Young, Sinan Aksoy, Jesun Firoz, Roberto Gioiosa, Tobias Hagge, Mark Kempton, Juan Escobedo, and Mark Raugas. 2022. SpectralFly: Ramanujan Graphs as Flexible and Efficient Interconnection Networks. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1040--1050.

Cited By

View all
  • (2024)UpDown: A Novel Architecture for Unlimited Memory ParallelismProceedings of the International Symposium on Memory Systems10.1145/3695794.3695801(61-77)Online publication date: 30-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '24: Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures
June 2024
510 pages
ISBN:9798400704161
DOI:10.1145/3626183
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2024

Check for updates

Author Tags

  1. extremal graph theory
  2. high-performance networks
  3. network architecture
  4. network evaluation
  5. network topology
  6. networks

Qualifiers

  • Research-article

Funding Sources

Conference

SPAA '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)216
  • Downloads (Last 6 weeks)44
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)UpDown: A Novel Architecture for Unlimited Memory ParallelismProceedings of the International Symposium on Memory Systems10.1145/3695794.3695801(61-77)Online publication date: 30-Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media