More Web Proxy on the site http://driver.im/

research-article

NVIDIA Grace Superchip Early Evaluation for HPC Applications

Authors:

Fabio Banchelli,

Joan Vinyals-Ylla-Catala,

Josep Pocurull,

Marta Garcia-Gasulla,

Filippo MantovaniAuthors Info & Claims

HPCAsia '24 Workshops: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops

Pages 45 - 54

https://doi.org/10.1145/3636480.3637284

Published: 11 January 2024 Publication History

Abstract

Arm-based system in HPC are a reality since more than a decade. However, when a new chip enters the market always implies challenges, not only at ISA level, but also with regards to the SoC integration, the memory subsystem, the board integration, the node interconnection, and finally the OS and all layers of the system software (compiler and libraries). Guided by the procurement of an NVIDIA Grace HPC cluster within the deployment of MareNostrum 5, and emulating the approach of a scientist who needs to migrate its scientific research to a new HPC system, we evaluated five complex scientific applications on engineering sample nodes of NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchip (CPU-only). We report intra-node and inter-node scalability and early performance results showing a speed-up between 1.3 × and 4.28 × for all codes when compared to the current generation of MareNostrum 4 powered by Intel Skylake CPUs.

References

[1]

Fabio Banchelli, Kilian Peiro, Guillem Ramirez-Gargallo, Joan Vinyals, David Vicente, Marta Garcia-Gasulla, and Filippo Mantovani. 2021. Cluster of emerging technology: evaluation of a production HPC system based on A64FX. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 741–750.

[2]

Bine Brank, Stepan Nassyr, Fatemeh Pouyan, and Dirk Pleiter. 2020. Porting Applications to Arm-based Processors. IEEE Computer Society, 559–566. https://doi.org/10.1109/CLUSTER49012.2020.00079

[3]

Bine Brank and Dirk Pleiter. 2022. Assessing the State of Autovectorization Support based on SVE. IEEE Computer Society, 556–562. https://doi.org/10.1109/CLUSTER51413.2022.00073

[4]

Marc Clascà, Marta Garcia-Gasulla, Arnau Montagud, José Carbonell Caballero, and Alfonso Valencia. 2023. Lessons Learned from a Performance Analysis and Optimization of a Multiscale Cellular Simulation. In Proceedings of the Platform for Advanced Scientific Computing Conference. ACM. https://doi.org/10.1145/3592979.3593403

Digital Library

[5]

Miguel Ponce de Leon, Arnau Montagud, Vincent Noel, Gerard Pradas, Annika Meert, Emmanuel Barillot, Laurence Calzone, and Alfonso Valencia. 2022. PhysiBoSS 2.0: a sustainable integration of stochastic Boolean and agent-based modelling frameworks. (Jan. 2022). https://doi.org/10.1101/2022.01.06.468363

[6]

Marta Garcia-Gasulla, Fabio Banchelli, Kilian Peiro, Guillem Ramirez-Gargallo, Guillaume Houzeaux, Ismaïl Ben Hassan Saïdi, Christian Tenaud, Ivan Spisso, and Filippo Mantovani. 2020. A Generic Performance Analysis Technique Applied to Different CFD Methods for HPC. International Journal of Computational Fluid Dynamics 34, 7-8 (2020), 508–528. https://doi.org/10.1080/10618562.2020.1778168 arXiv:https://doi.org/10.1080/10618562.2020.1778168

[7]

Simon David Hammond 2018. The Astra Supercomputer. Technical Report. Sandia National Lab. https://www.osti.gov/servlets/purl/1574565

[8]

Hrvoje Jasak. 2009. OpenFOAM: Open source CFD in research and industry. International Journal of Naval Architecture and Ocean Engineering 1, 2 (2009), 89–94. https://doi.org/10.2478/IJNAOE-2013-0011

[9]

Gurvan Madec, Mike Bell, Adam Blaker, Clément Bricaud, Diego Bruciaferri, Miguel Castrillo, Daley Calvert, Jérômeme Chanut, Emanuela Clementi, Andrew Coward, Italo Epicoco, Christian Éthé, Jonas Ganderton, James Harle, Katherine Hutchinson, Doroteaciro Iovino, Dan Lea, Tomas Lovato, Matt Martin, Nicolas Martin, Francesca Mele, Diana Martins, Sébastien Masson, Pierre Mathiot, Francesca Mele, Silvia Mocavero, Simon Müller, A.J. George Nurser, Stella Paronuzzi, Mathieu Peltier, Renaud Person, Clement Rousset, Stefanie Rynders, Guillaume Samson, Sibylle Téchené, Martin Vancoppenolle, and Chris Wilson. 2023. NEMO Ocean Engine Reference Manual. https://doi.org/10.5281/zenodo.8167700

[10]

Gurvan Madec, Romain Bourdallé-Badie, Pierre-Antoine Bouttier, Clément Bricaud, Diego Bruciaferri, Daley Calvert, Jérôme Chanut, Emanuela Clementi, Andrew Coward, Damiano Delrosso, 2017. NEMO ocean engine. (2017).

[11]

Filippo Mantovani, Marta Garcia-Gasulla, José Gracia, Esteban Stafford, Fabio Banchelli, Marc Josep-Fabrego, Joel Criado-Ledesma, and Mathias Nachtmann. 2020. Performance and energy consumption of HPC workloads on a cluster based on Arm ThunderX2 CPU. Future generation computer systems 112 (2020), 800–818.

[12]

Nikola Rajovic, Alejandro Rico, Filippo Mantovani, Daniel Ruiz, Josep Oriol Vilarrubi, Constantino Gomez, Luna Backes, Diego Nieto, Harald Servat, Xavier Martorell, 2016. The Mont-Blanc prototype: an alternative approach for HPC systems. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 444–455.

[13]

Nikola Rajovic, Alejandro Rico, Nikola Puzovic, Chris Adeniyi-Jones, and Alex Ramirez. 2014. Tibidabo: Making the case for an ARM-based HPC system. Future Generation Computer Systems 36 (2014), 322–334.

[14]

Mitsuhisa Sato, Yutaka Ishikawa, Hirofumi Tomita, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, Hisashi Yashiro, Masaki Aoki, Naoyuki Shida, Ikuo Miyoshi, 2020. Co-Design for A64FX Manycore Processor and” Fugaku”. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–15.

Digital Library

[15]

Nikolay A. Simakov, Robert L. Deleon, Joseph P. White, Matthew D. Jones, Thomas R. Furlani, Eva Siegmann, and Robert J. Harrison. 2023. Are we ready for broader adoption of ARM in the HPC community: Performance and Energy Efficiency Analysis of Benchmarks and Applications Executed on High-End ARM Systems. In Proceedings of the HPC Asia 2023 Workshops(HPC Asia ’23 Workshops). Association for Computing Machinery, New York, NY, USA, 78–86. https://doi.org/10.1145/3581576.3581618

Digital Library

[16]

A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton. 2022. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comp. Phys. Comm. 271 (2022), 108171. https://doi.org/10.1016/j.cpc.2021.108171

[17]

Miwako Tsuji, Misun Min, Stefan Kerkemeier, Paul Fischer, Elia Merzari, and Mitsuhisa Sato. 2022. Performance tuning of the Helmholtz matrix-vector product kernel in the computational fluid dynamics solver Nek5000/RS for the A64FX processor. In International Conference on High Performance Computing in Asia-Pacific Region Workshops(HPCAsia 2022 Workshop). Association for Computing Machinery, New York, NY, USA, 49–59. https://doi.org/10.1145/3503470.3503476

Digital Library

[18]

Sudharshan S Vazhkudai, Bronis R De Supinski, Arthur S Bland, Al Geist, James Sexton, Jim Kahle, Christopher J Zimmer, Scott Atchley, Sarp Oral, Don E Maxwell, 2018. The design, deployment, and evaluation of the CORAL pre-exascale systems. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 661–672.

Digital Library

[19]

Mariano Vázquez, Guillaume Houzeaux, Seid Koric, Antoni Artigues, Jazmin Aguado-Sierra, Ruth Arís, Daniel Mira, Hadrien Calmet, Fernando Cucchietti, Herbert Owen, 2016. Alya: Multiphysics engineering simulation toward exascale. Journal of Computational Science 14 (2016), 15–27.

Cited By

Kang YGhosh SKandemir MMárquez A(2024)Studying CPU and memory utilization of applications on Fujitsu A64FX and Nvidia Grace SuperchipProceedings of the International Symposium on Memory Systems10.1145/3695794.3695813(198-207)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3695794.3695813
Schieffer GWahlgren JRen JFaj JPeng I(2024)Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace HopperProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673110(199-209)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673110
Jesus RWeiland M(2024)Evaluating and optimising compiler code generation for NVIDIA GraceProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673104(691-700)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673104

Index Terms

NVIDIA Grace Superchip Early Evaluation for HPC Applications

Recommendations

First Impressions of the NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchip for Scientific Workloads
HPCAsia '24 Workshops: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops

The engineering samples of the NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchips were tested using different benchmarks and scientific applications. The benchmarks include HPCC and HPCG. The real application-based benchmark includes AI-...
Early Performance Evaluation of New Six-Core Intel® Xeon® 5600 Family Processors for HPC
ISPDC '10: Proceedings of the 2010 Ninth International Symposium on Parallel and Distributed Computing

In this paper we take a look at what the newest member of the Intel Xeon Processor family, code named Westmere brings to high performance computing. We compare three generations of Intel Xeon based systems and present a performance evolutions based on ...
Studying CPU and memory utilization of applications on Fujitsu A64FX and Nvidia Grace Superchip
MEMSYS '24: Proceedings of the International Symposium on Memory Systems
ARM-based manycore CPU architectures are well-positioned to provide the rising memory throughput requirements of modern data intensive scientific applications in High Performance Computing (HPC). The Fujitsu A64FX CPU platform is based on the ARM v8.2A ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

HPCAsia '24 Workshops: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops

January 2024

134 pages

ISBN:9798400716522

DOI:10.1145/3636480

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

HORIZON EUROPE Framework Programme
European High Performance Computing Joint Undertaking

Conference

HPCAsiaWS 2024

HPCAsiaWS 2024: International Conference on High Performance Computing in Asia-Pacific Region Workshops

January 25 - 27, 2024

Nagoya, Japan

Acceptance Rates

Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
711
Total Downloads

Downloads (Last 12 months)711
Downloads (Last 6 weeks)23

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kang YGhosh SKandemir MMárquez A(2024)Studying CPU and memory utilization of applications on Fujitsu A64FX and Nvidia Grace SuperchipProceedings of the International Symposium on Memory Systems10.1145/3695794.3695813(198-207)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3695794.3695813
Schieffer GWahlgren JRen JFaj JPeng I(2024)Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace HopperProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673110(199-209)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673110
Jesus RWeiland M(2024)Evaluating and optimising compiler code generation for NVIDIA GraceProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673104(691-700)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673104

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents