[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3343211.3343214acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Persistent coarrays: integrating MPI storage windows in coarray fortran

Published: 11 September 2019 Publication History

Abstract

The inherent integration of novel hardware and software components on HPC is expected to considerably aggravate the Mean Time Between Failures (MTBF) on scientific applications, while simultaneously increase the programming complexity of these clusters. In this work, we present the initial steps towards the integration of transparent resilience support inside Coarray Fortran. In particular, we propose persistent coarrays, an extension of OpenCoarrays that integrates MPI storage windows to leverage its transport layer and seamlessly map coarrays to files on storage. Preliminary results indicate that our approach provides clear benefits on representative workloads, while incurring in minimal source code changes.

References

[1]
Nilmini Abeyratne, Hsing-Min Chen, Byoungchan Oh, Ronald Dreslinski, Chaitali Chakrabarti, and Trevor Mudge. 2016. Checkpointing Exascale Memory Systems with Existing Memory Technologies. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS '16). ACM, New York, NY, USA, 18--29.
[2]
Katie Antypas, Nicholas Wright, Nicholas P Cardo, Allison Andrews, and Matthew Cordery. 2014. Cori: A Cray XC Pre-exascale System for NERSC. Cray User Group Proceedings. Cray (2014).
[3]
Daniel P Bovet and Marco Cesati. 2005. Understanding the Linux Kernel: from I/O ports to process management. O'Reilly.
[4]
Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, and Marc Snir. 2009. Toward Exascale Resilience. The International Journal of High Performance Computing Applications 23, 4 (2009), 374--388.
[5]
Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model. ACM, 2.
[6]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).
[7]
Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An industry-standard API for shared-memory programming. Computing in Science & Engineering 1 (1998).
[8]
Jack Dongarra, Pete Beckman, Terry Moore, Patrick Aerts, Giovanni Aloisio, Jean-Claude Andre, David Barkai, Jean-Yves Berthou, Taisuke Boku, Bertrand Braunschweig, et al. 2011. The International Exascale Software Project Roadmap. The International Journal of High-Performance Computing Applications 25, 1 (2011).
[9]
Piotr Dorożyński, Pawełt Czarnul, Artur Malinowski, Krzysztof Czuryłto, Łukasz Dorau, Maciej Maciejewski, and Pawełt Skowron. 2016. Checkpointing of parallel MPI applications using MPI one-sided API with support for byte-addressable non-volatile RAM. Procedia Computer Science 80 (2016), 30--40.
[10]
H. Elnawawy, M. Alshboul, J. Tuck, and Y. Solihin. 2017. Efficient Checkpointing of Loop-Based Codes for Non-volatile Main Memory. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 318--329.
[11]
Alessandro Fanfarillo, Tobias Burnus, Valeria Cardellini, Salvatore Filippone, Dan Nagle, and Damian Rouson. 2014. OpenCoarrays: Open-source Transport Layers supporting Coarray Fortran compilers. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. ACM, 4.
[12]
Alessandro Fanfarillo, Sudip Kumar Garain, Dinshaw Balsara, and Daniel Nagle. 2019. Resilient Computational Applications using Coarray Fortran. Parallel Comput. 81 (2019), 58--67.
[13]
Michael Feldman. 2017. Oak Ridge readies Summit supercomputer for 2018 debut. in: Top500.org, http://bit.ly/2ERRFr9. {On-Line}.
[14]
Robert Gerstenberger, Maciej Besta, and Torsten Hoefler. 2014. Enabling highly-scalable remote memory access programming with MPI-3 one sided. Scientific Programming 22, 2 (2014), 75--91.
[15]
Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. 2019. Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory. arXiv preprint arXiv:1904.07162 (2019).
[16]
William Gropp, Torsten Hoefler, Rajeev Thakur, and Ewing Lusk. 2014. Using advanced MPI: Modern features of the message-passing interface. MIT Press.
[17]
William Gropp and Ewing Lusk. 2004. Fault tolerance in message passing interface programs. The International Journal of High Performance Computing Applications 18, 3 (2004), 363--372.
[18]
David Henty. 2011. A Parallel Benchmark Suite for Fortran Coarrays. In Parallel Computing. Elsevier, 281--288.
[19]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. 2019. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. arXiv preprint arXiv:1903.05714 (2019).
[20]
Edward Karrels and Ewing Lusk. 1994. Performance analysis of MPI programs. In Proceedings of the Workshop on Environments and Tools For Parallel Scientific Computing. 195--200.
[21]
Urs Köster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K Bansal, William Constable, Oguz Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrow-shahi, Carey Kloss, Ruby J Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In Advances in Neural Information Processing Systems 30 (NIPS 2017). 1740--1750.
[22]
John D McCalpin. 1995. A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter 19 (1995), 25.
[23]
MPI Forum. 2015. MPI: A Message-Passing Interface Standard. Vol. 3.1. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf. Accessed: 2019-04-21.
[24]
Mihir Nanavati, Malte Schwarzkopf, Jake Wires, and Andrew Warfield. 2015. Non-volatile storage. Commun. ACM 59, 1 (2015), 56--63.
[25]
Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Stefano Markidis, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Dirk Pleiter, and Shaun De Witt. 2018. SAGE: Percipient Storage for Exascale Data-centric Computing. Parallel Computing (2018).
[26]
Robert W Numrich and John Reid. 1998. Co-Array Fortran for parallel programming. In ACM Sigplan Fortran Forum, Vol. 17. ACM, 1--31.
[27]
Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Pietro Cicotti, Erwin Laure, and Stefano Markidis. 2017. Exploring the Performance Benefit of Hybrid Memory System on HPC Environments. In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International. IEEE, 683--692.
[28]
Daniel A Reed and Jack Dongarra. 2015. Exascale computing and big data. Commun. ACM 58, 7 (2015), 56--68.
[29]
John Reid. 2018. The new features of Fortran 2018. In ACM SIGPLAN Fortran Forum, Vol. 37. ACM, 5--43.
[30]
John Reid and Robert W Numrich. 2007. Co-arrays in the next Fortran Standard. Scientific Programming 15, 1 (2007), 9--26.
[31]
Sergio Rivas-Gomez, Roberto Gioiosa, Ivy Bo Peng, Gokcen Kestor, Sai Narasimhamurthy, Erwin Laure, and Stefano Markidis. 2018. MPI Windows on Storage for HPC Applications. Parallel Computing 77 (2018), 38--56.
[32]
Gabriel Rodriguez, María J. Martín, Patricia González, Juan Touriño, and Ramón Doallo. 2010. CPPC: A Compiler-assisted Tool for Portable Checkpointing of Message-passing Applications. Concurr. Comput.: Pract. Exper. 22, 6 (April 2010), 749--766.
[33]
David Schneider. 2018. US supercomputing strikes back. IEEE Spectrum 55, 1 (2018), 52--53.
[34]
Monika ten Bruggencate and Duncan Roweth. 2010. DMAPP - An API for One-sided Program Models on Baker Systems. In Cray User Group Conference.
[35]
Rob F Van der Wijngaart and Timothy G Mattson. 2014. The Parallel Research Kernels. In 2014 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--6.
[36]
Sudharshan S Vazhkudai, Bronis R de Supinski, Arthur S Bland, Al Geist, James Sexton, Jim Kahle, Christopher J Zimmer, Scott Atchley, Sarp Oral, Don E Maxwell, et al. 2018. The design, deployment, and evaluation of the CORAL pre-exascale systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 52.

Cited By

View all
  • (2023)Deepfakes Spark Implementation for Big Data AnalyticsHandbook of Research on Advanced Practical Approaches to Deepfake Detection and Applications10.4018/978-1-6684-6060-3.ch003(32-43)Online publication date: 3-Jan-2023
  • (2020)OpenSHMEM I/O Extensions for Fine-Grained Access to Persistent Memory StorageDriving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI10.1007/978-3-030-63393-6_21(318-333)Online publication date: 18-Dec-2020
  • (2019)uMMAP-IO: User-Level Memory-Mapped I/O for HPC2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC.2019.00051(363-372)Online publication date: Dec-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '19: Proceedings of the 26th European MPI Users' Group Meeting
September 2019
134 pages
ISBN:9781450371759
DOI:10.1145/3343211
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPI storage windows
  2. coarray fortran
  3. persistent coarrays

Qualifiers

  • Research-article

Funding Sources

  • Horizon 2020 - Sage2
  • National Energy Research Scientific Computing Center (NERSC)

Conference

EuroMPI 2019
EuroMPI 2019: 26th European MPI Users' Group Meeting
September 11 - 13, 2019
Zürich, Switzerland

Acceptance Rates

EuroMPI '19 Paper Acceptance Rate 13 of 26 submissions, 50%;
Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Deepfakes Spark Implementation for Big Data AnalyticsHandbook of Research on Advanced Practical Approaches to Deepfake Detection and Applications10.4018/978-1-6684-6060-3.ch003(32-43)Online publication date: 3-Jan-2023
  • (2020)OpenSHMEM I/O Extensions for Fine-Grained Access to Persistent Memory StorageDriving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI10.1007/978-3-030-63393-6_21(318-333)Online publication date: 18-Dec-2020
  • (2019)uMMAP-IO: User-Level Memory-Mapped I/O for HPC2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC.2019.00051(363-372)Online publication date: Dec-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media