Scalable Fault Tolerant MPI: Extending the Recovery Algorithm

Graham E. Fagg¹⁹,
Thara Angskun¹⁹,
George Bosilca¹⁹,
Jelena Pjesivac-Grbovic¹⁹ &
…
Jack J. Dongarra¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 3666))

Included in the following conference series:

European Parallel Virtual Machine / Message Passing Interface Users’ Group Meeting

854 Accesses
7 Citations

Abstract

Fault Tolerant MPI (FT-MPI) [6] was designed as a solution to allow applications different methods to handle process failures beyond simple check-point restart schemes. The initial implementation of FT-MPI included a robust heavy weight system state recovery algorithm that was designed to manage the membership of MPI communicators during multiple failures. The algorithm and its implementation although robust, was very conservative and this effected its scalability on both very large clusters as well as on distributed systems. This paper details the FT-MPI recovery algorithm and our initial experiments with new recovery algorithms that are aimed at being both scalable and latency tolerant. Our conclusions shows that the use of both topology aware collective communication and distributed consensus algorithms together produce the best results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Fault-Model-Relevant Classification of Consensus Mechanisms for MPI and HPC

Article 12 December 2022

Fault-Aware Group-Collective Communication Creation and Repair in MPI

Scalable Byzantine fault-tolerant state-machine replication on heterogeneous servers

Article 21 August 2018

References

Agbaria, A., Friedman, R.: Starfish: Fault-tolerant dynamic mpi programs on clusters of workstations. In: 8th IEEE International Symposium on High Performance Distributed Computing (1999)
Google Scholar
Batchu, R., Neelamegam, J., Cui, Z., Beddhua, M., Skjellum, A., Dandass, Y., Apte, M.: Mpi/ft^TM: Architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computing. In: Proceedings of the 1st IEEE International Symposium of Cluster Computing and the Grid held in Melbourne, Australia (2001)
Google Scholar
Beck, M., Dongarra, J.J., Fagg, G.E., Geist, G.A., Gray, P., Kohl, J., Migliardi, M., Moore, K., Moore, T., Papadopoulous, P., Scott, S.L., Sunderam, V.: HARNESS:a next generation distributed virtual machine. Future Generation Computer Systems 15 (1999)
Google Scholar
Bosilca, G., Bouteiller, A., Cappello, F., Djilali, S., Fédak, G., Germain, C., Hérault, T., Lemarinier, P., Lodygensky, O., Magniette, F., Néri, V., Selikhov, A.: MPICH-v: Toward a scalable fault tolerant MPI for volatile nodes. In: SuperComputing, Baltimore USA (November 2002)
Google Scholar
Burns, G., Daoud, R.: Robust MPI message delivery through guaranteed resources. In: MPI Developers Conference (June 1995)
Google Scholar
Fagg, G.E., Bukovsky, A., Dongarra, J.J.: HARNESS and fault tolerant MPI. Parallel Computing 27, 1479–1496 (2001)
Article MATH Google Scholar
Fagg, G.E., Moore, K., Dongarra, J.J.: Scalable networked information processing environment (SNIPE). Future Generation Computing Systems 15, 571–582 (1999)
Article Google Scholar
Graham, R.L., Choi, S.-E., Daniel, D.J., Desai, N.N., Minnich, R.G., Rasmussen, C.E., Risinger, L.D., Sukalski, M.W.: A network-failure-tolerant message-passing system for terascale clusters. In: ICS, New York, USA, June 22-26 (2002)
Google Scholar
Louca, S., Neophytou, N., Lachanas, A., Evripidou, P.: Mpi-ft: Portable fault tolerance scheme for MPI. In: Parallel Processing Letters, vol. 10(4), pp. 371–382. World Scientific Publishing Company, Singapore (2000)
Google Scholar
Message Passing Interface Forum. MPI: A Message Passing Interface Standard (June 1995), http://www.mpi-forum.org/
Message Passing Interface Forum. MPI-2: Extensions to the Message Passing Interface (July 1997), http://www.mpi-forum.org/
Stellner, G.: Cocheck: Checkpointing and process migration for MPI. In: Proceedings of the 10th International Parallel Processing Symposium (IPPS 1996), Honolulu, Hawaii (1996)
Google Scholar
Vadhiyar, S.S., Fagg, G.E., Dongarra, J.J.: Performance modeling for self-adapting collective communications for MPI. In: LACSI Symposium, Eldorado Hotel, Santa Fe, NM, October 15-18. Springer, Heidelberg (2001)
Google Scholar
Sankaran, S., Squyres, J.M., Barrett, B., Lumsdaine, A., Duell, J., Hargrove, P., Roman, E.: The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing. In: LACSI Symposium, Santa Fe, NM (October 2003)
Google Scholar
Gabriel, E., Fagg, G.E., Bosilica, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Proceedings 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungry (2004)
Google Scholar
Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MagPIe: MPI’s collective communication operations for clustered wide area systems. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 1999), May 1999, vol. 34(8), pp. 131–140 (1999)
Google Scholar
Tanenbaum, A.S., van Steen, M.: Distributed Systems: Principles and Paradigms. Prentice Hall, Englewood Cliffs (2002)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, The University of Tennessee, 1122 Volunteer Blvd., Suite 413, Knoxville, TN, 37996-3450, USA
Graham E. Fagg, Thara Angskun, George Bosilca, Jelena Pjesivac-Grbovic & Jack J. Dongarra

Authors

Graham E. Fagg
View author publications
You can also search for this author in PubMed Google Scholar
Thara Angskun
View author publications
You can also search for this author in PubMed Google Scholar
George Bosilca
View author publications
You can also search for this author in PubMed Google Scholar
Jelena Pjesivac-Grbovic
View author publications
You can also search for this author in PubMed Google Scholar
Jack J. Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’ Informazione, Second University of Naples - Italy, Real Casa dell’Annunziata - via Roma, 29, 81031, Aversa, CE, Italy
Beniamino Di Martino
GUP, Institute of Graphics and Parallel Processing, Johannes Kepler University, Altenbergerstraße 69, A-4040, Linz, Austria
Dieter Kranzlmüller
Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fagg, G.E., Angskun, T., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J.J. (2005). Scalable Fault Tolerant MPI: Extending the Recovery Algorithm. In: Di Martino, B., Kranzlmüller, D., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2005. Lecture Notes in Computer Science, vol 3666. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557265_13

Download citation

DOI: https://doi.org/10.1007/11557265_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29009-4
Online ISBN: 978-3-540-31943-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scalable Fault Tolerant MPI: Extending the Recovery Algorithm

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Fault-Model-Relevant Classification of Consensus Mechanisms for MPI and HPC

Fault-Aware Group-Collective Communication Creation and Repair in MPI

Scalable Byzantine fault-tolerant state-machine replication on heterogeneous servers

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Scalable Fault Tolerant MPI: Extending the Recovery Algorithm

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Fault-Model-Relevant Classification of Consensus Mechanisms for MPI and HPC

Fault-Aware Group-Collective Communication Creation and Repair in MPI

Scalable Byzantine fault-tolerant state-machine replication on heterogeneous servers

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation