Abstract
This work presents new algorithms for the “Do-All” problem that consists of performing t tasks reliably in a message-passing synchronous system of p fault-prone processors. The algorithms are based on an aggressive coordination paradigm in which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f < p stop-failures and it does not allow restarts. It has the available processor steps complexity S = O((t + plog p/ log log p) · log f) and the message complexity M = O(t + plog p/ log log p + f · p). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for large f, it has better S complexity. This algorithm is used as the basis for another algorithm which tolerates any pattern of stop-failures and restarts. This new algorithm is the first solution for the Do-All problem that efficiently deals with processor restarts. Its available processor steps complexity is S = O((t + plog p + f) · min{log p, log f}), and its message complexity is M = O(t + p · log p + f · p), where f is the number of failures.
This work was supported by the following contracts: ARPA N00014-92-J-4033 and F19628-95-C-0118, NSF 922124-CCR, ONR-AFOSR F49620-94-1-01997, and DFG-Graduiertenkolleg “Parallele Rechnernetzwerke in der Produktionstechnik” ME 872/4-1, DFG-SFB 376 “Massive Parallelität: Algorithmen, Entwurfsmethoden, Anwendungen”. The research of the third author was substantially done at the Massachusetts Institute of Technology. The research of the first and the third authors was partly done while visiting Heinz Nixdorf Institut, Universität-GH Paderborn.
Preview
Unable to display preview. Download preview PDF.
References
R. De Prisco, A. Mayer, and M. Yung, “Time-Optimal Message-Efficient Work Performance in the Presence of Faults,” in Proc. 13th ACM Symposium on Principles of Distributed Computing, 1994, pp. 161–172.
C. Dwork, J. Halpern, O. Waarts, “Performing Work Efficiently in the Presence of Faults”, to appear in SIAM J. on Computing, prelim. vers. appeared as Accomplishing Work in the Presence of Failures in Proc. 11th ACM Symposium on Principles of Distributed Computing, pp. 91–102, 1992.
Z. Galil, A. Mayer, and M. Yung, ”Resolving Message Complexity of Byzantine Agreement and Beyond,” in Proc. 36th IEEE Symposium on Foundations of Computer Science, 1995, pp. 724–733.
V. Hadzilacos and S. Toueg, “Fault-Tolerant Broadcasts and Related Problems,” in Distributed Systems, 2nd Ed., S. Mullender, Ed., Addison-Wesley and ACM Press, 1993.
P.C. Kanellakis, D. Michailidis, A.A. Shvartsman, “Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms”, Nordic J. of Computing, vol. 2, pp. 146–180, 1995 (prel. vers. in WDAG-7, pp. 99–114, 1993).
P.C. Kanellakis and A.A. Shvartsman, “Efficient Parallel Algorithms Can Be Made Robust,” Distributed Computing, vol. 5, pp. 201–217, 1992; prel. version in Proc. of the 8th ACM Symp. on Principles of Distributed Computing, 1989, pp. 211–222.
P.C. Kanellakis and A.A. Shvartsman, Fault-Tolerant Parallel Computation, ISBN 0-7923-9922-6, Kluwer Academic Publishers, 1997.
Z.M. Kedem, K.V. Palem, and P. Spirakis, “Efficient Robust Parallel Computations,” Proc. 22nd ACM Symp. on Theory of Computing, pp. 138–148, 1990.
Z.M. Kedem, K.V. Palem, M.O. Rabin, A. Raghunathan, “Efficient Program Transformations for Resilient Parallel Computation via Randomization,” in Proc. 24th ACM Symp. on Theory of Comp., pp. 306–318, 1992.
C. Martel, personal communication, March, 1991.
C. Martel, R. Subramonian, and A. Park, “Asynchronous PRAMs are (Almost) as Good as Synchronous PRAMs,” in Proc. 32d IEEE Symposium on Foundations of Computer Science, pp. 590–599, 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chlebus, B.S., De Prisco, R., Shvartsman, A.A. (1997). Performing tasks on restartable message-passing processors. In: Mavronicolas, M., Tsigas, P. (eds) Distributed Algorithms. WDAG 1997. Lecture Notes in Computer Science, vol 1320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0030678
Download citation
DOI: https://doi.org/10.1007/BFb0030678
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63575-8
Online ISBN: 978-3-540-69600-1
eBook Packages: Springer Book Archive