Abstract
Modern cluster file systems such as PVFS that stripe files across multiple nodes have shown to provide high aggregate I/O bandwidth but are prone to data loss since the failure of a single disk or server affects the whole file system. To address this problem a number of distributed data redundancy schemes have been proposed that represent different trade-offs between performance, storage efficiency and level of fault tolerance. However the actual level of dependability of an enhanced striped file system is determined by more than just the redundancy scheme adopted, depending in general on other factors such as the type of fault detection mechanism, the nature and the speed of the recovery. In this paper we address the question of how to assess the dependability of CSAR, a version of PVFS augmented with a RAID5 distributed redundancy scheme we described in a previous work.
This work has been partially supported by the Consorzio Interuniversitario Nazionale per l’Informatica (CINI), by the Italian Ministry for Education, University, and Research (MIUR) in the framework of the FIRB Project ”Middleware for advanced services over large-scale, wired-wireless distributed systems (WEB-MINDS)”, by the National Partnership for Advanced Computational Infrastructure, by the Ohio Supercomputer Center through grants PAS0036 and PAS0121, and by NSF grant CNS-0403342. M.L. is partially supported by NSF DBI-0317335. Support from Hewlett-Packard is also gratefully acknowledged.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Carns, P.H., Ligon III, W.B., Ross, R.B., Thakur, R.: PVFS: a parallel file system for Linux clusters. In: Proc. of the 4th Annual Linux Showcase and Conference, Atlanta, GA, pp. 317–327 (2000) (Best Paper Award)
Stonebraker, M., Schloss, G.A.: Distributed RAID-a new multiple copy algorithm. In: Proceedings of Sixth Int. Conf. on Data Engineering, February 5-9, pp. 430–437 (1990)
Pillai, M., Lauria, M.: CSAR: Cluster Storage with Adaptive Redundancy. In: ICPP 2003, Kaohsiung, Taiwan, ROC, October 2003, pp. 223–230 (2003)
Hwang, K., Jin, H., Ho, R.S.C.: Orthogonal Striping and Mirroring in Distributed RAID for I/O-Centric Cluster Computing. IEEE Trans. on Parallel and Distributed Systems 13(1) (January 2002)
Trivedi, K.S.: SHARPE 2002: Symbolic Hierarchical Automated Reliability and Performance Evaluator. In: Proceedings of Int. Conf. on Dependable Systems and Networks, June 23-26, p. 544 (2002)
Mendiratta, V.B.: Reliability analysis of clustered computing systems. In: Proceedings of the 9th Int. Symp. on Software Reliability Engineering, November 1998, pp. 268–272 (1998)
Smirni, E., Reed, D.A.: Workload Characterization of Input/Output Intensive Parallel Applications. In: Marie, R., Plateau, B., Calzarossa, M.C., Rubino, G.J. (eds.) TOOLS 1997. LNCS, vol. 1245, pp. 169–180. Springer, Heidelberg (1997)
Sun, H., Han, J.J., Levendel, H.: A generic availability model for clustered computing systems. In: Proceedings of Pacific Rim Int. Symposium on Dependable Computing, December 17-19, pp. 241–248 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cotroneo, D., Paolillo, G., Russo, S., Lauria, M. (2005). CSAR-2: A Case Study of Parallel File System Dependability Analysis. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds) High Performance Computing and Communications. HPCC 2005. Lecture Notes in Computer Science, vol 3726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557654_23
Download citation
DOI: https://doi.org/10.1007/11557654_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29031-5
Online ISBN: 978-3-540-32079-1
eBook Packages: Computer ScienceComputer Science (R0)