Abstract
In data grid environments, many large-scale scientific experiments and simulations generate very large amounts of data in the distributed storages, spanning thousands of files and data sets. In such environments, the replication technique for the fast data sharing between the community of researchers, and the high-performance I/O for the storage and efficient data accesses on heterogeneous resources present an extremely challenging task. Several data replication techniques have been developed to support high-performance data accesses to the remotely produced scientific data. However, most of those techniques were implemented with the assumption that the data being replicated is read-only so that it would not be modified once it has been generated. Furthermore, those techniques mainly focus on measuring up the network performance, but ignoring I/O overhead incurred during the data generation and replication. We have developed a software system, called Grid Environment-based Data Management System (GEDAS), that provides a high-level, user-friendly interface, while maintaining the consistent data replicas among the grid communities. We describe the design and implementation of GEDAS and present performance results on Linux cluster.
Chapter PDF
Similar content being viewed by others
References
Allcock, B., Foster, I., Nefedova, V., Chervenak, A., Deelman, E., Kesselman, C., Leigh, J., Sim, A., Shoshani, A., Drach, B., Williams, D.: High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies. In: SC 2001 (November 2001)
Moore, R., Rajasekar, A.: Data and Metadata Collections for Scientific Applications. In: High Performance Computing and Networking (HPCN 2001), Amsterdam, NL (June 2001)
Chervenak, A., Deelman, E., Kesselman, C., Pearlman, L., Singh, G.: A Metadata Catalog Service for Data Intensive Applications. GriPhyN technical report (2002)
Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration WG. In: Global Grid Forum, June 22 (2002)
Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., Tuecke, S.: The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. Journal of Network and Computer Applications 23, 187–200 (2001)
Foster, I., Kesselman, C.: Globus: A Metacomputing Infrastructure Toolkit. Intl J. Supercomputer Applications 11(2), 115–128 (1997)
No, J., Thakur, R., Choudhary, A.: High-Performance Scientific Data Management System. Journal of Parallel and Distributed Computing 4(64), 434–447 (2003)
Shen, X., Choudhary, A.: A Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing. In: 9th IEEE Symposium on High Performance Distributed Computing (2000)
Thekkath, C.A., Mann, T., Lee, E.K.: Frangipani: A Scalable Distributed File System. In: Proceedings of the Symposium on Operating Systems Principles, pp. 224–237 (1997)
Preslan, K.W., Barry, A.P., Brassow, J.E., Erickson, G.M., Nygaard, E., Sabol, C.J., Soltis, S.R., Teigland, D.C., O’Keefe, M.T.: A 64-bit Shared Disk File System for Linux. In: Proceedings of Sixteenth IEEE Mass Storage Systems Symposium Seventh NASA Goddard Conference on Mass Storage Systems & Technologies, March 15-18 (1999)
Thakur, R., Gropp, W.: Improving the Performance of Collective Operations in MPICH. In: Proceedings of the 10th European PVM/MPI Users’ Group Conference (Euro PVM/MPI 2003) (September 2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
No, J., Park, H. (2005). GEDAS: A Data Management System for Data Grid Environments. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J.J. (eds) Computational Science – ICCS 2005. ICCS 2005. Lecture Notes in Computer Science, vol 3514. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428831_60
Download citation
DOI: https://doi.org/10.1007/11428831_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26032-5
Online ISBN: 978-3-540-32111-8
eBook Packages: Computer ScienceComputer Science (R0)