Abstract
AbFS is a distributed file system that makes it possible to efficiently share the inexpensive devices attached to the commodity computers of a cluster. The implementation of AbFS offers high-performance metadata management by combining hashing and tables in several levels, hierarchical structures and caches, and by combining the attributes and the namespace in the same structure. No additional layers are needed to implement caches because AbFS uses the Linux metadata caches, inode and dentry, to implement them. Along with the description of the proposed implementation for metadata management and the comparison with other implementations, this work provides experimental results to evaluate its performance obtained with a prototype made from scratch at kernel level. AbFS experimental results show that the implementation proposed is capable to manage files and directories with high performance.
Similar content being viewed by others
References
Carns PH, Ligon WB, Ross RB, Thakur R (2000) PVFS: a parallel file system for Linux clusters. In: Proc 4th annual Linux showc and conf, pp 317–327
Braam PJ (2002) The Lustre storage architecture
Weil SA et al (2006) Ceph: a scalable, high-performance distributed file system. In: Proc 7th symp on oper syst des and implement (OSDI’06), pp 307–320
Schmuck F, Haskin R (2002) GPFS: a shared-disk file system for large computing clusters. In: Proc 1st USENIX conf on file and storage technol, Berkeley, pp 19–23
Soltis SR, Ruwart TM, O’Keefe MT (1996) The global file system. In: Proc 5th NASA Goddard conf on mass storage syst and technol. IEEE Comput. Soc., Los Alamitos, pp 319–342
Ousterhout JK et al (1985) A trace-driven analysis of the UNIX 4.2 BSD file system. In: Proc 10th ACM symp on oper syst princ, pp 15–24
Mummert L, Satyanarayanan M, (1996) Long term distributed file reference tracing: implementation and experience. Softw Pract Exp 26(6):705–736
Roselli D, Lorch JR, Anderson TE (2000) A comparison of file system workloads. In: Proc annual conf on USENIX annual tech conf, p 4
SPECsfs2008 User’s guide v. 1.0. Standard performance evaluation corporation (SPEC)
Zhu Y, Jiang H, Wang J, Xian F (2008) HBA: distributed metadata management for large cluster-based storage systems. IEEE Trans Parallel Distrib Syst 19(6):750–763
Xing J, Xiong J, Sun N, Ma J (2009) Adaptive and scalable metadata management to support a trillion files. In: Proc conf. on high perform comput netw, storage and anal. ACM, New York, pp 1–11
Floyd RA, Ellis CS (1989) Directory reference patterns in hierarchical file systems. IEEE Trans Knowl Data Eng 1(2):238
Wang F et al (2004) File system workload analysis for large scale scientific computing applications. In: Proc 21st IEEE/12th NASA Goddard conf on mass storage syst and technol
Hua Y et al (2011) Supporting scalable and adaptive metadata management in ultralarge-scale file systems. IEEE Trans Parallel Distrib Syst 22(4):580–593
Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. In: Proc 9th ACM symp on oper syst princ, pp 96–108
Sun M (2009) Clustered metadata design. Sun and Cray Confidential
Anderson TE et al (2001) Serverless network file systems. In: Jin H, Cortes T, Buyya R (eds) High perform mass storage and parallel {I/O}: technol and appl. IEEE Comput Soc/Wiley, New York, pp 364–385
Corbett PF, Feitelson DG (2001) The Vesta parallel file system. In: Jin H, Cortes T, Buyya R (eds) High perform mass storage and parallel {I/O}: technol and appl. IEEE Computer Society/Wiley, New York, pp 285–308
Weil SA, Pollack KT, Brandt SA, Miller EL (2004) Dynamic metadata management for petabyte-scale file systems. In: Proc ACM/IEEE conf supercomput, p 4
Brandt SA et al (2003) Efficient metadata management in large distributed storage systems. In: Proc 20th IEEE/11th NASA Goddard conf on mass storage syst and technol, pp 290–298
Xiong J, Hu Y, Li G, Tang R, Fan Z (2011) Metadata distribution and consistency techniques for large-scale cluster file systems. IEEE Trans Parallel Distrib Syst 22(5):803–816
Fagin R, Nievergelt J, Pippenger N, Strong HR (1979) Extendible hashing: a fast access method for dynamic files. ACM Trans Database Syst 4(3):315–344
Hwang I, Maeng S, Cho J (2006) Home-based cooperative cache for parallel I/O applications. Future Gener Comput Syst 22(5):633–642
Turek W, Calleja P (2010) High performance, open source, Dell Lustre storage system. White paper, University of Cambridge, Dell
Kondekar P (2009) MDS performance analysis. Sun microsystems
Kunkel JM, Ludwig T (2007) Performance evaluation of the PVFS2 architecture. In: Proc of the 15th EUROMICRO int conf on parallel, distrib and netw-based process, pp 509–516
Acknowledgements
The authors would like to thank FCSCL (Fundación Centro de Supercomputación de Castilla y León, Spain) for giving access to a cluster of its supercomputer Calendula. This work was partially funded by project IPT-2011-1728-430000.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Díaz, A.F., Anguita, M., Camacho, H.E. et al. Two-level Hash/Table approach for metadata management in distributed file systems. J Supercomput 64, 144–155 (2013). https://doi.org/10.1007/s11227-012-0801-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-012-0801-y