[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3126908.3126928acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

LocoFS: a loosely-coupled metadata service for distributed file systems

Published: 12 November 2017 Publication History

Abstract

Key-Value stores provide scalable metadata service for distributed file systems. However, the metadata's organization itself, which is organized using a directory tree structure, does not fit the key-value access pattern, thereby limiting the performance. To address this issue, we propose a distributed file system with a loosely-coupled metadata service, LocoFS, to bridge the performance gap between file system metadata and key-value stores. LocoFS is designed to decouple the dependencies between different kinds of metadata with two techniques. First, LocoFS decouples the directory content and structure, which organizes file and directory index nodes in a flat space while reversely indexing the directory entries. Second, it decouples the file metadata to further improve the key-value access performance. Evaluations show that LocoFS with eight nodes boosts the metadata throughput by 5 times, which approaches 93% throughput of a single-node key-value store, compared to 18% in the state-of-the-art IndexFS.

References

[1]
China Tops Supercomputer Rankings with New 93-Petaflop Machine. https://www.top500.org/news/china-tops-supercomputer-rankings-with-new-93-petaflop-machine/.
[2]
Extra packages for enterprise linux (epel). https://fedoraproject.org/wiki/EPEL.
[3]
LevelDB, a fast and lightweight key/value database library by Google. https://code.google.com/p/leveldb/.
[4]
libcephfs. https://github. com/ceph/ceph/blob/master/src/include/cephfs/lib-cephfs.h.
[5]
libgfapi. http://gluster-documentation-trial.readthedocs.org/en/latest/Features/libgfapi/.
[6]
Lustre releases. https://wiki.hpdd.intel.com.
[7]
mdtest. https://github.com/MDTEST-LANL/mdtest.
[8]
XtreemFS. http://www.xtreemfs.org/.
[9]
A. Adya, W.J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, and R. P. Wattenhofer. Farsite: Federated, available, and reliable storage for an incompletely trusted environment. 2012.
[10]
P. C. Authors. protobuf: bindings for Google's Protocol Buffers-Google Project Hosting.
[11]
P. J. Braam et al. The lustre storage architecture. 2004.
[12]
Y. Chen, Y. Lu, J. Ou, and J. Shu. HiNFS: A persistent memory file system with both buffering and direct-access. In ACM Transaction on Storage (TOS). ACM, 2017.
[13]
A. Davies and A. Orsaria. Scale out with glusterfs. Linux Journal, 2013(235):1, 2013.
[14]
J. Dean and S. Ghemawat. Leveldb. Retrieved, 1:12, 2012.
[15]
S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, and J. Jackson. System software for persistent memory. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys), pages 15:1--15:15, New York, NY, USA, 2014. ACM.
[16]
Google. Leveldb benchmarks. https://leveldb.googlecode.com, 2011.
[17]
M. Hirabayashi. Kyoto cabinet: a straightforward implementation of dbm.
[18]
F. Hupfeld, T. Cortes, B. Kolbeck, J. Stender, E. Focht, M. Hess, J. Malo, J. Marti, and E. Cesario. The xtreemfs architecture - a case for object-based file systems in grids. Concurrency and computation: Practice and experience, 20(17):2049--2060, 2008.
[19]
Intel. Analysis of dne phase i and ii in the latest lustre* releases. http://www.intel.com/content/www/us/en/lustre/dne-phase-i-ii-latest-lustre-releases.html, 2016.
[20]
W. Jannen, J. Yuan, Y. Zhan, A. Akshintala, J. Esmet, Y. Jiao, A. Mittal, P. Pandey, P. Reddy, L. Walsh, M. Bender, M. Farach-Colton, R. Johnson, B. C. Kuszmaul, and D. E. Porter. BetrFS: A right-optimized write-optimized file system. In 13th USENIX Conference on File and Storage Technologies (FAST 15), pages 301--315. USENIX Association, 2015.
[21]
W. K. Josephson, L. A. Bongo, D. Flynn, and K. Li. DFS: A file system for virtualized flash storage. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST), Berkeley, CA, 2010. USENIX.
[22]
C. Lee, D. Sim, J. Hwang, and S. Cho. F2FS: A new file system for flash storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, Feb. 2015. USENIX.
[23]
P. H. Lensing, T. Cortes, J. Hughes, and A. Brinkmann. File system scalability with highly decentralized metadata on independent storage devices. In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pages 366--375. IEEE, 2016.
[24]
A. W. Leung, S. Pasupathy, G. R. Goodson, and E. L. Miller. Measurement and analysis of large-scale network file system workloads. In USENIX Annual Technical Conference, volume 1, pages 5--2, 2008.
[25]
L. Lu, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Wisckey: separating keys from values in ssd-conscious storage. In 14th USENIX Conference on File and Storage Technologies (FAST 16), pages 133--148, 2016.
[26]
Y. Lu, J. Shu, Y. Chen, and T. Li. Octopus: an RDMA-enabled distributed persistent memory file system. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 773--785. USENIX Association, 2017.
[27]
Y. Lu, J. Shu, S. Li, and L. Yi. Accelerating distributed updates with asynchronous ordered writes in a parallel file system. In 2012 IEEE International Conference on Cluster Computing (CLUSTER), pages 302--310. IEEE, 2012.
[28]
Y. Lu, J. Shu, and W. Wang. ReconFS: A reconstructable file system on flash storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 14), pages 75--88, 2014.
[29]
Y. Lu, J. Shu, and W. Zheng. Extending the lifetime of flash-based storage through reducing write amplification from file systems. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST), Berkeley, CA, 2013. USENIX.
[30]
M. Moore, D. Bonnie, B. Ligon, M. Marshall, W. Ligon, N. Mills, E. Quarles, S. Sampson, S. Yang, and B. Wilson. Orangefs: Advancing pvfs. FAST poster session, 2011.
[31]
D. Nagle, D. Serenyi, and A. Matthews. The panasas activescale storage cluster: Delivering scalable high bandwidth storage. In Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 53. IEEE Computer Society, 2004.
[32]
S. Niazi, M. Ismail, S. Grohsschmiedt, M. Ronström, S. Haridi, and J. Dowling. HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases. 2016.
[33]
M. A. Olson. The design and implementation of the inversion file system. In USENIX Winter, 1993.
[34]
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996.
[35]
J. Ou, J. Shu, and Y. Lu. A high performance file system for non-volatile main memory. In Proceedings of the Eleventh European Conference on Computer Systems, page 12. ACM, 2016.
[36]
S. Patil and G. A. Gibson. Scale and concurrency of GIGA+: File system directories with millions of files. In FAST, volume 11, pages 13--13, 2011.
[37]
K. Ren and G. A. Gibson. TableFS: Enhancing metadata efficiency in the local file system. In Proceedings of 2013 USENIX Annual Technical Conference (USENIX'13), pages 145--156, 2013.
[38]
K. Ren, Q. Zheng, S. Patil, and G. Gibson. IndexFS: scaling file system metadata performance with stateless caching and bulk insertion. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC'14), pages 237--248. IEEE, 2014.
[39]
D. S. Roselli, J. R. Lorch, T. E. Anderson, et al. A comparison of file system workloads. In USENIX annual technical conference, general track, pages 41--54, 2000.
[40]
R. B. Ross, R. Thakur, et al. Pvfs: A parallel file system for linux clusters. In Proceedings of the 4th annual Linux showcase and conference, pages 391--430, 2000.
[41]
M. I. Seltzer and N. Murphy. Hierarchical file systems are dead. In Proceedings of the 12th Workshop on Hot Topics in Operating Systems (HotOS XII), 2009.
[42]
M. A. Sevilla, N. Watkins, C. Maltzahn, I. Nassi, S. A. Brandt, S. A. Weil, G. Farnum, and S. Fineberg. Mantle: a programmable metadata load balancer for the ceph file system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 21. ACM, 2015.
[43]
P. J. Shetty, R. P. Spillane, R. R. Malpani, B. Andrews, J. Seyster, and E. Zadok. Building workload-independent storage with vt-trees. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST 13), pages 17--30. USENIX, 2013.
[44]
J. Stender, B. Kolbeck, M. Ho gqvist, and F. Hupfeld. Babudb: Fast and efficient file system metadata storage. In International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI), pages 51--58. IEEE, 2010.
[45]
B. Vangoor, V. Tarasov, and E. Zadok. To FUSE or Not to FUSE: Performance of User-Space File Systems. ... FAST'17: 15th USENIX Conference ..., 2017.
[46]
S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation (OSDI), pages 307--320. USENIX Association, 2006.
[47]
S. A. Weil, K. T. Pollack, S. A. Brandt, and E. L. Miller. Dynamic metadata management for petabyte-scale file systems. In Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 4. IEEE Computer Society, 2004.
[48]
J. Xiong, Y. Hu, G. Li, R. Tang, and Z. Fan. Metadata distribution and consistency techniques for large-scale cluster file systems. IEEE Transactions on Parallel and Distributed Systems, 22(5):803--816, 2011.
[49]
J. Xu and S. Swanson. Nova: a log-structured file system for hybrid volatile/non-volatile main memories. In 14th USENIX Conference on File and Storage Technologies (FAST 16), pages 323--338, 2016.
[50]
Y. Yang, X. Wang, B. Yang, W. Liu, and W. Xue. I/O trace tool for HPC applications over Sunway TaihuLight Supercomputer. In HPC China, 2016.
[51]
L. Yi, J. Shu, J. Ou, and Y. Zhao. Cx: concurrent execution for the cross-server operations in a distributed file system. In 2012 IEEE International Conference on Cluster Computing (CLUSTER), pages 99--107. IEEE, 2012.
[52]
L. Yi, J. Shu, Y. Zhao, Y. Qian, Y. Lu, and W. Zheng. Design and implementation of an asymmetric block-based parallel file system. IEEE Transactions on Computers, 63(7):1723--1735, 2014.
[53]
J. Yuan, Y. Zhan, W. Jannen, P. Pandey, A. Akshintala, K. Chandnani, P. Deo, Z. Kasheff, L. Walsh, M. Bender, M. Farach-Colton, R. Johnson, B. C. Kuszmaul, and D. E. Porter. Optimizing every operation in a write-optimized file system. In 14th USENIX Conference on File and Storage Technologies (FAST 16), pages 1--14. USENIX Association, 2016.
[54]
K. Zeng, Y. Lu, H. Wan, and J. Shu. Efficient storage management for aged file systems on persistent memory. In 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1769--1774. IEEE, 2017.
[55]
J. Zhang, J. Shu, and Y. Lu. ParaFS: A log-structured file system to exploit the internal parallelism of flash devices. In 2016 USENIX Annual Technical Conference (USENIX ATC 16), pages 87--100, 2016.

Cited By

View all
  • (2024)LoADM: Load-Aware Directory Migration Policy in Distributed File Systems2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546634(1-6)Online publication date: 25-Mar-2024
  • (2024)DPC: DPU-accelerated High-Performance File System ClientProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673123(63-72)Online publication date: 12-Aug-2024
  • (2024)Exploring the Asynchrony of Slow Memory Filesystem with EasyIOProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629586(624-640)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2017
801 pages
ISBN:9781450351140
DOI:10.1145/3126908
  • General Chair:
  • Bernd Mohr,
  • Program Chair:
  • Padma Raghavan
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed architectures
  2. distributed storage
  3. file systems management
  4. key-value stores

Qualifiers

  • Research-article

Funding Sources

Conference

SC '17
Sponsor:

Acceptance Rates

SC '17 Paper Acceptance Rate 61 of 327 submissions, 19%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)6
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LoADM: Load-Aware Directory Migration Policy in Distributed File Systems2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546634(1-6)Online publication date: 25-Mar-2024
  • (2024)DPC: DPU-accelerated High-Performance File System ClientProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673123(63-72)Online publication date: 12-Aug-2024
  • (2024)Exploring the Asynchrony of Slow Memory Filesystem with EasyIOProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629586(624-640)Online publication date: 22-Apr-2024
  • (2023)λFS: A Scalable and Elastic Distributed File System Metadata Service using Serverless FunctionsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624765(394-411)Online publication date: 25-Mar-2023
  • (2023)Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed StorageProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35899777:2(1-22)Online publication date: 22-May-2023
  • (2023)CFS: Scaling Metadata Service for Distributed File System via Pruned Scope of Critical SectionsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587443(331-346)Online publication date: 8-May-2023
  • (2023)PetaKV: Building Efficient Key-Value Store for File System Metadata on Persistent MemoryIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.323238234:3(843-855)Online publication date: 1-Mar-2023
  • (2023)Low-Latency and Scalable Full-path Indexing Metadata Service for Distributed File Systems2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00051(283-290)Online publication date: 6-Nov-2023
  • (2023)KV-CSD: A Hardware-Accelerated Key-Value Store for Data-Intensive Applications2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00019(132-144)Online publication date: 31-Oct-2023
  • (2023)CLMS: Configurable and Lightweight Metadata Service for Parallel File Systems on NVMe SSDsAdvanced Parallel Processing Technologies10.1007/978-981-99-7872-4_6(101-112)Online publication date: 8-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media