[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
short-paper
Public Access

The Composite-File File System: Decoupling One-to-One Mapping of Files and Metadata for Better Performance

Published: 02 March 2020 Publication History

Abstract

The design and implementation of traditional file systems typically use the one-to-one mapping of logical files to their physical metadata representations. File system optimizations generally follow this rigid mapping and miss opportunities for an entire class of optimizations.
We designed, implemented, and evaluated a composite-file file system, which allows many-to-one mappings of files to metadata. Through exploring different mapping strategies, our empirical evaluation shows up to a 27% performance improvement under web server and software development workloads, for both disks and SSDs. This result demonstrates that our approach of relaxing file-to-metadata mapping is promising.

References

[1]
M. Abd-El-Malek, W. V. Courtright, C. Cranor, G. R. Ganger, J. Hendricks, A. J. Klosterman, M. Mesnier et al. 2005. Ursa Minor: Versatile cluster-based storage. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST’05). 59--72.
[2]
R. Agrawal and R. Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). 487--499.
[3]
R. Albrecht. 2017. Web Performance: Cache Efficiency Exercise. Retrieved February 6, 2020 from https://code.facebook.com/posts/964122680272229/web-performance-cache-efficiency-exercise/.
[4]
D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel. 2010. Finding a needle in haystack: Facebook's photo storage. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10). 47--60.
[5]
B. Bloom. 1970. Space/time tradeoffs in hash coding with allowable errors. Communications of the ACM 13, 7 (1970), 422--426.
[6]
S. Chandrasekar, R. Dakshinamurthy, P. G. Seshakumar, B. Prabavathy, and B. Chitra. 2013. A novel indexing scheme for efficient handling of small files in Hadoop Distributed File System. In Proceedings of the 2013 International Conference on Computer Communication and Informatics (ICCCI’2013). 1--8.
[7]
V. Chidambaram, T. Sharma, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. 2012. Consistency without ordering. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’12). 101--116.
[8]
X. Ding, S. Jiang, F. Chen, K. Davis, and X. Zhang. 2007. DiskSeen: Exploiting disk layout and access history to enhance I/O prefetch. In Proceedings of the 2007 USENIX Annual Technical Conference (ATC’07). 261--274
[9]
B. Dong, J. Qiu, Q. Zheng, X. Zhong, J. Li, and Y. Li. 2010. A novel approach to improving the efficiency of storing and accessing smaller files on Hadoop: A case study by PowerPoint files. In Proceedings of the 2010 IEEE International Conference on Services Computing. 65--72
[10]
N. K. Edel, D. Tuteja, E. L. Miller, and S. A. Brandt. 2004. MRAMFS: A compressing file system for non-volatile RAM. In Proceedings of the IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’2004). 596--603.
[11]
G. R. Ganger and M. F. Kaashoek. 1997. Embedded inodes and explicit grouping: Exploiting disk bandwidth for small files. In Proceedings of the USENIX 1997 Annual Technical Conference (ATC’97). 1--17.
[12]
J. A. Garrison and A. L. N. Reddy. 2009. Umbrella file system: Storage management across heterogeneous devices. ACM Transactions on Storage 5, 1, Article 3.
[13]
T. Harter, C. Dragga, M. Vaughn, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. 2012. A file is not a file: Understanding the I/O behavior of Apple desktop applications. ACM Transactions on Computer Systems 30, 3, (2012), Article 10.
[14]
J. S. Heidemann and G. J. Popek. 1994. File-system development with stackable layers. ACM Transactions on Computer Systems: Special Issue on Operating Systems Principles 12, 1 (1994), 58--89.
[15]
R. Jain. 1991. The Art of Computer Systems Performance Analysis. Wiley.
[16]
S. Jiang, X. Ding, Y. Xu, and K. Davis. 2013. A prefetching scheme exploiting both data layout and access history on disk. ACM Transactions on Storage 9, 3 (2013), Article 10.
[17]
T. M. Kroeger and D. E. Long. 2001. Design and implementation of a predictive file prefetching. In Proceedings of the USENIX 2001 Annual Technical Conference (ATC’01).
[18]
Z. Li, Z. Chen, S. M. Srinivasan, and Y. Y. Zhou. 2004. C-Miner: Mining block correlations in storage systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04).
[19]
M. K. McKusick, M. J. Karels, and K. Bostic. 1990. A pageable memory based filesystem. In Proceedings of the USENIX Summer Conference.
[20]
S. J. Mullender and A. S. Tanenbaum. 1984. Immediate files. Software: Practice and Experience 14, 4 (1984), 365--368.
[21]
PKWARE. 2018. ZIP File Format Specification. Retrieved February 6, 2020 from https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.5.TXT.
[22]
K. Ren and G. Gibson. 2013. TABLEFS: Enhancing metadata efficiency in the local file system. In Proceedings of the 2013 USENIX Annual Technical Conference (ATC’13). 145--156.
[23]
O. Rodeh, J. Bacik, and C. Mason. 2013. BTRFS: The Linux B-Tree Filesystem. ACM Transactions on Storage 9, 3 (2013), Article 9.
[24]
D. Roselli, J. R. Lorch, and T. E. Anderson. 2000. A comparison of file system workloads. In Proceeding of the 2000 USENIX Annual Technical Conference (ATC’00).
[25]
G. Soundararajan, M. Mihailescu, and C. Amza. 2008. Context-aware prefetching at the storage server. In Proceedings of the 2008 USENIX Annual Technical Conference (ATC’08). 377--390.
[26]
M. Szeredi. 2017. Filesystem in Userspace. Retrieved February 6, 2020 from https://github.com/libfuse/libfuse.
[27]
M. Terry. 2017. Duplicity. Retrieved February 6, 2020 from http://duplicity.nongnu.org/index.html.
[28]
B. K. R. Vangoor, V. Tarasov, and E. Zadok. 2017. To FUSE or not to FUSE: Performance of user-space file system. In Proceedings of the 15th USENIX Conference on File and Technologies (FAST’17).
[29]
W. Yu, J. Vetter, R. S. Canon, and S. Jiang. 2007. Exploiting Lustre file joining for effective collective IO. In Proceedings of the 7th International Symposium on Cluster Computing and the Grid (CCGRID’07).
[30]
Z. Zhang and K. Ghose. 2007. hFS: A hybrid file system prototype for improving small file and metadata performance. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys’07). 175--187.

Cited By

View all
  • (2023)DHIFS: A Dynamic and Hybrid Index Method with Low Memory Overhead and Efficient File Access2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00064(417-426)Online publication date: 17-Dec-2023
  • (2022)Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer InteractionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/353957720:2(1-17)Online publication date: 26-May-2022
  • (2022)Towards Resilient and Efficient Big Data Storage: Evaluating a SIEM Repository Based on HDFS2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00051(290-297)Online publication date: Mar-2022
  • Show More Cited By

Index Terms

  1. The Composite-File File System: Decoupling One-to-One Mapping of Files and Metadata for Better Performance

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Storage
    ACM Transactions on Storage  Volume 16, Issue 1
    ATC 2019 Special Section and Regular Papers
    February 2020
    155 pages
    ISSN:1553-3077
    EISSN:1553-3093
    DOI:10.1145/3386184
    • Editor:
    • Sam H. Noh
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 March 2020
    Accepted: 01 October 2019
    Revised: 01 July 2019
    Received: 01 April 2019
    Published in TOS Volume 16, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Design
    2. file systems
    3. metadata
    4. performance

    Qualifiers

    • Short-paper
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)227
    • Downloads (Last 6 weeks)36
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)DHIFS: A Dynamic and Hybrid Index Method with Low Memory Overhead and Efficient File Access2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00064(417-426)Online publication date: 17-Dec-2023
    • (2022)Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer InteractionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/353957720:2(1-17)Online publication date: 26-May-2022
    • (2022)Towards Resilient and Efficient Big Data Storage: Evaluating a SIEM Repository Based on HDFS2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00051(290-297)Online publication date: Mar-2022
    • (2022)A survey on AI for storageCCF Transactions on High Performance Computing10.1007/s42514-022-00101-34:3(233-264)Online publication date: 23-May-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media