[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3489146.3489147guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Libnvmmio: reconstructing software IO path with failure-atomic memory-mapped interface

Published: 15 July 2020 Publication History

Abstract

Fast non-volatile memory (NVM) technology changes the landscape of file systems. A series of research efforts to overcome the traditional file system designs that limit NVM performance. This research has proposed NVM-optimized file systems to leverage the favorable features of byte-addressability, low-latency, and high scalability. The work tailors the file system stack to reduce the software overhead in using fast NVM. As a further step, NVM IO systems use the memory-mapped interface to fully capture the performance of NVM. However, the memory-mapped interface makes it difficult to manage the consistency semantics of NVM, as application developers need to consider the low-level details. In this work, we propose Libnvmmio, an extended user-level memory-mapped IO, which provides failure-atomicity and frees developers from the crash-consistency headaches. Libnvmmio reconstructs a common data IO path with memory-mapped IO, providing better performance and scalability than the state-of-the-art NVM file systems. On a number of microbenchmarks, Libnvmmio gains up to 2.2× better throughput and 13× better scalability than file accesses via system calls to underlying file systems. For SQLite, Libnvmmio improves the performance of Mobibench and TPC-C by up to 93% and 27%, respectively. For MongoDB, it gains up to 42% throughput increase on write-intensive YCSB workloads.

References

[1]
Saurabh Agarwal, Rahul Garg, Meeta S. Gupta, and Jose E. Moreira. Adaptive incremental checkpointing for massively parallel systems. In Proceedings of the 18th Annual International Conference on Supercomputing, ICS '04, pages 277-286, New York, NY, USA, 2004. ACM.
[2]
Jaehyung Ahn, Dongup Kwon, Youngsok Kim, Mohammadamin Ajdari, Jaewon Lee, and Jangwoo Kim. DCS: A Fast and Scalable Device-centric Server Architecture. In Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48. ACM, 2015.
[3]
Nadav Amit. Optimizing the tlb shootdown algorithm with page access tracking. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC '17, pages 27-39, Berkeley, CA, USA, 2017. USENIX Association.
[4]
Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang, Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, Alexander Driskill-Smith, and Mohamad Krounbi. Spin-transfer torque magnetic random access memory (stt-mram). J. Emerg. Technol. Comput. Syst., 9(2), May 2013.
[5]
Jens Axboe. Flexible I/O Tester. https://github.com/axboe/fio.
[6]
Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. Efficient virtual memory for big memory servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA '13, pages 237-248, New York, NY, USA, 2013. ACM.
[7]
Adrian M. Caulfield, Todor I. Mollov, Louis Alex Eisner, Arup De, Joel Coburn, and Steven Swanson. Providing safe, user space access to fast, solid state disks. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII. ACM, 2012.
[8]
J. Choi, J. Ahn, J. Kim, S. Ryu, and H. Han. In-memory file system with efficient swap support for mobile smart devices. IEEE Transactions on Consumer Electronics, 62(3):275-282, 2016.
[9]
Jungsik Choi, Jiwon Kim, and Hwansoo Han. Efficient Memory Mapped File I/O for In-Memory File Systems. In 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 17). USENIX Association, 2017.
[10]
Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. Scalable address spaces using rcu balanced trees. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, page 199-210, New York, NY, USA, 2012. Association for Computing Machinery.
[11]
Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. Radixvm: Scalable address spaces for multithreaded applications. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, page 211-224, New York, NY, USA, 2013. Association for Computing Machinery.
[12]
Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. Better I/O Through Byte-addressable, Persistent Memory. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09. ACM, 2009.
[13]
Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. System Software for Persistent Memory. In Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14. ACM, 2014.
[14]
Izzat El Hajj, Alexander Merritt, Gerd Zellweger, Dejan Milojicic, Reto Achermann, Paolo Faraboschi, Wen-mei Hwu, Timothy Roscoe, and Karsten Schwan. Spacejmp: Programming with multiple virtual address spaces. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '16, pages 353-368, New York, NY, USA, 2016. ACM.
[15]
R. Gioiosa, J. C. Sancho, S. Jiang, and F. Petrini. Transparent, incremental checkpointing at kernel level: a foundation for fault tolerance for parallel computers. In SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, pages 9-9, Nov 2005.
[16]
R. Hagmann. Reimplementing the cedar file system using logging and group commit. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles, SOSP '87, pages 155-162, New York, NY, USA, 1987. ACM.
[17]
Dave Hitz, James Lau, and Michael Malcolm. File system design for an nfs file server appliance. In Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, WTEC'94, pages 19-19, Berkeley, CA, USA, 1994. USENIX Association.
[18]
Intel Memory Latency Checker. https://software.intel.com/en-us/articles/intelr-memory-latency-checker.
[19]
Intel Optane™ DC Persistent Memory. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.
[20]
Intel Persistent Memory Programming. https://pmem.io/pmdk/.
[21]
Intel and Micron's 3D XPoint™ Technology. https://www.micron.com/about/our-innovation/3d-xpoint-technology.
[22]
Jonathan Corbet. Supporting filesystems in persistent memory, 2014. https://lwn.net/Articles/610174/.
[23]
Juchang Lee, Kihong Kim, and S. K. Cha. Differential logging: a commutative and associative logging scheme for highly parallel main memory database. In Proceedings 17th International Conference on Data Engineering, pages 173-182, April 2001.
[24]
Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, and Vijay Chidambaram. Splitfs: Reducing software overhead in file systems for persistent memory. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP '19, pages 494-508, New York, NY, USA, 2019. ACM.
[25]
Hyeong-Jun Kim, Young-Sik Lee, and Jin-Soo Kim. NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs. In 8th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage '16. USENIX Association, 2016.
[26]
Hyunjun Kim, Joonwook Ahn, Sungtae Ryu, Jungsik Choi, and Hwansoo Han. In-memory file system for nonvolatile memory. In Proceedings of the 2013 Research in Adaptive and Convergent Systems, RACS '13, page 479-484, New York, NY, USA, 2013. Association for Computing Machinery.
[27]
Wook-Hee Kim, Jinwoong Kim, Woongki Baek, Beomseok Nam, and Youjip Won. NVWAL: Exploiting NVRAM in Write-Ahead Logging. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '16. ACM, 2016.
[28]
Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, and Thomas Anderson. Strata: A cross media file system. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17, pages 460-477, New York, NY, USA, 2017. ACM.
[29]
Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. Coordinated and efficient huge page management with ingens. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, page 705-721, USA, 2016. USENIX Association.
[30]
E. Kültürsay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu. Evaluating stt-ram as an energy-efficient main memory alternative. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS '13, April 2013.
[31]
Butler W. Lampson. Hints for computer system design. In Proceedings of the Ninth ACM Symposium on Operating Systems Principles, SOSP '83, pages 33-48, New York, NY, USA, 1983. ACM.
[32]
B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, and D. Burger. Phase-change technology and the future of main memory. IEEE Micro, 30(1):143-143, Jan 2010.
[33]
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09. ACM, 2009.
[34]
Edward K. Lee and Chandramohan A. Thekkath. Petal: Distributed virtual disks. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VII, pages 84-92, New York, NY, USA, 1996. ACM.
[35]
Gyusun Lee, Wenjing Jin, Wonsuk Song, Jeonghun Gong, Jonghyun Bae, Tae Jun Han, Jae W. Lee, and Jinkyu Jeong. A case for hardware-based demand paging. In Proceedings of the 47th Annual International Symposium on Computer Architecture, ISCA '20, pages 1103-1116, New York, NY, USA, 2020. ACM.
[36]
Sang-Won Lee and Bongki Moon. Design of flash-based dbms: An in-page logging approach. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD '07, page 55-66, New York, NY, USA, 2007. Association for Computing Machinery.
[37]
Bojie Li, Tianyi Cui, Zibo Wang, Wei Bai, and Lintao Zhang. Socksdirect: Datacenter sockets can be fast and compatible. In Proceedings of the ACM Special Interest Group on Data Communication, SIGCOMM '19, page 90-103, New York, NY, USA, 2019. Association for Computing Machinery.
[38]
Sihang Liu, Yizhou Wei, Jishen Zhao, Aasheesh Kolli, and Samira Khan. Pmtest: A fast and flexible testing framework for persistent memory programs. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '19, page 411-425, New York, NY, USA, 2019. Association for Computing Machinery.
[39]
Amirsaman Memaripour and Steven Swanson. Breeze : User-Level Access to Non-Volatile Main Memories for Legacy Software. In 2018 IEEE 36st International Conference on Computer Design, ICCD '18. IEEE, 2018.
[40]
Changwoo Min, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. Understanding manycore scalability of file systems. In 2016 USENIX Annual Technical Conference (USENIX ATC 16), pages 71-85, Denver, CO, June 2016. USENIX Association.
[41]
Mobibench. https://github.com/ESOS-Lab/Mobibench.
[42]
C. Mohan. Repeating history beyond aries. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, page 1-17, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[43]
C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. Aries: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst., 17(1):94-162, March 1992.
[44]
MongoDB. https://www.mongodb.com.
[45]
Netlist NVvault DDR4 NVDIMM-N. https://www.netlist.com/products/specialty-dimms/nvvault-ddr4-nvdimm.
[46]
Jiaxin Ou, Jiwu Shu, and Youyou Lu. A High Performance File System for Non-volatile Main Memory. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys '16. ACM, 2016.
[47]
Ashish Panwar, Aravinda Prasad, and K. Gopinath. Making huge pages actually useful. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '18, pages 679-692, New York, NY, USA, 2018. ACM.
[48]
Jim Pappas. Annual Update on Interfaces, 2014. https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2014/20140805_U3_Pappas.pdf.
[49]
Daejun Park and Dongkun Shin. ijournaling: Fine-grained journaling for improving the latency of fsync system call. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 787-798, Santa Clara, CA, July 2017. USENIX Association.
[50]
Stan Park, Terence Kelly, and Kai Shen. Failure-atomic msync(): A simple and efficient mechanism for preserving the integrity of durable data. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13. ACM, 2013.
[51]
Thanumalayan Sankaranarayana Pillai, Ramnatthan Alagappan, Lanyue Lu, Vijay Chidambaram, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. Application Crash Consistency and Performance with CCFS. In 15th USENIX Conference on File and Storage Technologies, FAST '17. USENIX Association, 2017.
[52]
Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI '14. USENIX Association, 2014.
[53]
Vijayan Prabhakaran, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Analysis and evolution of journaling file systems. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '05, pages 8-8, Berkeley, CA, USA, 2005. USENIX Association.
[54]
S. Qiu and A. L. N. Reddy. Exploiting superpages in a nonvolatile memory file system. In 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pages 1-5, April 2012.
[55]
S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y. Chen, R. M. Shelby, M. Salinga, D. Krebs, S. Chen, H. Lung, and C. H. Lam. Phase-change random access memory: A scalable technology. IBM Journal of Research and Development, 52(4.5):465-479, July 2008.
[56]
Ohad Rodeh, Josef Bacik, and Chris Mason. Btrfs: The linux b-tree filesystem. Trans. Storage, 9(3):9:1-9:32, August 2013.
[57]
Livio Soares and Michael Stumm. FlexSC: Flexible System Call Scheduling with Exception-less System Calls. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10. USENIX Association, 2010.
[58]
Nae Young Song, Yongseok Son, Hyuck Han, and Heon Young Yeom. Efficient Memory-Mapped I/O on Fast Storage Device. ACM Transactions on Storage, 12(4):19:1-19:27, 2016.
[59]
SQLite. https://www.sqlite.org.
[60]
Michael M. Swift. Towards o(1) memory. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems, HotOS '17. ACM, 2017.
[61]
C. Villavieja, V. Karakostas, L. Vilanova, Y. Etsion, A. Ramirez, A. Mendelson, N. Navarro, A. Cristal, and O. S. Unsal. Didi: Mitigating the performance impact of tlb shootdowns using a shared tlb directory. In 2011 International Conference on Parallel Architectures and Compilation Techniques, pages 340-349, Oct 2011.
[62]
Haris Volos, Sanketh Nalli, Sankarlingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. Aerie: Flexible File-system Interfaces to Storage-class Memory. In Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14. ACM, 2014.
[63]
Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin. Robustness in the salus scalable block store. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, nsdi'13, page 357-370, USA, 2013. USENIX Association.
[64]
David A. Wheeler. SLOCCount. https://dwheeler.com/sloccount/.
[65]
Xiaojian Wu and A. L. Narasimha Reddy. SCMFS: A File System for Storage Class Memory. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11. ACM, 2011.
[66]
Jian Xu, Juno Kim, Amirsaman Memaripour, and Steven Swanson. Finding and fixing performance pathologies in persistent memory software stacks. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '19, page 427-439, New York, NY, USA, 2019. Association for Computing Machinery.
[67]
Jian Xu and Steven Swanson. NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories. In 14th USENIX Conference on File and Storage Technologies, FAST '16. USENIX Association, 2016.
[68]
Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, and Andy Rudoff. NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17. ACM, 2017.
[69]
Jisoo Yang, Dave B. Minturn, and Frank Hady. When poll is better than interrupt. In Proceedings of the 10th USENIX Conference on File and Storage Technologies, FAST'12, pages 3-3, Berkeley, CA, USA, 2012. USENIX Association.
[70]
Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. Nv-tree: Reducing consistency cost for nvm-based single level systems. In 13th USENIX Conference on File and Storage Technologies, FAST '15. USENIX Association, 2015.
[71]
Mai Zheng, Joseph Tucek, Dachuan Huang, Feng Qin, Mark Lillibridge, Elizabeth S. Yang, Bill W Zhao, and Shashank Singh. Torturing databases for fun and profit. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI '14. USENIX Association, 2014.

Cited By

View all
  • (2021)SPMFS: A Scalable Persistent Memory File System on Optane Persistent Memory50th International Conference on Parallel Processing10.1145/3472456.3472503(1-10)Online publication date: 9-Aug-2021

Index Terms

  1. Libnvmmio: reconstructing software IO path with failure-atomic memory-mapped interface
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        USENIX ATC'20: Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference
        July 2020
        957 pages
        ISBN:978-1-939133-14-4

        Sponsors

        • VMware
        • Facebook
        • Microsoft
        • ORACLE
        • Google Inc.

        Publisher

        USENIX Association

        United States

        Publication History

        Published: 15 July 2020

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)47
        • Downloads (Last 6 weeks)14
        Reflects downloads up to 07 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)SPMFS: A Scalable Persistent Memory File System on Optane Persistent Memory50th International Conference on Parallel Processing10.1145/3472456.3472503(1-10)Online publication date: 9-Aug-2021

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media