[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Revisiting the design of LSM-tree Based OLTP storage engine with persistent memory

Published: 01 June 2021 Publication History

Abstract

The recent byte-addressable and large-capacity commercialized persistent memory (PM) is promising to drive database as a service (DBaaS) into unchartered territories. This paper investigates how to leverage PMs to revisit the conventional LSM-tree based OLTP storage engines designed for DRAM-SSD hierarchy for DBaaS instances. Specifically we (1) propose a light-weight PM allocator named Hal-loc customized for LSM-tree, (2) build a high-performance Semi-persistent Memtable utilizing the persistent in-memory writes of PM, (3) design a concurrent commit algorithm named Reorder Ring to aschieve log-free transaction processing for OLTP workloads and (4) present a Global Index as the new globally sorted persistent level with non-blocking in-memory compaction. The design of Reorder Ring and Semi-persistent Memtable achieves fast writes without synchronized logging overheads and achieves near instant recovery time. Moreover, the design of Semi-persistent Memtable and Global Index with in-memory compaction enables the byte-addressable persistent levels in PM, which significantly reduces the read and write amplification as well as the background compaction overheads. The overall evaluation shows that the performance of our proposal over PM-SSD hierarchy outperforms the baseline by up to 3.8x in YCSB benchmark and by 2x in TPC-C benchmark.

References

[1]
Intel 2015. Intel and Micron Produce Breakthrough Memory Technology. Intel. Retrieved May 29, 2021 from https://newsroom.intel.com/news-releases/intel-and-micron-produce-breakthrough-memory-technology
[2]
Intel 2019. The Challenge of Keeping Up with Data. Intel. Retrieved May 29, 2021 from https://www.intel.com/content/www/us/en/products/docs/memory-storage/optane-persistent-memory/optane-dc-persistent-memory-brief.html
[3]
Alibaba Cloud 2021. Enhanced SSDs, Alibaba Cloud. Alibaba Cloud. Retrieved May 29,2021 from https://www.alibabacloud.com/help/doc-detail/122389.html
[4]
Apache 2021. HBase, a distributed, scalable, big data store. Apache. Retrieved May 29, 2021 from https://github.com/google/leveldb
[5]
Intel 2021. PMDK: Persistent Memory Programming. Intel. Retrieved May 29, 2021 from https://pmem.io/pmdk/
[6]
Facebook 2021. Rocksdb, a persistent key-value store for fast storage enviroments. Facebook. Retrieved May 29, 2021 from https://rocksdb.org/
[7]
Joy Arulraj and Andrew Pavlo. 2017. How to build a non-volatile memory database management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Vol. Part F127746. Association for Computing Machinery, New York, New York, USA, 1753--1758.
[8]
Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. 2017. {TRIAD}: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores. In 2017 {USENIX} Annual Technical Conference ({USENIX}{ATC} 17). 363--375.
[9]
Matias Bjørling. 2019. From open-channel SSDs to zoned namespaces. In Linux Storage and Filesystems Conference (Vault 19). 1.
[10]
Edward Bortnikov, Anastasia Braginsky, Eshcar Hillel, Idit Keidar, and Gali Sheffi. 2018. Accordion: Better memory organization for LSM key value stores. Proceedings of the VLDB Endowment 11, 12 (2018), 1863--1875.
[11]
Wentao Cai, Haosen Wen, H. Alan Beadle, Chris Kjellqvist, Mohammad Hedayati, and Michael L. Scott. 2020. Understanding and optimizing persistent memory allocation. International Symposium on Memory Management, ISMM (2020), 60--73. arXiv:2003.06718
[12]
Christopher Cantalupo, Vishwanath Venkatesan, Jeff R. Hammond, Krzysztof Czury lo, and Simon Hammond. 2015. User Extensible Heap Manager for Heterogeneous Memory Platforms and Mixed Memory Policies. Intel. Retrieved May 29, 2021 from http://memkind.github.io/memkind/
[13]
Shimin Chen and Qin Jin. 2015. Persistent b+-trees in non-volatile main memory. Proceedings of the VLDB Endowment 8,7 (2015),786--797.
[14]
Youmin Chen, Youyou Lu, Fan Yang, Qing Wang, Yang Wang, and Jiwu Shu. 2020. FlatStore: An efficient log-structured key-value storage engine for persistent memory. International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (2020), 1077--1091.
[15]
Joel Coburn, Adrian M Caulfield, Ameen Akel, Laura M Grupp, Rajesh K Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories. ACM SIGARCH Computer Architecture News 39,1 (2011), 105--118.
[16]
Jeremy Condit, Edmund B Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 133--146.
[17]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10--11, 2010.
[18]
Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better space-time trade-offs for LSM-tree based key-value stores via adaptive removal of superfluous merging. In Proceedings of the 2018 International Conference on Management of Data. 505--520.
[19]
Niv Dayan and Stratos Idreos. 2019. The log-structured merge-bush & the wacky continuum. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Vol. 4. Association for Computing Machinery, New York, New York, USA, 449--466.
[20]
Alexander Driskill-Smith. 2010. Latest advances and future prospects of STT-RAM. In Non-Volatile Memories Workshop. 11--13.
[21]
Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM Footprint with NVM in Facebook. Proceedings of the 13th EuroSys Conference, EuroSys2018 2018-Janua (2018).
[22]
Jason Evans. 2006. A scalable concurrent malloc implementation for FreeBSD. In Proceedings of the bsdcan conference, ottawa, canada.
[23]
Keir Fraser. 2004. Practical lock-freedom. Technical Report. University of Cambridge, Computer Laboratory.
[24]
Sanjay Ghemawat and JeffDean. 2011. LevelDB. Google. Retrieved May 29, 2021 from https://github.com/google/leveldb
[25]
Theo Haerder and Andreas Reuter. 1983. Principles of transaction-oriented database recovery. ACM Computing Surveys (CSUR) 15, 4 (dec 1983), 287--317.
[26]
Gui Huang, Xuntao Cheng, Jianying Wang, Yujie Wang, Dengcheng He, Tieying Zhang, Feifei Li, Sheng Wang, Wei Cao, and Qiang Li. 2019. X-engine: An optimized storage engine for large-scale e-commerce transaction processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 651--665.
[27]
Deukyeon Hwang, Wook-Hee Kim, Youjip Won, and Beomseok Nam. 2018. Endurable Transient Inconsistency in Byte- Addressable Persistent B+-Tree. 187--200 pages. https://www.usenix.org/conference/fast18/presentation/hwang
[28]
Joseph Izraelevitz, Terence Kelly, and Aasheesh Kolli. 2016. Failure-atomic persistent memory updates via JUSTDO logging. ACM SIGARCH Computer Architecture News 44,2 (2016), 427--442.
[29]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. 2019. Basic performance measurements of the intel optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019).
[30]
Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, and Young ri Choi. 2019. SLM-DB: Single-level key-value store with persistent memory. Proceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019 (2019), 191--205.
[31]
Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Georgia Tech, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Redesigning LSMs for Nonvolatile Memory with NoveLSM. In USENIX Annual Technical Conference. 993--1005. https://www.usenix.org/conference/atc18/presentation/kannan
[32]
Jihwan Lee, Won Gi Choi, Doyoung Kim, Hanseung Sung, and Sanghyun Park. 2020. TLSM: Tiered Log-Structured Merge-Tree Utilizing Non-Volatile Memory. IEEE Access 8 (2020), 100948--100962.
[33]
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radixtree: ARTful indexing for main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 38--49.
[34]
Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of practical synchronization. In Proceedings of the 12th International Workshop on Data Management on New Hardware. 1--8.
[35]
Lucas Lersch, Xiangpeng Hao, Ismail Oukid, Tianzheng Wang, and Thomas Willhalm. 2019. Evaluating persistent memory range indexes. Proceedings of the VLDB Endowment 13, 4 (2019), 574--587.
[36]
Jianhong Li, Andrew Pavlo, and Siying Dong. 2017. NVMRocks: RocksDB on non-volatile memory systems.
[37]
Jihang Liu, Shimin Chen, and Lujun Wang. 2019. LB + -Trees : Optimizing Persistent Index Performance on 3DXPoint Memory. 13, 7 (2019), 1078--1090.
[38]
Chen Luo and Michael J. Carey. 2020. LSM-based storage techniques: a survey. VLDB Journal 29, 1 (2020), 393--418.
[39]
Yoshinori Matsunobu, Siying Dong, and Herman Lee. 2020. MyRocks : LSM-Tree Database Storage Engine Serving Facebook's Social Graph. Proceedings of the VLDB Endowment 13, 12 (2020), 3217--3230.
[40]
C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Transactions on Database Systems (TODS) 17, 1 (jan 1992), 94--162.
[41]
Iulian Moraru, David G. Andersen, Michael Kaminsky, Niraj Tolia, Parthasarathy Ranganathan,and Nathan Binkert. 2013. Consistent, durable, and safe memory management for byte-addressable non volatile main memory. In Proceedings of the 1st ACM SIGOPS Conference on Timely Results in Operating Systems, TRIOS 2013. Association for Computing Machinery, Inc, New York, New York, USA, 1--17.
[42]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385.
[43]
Ismail Oukid, Daniel Booss, Adrien Lespinasse, Wolfgang Lehner, Thomas Willhalm, and Grégoire Gomes. 2017. Memory management techniques for large-scale persistent-main-memory systems. Proceedings of the VLDB Endowment 10, 11 (2017), 1166--1177.
[44]
Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 371--386.
[45]
John D Valois. 1994. Implementing lock-free queues. In Proceedings of the seventh international conference on Parallel and Distributed Computing Systems. 64--69.
[46]
Alexander Van Renen, Lukas Vogel, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2019. Persistent memory I/O primitives. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (2019). arXiv:1904.01614
[47]
Haris Volos, Andres Jaan Tack, and Michael M Swift. 2011. Mnemosyne: Lightweight persistent memory. ACM SIGARCH Computer Architecture News 39, 1 (2011), 91--104.
[48]
Daniel Waddington, Mark Kunitomi, Clem Dickey, Samyukta Rao, Amir Abboud, and Jantz Tran. 2019. Evaluation of intel 3D-xpoint NVDIMM technology for memory-intensive genomic workloads. In Proceedings of the International Symposium on Memory Systems. 277--287.
[49]
Tianzheng Wang, Justin Levandoski, and Per-Ake Larson. 2018. Easy lock-free indexing in non-volatile memory. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 461--472.
[50]
William Wang and Stephan Diestelhorst. 2018. Quantify the performance overheads of PMDK. ACM International Conference Proceeding Series (2018), 7--9.
[51]
Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G. Andersen. 2018. Building a BW-tree takes more than just BuzzWords. Proceedings of the ACM SIGMOD International Conference on Management of Data 1 (2018), 473--488.
[52]
Jian Xu, Juno Kim, Amirsaman Memaripour, and Steven Swanson. 2019. Finding and Fixing Performance Pathologies in Persistent Memory Software Stacks. International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (2019), 427--439.
[53]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. 2020. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 169--182. https://www.usenix.org/conference/fast20/presentation/yang
[54]
Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. 2020. MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 17--31. https://www.usenix.org/conference/atc20/presentation/yao

Cited By

View all
  • (2024)FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast StorageProceedings of the VLDB Endowment10.14778/3648160.364817717:6(1377-1390)Online publication date: 1-Feb-2024
  • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
  • (2024)Optimizing LSM-based indexes for disaggregated memoryThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00863-y33:6(1813-1836)Online publication date: 1-Nov-2024
  • Show More Cited By

Index Terms

  1. Revisiting the design of LSM-tree Based OLTP storage engine with persistent memory
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 14, Issue 10
        June 2021
        219 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        Published: 01 June 2021
        Published in PVLDB Volume 14, Issue 10

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)77
        • Downloads (Last 6 weeks)9
        Reflects downloads up to 14 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast StorageProceedings of the VLDB Endowment10.14778/3648160.364817717:6(1377-1390)Online publication date: 1-Feb-2024
        • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
        • (2024)Optimizing LSM-based indexes for disaggregated memoryThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00863-y33:6(1813-1836)Online publication date: 1-Nov-2024
        • (2023)Perseid: A Secondary Indexing Mechanism for LSM-Based Storage SystemsACM Transactions on Storage10.1145/363328520:2(1-28)Online publication date: 17-Nov-2023
        • (2023)PMLDS: An LSM-Tree Direct Managed Storage for Key-Value Stores on Byte-Addressable DevicesProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605629(223-232)Online publication date: 7-Aug-2023
        • (2023)Falcon: Fast OLTP Engine for Persistent Cache and Non-Volatile MemoryProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613141(531-544)Online publication date: 23-Oct-2023
        • (2023)FlatLSM: Write-Optimized LSM-Tree for PM-Based KV StoresACM Transactions on Storage10.1145/357985519:2(1-26)Online publication date: 21-Jan-2023
        • (2023)TurboHash: A Hash Table for Key-value Store on Persistent MemoryProceedings of the 16th ACM International Conference on Systems and Storage10.1145/3579370.3594766(35-48)Online publication date: 5-Jun-2023
        • (2022)Tair-PMemProceedings of the VLDB Endowment10.14778/3554821.355482715:12(3346-3358)Online publication date: 29-Sep-2022
        • (2022)PlushProceedings of the VLDB Endowment10.14778/3551793.355183915:11(2895-2907)Online publication date: 1-Jul-2022
        • Show More Cited By

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media