[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3267809.3267824acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

An Efficient Memory-Mapped Key-Value Store for Flash Storage

Published: 11 October 2018 Publication History

Abstract

Persistent key-value stores have emerged as a main component in the data access path of modern data processing systems. However, they exhibit high CPU and I/O overhead. Today, due to power limitations it is important to reduce CPU overheads for data processing.
In this paper, we propose Kreon, a key-value store that targets servers with flash-based storage, where CPU overhead and I/O amplification are more significant bottlenecks compared to I/O randomness. We first observe that two significant sources of overhead in state-of-the-art key-value stores are: (a) The use of compaction in LSM-Trees that constantly perform merging and sorting of large data segments and (b) the use of an I/O cache to access devices, which incurs overhead even for data that reside in memory. To avoid these, Kreon performs data movement from level to level by performing partial instead of full data reorganization via the use of a full index per-level. In addition, Kreon uses memory-mapped I/O via a custom kernel path with Copy-On-Write.
We implement Kreon as well as our custom memory-mapped I/O path in Linux and we evaluate Kreon using commodity SSDs with both small and large datasets (up to 6 billion keys). For a large dataset that stresses I/O, Kreon reduces CPU cycles/op by up to 5.8x, reduces I/O amplification for inserts by up to 4.61x, and increases insert ops/s by up to 5.3x, compared to RocksDB, a state-of-the-art key-value store that is broadly used today.

References

[1]
Apache. 2018. HBase. https://hbase.apache.org/.
[2]
Jens Axboe. 2017. Flexible I/O Tester. https://github.com/axboe.
[3]
Rudolf Bayer and Edward McCreight. 2002. Organization and maintenance of large ordered indexes. Springer.
[4]
Philip Bohannon, Peter McIlroy, and Rajeev Rastogi. 2001. Main-memory Index Structures with Fixed-size Partial Keys. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD '01). ACM, New York, NY, USA, 163--174.
[5]
Randal Burns and Wayne Hineman. 2001. A bit-parallel search algorithm for allocating free space. In Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2001. Proceedings. Ninth International Symposium on. IEEE, 302--310.
[6]
Bernard Chazelle and Leonidas J Guibas. 1986. Fractional cascading: I. A data structuring technique. Algorithmica 1, 1 (1986), 133--162.
[7]
Yanpei Chen, Sara Alspaugh, and Randy Katz. 2012. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proceedings of the VLDB Endowment 5, 12 (2012), 1802--1813.
[8]
Jungsik Choi, Jiwon Kim, and Hwansoo Han. 2017. Efficient Memory Mapped File I/O for In-Memory File Systems. In 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 17). USENIX Association, Santa Clara, CA. https://www.usenix.org/conference/hotstorage17/program/presentation/choi
[9]
Brian F. Cooper. 2018. Core Workloads. https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads.
[10]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 143--154.
[11]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). ACM, New York, NY, USA, 79--94.
[12]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon's highly available key-value store. ACM SIGOPS operating systems review 41, 6 (2007), 205--220.
[13]
Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf
[14]
Brian Essen, Henry Hsieh, Sasha Ames, Roger Pearce, and Maya Gokhale. 2015. DI-MMAP-a Scalable Memory-map Runtime for Out-of-core Data-intensive Applications. Cluster Computing 18, 1 (March 2015), 15--28.
[15]
Jason Evans. 2018. jemalloc. http://jemalloc.net/.
[16]
Facebook. 2015. RocksDB Performance Benchmarks. https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks.
[17]
Facebook. 2018. RocksDB. http://rocksdb.org/.
[18]
Google. 2018. LevelDB. http://leveldb.org/.
[19]
Goetz Graefe. 2004. Write-optimized B-trees. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30 (VLDB '04). VLDB Endowment, 672--683. http://dl.acm.org/citation.cfm?id=1316689.1316748
[20]
Brendan Gregg. 2016. The Flame Graph. Queue 14, 2, Article 10 (March 2016), 20 pages.
[21]
Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker. 2008. OLTP Through the Looking Glass, and What We Found There. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, New York, NY, USA, 981--992.
[22]
Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, Martin Kersten, et al. 2012. MonetDB: Two decades of research in column-oriented database architectures. A Quarterly Bulletin of the IEEE Computer Society Technical Committee on Database Engineering 35, 1 (2012), 40--45.
[23]
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: A Right-Optimized Write-Optimized File System. In 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, Santa Clara, CA, 301---315. https://www.usenix.org/conference/fast15/technical-sessions/presentation/jannen
[24]
B Kuszmaul. 2014. A comparison of fractal trees to log-structured merge (LSM) trees. White Paper (2014).
[25]
Chunbo Lai, Song Jiang, Liqiong Yang, Shiding Lin, Guangyu Sun, Zhenyu Hou, Can Cui, and Jason Cong. 2015. Atlas: Baidu's key-value storage system for cloud data. In MSST. IEEE Computer Society, 1--14. http://dblp.uni-trier.de/db/conf/mss/msst2015. html#LaiJYLSHCC15
[26]
Leslie Lamport. 1977. Concurrent reading and writing. Commun. ACM 20, 11 (1977), 806--811.
[27]
Y. Li, B. He, Q. Luo, and K. Yi. 2009. Tree Indexing on Flash Disks. In 2009 IEEE 25th International Conference on Data Engineering. 1303--1306.
[28]
Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, Stavros Volos, Onur Kocberber, Javier Picorel, Almutaz Adileh, Djordje Jevdjic, Sachin Idgunji, Emre Ozer, and Babak Falsafi. 2012. Scale-out Processors. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). IEEE Computer Society, Washington, DC, USA, 500--511. http://dl.acm.org/citation.cfm?id=2337159.2337217
[29]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In 14th USENIX Conference on File and Storage Technologies (FAST 16). USENIX Association, Santa Clara, CA, 133---148. https://www.usenix.org/conference/fast16/technical-sessions/presentation/lu
[30]
Michael A. Olson, Keith Bostic, and Margo Seltzer. 1999. Berkeley DB. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '99). USENIX Association, Berkeley, CA, USA, 43--43. http://dl.acm.org/citation.cfm?id=1268708.1268751
[31]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385.
[32]
Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. 2016. Tucana: Design and Implementation of a Fast and Efficient Scale-up Key-value Store. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, Denver, CO, 537--550. https://www.usenix.org/conference/atc16/technical-sessions/presentation/papagiannis
[33]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building Key-Value Stores Using Fragmented Log-Structured Merge Trees. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, New York, NY, USA, 497--514.
[34]
Jinglei Ren. 2016. YCSB-C. https://github.com/basicthinker/YCSB-C.
[35]
Ohad Rodeh. 2008. B-trees, Shadowing, and Clones. Trans. Storage 3, 4, Article 2 (Feb. 2008), 27 pages.
[36]
Allen Samuels. 2018. The Consequences of Infinite Storage Bandwidth. https://goo.gl/Xfo7Lu.
[37]
Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). ACM, New York, NY, USA, 217--228.
[38]
Pradeep J. Shetty, Richard P. Spillane, Ravikant R. Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok. 2013. Building Workload-Independent Storage with VT-Trees. In Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 13). USENIX, San Jose, CA, 17---30. https://www.usenix.org/conference/fast13/technical-sessions/presentation/shetty
[39]
Nae Young Song, Yongseok Son, Hyuck Han, and Heon Young Yeom. 2016. Efficient Memory-Mapped I/O on Fast Storage Device. Trans. Storage 12, 4, Article 19 (May 2016), 27 pages.
[40]
INC TOKUTEK. 2013. TokuDB: MySQL Performance, MariaDB Performance.
[41]
Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, Santa Clara, CA, 71--82. https://www.usenix.org/conference/atc15/technical-session/presentation/wu

Cited By

View all
  • (2024)TeRMProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650698(1-16)Online publication date: 27-Feb-2024
  • (2024)Index Shipping for Efficient Replication in LSM Key-Value Stores with Hybrid KV PlacementACM Transactions on Storage10.1145/365867220:3(1-23)Online publication date: 16-Apr-2024
  • (2024)MemSnap μCheckpoints: A Data Single Level Store for Fearless PersistenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651334(622-638)Online publication date: 27-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '18: Proceedings of the ACM Symposium on Cloud Computing
October 2018
546 pages
ISBN:9781450360111
DOI:10.1145/3267809
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Copy-On-Write
  2. Key-Value Stores
  3. LSM-Tree
  4. Memory-Mapped I/O
  5. SSD
  6. mmap

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SoCC '18
Sponsor:
SoCC '18: ACM Symposium on Cloud Computing
October 11 - 13, 2018
CA, Carlsbad, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)8
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TeRMProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650698(1-16)Online publication date: 27-Feb-2024
  • (2024)Index Shipping for Efficient Replication in LSM Key-Value Stores with Hybrid KV PlacementACM Transactions on Storage10.1145/365867220:3(1-23)Online publication date: 16-Apr-2024
  • (2024)MemSnap μCheckpoints: A Data Single Level Store for Fearless PersistenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651334(622-638)Online publication date: 27-Apr-2024
  • (2024)HPDK: A Hybrid PM-DRAM Key-Value Store for High I/O ThroughputIEEE Transactions on Computers10.1109/TC.2024.337791473:6(1575-1587)Online publication date: 18-Mar-2024
  • (2023)MyWAL: performance optimization by removing redundant input/output stack in key-value storeMyWAL: 一种基于精简输入输出堆栈的键值存储系统性能优化方案Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220049624:7(980-993)Online publication date: 28-Jul-2023
  • (2023)TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory ProgramsACM Transactions on Storage10.1145/358313919:2(1-30)Online publication date: 22-Mar-2023
  • (2023)MoonKV: Optimizing Update-intensive Workloads for NVM-based Key-value Stores2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00057(478-487)Online publication date: 1-Dec-2023
  • (2022)TebisProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3519572(85-98)Online publication date: 28-Mar-2022
  • (2022)The Concurrent Learned Indexes for Multicore Data StorageACM Transactions on Storage10.1145/347828918:1(1-35)Online publication date: 29-Jan-2022
  • (2022)Enabling Efficient Slab-based Allocator on Fast NVMe SSD2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)10.1109/CACML55074.2022.00067(361-366)Online publication date: Mar-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media