[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3318464.3383133acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Key-Value Storage Engines

Published: 31 May 2020 Publication History

Abstract

Key-value stores are everywhere. They power a diverse set of data-driven applications across both industry and science. Key-value stores are used as stand-alone NoSQL systems but they are also used as a part of more complex pipelines and systems such as machine learning and relational systems. In this tutorial, we survey the state-of-the-art approaches on how the core storage engine of a key-value store system is designed. We focus on several critical components of the engine, starting with the core data structures to lay out data across the memory hierarchy. We also discuss design issues related to caching, timestamps, concurrency control, updates, shifting workloads, as well as mixed workloads with both analytical and transactional characteristics. We cover designs that are read-optimized, write-optimized as well as hybrids. We draw examples from several state-of-the-art systems but we also put everything together in a general framework which allows us to model storage engine designs under a single unified model and reason about the expected behavior of diverse designs. In addition, we show that given the vast number of possible storage engine designs and their complexity, there is a need to be able to describe and communicate design decisions at a high level descriptive language and we present a first version of such a language. We then use that framework to present several open challenges in the field, especially in terms of supporting increasingly more diverse and dynamic applications in the era of data science and AI, including neural networks, graphs, and data versioning.

References

[1]
Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design Tradeoffs for SSD Performance. In Proceedings of the USENIX Annual Technical Conference (ATC). 57--70. http://research.microsoft.com/pubs/63596/usenix-08-ssd.pdf http://dl.acm.org/citation.cfm?id=1404019
[2]
Jung-Sang Ahn, Chiyoung Seo, Ravi Mayuram, Rahim Yaseen, Jin-Soo Kim, and Seungryoul Maeng. 2016. ForestDB: A Fast Key-Value Storage System for Variable-Length String Keys. IEEE Transactions on Computers (TC), Vol. 65, 3 (2016), 902--915. https://doi.org/10.1109/TC.2015.2435779
[3]
Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1009--1024. https://doi.org/10.1145/3035918.3064029
[4]
Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: A Hands-free Adaptive Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1103--1114. https://doi.org/10.1145/2588555.2610502
[5]
Victor Alvarez, Felix Martin Schuhknecht, Jens Dittrich, and Stefan Richter. 2014. Main Memory Adaptive Indexing for Multi-Core Systems. In Proceedings of the International Workshop on Data Management on New Hardware (DAMON). 3:1---3:10. https://doi.org/10.1145/2619228.2619231
[6]
Michael R. Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael J. Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Ré, and Ce Zhang. 2013. Brainwash: A Data System for Feature Engineering. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://web.eecs.umich.edu/ mrander/pubs/mythical_man.pdf http://cidrdb.org/cidr2013/Papers/CIDR13_Paper82.pdf
[7]
Apache. [n. d.]. Accumulo. https://accumulo.apache.org/ ([n. d.]).
[8]
Apple. 2018. FoundationDB. https://github.com/apple/foundationdb (2018).
[9]
Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: a Database Benchmark Based on the Facebook Social Graph. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1185--1196. https://doi.org/10.1145/2463676.2465296
[10]
Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads. In Proceedings of the ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/2882903.2915231
[11]
Manos Athanassoulis, Michael S. Kester, Lukas M. Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. In Proceedings of the International Conference on Extending Database Technology (EDBT). 461--466. http://dx.doi.org/10.5441/002/edbt.2016.42
[12]
Shivnath Babu, Nedyalko Borisov, Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala. 2009. Automated Experiment-Driven Management of (Database) Systems. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS). http://www.usenix.org/events/hotos09/tech/full_papers/babu/babu.pdf
[13]
Rudolf Bayer and Edward M. McCreight. 1970. Organization and Maintenance of Large Ordered Indices. In Proceedings of the ACM SIGFIDET Workshop on Data Description and Access, Vol. 1. 107--141. https://doi.org/10.1007/BF00288683
[14]
Yingyi Bu, Vinayak R. Borkar, Jianfeng Jia, Michael J. Carey, and Tyson Condie. 2014. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. Proceedings of the VLDB Endowment, Vol. 8, 2 (2014), 161--172. https://doi.org/10.14778/2735471.2735477
[15]
Zhao Cao, Shimin Chen, Feifei Li, Min Wang, and Xiaoyang Sean Wang. 2013. LogKV: Exploiting Key-Value Stores for Log Processing. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://cidrdb.org/cidr2013/Papers/CIDR13_Paper46.pdf
[16]
Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin J Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: A Concurrent Key-Value Store with In-Place Updates. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 275--290. https://doi.org/10.1145/3183713.3196898
[17]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). 205--218. http://dl.acm.org/citation.cfm?id=1267308.1267323
[18]
Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W Lee, Ashish Motivala, Abdul Q Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 215--226. https://doi.org/10.1145/2882903.2903741
[19]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 79--94. https://doi.org/10.1145/3035918.3064054
[20]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2018. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. ACM Transactions on Database Systems (TODS), Vol. 43, 4 (2018), 16:1--16:48.
[21]
Niv Dayan, Philippe Bonnet, and Stratos Idreos. 2016. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 327--342. https://doi.org/10.1145/2882903.2915219
[22]
Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 505--520. https://doi.org/10.1145/3183713.3196927
[23]
Niv Dayan and Stratos Idreos. 2019. The Log-Structured Merge-Bush & the Wacky Continuum. In Proceedings of the ACM SIGMOD International Conference on Management of Data .
[24]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS Operating Systems Review, Vol. 41, 6 (2007), 205--220. https://doi.org/10.1145/1323293.1294281
[25]
Jens Dittrich and Alekh Jindal. 2011. Towards a One Size Fits All Database Architecture. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). 195--198.
[26]
Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf
[27]
Facebook. [n. d.]. RocksDB. https://github.com/facebook/rocksdb ([n. d.]).
[28]
Michael J Franklin. 1993. Caching and Memory Management in Client-Server Database Systems. Ph.D. Dissertation. University of Wisconsin-Madison.
[29]
Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling Concurrent Log-Structured Data Stores. In Proceedings of the ACM European Conference on Computer Systems (EuroSys). 32:1--32:14. https://doi.org/10.1145/2741948.2741973
[30]
Google. [n. d.]. LevelDB. https://github.com/google/leveldb/ ([n. d.]).
[31]
Goetz Graefe, Felix Halim, Stratos Idreos, Harumi Kuno, and Stefan Manegold. 2012. Concurrency control for adaptive indexing. Proceedings of the VLDB Endowment, Vol. 5, 7 (2012), 656--667. http://dl.acm.org/citation.cfm?id=2180918
[32]
Richard A Hankins and Jignesh M Patel. 2003. Data Morphing: An Adaptive, Cache-Conscious Storage Technique. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 417--428. http://www.vldb.org/conf/2003/papers/S13P03.pdf
[33]
HBase. 2013. Online reference. http://hbase.apache.org/ (2013).
[34]
Max Heimel, Martin Kiefer, and Volker Markl. 2015. Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1477--1492. https://doi.org/10.1145/2723372.2749438
[35]
Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, and Zichen Zhu. 2019. Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn. In Biennial Conference on Innovative Data Systems Research (CIDR) .
[36]
Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2007. Database Cracking. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) .
[37]
Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2009. Self-organizing Tuple Reconstruction in Column-Stores. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 297--308. https://doi.org/10.1145/1559845.1559878
[38]
Stratos Idreos, Lukas M Maas, and Mike S Kester. 2017. Evolutionary Data Systems. CoRR, Vol. abs/1706.0 (2017). arxiv: 1706.05714
[39]
Stratos Idreos, Kostas Zoumpatianos, Manos Athanassoulis, Niv Dayan, Brian Hentschel, Michael S. Kester, Demi Guo, Lukas M. Maas, Wilson Qin, Abdul Wasay, and Yiyou Sun. 2018a. The Periodic Table of Data Structures. IEEE Data Engineering Bulletin, Vol. 41, 3 (2018), 64--75. http://sites.computer.org/debull/A18sept/p64.pdf
[40]
Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S Kester, and Demi Guo. 2018b. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 535--550. https://doi.org/10.1145/3183713.3199671
[41]
Oliver Kennedy and Lukasz Ziarek. 2015. Just-In-Time Data Structures. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper9.pdf
[42]
Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, and Themis Palpanas. 2018. Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes. VLDB, Vol. 11, 6 (2018), 677--690. http://www.vldb.org/pvldb/vol11/p677-kondylakis.pdf
[43]
Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, and Themis Palpanas. 2019. Coconut Palm: Static and Streaming Data Series Exploration Now in your Palm. In SIGMOD .
[44]
Donald Kossman. 2018. Systems Research - Fueling Future Disruptions. In Keynote talk at the Microsoft Research Faculty Summit. Redmond, WA, USA. https://www.microsoft.com/en-us/research/video/systems-research-fueling-future-disruptions/
[45]
Avinash Lakshman and Prashant Malik. 2010. Cassandra - A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review, Vol. 44, 2 (2010), 35--40. http://dl.acm.org/citation.cfm?id=1773912.1773922
[46]
Hyeontaek Lim, Dongsu Han, David G Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). 429--444. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/lim
[47]
LinkedIn. [n. d.]. Voldemort. http://www.project-voldemort.com ([n. d.]).
[48]
Zezhou Liu and Stratos Idreos. 2016. Main Memory Adaptive Denormalization. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 2253--2254. https://doi.org/10.1145/2882903.2914835
[49]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). 133--148. https://www.usenix.org/conference/fast16/technical-sessions/presentation/lu
[50]
Tim Mattson, Beverly Sanders, and Berna Massingill. 2004. Patterns for Parallel Programming .Addison-Wesley Professional.
[51]
Memcached. [n. d.]. Reference. http://memcached.org/ ( [n. d.]).
[52]
MongoDB. [n. d.]. Online reference. http://www.mongodb.com/ ( [n. d.]).
[53]
Michael A. Olson, Keith Bostic, and Margo I. Seltzer. 1999. Berkeley DB. In Proceedings of the USENIX Annual Technical Conference (ATC). 183--191. http://www.usenix.org/events/usenix99/olson.html
[54]
Patrick E. O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth J. O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica, Vol. 33, 4 (1996), 351--385. http://dl.acm.org/citation.cfm?id=230823.230826
[55]
Eleni Petraki, Stratos Idreos, and Stefan Manegold. 2015. Holistic Indexing in Main-memory Column-stores. In Proceedings of the ACM SIGMOD International Conference on Management of Data .
[56]
Holger Pirk, Eleni Petraki, Stratos Idreos, Stefan Manegold, and Martin L. Kersten. 2014. Database cracking: fancy scan, not poor man's sort!. In Proceedings of the International Workshop on Data Management on New Hardware (DAMON). 1--8. https://doi.org/10.1145/2619228.2619232
[57]
Redis. [n. d.]. Online reference. http://redis.io/ ([n. d.]).
[58]
Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. 2017. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. Proceedings of the VLDB Endowment, Vol. 10, 13 (2017), 2037--2048. http://www.vldb.org/pvldb/vol10/p2037-ren.pdf
[59]
Stephen M. Rumble, Ankita Kejriwal, and John K. Ousterhout. 2014. Log-structured memory for DRAM-based storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). 1--16. https://www.usenix.org/conference/fast14/technical-sessions/presentation/rumble
[60]
Felix Martin Schuhknecht, Alekh Jindal, and Jens Dittrich. 2013. The Uncracked Pieces in Database Cracking. Proceedings of the VLDB Endowment, Vol. 7, 2 (2013), 97--108. http://www.vldb.org/pvldb/vol7/p97-schuhknecht.pdf
[61]
Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 217--228. https://doi.org/10.1145/2213836.2213862
[62]
Justin Sheehy and David Smith. 2010. Bitcask: A Log-Structured Hash Table for Fast Key/Value Data. Basho White Paper (2010).
[63]
Daniel Dominic Sleator and Robert Endre Tarjan. 1985. Self-Adjusting Binary Search Trees. J. ACM, Vol. 32, 3 (1985), 652--686. https://doi.org/10.1145/3828.3835
[64]
Spotify. 2014. Sparkey. https://github.com/spotify/sparkey (2014).
[65]
Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 2--11. https://doi.org/10.1109/ICDE.2005.1
[66]
WiredTiger. [n. d.]. Source Code. https://github.com/wiredtiger/wiredtiger ([n. d.]).
[67]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 323--336. https://doi.org/10.1145/3183713.3196931
[68]
Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2014. Indexing for interactive exploration of big data series. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1555--1566. https://doi.org/10.1145/2588555.2610498

Cited By

View all
  • (2025)An update-intensive LSM-based R-tree indexThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00876-734:1Online publication date: 1-Jan-2025
  • (2024)Wayfinder: Speeding up Key-Value Separation by Avoiding I/O Based IndirectionProceedings of the 2nd Workshop on Simplicity in Management of Data10.1145/3663351.3663880(1-4)Online publication date: 14-Jun-2024
  • (2024)Anatomy of the LSM Memory BufferProceedings of the Tenth International Workshop on Testing Database Systems10.1145/3662165.3662766(23-29)Online publication date: 9-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. key-value stores
  2. self-designing systems

Qualifiers

  • Short-paper

Funding Sources

  • USA Department of Energy

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)125
  • Downloads (Last 6 weeks)16
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)An update-intensive LSM-based R-tree indexThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00876-734:1Online publication date: 1-Jan-2025
  • (2024)Wayfinder: Speeding up Key-Value Separation by Avoiding I/O Based IndirectionProceedings of the 2nd Workshop on Simplicity in Management of Data10.1145/3663351.3663880(1-4)Online publication date: 14-Jun-2024
  • (2024)Anatomy of the LSM Memory BufferProceedings of the Tenth International Workshop on Testing Database Systems10.1145/3662165.3662766(23-29)Online publication date: 9-Jun-2024
  • (2024)Benchmarking Learned and LSM Indexes for Data SortednessProceedings of the Tenth International Workshop on Testing Database Systems10.1145/3662165.3662764(16-22)Online publication date: 9-Jun-2024
  • (2024)Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage EnginesProceedings of the ACM on Management of Data10.1145/36393022:1(1-28)Online publication date: 26-Mar-2024
  • (2024)Tiered Storage in Modern Key-Value Stores: Performance, Storage-Efficiency, and Cost-Efficiency Considerations2024 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp60711.2024.00032(151-158)Online publication date: 18-Feb-2024
  • (2024)Storage Management with Multi-Version Partitioned BTreesInformation Systems10.1016/j.is.2024.102403125:COnline publication date: 1-Nov-2024
  • (2024)REXIO: Indexing for Low Write Amplification by Reducing Extra I/Os in Key-Value Store Under Mixed Read/Write WorkloadsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_18(245-258)Online publication date: 29-Nov-2024
  • (2023)The LSM Design Space and its Read Optimizations2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00273(3578-3584)Online publication date: Apr-2023
  • (2023)Indexing for Near-Sorted Data2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00117(1475-1488)Online publication date: Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media