[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2168836.2168855acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Cache craftiness for fast multicore key-value storage

Published: 10 April 2012 Publication History

Abstract

We present Masstree, a fast key-value database designed for SMP machines. Masstree keeps all data in memory. Its main data structure is a trie-like concatenation of B+-trees, each of which handles a fixed-length slice of a variable-length key. This structure effectively handles arbitrary-length possiblybinary keys, including keys with long shared prefixes. +-tree fanout was chosen to minimize total DRAM delay when descending the tree and prefetching each tree node. Lookups use optimistic concurrency control, a read-copy-update-like technique, and do not write shared data structures; updates lock only affected nodes. Logging and checkpointing provide consistency and durability. Though some of these ideas appear elsewhere, Masstree is the first to combine them. We discuss design variants and their consequences.
On a 16-core machine, with logging enabled and queries arriving over a network, Masstree executes more than six million simple queries per second. This performance is comparable to that of memcached, a non-persistent hash table server, and higher (often much higher) than that of VoltDB, MongoDB, and Redis.

References

[1]
Sharding for startups. http://www.startuplessonslearned.com/2009/01/sharding-for-startups.html.
[2]
MongoDB. http://mongodb.com.
[3]
Redis. http://redis.io.
[4]
Cassandra @ Twitter: An interview with Ryan King. http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king.
[5]
VoltDB, the NewSQL database for high velocity applications. http://voltdb.com.
[6]
R. Bayer and E. McCreight. Organization and maintenance of large ordered indices. In Proc. 1970 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control, SIGFIDET '70, pages 107--141.
[7]
R. Bayer and K. Unterauer. Prefix B-Trees. ACM Transactions on Database Systems, 2(1):11--26, Mar. 1977.
[8]
P. Bohannon, P. McIlroy, and R. Rastogi. Main-memory index structures with fixed-size partial keys. SIGMOD Record, 30: 163--174, May 2001.
[9]
N. G. Bronson, J. Casper, H. Chafi, and K. Olukotun. A practical concurrent binary search tree. In Proc. 15th ACM PPoPP Symposium, Bangalore, India, 2010.
[10]
S. K. Cha and C. Song. P*TIME: Highly scalable OLTP DBMS for managing update-intensive stream workload. In Proc. 30th VLDB Conference, pages 1033--1044, 2004.
[11]
S. K. Cha, S. Hwang, K. Kim, and K. Kwon. Cacheconscious concurrency control of main-memory indexes on shared-memory multiprocessor systems. In Proc. 27th VLDB Conference, 2001.
[12]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems, 26:4:1--4:26, June 2008.
[13]
S. Chen, P. B. Gibbons, and T. C. Mowry. Improving index performance through prefetching. In Proc. 2001 SIGMOD Conference, pages 235--246.
[14]
J. Cieslewicz and K. A. Ross. Data partitioning on chip multiprocessors. In Proc. 4th International Workshop on Data Management on New Hardware, DaMoN '08, pages 25--34, New York, NY, USA, 2008.
[15]
J. Cieslewicz, K. A. Ross, K. Satsumi, and Y. Ye. Automatic contention detection and amelioration for data-intensive operations. In Proc. 2010 SIGMOD Conference, pages 483--494.
[16]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proc. 1st ACM Symposium on Cloud Computing, SoCC '10, pages 143--154, New York, NY, USA, 2010.
[17]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available keyvalue store. In Proc. 21st ACM SOSP, pages 205--220, 2007.
[18]
B. Fitzpatrick. LiveJournal's backend--a history of scaling. http://www.danga.com/words/2005_oscon/oscon-2005.pdf.
[19]
K. Fraser. Practical lock-freedom. Technical Report UCAMCL-TR-579, University of Cambridge Computer Laboratory, 2004.
[20]
E. Fredkin. Trie memory. Communications of the ACM, 3: 490--499, September 1960.
[21]
N. Hardavellas, I. Pandis, R. Johnson, N. G. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: Limitations and opportunities. In 3rd Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, Califormnia, USA, January 2007.
[22]
K. A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. In Proc. 17th VLDB Conference, pages 525--535, 1991.
[23]
J. Hugg. Key-value benchmarking. http://voltdb.com/company/blog/key-value-benchmarking.
[24]
R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi. Shore-MT: A scalable storage manager for the multicore era. In Proc. 12th International Conference on Extending Database Technology: Advances in Database Technology, pages 24--35, New York, NY, USA, 2009.
[25]
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: A highperformance, distributed main memory transaction processing system. Proc. VLDB Endowment, 1:1496--1499, August 2008.
[26]
A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating System Review, 44:35--40, April 2010.
[27]
P. L. Lehman and S. B. Yao. Efficient locking for concurrent operations on B-trees. ACM Transactions on Database Systems, 6(4):650--670, 1981.
[28]
P. E. McKenney, D. Sarma, A. Arcangeli, A. Kleen, O. Krieger, and R. Russell. Read-copy update. In Proc. 2002 Ottawa Linux Symposium, pages 338--367, 2002.
[29]
C. Nyberg, T. Barclay, Z. Cvetanovic, J. Gray, and D. Lomet. AlphaSort: A cache-sensitive parallel external sort. The VLDB Journal, 4(4):603--627, 1995.
[30]
J. Rao and K. A. Ross. Making B+-trees cache conscious in main memory. SIGMOD Record, 29:475--486, May 2000.
[31]
K. A. Ross. Optimizing read convoys in main-memory query processing. In Proc. 6th International Workshop on Data Management on New Hardware, DaMoN '10, pages 27--33, New York, NY, USA, 2010. ACM.
[32]
S. Schneider, C. D. Antonopoulos, and D. S. Nikolopoulos. Scalable locality-conscious multithreaded memory allocation. In Proc. 5th International Symposium on Memory Management, ISMM '06, pages 84--94. ACM, 2006.
[33]
S. Sen and R. E. Tarjan. Deletion without rebalancing in balanced binary trees. In Proc. 21st SODA, pages 1490--1499, 2010.
[34]
J. Sewall, J. Chhugani, C. Kim, N. Satish, and P. Dubey. PALM: Parallel architecture-friendly latch-free modifications to B+ trees on many-core processors. Proc. VLDB Endowment, 4(11):795--806, August 2011.
[35]
M. Stonebraker, S. Madden, J. D. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). In Proc. 33rd VLDB Conference, pages 1150--1160, 2007.

Cited By

View all
  • (2024)DEX: Scalable Range Indexing on Disaggregated MemoryProceedings of the VLDB Endowment10.14778/3675034.367505017:10(2603-2616)Online publication date: 1-Jun-2024
  • (2024)FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast StorageProceedings of the VLDB Endowment10.14778/3648160.364817717:6(1377-1390)Online publication date: 1-Feb-2024
  • (2024)ZipCache: A DRAM/SSD Cache with Built-in Transparent CompressionProceedings of the International Symposium on Memory Systems10.1145/3695794.3695805(116-128)Online publication date: 30-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '12: Proceedings of the 7th ACM european conference on Computer Systems
April 2012
394 pages
ISBN:9781450312233
DOI:10.1145/2168836
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 April 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. in-memory
  2. key-value
  3. multicore
  4. persistent

Qualifiers

  • Research-article

Conference

EuroSys '12
Sponsor:
EuroSys '12: Seventh EuroSys Conference 2012
April 10 - 13, 2012
Bern, Switzerland

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)154
  • Downloads (Last 6 weeks)20
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DEX: Scalable Range Indexing on Disaggregated MemoryProceedings of the VLDB Endowment10.14778/3675034.367505017:10(2603-2616)Online publication date: 1-Jun-2024
  • (2024)FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast StorageProceedings of the VLDB Endowment10.14778/3648160.364817717:6(1377-1390)Online publication date: 1-Feb-2024
  • (2024)ZipCache: A DRAM/SSD Cache with Built-in Transparent CompressionProceedings of the International Symposium on Memory Systems10.1145/3695794.3695805(116-128)Online publication date: 30-Sep-2024
  • (2024)SLIPP: A Space-Efficient Learned Index for String KeysProceedings of the 2024 6th International Conference on Big-data Service and Intelligent Computation10.1145/3686540.3686550(69-77)Online publication date: 29-May-2024
  • (2024)BP-tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-trees (Abstract)Proceedings of the 2024 ACM Workshop on Highlights of Parallel Computing10.1145/3670684.3673412(29-30)Online publication date: 17-Jun-2024
  • (2024)A Memory-Disaggregated Radix TreeACM Transactions on Storage10.1145/366428920:3(1-41)Online publication date: 6-Jun-2024
  • (2024)How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache PrefetchingProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663451(1-10)Online publication date: 10-Jun-2024
  • (2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
  • (2024)Compositional Verification of Concurrent C Programs with Search Structure TemplatesProceedings of the 13th ACM SIGPLAN International Conference on Certified Programs and Proofs10.1145/3636501.3636940(60-74)Online publication date: 9-Jan-2024
  • (2024)An LSM Tree Augmented with B+ Tree on Nonvolatile MemoryACM Transactions on Storage10.1145/363347520:1(1-24)Online publication date: 30-Jan-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media