[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3332466.3374547acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

XIndex: a scalable learned index for multicore data storage

Published: 19 February 2020 Publication History

Abstract

We present XIndex, a concurrent ordered index designed for fast queries. Similar to a recent proposal of the learned index, XIndex uses learned models to optimize index efficiency. Comparing with the learned index, XIndex is able to effectively handle concurrent writes without affecting the query performance by leveraging fine-grained synchronization and a new compaction scheme, Two-Phase Compaction. Furthermore, XIndex adapts its structure according to run-time workload characteristics to support dynamic workload. We demonstrate the advantages of XIndex with both YCSB and TPC-C (KV), a TPC-C variant for key-value stores. XIndex achieves up to 3.2X and 4.4X performance improvement comparing with Masstree and Wormhole, respectively, on a 24-core machine, and it is open-sourced1.

References

[1]
Timo Bingmann. 2008. STX B+ tree C++ template classes.
[2]
Robert Birma, Eva Zangerle, Martin Pichl, Günther Specht, and Viktor Leis. 2018. HOT: a height optimized Trie index for main-memory database systems. In Proceedings of the 2018 International Conference on Management of Data. ACM, 521--534.
[3]
Nathan G Bronson, Jared Casper, Hassan Chafi, and Kunle Olukotun. 2010. A practical concurrent binary search tree. In ACM Sigplan Notices, Vol. 45. ACM, 257--268.
[4]
Sang Kyun Cha, Sangyong Hwang, Kihong Kim, and Keunjoo Kwon. 2001. Cache-conscious concurrency control of main-memory indexes on shared-memory multiprocessor systems. In VLDB, Vol. 1. 181--190.
[5]
Austin T Clements, M Frans Kaashoek, and Nickolai Zeldovich. 2012. Scalable address spaces using RCU balanced trees. ACM SIGPLAN Notices 47, 4 (2012), 199--210.
[6]
OpenStreetMap contributors, [n. d.]. OpenStreetMap database, https://aws.amazon.com/public-datasets/osm. Accessed: 2019-4-24.
[7]
Justin DeBrabant, Andrew Pavlo, Stephen Tu, Michael Stonebraker, and Stan Zdonik. 2013. Anti-caching: A new approach to database management system architecture. Proceedings of the VLDB Endowment 6, 14 (2013), 1942--1953.
[8]
Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 1243--1254.
[9]
Jialin Ding, Umar Farooq Minhas, Hantian Zhang, Yinan Li, Chi Wang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, and David Lomet. 2019. ALEX: An Updatable Adaptive Learned Index. arXiv preprint arXiv:1905.08898 (2019).
[10]
Ahmed Eldawy, Justin Levandoski, and Per-Åke Larson. 2014. Trekking through Siberia: Managing cold data in a memory-optimized database. Proceedings of the VLDB Endowment 7, 11 (2014), 931--942.
[11]
Paolo Ferragina and Giorgio Vinciguerra. 2019. The PGM-index: a multicriteria, compressed and learned approach to data indexing. arXiv:cs.DS/1910.06169 https://arxiv.org/abs/1910.06169
[12]
Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data. ACM, 1189--1206.
[13]
Maurice P Herlihy and Jeannette M Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS) 12, 3 (1990), 463--492.
[14]
Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H Chi, Jialin Ding, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. Sagedb: A learned database system.
[15]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data. ACM, 489--504.
[16]
Justin J Levandoski, Per-Åke Larson, and Radu Stoica. 2013. Identifying hot and cold data in main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 26--37.
[17]
Justin J Levandoski, David B Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for new hardware platforms. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 302--313.
[18]
Pengfei Li, Yu Hua, Pengfei Zuo, and Jingnan Jia. 2019. A Scalable Learned Index Scheme in Storage Systems. arXiv preprint arXiv:1905.06256 (2019).
[19]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. Wisckey: Separating keys from values in ssd-conscious storage. ACM Transactions on Storage (TOS) 13, 1 (2017), 5.
[20]
Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM european conference on Computer Systems. ACM, 183--196.
[21]
Paul E McKenney, Jonathan Appavoo, Andi Kleen, Orran Krieger, Rusty Russell, Dipankar Sarma, and Maneesh Soni. 2001. Read-copy update. In AUUG Conference Proceedings. AUUG, Inc., 175.
[22]
Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, and Dave Lomet. 1994. AlphaSort: A RISC machine sort. In ACM SIGMOD Record, Vol. 23. ACM, 233--242.
[23]
Karl Schnaitter, Serge Abiteboul, Tova Milo, and Neoklis Polyzotis. 2007. On-line index selection for shifting workloads. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE Computer Society, 459--468.
[24]
Xingbo Wu, Fan Ni, and Song Jiang. 2019. Wormhole: A Fast Ordered Index for In-memory Data Management. In Proceedings of the Fourteenth EuroSys Conference 2019. ACM, 18.
[25]
Huanchen Zhang, David G Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the storage overhead of main-memory OLTP databases with hybrid indexes. In Proceedings of the 2016 International Conference on Management of Data. ACM, 1567--1581.

Cited By

View all
  • (2024)LITS: An Optimized Learned Index for StringsProceedings of the VLDB Endowment10.14778/3681954.368201017:11(3415-3427)Online publication date: 1-Jul-2024
  • (2024)Accelerating String-Key Learned Index Structures via Memoization-Based Incremental TrainingProceedings of the VLDB Endowment10.14778/3659437.365943917:8(1802-1815)Online publication date: 1-Apr-2024
  • (2024)Advocating for Key-Value Stores with Workload Pattern Aware Dynamic CompactionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665955(124-130)Online publication date: 8-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2020
454 pages
ISBN:9781450368186
DOI:10.1145/3332466
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 19 February 2020

Permissions

Request permissions for this article.

Check for updates

Badges

Qualifiers

  • Research-article

Funding Sources

  • China National Natural Science Foundation

Conference

PPoPP '20

Acceptance Rates

PPoPP '20 Paper Acceptance Rate 28 of 121 submissions, 23%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)213
  • Downloads (Last 6 weeks)14
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LITS: An Optimized Learned Index for StringsProceedings of the VLDB Endowment10.14778/3681954.368201017:11(3415-3427)Online publication date: 1-Jul-2024
  • (2024)Accelerating String-Key Learned Index Structures via Memoization-Based Incremental TrainingProceedings of the VLDB Endowment10.14778/3659437.365943917:8(1802-1815)Online publication date: 1-Apr-2024
  • (2024)Advocating for Key-Value Stores with Workload Pattern Aware Dynamic CompactionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665955(124-130)Online publication date: 8-Jul-2024
  • (2024)Making In-Memory Learned Indexes Efficient on DiskProceedings of the ACM on Management of Data10.1145/36549542:3(1-26)Online publication date: 30-May-2024
  • (2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
  • (2024)Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offsProceedings of the ACM on Management of Data10.1145/36549192:3(1-25)Online publication date: 30-May-2024
  • (2024)SWIX: A Memory-efficient Sliding Window Learned IndexProceedings of the ACM on Management of Data10.1145/36392962:1(1-26)Online publication date: 26-Mar-2024
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2024)AStore: Uniformed Adaptive Learned Index and Cache for RDMA-enabled Key-Value StoreIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.3355100(1-18)Online publication date: 2024
  • (2024)When Learned Indexes Meet Persistent Memory: The Analysis and the OptimizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334282536:12(9517-9531)Online publication date: Dec-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media