[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3514221.3520255acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
abstract

Tuning Hierarchical Learned Indexes on Disk and Beyond

Published: 11 June 2022 Publication History

Abstract

Entry retrieval-a process to retrieve rows whose field(s) associates with the given key(s)-is one of the core operations in databases. Classical indexes such as B-tree [2, 4] and skip list recursively partition the key space into a hierarchical structure. Such a structure retrieves an entry by traversing the path of partitions that enclose the given key, effectively reducing uncertainty of key's position as the traversal proceeds. Recently, index designers have developed interests in learned indexes, a concept introduced by [13]. Using patterns in key-position pairs, a learned model can provide a significantly higher information gain about the key's position than a pessimistic partitioning index can [8]. Many works have successfully outperformed classical indexes by multiple factors in latency and memory usage. Overall, they incorporate various combinations of models, partitioning, error corrections (a.k.a. last mile search), mutability, and tunable parameters [6, 9, 10, 12, 16] among many other works.

References

[1]
Hussam Abu-Libdeh, Deniz Altinbüken, Alex Beutel, Ed H. Chi, Lyric Doshi, Tim Kraska, Xiaozhou, Li, Andy Ly, and Christopher Olston. 2020. Learned Indexes for a Google-scale Disk-based Database. arxiv: 2012.12501 [cs.DB]
[2]
R. Bayer and E. M. McCreight. 1972. Organization and maintenance of large ordered indexes. Acta Informatica, Vol. 1 (1972), 173--189. Issue 3. https://doi.org/10.1007/BF00288683
[3]
Jon Louis Bentley and Andrew Chi-Chih Yao. 1976. An almost optimal algorithm for unbounded searching. Inform. Process. Lett., Vol. 5, 3 (1976), 82--87. https://doi.org/10.1016/0020-0190(76)90071--5
[4]
Douglas Comer. 1979. UBIQUITOUS B-TREE. Comput Surv, Vol. 11 (1979). Issue 2.
[5]
Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2020. From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 155--171. https://www.usenix.org/conference/osdi20/presentation/dai
[6]
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 969--984. https://doi.org/10.1145/3318464.3389711
[7]
Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. 2021. RocksDB: Evolution of Development Priorities in a Key-Value Store Serving Large-Scale Applications. ACM Trans. Storage, Vol. 17, 4, Article 26 (Oct. 2021), 32 pages. https://doi.org/10.1145/3483840
[8]
Paolo Ferragina, Fabrizio Lillo, and Giorgio Vinciguerra. 2020. Why Are Learned Indexes So Effective?. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 3123--3132. https://proceedings.mlr.press/v119/ferragina20a.html
[9]
Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-Index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds. Proc. VLDB Endow., Vol. 13, 8 (April 2020), 1162--1175. https://doi.org/10.14778/3389133.3389135
[10]
Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-Aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1189--1206. https://doi.org/10.1145/3299869.3319860
[11]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2019. SOSD: A Benchmark for Learned Indexes. NeurIPS Workshop on Machine Learning for Systems (2019).
[12]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Portland, Oregon) (aiDM '20). Association for Computing Machinery, New York, NY, USA, Article 5, 5 pages. https://doi.org/10.1145/3401071.3401659
[13]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. Proceedings of the ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/3183713.3196909
[14]
Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020. Benchmarking Learned Indexes. Proc. VLDB Endow., Vol. 14, 1 (2020), 1--13.
[15]
PostgreSQL. [n.,d.]. PostgreSQL: The World's Most Advanced Open Source Relational Database. https://www.postgresql.org. [Online; accessed November-12--2021].
[16]
Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo Chen. 2020. XIndex: A Scalable Learned Index for Multicore Data Storage. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Diego, California) (PPoPP '20). Association for Computing Machinery, New York, NY, USA, 308--320. https://doi.org/10.1145/3332466.3374547

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Check for updates

Author Tags

  1. automatic tuning
  2. bandwidth
  3. data access methods
  4. external memory
  5. index
  6. key-value database
  7. latency
  8. learned index
  9. optimization
  10. point lookup
  11. storage
  12. storage-aware

Qualifiers

  • Abstract

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)7
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media