[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3492321.3519572acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Tebis: index shipping for efficient replication in LSM key-value stores

Published: 28 March 2022 Publication History

Abstract

Key-value (KV) stores based on LSM tree have become a foundational layer in the storage stack of datacenters and cloud services. Current approaches for achieving reliability and availability favor reducing network traffic and send to replicas only new KV pairs. As a result, they perform costly compactions to reorganize data in both the primary and backup nodes, which increases device I/O traffic and CPU overhead, and eventually hurts overall system performance. In this paper we describe Tebis, an efficient LSM-based KV store that reduces I/O amplification and CPU overhead for maintaining the replica index. We use a primary-backup replication scheme that performs compactions only on the primary nodes and sends pre-built indexes to backup nodes, avoiding all compactions in backup nodes. Our approach includes an efficient mechanism to deal with pointer translation across nodes in the pre-built region index. Our results show that Tebis reduces pressure on backup nodes compared to performing full compactions: Throughput is increased by 1.1 -- 1.48×, CPU efficiency is increased by 1.06 -- 1.54×, and I/O amplification is reduced by 1.13 -- 1.81×, without increasing server to server network traffic excessively (by up to 1.09 -- 1.82×).

References

[1]
Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Virendra J. Marathe, Athanasios Xygkis, and Igor Zablotchi. 2020. Microsecond Consensus for Microsecond Applications. USENIX Association, USA.
[2]
Apache. 2018. HBase. https://hbase.apache.org/.
[3]
INFINIBAND TRADE ASSOCIATION. 2015. IB Specification Vol 1, 03,2015. Release-1.3. (2015).
[4]
Aurelius. 2012. TitanDB. Retrieved September 30, 2021 from http://titan.thinkaurelius.com/
[5]
Nikos Batsaras, Giorgos Saloustros, Anastasios Papagiannis, Panagiota Fatourou, and Angelos Bilas. 2020. VAT: Asymptotic Cost Analysis for Multi-Level Key-Value Stores. arXiv:2003.00103 [cs.DC]
[6]
Laurent Bindschaedler, Ashvin Goel, and Willy Zwaenepoel. 2020. Hailstorm: Disaggregated Compute and Storage for Distributed LSM-Based Databases. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 301--316.
[7]
Dhruba Borthakur et al. 2008. HDFS architecture guide. Hadoop apache project 53, 1--13 (2008), 2.
[8]
Navin Budhiraja, Keith Marzullo, Fred B. Schneider, and Sam Toueg. 1993. Distributed Systems (2Nd Ed.). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, Chapter The Primary-backup Approach, 199--216. http://dl.acm.org/citation.cfm?id=302430.302438
[9]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H.C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST '20). USENIX Association, Santa Clara, CA, 209--223.
[10]
Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee, and Yinlong Xu. 2018. HashKV: Enabling Efficient Updates in KV Storage via Hashing. In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (Boston, MA, USA) (USENIX ATC '18). USENIX Association, Berkeley, CA, USA, 1007--1019. http://dl.acm.org/citation.cfm?id=3277355.3277451
[11]
Kristina Chodorow. 2013. MongoDB: The Definitive Guide (second ed.). O'Reilly Media. http://amazon.com/o/ASIN/1449344682/
[12]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (Indianapolis, Indiana, USA) (SoCC '10). ACM, New York, NY, USA, 143--154.
[13]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon's highly available key-value store. ACM SIGOPS operating systems review 41, 6 (2007), 205--220.
[14]
Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf
[15]
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: Fast Remote Memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation. 401--414.
[16]
Facebook. 2018. BlobDB. http://rocksdb.org/. Accessed: March 15, 2022.
[17]
Facebook. 2018. RocksDB. http://rocksdb.org/.
[18]
FORTH. 2021. Kreon. https://github.com/CARV-ICS-FORTH/kreon.
[19]
Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, Fei Feng, Yan Zhuang, Fan Liu, Pan Liu, Xingkui Liu, Zhongjie Wu, Junping Wu, Zheng Cao, Chen Tian, Jinbo Wu, Jiaji Zhu, Haiyong Wang, Dennis Cai, and Jiesheng Wu. 2021. When Cloud Storage Meets RDMA. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, 519--533. https://www.usenix.org/conference/nsdi21/presentation/gao
[20]
Panagiotis Garefalakis, Panagiotis Papadopoulos, and Kostas Magoutis. 2014. ACaZoo: A Distributed Key-Value Store Based on Replicated LSM-Trees. In 2014 IEEE 33rd International Symposium on Reliable Distributed Systems. 211--220.
[21]
Haoyu Huang and Shahram Ghandeharizadeh. 2021. Nova-LSM: A Distributed, Component-Based LSM-Tree Key-Value Store. Association for Computing Machinery, New York, NY, USA, 749--763.
[22]
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference (Boston, MA) (USENIXATC'10). USENIX Association, Berkeley, CA, USA, 11--11. http://dl.acm.org/citation.cfm?id=1855840.1855851
[23]
H. V. Jagadish, P. P. S. Narayan, S. Seshadri, S. Sudarshan, and Rama Kanneganti. 1997. Incremental Organization for Data Recording and Warehousing. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 16--25.
[24]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA Efficiently for Key-value Services. In Proceedings of the 2014 ACM Conference on SIGCOMM (Chicago, Illinois, USA) (SIGCOMM '14). ACM, New York, NY, USA, 295--306.
[25]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference. 437--450.
[26]
Chunbo Lai, Song Jiang, Liqiong Yang, Shiding Lin, Guangyu Sun, Zhenyu Hou, Can Cui, and Jason Cong. 2015. Atlas: Baidu's key-value storage system for cloud data. In MSST. IEEE Computer Society, 1--14. http://dblp.uni-trier.de/db/conf/mss/msst2015.html#LaiJYLSHCC15
[27]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev. 44, 2 (April 2010), 35--40.
[28]
Yongkun Li, Zhen Liu, Patrick P. C. Lee, Jiayu Wu, Yinlong Xu, Yi Wu, Liu Tang, Qi Liu, and Qiu Cui. 2021. Differentiated Key-Value Storage Management for Balanced I/O Performance. In 2021 USENIX Annual Technical Conference (USENIX ATC '21). USENIX Association, 673--687.
[29]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In 14th USENIX Conference on File and Storage Technologies (FAST 16). USENIX Association, Santa Clara, CA, 133--148. https://www.usenix.org/conference/fast16/technical-sessions/presentation/lu
[30]
Yoshinori Matsunobu, Siying Dong, and Herman Lee. 2020. MyRocks: LSM-Tree Database Storage Engine Serving Facebook's Social Graph. Proc. VLDB Endow. 13, 12 (Aug. 2020), 3217--3230.
[31]
Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using Onesided RDMA Reads to Build a Fast, CPU-efficient Key-value Store. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (San Jose, CA) (USENIX ATC'13). USENIX Association, Berkeley, CA, USA, 103--114. http://dl.acm.org/citation.cfm?id=2535461.2535475
[32]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The Log-structured Merge-tree (LSM-tree). Acta Inf. 33, 4 (June 1996), 351--385.
[33]
John Ousterhout, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, and Stephen Yang. 2015. The RAMCloud Storage System. ACM Trans. Comput. Syst. 33, 3, Article 7 (aug 2015), 55 pages.
[34]
Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. 2018. An Efficient Memory-Mapped Key-Value Store for Flash Storage. In Proceedings of the ACM Symposium on Cloud Computing (Carlsbad, CA, USA) (SoCC '18). ACM, New York, NY, USA, 490--502.
[35]
Marius Poke and Torsten Hoefler. 2015. DARE: High-Performance State Machine Replication on RDMA Networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (Portland, Oregon, USA) (HPDC '15). Association for Computing Machinery, New York, NY, USA, 107--118.
[36]
Jinglei Ren. 2016. YCSB-C. https://github.com/basicthinker/YCSB-C.
[37]
Russell Sears, Mark Callaghan, and Eric Brewer. 2008. Rose: Compressed, Log-Structured Replication. Proc. VLDB Endow. 1, 1 (Aug. 2008), 526--537.
[38]
Chen Shen, Youyou Lu, Fei Li, Weidong Liu, and Jiwu Shu. 2020. NovKV: Efficient Garbage Collection for Key-Value Separated LSM-Stores. (Oct. 2020), 8 pages.
[39]
Arjun Singhvi, Aditya Akella, Maggie Anderson, Rob Cauble, Harshad Deshmukh, Dan Gibson, Milo M. K. Martin, Amanda Strominger, Thomas F. Wenisch, and Amin Vahdat. 2021. CliqueMap: Productionizing an RMA-Based Distributed Caching System. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference (Virtual Event, USA) (SIGCOMM '21). Association for Computing Machinery, New York, NY, USA, 93--105.
[40]
Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, and Toni Cortes. 2018. Tailwind: Fast and Atomic RDMA-based Replication. In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (Boston, MA, USA) (USENIX ATC '18). USENIX Association, Berkeley, CA, USA, 851--863. http://dl.acm.org/citation.cfm?id=3277355.3277438
[41]
Shin-Yeh Tsai and Yiying Zhang. 2017. LITE Kernel RDMA Support for Datacenter Applications. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). Association for Computing Machinery, New York, NY, USA, 306--324.
[42]
Michalis Vardoulakis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. 2022. Tebis software.
[43]
Cheng Wang, Jianyu Jiang, Xusheng Chen, Ning Yi, and Heming Cui. 2017. APUS: Fast and Scalable Paxos on RDMA. In Proceedings of the 2017 Symposium on Cloud Computing (Santa Clara, California) (SoCC '17). Association for Computing Machinery, New York, NY, USA, 94--107.
[44]
Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast In-memory Transaction Processing Using RDMA and HTM. In Proceedings of the 25th Symposium on Operating Systems Principles. 87--104.
[45]
Giorgos Xanthakis, Giorgos Saloustros, Nikos Batsaras, Papagiannis Anastasios, and Angelos Bilas. 2021. Parallax: Hybrib Key-Value Placement in LSM-based Key-Value Stores. In Proceedings of the ACM Symposium on Cloud Computing (Hybrid Event) (SoCC '21). ACM, New York, NY, USA.
[46]
Qiang Zhang, Yongkun Li, Patrick P. C. Lee, Yinlong Xu, and Si Wu. 2022. DEPART: Replica Decoupling for Distributed Key-Value Storage. In 20th USENIX Conference on File and Storage Technologies (FAST'22). USENIX Association, Santa Clara, CA. https://www.usenix.org/conference/fast22/presentation/zhang-qiang

Cited By

View all
  • (2024)Index Shipping for Efficient Replication in LSM Key-Value Stores with Hybrid KV PlacementACM Transactions on Storage10.1145/365867220:3(1-23)Online publication date: 11-Jun-2024
  • (2024)TEngine: A Native Distributed Table Storage Engine2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00290(3782-3795)Online publication date: 13-May-2024
  • (2023)Exploiting Hybrid Index Scheme for RDMA-based Key-Value StoresProceedings of the 16th ACM International Conference on Systems and Storage10.1145/3579370.3594768(49-59)Online publication date: 5-Jun-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '22: Proceedings of the Seventeenth European Conference on Computer Systems
March 2022
783 pages
ISBN:9781450391627
DOI:10.1145/3492321
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2022

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. B+ tree
  2. LSM tree
  3. RDMA
  4. flash
  5. key value stores

Qualifiers

  • Research-article

Funding Sources

Conference

EuroSys '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)113
  • Downloads (Last 6 weeks)9
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Index Shipping for Efficient Replication in LSM Key-Value Stores with Hybrid KV PlacementACM Transactions on Storage10.1145/365867220:3(1-23)Online publication date: 11-Jun-2024
  • (2024)TEngine: A Native Distributed Table Storage Engine2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00290(3782-3795)Online publication date: 13-May-2024
  • (2023)Exploiting Hybrid Index Scheme for RDMA-based Key-Value StoresProceedings of the 16th ACM International Conference on Systems and Storage10.1145/3579370.3594768(49-59)Online publication date: 5-Jun-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media