[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2807591.2807614acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

HydraDB: a resilient RDMA-driven key-value middleware for in-memory cluster computing

Published: 15 November 2015 Publication History

Abstract

In this paper, we describe our experiences and lessons learned from building a general-purpose in-memory key-value middleware, called HydraDB. HydraDB synthesizes a collection of state-of-the-art techniques, including continuous fault-tolerance, Remote Direct Memory Access (RDMA), as well as awareness for multicore systems, etc, to deliver a high-throughput, low-latency access service in a reliable manner for cluster computing applications.
The uniqueness of HydraDB mainly lies in its design commitment to fully exploit the RDMA protocol to comprehensively optimize various aspects of a general-purpose key-value store, including latency-critical operations, read enhancement, and data replications for high-availability service, etc. At the same time, HydraDB strives to efficiently utilize multicore systems to prevent data manipulation on the servers from curbing the potential of RDMA.
Many teams in our organization have adopted HydraDB to improve the execution of their cluster computing frameworks, including Hadoop, Spark, Sensemaking analytics, and Call Record Processing. In addition, our performance evaluation with a variety of YCSB workloads also shows that HydraDB can substantially outperform several existing in-memory key-value stores by an order of magnitude. Our detailed performance evaluation further corroborates our design choices.

References

[1]
Apache Hadoop Project. http://hadoop.apache.org/.
[2]
Apache Spark. http://spark.apache.org/.
[3]
IBM InfoSphere Sensemaking. http://www-01.ibm.com/software/data/entity-analytics-solutions/.
[4]
Memcached. http://memcached.org/.
[5]
Protobuf. https://code.google.com/p/protobuf/.
[6]
Redis. http://redis.io/.
[7]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '12, pages 53--64, New York, NY, USA, 2012. ACM.
[8]
Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue Moon. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC '07, pages 1--14, New York, NY, USA, 2007. ACM.
[9]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 143--154, New York, NY, USA, 2010. ACM.
[10]
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. Farm: Fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 401--414, Berkeley, CA, USA, 2014. USENIX Association.
[11]
Bin Fan, David G. Andersen, and Michael Kaminsky. Memc3: Compact and concurrent memcache with dumber caching and smarter hashing. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, nsdi'13, pages 371--384, Berkeley, CA, USA, 2013. USENIX Association.
[12]
Maurice Herlihy, Nir Shavit, and Moran Tzafrir. Hopscotch hashing. In Proceedings of the 22Nd International Symposium on Distributed Computing, DISC '08, pages 350--364, Berlin, Heidelberg, 2008. Springer-Verlag.
[13]
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. Zookeeper: Wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC'10, pages 11--11, Berkeley, CA, USA, 2010. USENIX Association.
[14]
Lim Hyeontaek, Han Dongsu, David G. Andersen, and Michael Kaminsky. Mica: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, nsdi'14, Berkeley, CA, USA, 2014. USENIX Association.
[15]
J. Chu and V. Kashyap. Transmission of IP over InfiniBand(IPoIB). http://tools.ietf.org/html/rfc4391, 2006.
[16]
Jithin Jose, Hari Subramoni, Krishna Kandalla, Md. Wasi-ur Rahman, Hao Wang, Sundeep Narravula, and Dhabaleswar K. Panda. Scalable memcached design for infiniband clusters using hybrid transports. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (Ccgrid 2012), CCGRID '12, pages 236--243, Washington, DC, USA, 2012. IEEE Computer Society.
[17]
Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi-ur Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, Sayantan Sur, and Dhabaleswar K. Panda. Memcached design on high performance rdma capable interconnects. In Proceedings of the 2011 International Conference on Parallel Processing, ICPP '11, pages 743--752, Washington, DC, USA, 2011. IEEE Computer Society.
[18]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. Using rdma efficiently for key-value services. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, pages 295--306, New York, NY, USA, 2014. ACM.
[19]
David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC '97, pages 654--663, New York, NY, USA, 1997. ACM.
[20]
Jiuxing Liu, Jiesheng Wu, and Dhabaleswar K. Panda. High performance rdma-based mpi implementation over infiniband. Int. J. Parallel Program., 32(3):167--198, June 2004.
[21]
Yandong Mao, Eddie Kohler, and Robert Tappan Morris. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys '12, pages 183--196, New York, NY, USA, 2012. ACM.
[22]
Maged M. Michael. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '02, pages 73--82, New York, NY, USA, 2002. ACM.
[23]
Christopher Mitchell, Yifeng Geng, and Jinyang Li. Using one-sided rdma reads to build a fast, cpu-efficient key-value store. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference, USENIX ATC'13, pages 103--114, Berkeley, CA, USA, 2013. USENIX Association.
[24]
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. Scaling memcache at facebook. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, nsdi'13, pages 385--398, Berkeley, CA, USA, 2013. USENIX Association.
[25]
Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. Fast crash recovery in ramcloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 29--41, New York, NY, USA, 2011. ACM.
[26]
John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. The case for ramclouds: Scalable high-performance storage entirely in dram. SIGOPS Oper. Syst. Rev., 43(4):92--105, January 2010.
[27]
Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. J. Algorithms, 51(2):122--144, May 2004.
[28]
Mike Svoboda and Diego Zamboni. Leveraging in-memory key value stores for large-scale operations. https://www.usenix.org/conference/lisa13/leveraging-memory-key-value-stores-large-scale-operations, November 2013.
[29]
Robbert van Renesse and Fred B. Schneider. Chain replication for supporting high throughput and availability. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI'04, pages 7--7, Berkeley, CA, USA, 2004. USENIX Association.
[30]
Yandong Wang, Robin Goldstone, Weikuan Yu, and Teng Wang. Characterization and optimization of memory-resident mapreduce on HPC systems. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 19-23, 2014, pages 799--808, 2014.
[31]
Yandong Wang, Xiaoqiao Meng, Li Zhang, and Jian Tan. C-hint: An effective and reliable cache management for rdma-accelerated key-value stores. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pages 23:1--23:13, New York, NY, USA, 2014. ACM.
[32]
Yandong Wang, Xinyu Que, Weikuan Yu, Dror Goldenberg, and Dhiraj Sehgal. Hadoop acceleration through network levitated merge. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 57:1--57:10, New York, NY, USA, 2011. ACM.
[33]
Yandong Wang, Cong Xu, Xiaobing Li, and Weikuan Yu. Jvm-bypass for efficient hadoop shuffling. In Parallel Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pages 569--578, May 2013.

Cited By

View all
  • (2024)AStore: Uniformed Adaptive Learned Index and Cache for RDMA-enabled Key-Value StoreIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.3355100(1-18)Online publication date: 2024
  • (2023)Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMAProceedings of the ACM on Management of Data10.1145/35892761:2(1-26)Online publication date: 20-Jun-2023
  • (2023)The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of CoresProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607068(1-18)Online publication date: 12-Nov-2023
  • Show More Cited By

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2015
985 pages
ISBN:9781450337236
DOI:10.1145/2807591
  • General Chair:
  • Jackie Kern,
  • Program Chair:
  • Jeffrey S. Vetter
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC15
Sponsor:

Acceptance Rates

SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)3
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AStore: Uniformed Adaptive Learned Index and Cache for RDMA-enabled Key-Value StoreIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.3355100(1-18)Online publication date: 2024
  • (2023)Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMAProceedings of the ACM on Management of Data10.1145/35892761:2(1-26)Online publication date: 20-Jun-2023
  • (2023)The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of CoresProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607068(1-18)Online publication date: 12-Nov-2023
  • (2023)Exploiting Hybrid Index Scheme for RDMA-based Key-Value StoresProceedings of the 16th ACM International Conference on Systems and Storage10.1145/3579370.3594768(49-59)Online publication date: 5-Jun-2023
  • (2023)DevIOus: Device-Driven Side-Channel Attacks on the IOMMU2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179283(2288-2305)Online publication date: May-2023
  • (2023)BT-Duper: A Binomial-Tree Based Data Replication Offloading Method with Native RDMA Primitives2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00153(915-922)Online publication date: 21-Dec-2023
  • (2022)An RDMA-enabled In-memory Computing Platform for R-tree on ClustersACM Transactions on Spatial Algorithms and Systems10.1145/35035138:2(1-26)Online publication date: 12-Feb-2022
  • (2022)A Survey of Storage Systems in the RDMA EraIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318865633:12(4395-4409)Online publication date: 1-Dec-2022
  • (2022)Analyzing In-Memory NoSQL LandscapeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.300290834:4(1628-1643)Online publication date: 1-Apr-2022
  • (2022)FNotify: A Low-Latency and Scalable Publish/Subscribe System using RDMA2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00074(327-336)Online publication date: Dec-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media