[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3452296.3472934acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Open access

CliqueMap: productionizing an RMA-based distributed caching system

Published: 09 August 2021 Publication History

Abstract

Distributed in-memory caching is a key component of modern Internet services. Such caches are often accessed via remote procedure call (RPC), as RPC frameworks provide rich support for productionization, including protocol versioning, memory efficiency, auto-scaling, and hitless upgrades. However, full-featured RPC limits performance and scalability as it incurs high latencies and CPU overheads. Remote Memory Access (RMA) offers a promising alternative, but meeting productionization requirements can be a significant challenge with RMA-based systems due to limited programmability and narrow RMA primitives.
This paper describes the design, implementation, and experience derived from CliqueMap, a hybrid RMA/RPC caching system. CliqueMap has been in production use in Google's datacenters for over three years, currently serves more than 1PB of DRAM, and underlies several end-user visible services. CliqueMap makes use of performant and efficient RMAs on the critical serving path and judiciously applies RPCs toward other functionality. The design embraces lightweight replication, client-based quoruming, self-validating server responses, per-operation client-side retries, and co-design with the network layers. These foci lead to a system resilient to the rigors of production and frequent post deployment evolution.

Supplementary Material

bronson-public-review (556-public-review.pdf)
CliqueMap: Productionizing an RMA-Based Distributed Caching System: Public Review
MP4 File (video-presentation.mp4)
Conference Presentation Video
MP4 File (video-long.mp4)
Long Version Video

References

[1]
2020. Chelsio Terminator 6 NICs. https://www.chelsio.com/terminator-6-asic/.
[2]
2020. Google's Application Layer Transport Security. https://cloud.google.com/security/encryption-in-transit/application-layer-transport-security.
[3]
2020. Marvell FastLinQ 41000 Series Ethernet NICs. https://www.marvell.com/products/ethernet-adapters-and-controllers/41000-ethernet-adapters.html.
[4]
2020. Memcached. http://memcached.org/.
[5]
2020. Nvidia Mellanox Connect-X NICs. https://www.nvidia.com/en-us/networking/ethernet-adapters/.
[6]
2020. RDMA Core Userspace Libraries (libibverbs). https://github.com/linux-rdma/rdma-core.
[7]
Marcos K Aguilera, Kimberly Keeton, Stanko Novakovic, and Sharad Singhal. 2019. Designing far memory data structures: Think outside the box. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS'19). 120--126.
[8]
Emmanuel Amaro, Zhihong Luo, Amy Ousterhout, Arvind Krishnamurthy, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker. 2020. Remote Memory Calls. In Proceedings of the 19th ACM Workshop on Hot Topics in Networks (HotNets'20). 38--44.
[9]
Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. 1995. Sharing memory robustly in message-passing systems. Journal of the ACM (JACM) 42, 1 (1995), 124--142.
[10]
Benjamin Berg, Daniel S. Berger, Sara McAllister, Isaac Grosof, Sathya Gunasekar, Jimmy Lu, Michael Uhlar, Jim Carrig, Nathan Beckmann, Mor Harchol-Balter, and Gregory R. Ganger. 2020. The CacheLib Caching Engine: Design and Experiences at Scale. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI'20). 753--768.
[11]
Jeff Bonwick. 1994. The Slab Allocator: An Object-Caching Kernel. In USENIX Summer 1994 Technical Conference (USTC'94).
[12]
Eric Brewer. 2017. Spanner, TrueTime and the CAP Theorem. Technical Report. https://research.google/pubs/pub45855/
[13]
Mike Burrows. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th symposium on Operating systems design and implementation (OSDI'06). 335--350.
[14]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26, 2 (2008), 1--26.
[15]
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. Spanner: Google's Globally Distributed Database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 1--22.
[16]
Jeffrey Dean. 2010. Evolution and future directions of large-scale storage and computation systems at Google. (2010). https://research.google/pubs/pub44877/
[17]
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: Fast Remote Memory. In Proceedings of the Eleventh USENIX Symposium on Networked Systems Design and Implementation (NSDI'14). 401--414.
[18]
Aleksandar Dragojević, Dushyanth Narayanan, Edmund B Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No compromises: distributed transactions with consistency, availability, and performance. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP'15). 54--70.
[19]
David K Gifford. 1979. Weighted voting for replicated data. In Proceedings of the seventh ACM Symposium on Operating Systems Principles (SOSP'79). 150--162.
[20]
Maurice Herlihy, Victor Luchangco, and Mark Moir. 2003. Obstruction-free synchronization: Double-ended queues as an example. In 23rd International Conference on Distributed Computing Systems, 2003. Proceedings. 522--529.
[21]
Flavio P Junqueira, Benjamin C Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In 41st International Conference on Dependable Systems & Networks (DSN'11). 245--256.
[22]
Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be General and Fast. In Proceeding of Sixteenth USENIX Symposium on Networked Systems Design and Implementation. 1--16.
[23]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2014. Using RDMA efficiently for key-value services. In Proceedings of the 2014 Conference of ACM SIGCOMM. 295--306.
[24]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided RDMA Datagram RPCs. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16). 185--201.
[25]
Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15). 158--169.
[26]
Antonios Katsarakis, Vasilis Gavrielatos, MR Siavash Katebzadeh, Arpit Joshi, Aleksandar Dragojevic, Boris Grot, and Vijay Nagarajan. 2020. Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'20). 201--217.
[27]
Leslie Lamport. 1994. The temporal logic of actions. ACM Transactions on Programming Languages and Systems (TOPLAS) 16, 3 (1994), 872--923.
[28]
Leslie Lamport. 1998. The Part-Time Parliament. ACM Transactions on Computer Systems (TOCS) 16, 2 (1998), 133--169.
[29]
Hyeontaek Lim, Dongsu Han, David G Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI'14). 429--444.
[30]
Nancy A Lynch and Alexander A Shvartsman. 1997. Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts. In Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing. 272--281.
[31]
Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. 2019. Snap: A Microkernel Approach to Host Networking. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP'19). 399--413.
[32]
Nimrod Megiddo and Dharmendra S. Modha. 2003. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST '03). 115--130.
[33]
Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. In 2013 USENIX Annual Technical Conference (ATC'13). 103--114.
[34]
Arjun Singhvi, Aditya Akella, Dan Gibson, Thomas F. Wenisch, Monica Wong-Chan, Sean Clark, Milo M. K. Martin, Moray McLaren, Prashant Chandra, Rob Cauble, Hassan M. G. Wassel, Behnam Montazeri, Simon L. Sabato, Joel Scherpelz, and Amin Vahdat. 2020. 1RMA: Re-Envisioning Remote Memory Access for Multi-Tenant Datacenters. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM '20). 708--721.
[35]
Patrick Stuedi, Animesh Trivedi, and Bernard Metzler. 2012. Wimpy nodes with 10GbE: leveraging one-sided operations in soft-RDMA to boost memcached. In In 2012 USENIX Annual Technical Conference (ATC'12). 347--353.
[36]
Maomeng Su, Mingxing Zhang, Kang Chen, Zhenyu Guo, and Yongwei Wu. 2017. RFP: When RPC is Faster than Server-Bypass with RDMA. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys '17). 1--15.
[37]
Jeff Terrace and Michael J Freedman. 2009. Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads. In 2009 USENIX Annual Technical Conference. 1--16.
[38]
Robbert van Renesse and Fred B. Schneider. 2004. Chain Replication for Supporting High Throughput and Availability. In Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation (OSDI'04). 7.
[39]
Yandong Wang, Xiaoqiao Meng, Li Zhang, and Jian Tan. 2014. C-hint: An effective and reliable cache management for rdma-accelerated key-value stores. In Proceedings of the ACM Symposium on Cloud Computing (SoCC'14). 1--13.
[40]
Yandong Wang, Li Zhang, Jian Tan, Min Li, Yuqing Gao, Xavier Guerin, Xiaoqiao Meng, and Shicong Meng. 2015. HydraDB: a resilient RDMA-driven key-value middleware for in-memory cluster computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'15). 1--11.
[41]
Juncheng Yang, Yao Yue, and K. V. Rashmi. 2020. A large scale analysis of hundreds of in-memory cache clusters at Twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 191--208.

Cited By

View all
  • (2024)SIEVE is simpler than LRUProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691893(1229-1246)Online publication date: 16-Apr-2024
  • (2024)A Survey of RDMA Distributed StorageProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670199(534-539)Online publication date: 24-May-2024
  • (2024)Index Shipping for Efficient Replication in LSM Key-Value Stores with Hybrid KV PlacementACM Transactions on Storage10.1145/365867220:3(1-23)Online publication date: 11-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCOMM '21: Proceedings of the 2021 ACM SIGCOMM 2021 Conference
August 2021
868 pages
ISBN:9781450383837
DOI:10.1145/3452296
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. key-value caching system
  2. remote memory access
  3. remote procedure call

Qualifiers

  • Research-article

Conference

SIGCOMM '21
Sponsor:
SIGCOMM '21: ACM SIGCOMM 2021 Conference
August 23 - 27, 2021
Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)438
  • Downloads (Last 6 weeks)69
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SIEVE is simpler than LRUProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691893(1229-1246)Online publication date: 16-Apr-2024
  • (2024)A Survey of RDMA Distributed StorageProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670199(534-539)Online publication date: 24-May-2024
  • (2024)Index Shipping for Efficient Replication in LSM Key-Value Stores with Hybrid KV PlacementACM Transactions on Storage10.1145/365867220:3(1-23)Online publication date: 11-Jun-2024
  • (2024)Brief Announcement: ROMe: Wait-free Objects for RDMAProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660262(371-373)Online publication date: 17-Jun-2024
  • (2023)Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMAProceedings of the ACM on Management of Data10.1145/35892761:2(1-26)Online publication date: 20-Jun-2023
  • (2023)Honeycomb: Ordered Key-Value Store Acceleration on an FPGA-Based SmartNICIEEE Transactions on Computers10.1109/TC.2023.334517373:3(857-871)Online publication date: 20-Dec-2023
  • (2023)iWriter: An Offloading Method for Indirectly Writing Remote Data2023 IEEE International Performance, Computing, and Communications Conference (IPCCC)10.1109/IPCCC59175.2023.10253850(132-139)Online publication date: 17-Nov-2023
  • (2022)DINOMOProceedings of the VLDB Endowment10.14778/3565838.356585415:13(4023-4037)Online publication date: 1-Sep-2022
  • (2022)Clio: a hardware-software co-designed disaggregated memory systemProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507762(417-433)Online publication date: 28-Feb-2022
  • (2022)MicroStream: A Distributed In-memory Caching Service For Data Production2022 IEEE International Conference on Joint Cloud Computing (JCC)10.1109/JCC56315.2022.00010(17-22)Online publication date: Aug-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media