[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3464298.3493393acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

RamCast: RDMA-based atomic multicast

Published: 02 December 2021 Publication History

Abstract

Atomic multicast is a group communication abstraction useful in the design of highly available and scalable systems. It allows messages to be addressed to a subset of the processes in the system reliably and consistently. Many atomic multicast algorithms have been designed for the message-passing system model. The paper presents RamCast, the first atomic multicast protocol for the shared-memory system model. We design RamCast by leveraging Remote Direct Memory Access (RDMA) technology and by carefully combining techniques from message-passing and shared-memory systems. We show experimentally that RamCast outperforms current state-of-the-art atomic multicast protocols, increasing throughput by up to 3.7x and reducing latency by up to 28x.

References

[1]
M. K. Aguilera, N. Ben-David, R. Guerraoui, V. Marathe, and I. Zablotchi. 2019. The Impact of RDMA on Agreement. In PODC.
[2]
M. K. Aguilera, N. Ben-David, R. Guerraoui, V. J. Marathe, A. Xygkis, and I. Zablotchi. 2020. Microsecond Consensus for Microsecond Applications. In OSDI.
[3]
M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg. 2001. Stable Leader Election. In DISC.
[4]
M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. 2007. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. In SOSP.
[5]
P. A. Barret, A. M. Hilborne, P. G. Bond, D. T. Seaton, P. Verissimo, L. Rodrigues, and N. A. Speirs. 1990. The Delta-4 extra performance architecture (XPA). In FTCS.
[6]
M. Beck and M. Kagan. 2011. Performance evaluation of the RDMA over ethernet (RoCE) standard in enterprise data centers infrastructure. In DC-CaVES.
[7]
C. E. Bezerra, D. Cason, and F. Pedone. 2015. Ridge: high-throughput, low-latency atomic multicast. In SRDS. IEEE, 256--265.
[8]
C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. 2015. The end of slow networks: It's time for a redesign. arXiv preprint arXiv:1504.01048 (2015).
[9]
K. P. Birman and T. A. Joseph. 1987. Reliable communication in the presence of failures. ACM Transactions on Computer Systems (TOCS) 5, 1 (1987), 47--76.
[10]
K. P. Birman and T. A. Joseph. 1987. Reliable Communication in the Presence of Failures. ACM Transactions on Computer Systems 5, 1 (Feb. 1987), 47--76.
[11]
T. D. Chandra and S. Toueg. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2 (1996), 225--267.
[12]
P. R. Coelho, N. Schiper, and F. Pedone. 2017. Fast Atomic Multicast. In DSN.
[13]
J. C. Corbett, J. Dean, and M. et al Epstein. 2012. Spanner: Google's globally distributed database. In OSDI.
[14]
H. T. Dang, P. Bressana, H. Wang, K. S. Lee, N. Zilberman, H. Weatherspoon, M. Canini, F. Pedone, and R. Soulé. 2020. P4xos: Consensus as a Network Service. IEEE/ACM Trans. Netw. 28, 4 (Aug. 2020), 1726âĂŞ1738.
[15]
H. T. Dang, M. Canini, F. Pedone, and R. Soulé. 2016. Paxos Made Switch-y. 46, 2 (May 2016), 18--24.
[16]
H. T. Dang, D. Sciascia, M. Canini, F. Pedone, and R. Soulé. 2015. NetPaxos: Consensus at Network Speed. 1--7.
[17]
C. Delporte-Gallet and H. Fauconnier. 2000. Fault-Tolerant Genuine Atomic Multicast to Multiple Groups. In OPODIS. Citeseer.
[18]
A. Dragojević, D. Narayanan, M. Castro, andO. Hodson. 2014. FaRM: Fast Remote Memory. In NSDI.
[19]
D. Duplyakin, R. Ricci, A. Maricq, G. Wong, J. Duerig, E. Eide, L. Stoller, M. Hibler, D. Johnson, K. Webb, A. Akella, K. Wang, G. Ricart, L. Landweber, C. Elliott, M. Zink, E. Cecchet, S. Kar, and P. Mishra. 2019. The Design and Operation of CloudLab. In USENIX-ATC.
[20]
C. Dwork, N. Lynch, and L. Stockmeyer. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (1988), 288--323.
[21]
E. Giuseppe Esposito, P. R. Coelho, and F. Pedone. 2018. Kernel paxos. In SRDS. IEEE.
[22]
M. J. Fischer, N. A. Lynch, and M. S. Patterson. 1985. Impossibility of Distributed Consensus with one Faulty Process. J. ACM 32, 2 (1985), 374--382.
[23]
A. Gotsman, A. Lefort, and G. Chockler. 2019. White-Box Atomic Multicast. In DSN.
[24]
R. Guerraoui and A. Schiper. 2001. Genuine atomic multicast in asynchronous distributed systems. Theor. Comput. Sci. 254, 1--2 (2001), 297--316.
[25]
V. Hadzilacos and S. Toueg. 1993. Fault-tolerant broadcasts and related problems. In Distributed Systems, Sape J. Mullender (Ed.). Addison-Wesley, Chapter 5, 97--145.
[26]
L. Hoang Le, E. Fynn, M. Eslahi-Kelorazi, R. Soulé, and F. Pedone. 2019. DynaStar: Optimized Dynamic Partitioning for Scalable State Machine Replication. In ICDCS.
[27]
B. Huang, L. Jin, Z. Lu, M. Yan, J. Wu, P. CK Hung, and Q. Tang. 2019. RDMA-driven MongoDB: An approach of RDMA enhanced NoSQL paradigm for large-Scale data processing. Information Sciences 502 (2019), 376--393.
[28]
N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. 2012. High performance RDMA-based design of HDFS over InfiniBand. In SC. IEEE.
[29]
Z. István, D. Sidler, G. Alonso, and M. Vukolic. 2016. Consensus in a Box: Inexpensive Coordination in Hardware. 425--438.
[30]
X. Jin, X. Li, H. Zhang, N. Foster, J. Lee, R. Soulé, C. Kim, and I. Stoica. 2018. NetChain: Scale-Free Sub-RTT Coordination. 35--49.
[31]
F.P. Junqueira, B. C. Reed, and M. Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In DSN.
[32]
A. Kalia, M. Kaminsky, and D. G. Andersen. 2014. Using RDMA efficiently for key-value services. In SIGCOMM.
[33]
A. Kalia, M. Kaminsky, and D. G. Andersen. 2016. Design Guidelines for High Performance {RDMA} Systems. In USENIX-ATC.
[34]
L. Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM 21, 7 (July 1978), 558--565.
[35]
L. Lamport. 1998. The Part-Time Parliament. ACM Transactions on Computer Systems 16, 2 (1998), 133--169.
[36]
B. Li, P. Zhang, Z. Huo, and D. Meng. 2009. Early experiences with write-write design of NFS over RDMA. In NAS. IEEE.
[37]
J. Li, E. Michael, N. Kr. Sharma, A. Szekeres, and D. R. K. Ports. 2016. Just Say No to Paxos Overhead: Replacing Consensus with Network Ordering. 467--483.
[38]
Y. Lu, J. Shu, Y. Chen, and T. Li. 2017. Octopus: an rdma-enabled distributed persistent memory file system. In USENIX-ATC.
[39]
P. J. Marandi, M. Primi, and F. Pedone. 2012. Multi-ring paxos. In DSN. IEEE.
[40]
C. Mitchell, Y. Geng, and J. Li. 2013. Using One-Sided {RDMA} Reads to Build a Fast, CPU-Efficient Key-Value Store. In USENIX-ATC.
[41]
G. F. Pfister. 2001. An introduction to the infiniband architecture. High performance mass storage and parallel I/O 42, 617--632 (2001), 102.
[42]
M. Poke and T. Hoefler. 2015. DARE: High-Performance State Machine Replication on RDMA Networks. In HPDC.
[43]
D. R. K. Ports, J. Li, V. Liu, N. Kr. Sharma, and A. Krishnamurthy. 2015. Designing Distributed Systems Using Approximate Synchrony in Data Center Networks. 43--57.
[44]
M. J. Rashti and A. Afsahi. 2007. 10-Gigabit iWARP Ethernet: comparative performance analysis with InfiniBand and Myrinet-10G. In IPDPS. IEEE, 1--8.
[45]
P. Stuedi, B. Metzler, and A. Trivedi. 2013. jVerbs: ultra-low latency for data center applications. In SoCC.
[46]
C. Wang, J. Jiang, X. Chen, N. Yi, and H. Cui. 2017. APUS: Fast and Scalable Paxos on RDMA. In SoCC.
[47]
X. Wei, J. Shi, Y. Chen, R. Chen, and H. Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. In SOSP.
[48]
J. Wu, P. Wyckoff, and D. Panda. 2003. PVFS over InfiniBand: Design and performance evaluation. In ICPP. IEEE.

Cited By

View all
  • (2024)Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSsProceedings of the ACM on Management of Data10.1145/36392912:1(1-28)Online publication date: 26-Mar-2024
  • (2024) P 4 ce: Consensus over RDMA at Line Speed 2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00054(508-519)Online publication date: 23-Jul-2024
  • (2024)Cepheus: Accelerating Datacenter Applications with High-Performance RoCE-Capable Multicast2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00074(908-921)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware '21: Proceedings of the 22nd International Middleware Conference
December 2021
398 pages
ISBN:9781450385343
DOI:10.1145/3464298
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • USENIX Assoc: USENIX Assoc
  • IFIP

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RDMA
  2. atomic multicast
  3. group communication

Qualifiers

  • Research-article

Funding Sources

  • Swiss National Science Foundation

Conference

Middleware '21
Sponsor:
Middleware '21: 22nd International Middleware Conference
December 6 - 10, 2021
Québec city, Canada

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)5
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSsProceedings of the ACM on Management of Data10.1145/36392912:1(1-28)Online publication date: 26-Mar-2024
  • (2024) P 4 ce: Consensus over RDMA at Line Speed 2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00054(508-519)Online publication date: 23-Jul-2024
  • (2024)Cepheus: Accelerating Datacenter Applications with High-Performance RoCE-Capable Multicast2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00074(908-921)Online publication date: 2-Mar-2024
  • (2023)FlexCastProceedings of the 24th International Middleware Conference10.1145/3590140.3629122(288-300)Online publication date: 27-Nov-2023
  • (2023)MC-RDMA: Improving Replication Performance of RDMA-based Distributed Systems with Reliable Multicast Support2023 IEEE 31st International Conference on Network Protocols (ICNP)10.1109/ICNP59255.2023.10355619(1-11)Online publication date: 10-Oct-2023
  • (2023)Heron: Scalable State Machine Replication on Shared Memory2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58367.2023.00025(138-150)Online publication date: Jun-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media