More Web Proxy on the site http://driver.im/

research-article

Atum: Scalable Group Communication Using Volatile Groups

Authors:

Rachid Guerraoui,

Anne-Marie Kermarrec,

Matej Pavlovic,

Dragos-Adrian SeredinschiAuthors Info & Claims

Middleware '16: Proceedings of the 17th International Middleware Conference

Article No.: 19, Pages 1 - 14

https://doi.org/10.1145/2988336.2988356

Published: 28 November 2016 Publication History

Abstract

This paper presents Atum, a group communication middleware for a large, dynamic, and hostile environment. At the heart of Atum lies the novel concept of volatile groups: small, dynamic groups of nodes, each executing a state machine replication protocol, organized in a flexible overlay. Using volatile groups, Atum scatters faulty nodes evenly among groups, and then masks each individual fault inside its group. To broadcast messages among volatile groups, Atum runs a gossip protocol across the overlay.

We report on our synchronous and asynchronous (eventually synchronous) implementations of Atum, as well as on three representative applications that we build on top of it: A publish/subscribe platform, a file sharing service, and a data streaming system. We show that (a) Atum can grow at an exponential rate beyond 1000 nodes and disseminate messages in polylogarithmic time (conveying good scalability); (b) it smoothly copes with 18% of nodes churning every minute; and (c) it is impervious to arbitrary faults, suffering no performance decay despite 5.8% Byzantine nodes in a system of 850 nodes.

References

[1]

https://www.sqlite.org/.

[2]

https://aws.amazon.com/ec2/instance-types/.

[3]

Amazon S3 Availability Event: July 20, 2008. http://status.aws.amazon.com/s3-20080720.html.

[4]

A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, and R. P. Wattenhofer. FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. ACM SIGOPS Operating Systems Review, 36(SI), 2002.

Digital Library

[5]

A. Adya, G. Cooper, D. Myers, and M. Piatek. Thialfi: a Client Notification Service for Internet-Scale Applications. In SOSP, 2011.

Digital Library

[6]

M. Allman, V. Paxson, and E. Blanton. TCP Congestion Control. RFC 5681 (Draft Standard), Sept. 2009.

[7]

P.-L. Aublin, S. B. Mokhtar, and V. Quéma. Rbft: Redundant byzantine fault tolerance. In ICDCS, 2013.

Digital Library

[8]

B. Awerbuch and C. Scheideler. Towards Scalable and Robust Overlay Networks. IPTPS, 2007.

[9]

B. Awerbuch and C. Scheideler. Towards a Scalable and Robust DHT. Theory of Computing Systems, 45(2), 2009.

Digital Library

[10]

Z. Bar-Yossef, R. Friedman, and G. Kliot. RaWMS - Random Walk Based Lightweight Membership Service for Wireless Ad Hoc Networks. ACM Trans. Comput. Syst., 26(2), 2008.

Digital Library

[11]

L. A. Barroso, J. Clidaras, and U. Hölzle. The datacenter as a computer: an introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture, 8(3):1--154, 2013.

Digital Library

[12]

A. Bessani, J. Sousa, and E. Alchieri. State Machine Replication for the Masses with BFT-SMART. In DSN, 2014.

Digital Library

[13]

C. E. Bezerra, F. Pedone, and R. V. Renesse. Scalable state-machine replication. In DSN, 2014.

Digital Library

[14]

K. P. Birman. Replication and Fault-tolerance in the ISIS System. In SOSP, 1985.

Digital Library

[15]

K. P. Birman. The process group approach to reliable distributed computing. Communications of the ACM, 36, 1993.

Digital Library

[16]

G. Bracha and S. Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM, 32(4), 1985.

Digital Library

[17]

M. Castro, P. Druschel, A. Ganesh, A. Rowstron, and D. S. Wallach. Secure routing for structured peer-to-peer overlay networks. ACM SIGOPS Operating Systems Review, 36(SI), 2002.

Digital Library

[18]

M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, and A. Singh. Splitstream: high-bandwidth multicast in cooperative environments. In ACM SIGOPS Operating Systems Review, volume 37, pages 298--313, 2003.

Digital Library

[19]

M. Castro, P. Druschel, A.-M. Kermarrec, and A. I. Rowstron. Scribe: A large-scale and decentralized application-level multicast infrastructure. Selected Areas in Communications, IEEE Journal on, 20(8), 2002.

Digital Library

[20]

M. Castro and B. Liskov. Practical byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4), 2002.

Digital Library

[21]

F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM TOCS, 26(2), 2008.

Digital Library

[22]

Y. Chen, R. Griffith, J. Liu, R. H. Katz, and A. D. Joseph. Understanding TCP Incast Throughput Collapse in Datacenter Networks. In WREN, 2009.

Digital Library

[23]

G. V. Chockler, I. Keidar, and R. Vitenberg. Group communication specifications: A comprehensive study. ACM Comput. Surv., 33(4), 2001.

Digital Library

[24]

M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham. Vigilante: End-to-end Containment of Internet Worms. In SOSP, 2005.

Digital Library

[25]

J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In OSDI, 2006.

Digital Library

[26]

B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield. Remus: High availability via asynchronous virtual machine replication. In NSDI, 2008.

Digital Library

[27]

G. Danezis, C. Lesniewski-laas, M. F. Kaashoek, and R. Anderson. Sybil-resistant dht routing. In In ESORICS. Springer, 2005.

Digital Library

[28]

J. Dean. Designs, lessons and advice from building large distributed systems. Keynote from LADIS, 2009.

[29]

A. J. Demers, D. H. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. E. Sturgis, D. C. Swinehart, and D. B. Terry. Epidemic algorithms for replicated database maintenance. ACM SIGOPS Operating Systems Review, 22(1), 1988.

Digital Library

[30]

D. Dolev, E. Hoch, and R. Renesse. Self-stabilizing and byzantine-tolerant overlay network. In OPODIS, 2007.

Digital Library

[31]

D. Dolev and D. Malki. The Transis approach to high availability cluster communication. Communications of the ACM, 39(4), 1996.

Digital Library

[32]

D. Dolev and H. R. Strong. Authenticated algorithms for byzantine agreement. SIAM J. Comput., 12(4), 1983.

[33]

J. R. Douceur. The sybil attack. In IPTPS. Springer, 2002.

Digital Library

[34]

P. T. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec. The many faces of publish/subscribe. ACM Computing Surveys (CSUR), 35(2), 2003.

Digital Library

[35]

C. Fetzer and F. Cristian. A fail-aware membership service. In Reliable Distributed Systems, 1997.

Digital Library

[36]

A. Fiat, J. Saia, and M. Young. Making Chord Robust to Byzantine Attacks. Algorithms--ESA, 2005.

Digital Library

[37]

M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM), 32(2):374--382, 1985.

Digital Library

[38]

R. Geambasu, J. Falkner, P. Gardner, T. Kohno, A. Krishnamurthy, and H. M. Levy. Experiences building security applications on DHTs. Technical report, Technical report, UW-CSE-09-09-01, 2009.

[39]

H. Geng and R. van Renesse. Sprinkler - Reliable Broadcast for Geographically Dispersed Datacenters. In Middleware 2013. 2013.

[40]

S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In SOSP, 2003.

Digital Library

[41]

L. Glendenning, I. Beschastnikh, A. Krishnamurthy, and T. Anderson. Scalable consistency in Scatter. In SOSP, 2011.

Digital Library

[42]

R. Guerraoui, F. Huc, and A.-M. Kermarrec. Highly dynamic distributed computing with byzantine failures. In PODC, 2013.

Digital Library

[43]

M. A. Hiltunen and R. D. Schlichting. The cactus approach to building configurable middleware services. In Proceedings of the Workshop on Dependable System Middleware and Group Communication (DSMGC 2000), 2000.

[44]

H. Johansen, A. Allavena, and R. Van Renesse. Fireflies: scalable support for intrusion-tolerant network overlays. ACM SIGOPS Operating Systems Review, 40(4), 2006.

Digital Library

[45]

M. F. Kaashoek and A. S. Tanenbaum. Group communication in the Amoeba distributed operating system. In ICDCS, 1991.

[46]

A. Kate, Y. Huang, and I. Goldberg. Distributed key generation in the wild. IACR Cryptology ePrint Archive, 2012.

[47]

D. Kostić, A. Rodriguez, J. Albrecht, and A. Vahdat. Bullet: High bandwidth data dissemination using an overlay mesh. In SOSP, 2003.

Digital Library

[48]

R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: speculative byzantine fault tolerance. In SOSP, Oct. 2007.

Digital Library

[49]

J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao. Oceanstore: An architecture for global-scale persistent storage. SIGPLAN Not., 2000.

Digital Library

[50]

L. Lamport. The implementation of reliable distributed multiprocess systems. Computer Networks, 2:95--114, 1978.

[51]

C. Law and K.-Y. Siu. Distributed construction of random expander networks. In INFOCOM, 2003.

[52]

C. Lesniewski-Lass and M. F. Kaashoek. Whanau: A sybil-proof distributed hash table. In NSDI, 2010.

Digital Library

[53]

H. C. Li, A. Clement, M. Marchetti, M. Kapritsos, L. Robison, L. Alvisi, and M. Dahlin. FlightPath: Obedience vs. Choice in Cooperative Services. In OSDI, 2008.

Digital Library

[54]

Z. Liu, R. Yuan, Z. Li, H. Li, and G. Chen. Survive under high churn in structured P2P systems: evaluation and strategy. In ICCS, 2006.

Digital Library

[55]

J. R. Lorch, A. Adya, W. J. Bolosky, R. Chaiken, J. R. Douceur, and J. Howell. The smart way to migrate replicated stateful services. In EuroSys, 2006.

Digital Library

[56]

S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system.

[57]

E. B. Nightingale, J. R. Douceur, and V. Orgovan. Cycles, Cells and Platters: An Empirical Analysis of Hardware Failures on a Million Consumer PCs. In EuroSys, 2011.

Digital Library

[58]

D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. USENIX ATC, 2014.

Digital Library

[59]

D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In USITS, 2003.

Digital Library

[60]

S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz. Handling churn in a DHT. In USENIX, 2004.

Digital Library

[61]

S. C. Rhea, P. R. Eaton, D. Geels, H. Weatherspoon, B. Y. Zhao, and J. Kubiatowicz. Pond: The OceanStore prototype. In FAST, volume 3, 2003.

Digital Library

[62]

R. Rodrigues and B. Liskov. Rosebud: A scalable byzantine-fault-tolerant storage architecture. Technical Report MIT-LCS-TR-932 and MIT-CSAIL-TR-2003-035, 2003.

[63]

R. Rodrigues, B. Liskov, and L. Shrira. The design of a robust peer-to-peer system. In ACM SIGOPS European workshop: beyond the PC - EW10, 2002.

Digital Library

[64]

R. Roverso, J. Dowling, and M. Jelasity. Through the wormhole: Low cost, fresh peer sampling for the internet. In Peer-to-Peer Computing (P2P), 2013.

[65]

C. Scheideler. How to spread adversarial nodes?: Rotate! In STOC, 2005.

Digital Library

[66]

F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv., 22(4), 1990.

Digital Library

[67]

B. Schroeder, S. Damouras, and P. Gill. Understanding latent sector errors and how to protect against them. In FAST, 2010.

Digital Library

[68]

A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey. Jellyfish: Networking data centers randomly. In NSDI, 2012.

Digital Library

[69]

I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM, 2001.

Digital Library

[70]

R. Van Renesse, K. P. Birman, and S. Maffeis. Horus: A Flexible Group Communication System. Communications of the ACM, 39(4), 1996.

Digital Library

[71]

S. Voulgaris and M. van Steen. Middleware, 2013.

[72]

L. Vu, I. Gupta, J. Liang, and K. Nahrstedt. Measurement and modeling of a large-scale overlay for multimedia streaming. In QSHINE, 2007.

Digital Library

[73]

H. Weatherspoon, P. Eaton, B.-G. Chun, and J. Kubiatowicz. Antiquity: exploiting a secure log for wide-area distributed storage. ACM SIGOPS Operating Systems Review, 41(3), 2007.

Digital Library

[74]

M. Young, A. Kate, I. Goldberg, and M. Karsten. Practical Robust Communication in DHTs Tolerating a Byzantine Adversary. In ICDCS, 2010.

Digital Library

[75]

B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. D. Kubiatowicz. Tapestry: A resilient global-scale overlay for service deployment. IEEE JSAC, 22(1), 2004.

Digital Library

Cited By

van der Linde ALeitão JPreguiça N(2020)Practical client-side replicationProceedings of the VLDB Endowment10.14778/3407790.340784713:12(2590-2605)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.14778/3407790.3407847
Collins DGuerraoui RKomatovic JKuznetsov PMonti MPavlovic MPignolet YSeredinschi DTonkikh AXygkis A(2020)Online Payments by Merely Broadcasting Messages2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN48063.2020.00023(26-38)Online publication date: Jun-2020
https://doi.org/10.1109/DSN48063.2020.00023
Sadi MAmad MBadache N(2019)HiCo-MoG: Hierarchical consensus-based group membership service in mobile Ad Hoc networksJournal of High Speed Networks10.3233/JHS-19060925:2(155-172)Online publication date: 21-Jun-2019
https://doi.org/10.3233/JHS-190609
Show More Cited By

Atum: Scalable Group Communication Using Volatile Groups
1. Computer systems organization
  1. Architectures
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures

Recommendations

Multi-Threshold Byzantine Fault Tolerance
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

Classic Byzantine fault tolerant (BFT) protocols are designed for a specific timing model, most often one of the following: synchronous, asynchronous or partially synchronous. It is well known that the timing model and fault tolerance threshold present ...
Araneola: A scalable reliable multicast system for dynamic environments

This paper presents Araneola (Araneola means ''little spider'' in Latin.), a scalable reliable application-level multicast system for highly dynamic wide-area environments. Araneola supports multi-point to multi-point reliable communication in a fully ...
Efficient and Adaptive Epidemic-Style Protocols for Reliable and Scalable Multicast

Epidemic-style (gossip-based) techniques have recently emerged as a class of scalable and reliable protocols for peer-to-peer multicast dissemination in large process groups. However, popular implementations of epidemic-style dissemination suffer from ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

Middleware '16: Proceedings of the 17th International Middleware Conference

November 2016

280 pages

ISBN:9781450343008

DOI:10.1145/2988336

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

ACM: Association for Computing Machinery
USENIX Assoc: USENIX Assoc
IFIP

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

Middleware '16

Sponsor:

ACM
USENIX Assoc

Middleware '16: 17th International Middleware Conference

December 12 - 16, 2016

Trento, Italy

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
153
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

van der Linde ALeitão JPreguiça N(2020)Practical client-side replicationProceedings of the VLDB Endowment10.14778/3407790.340784713:12(2590-2605)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.14778/3407790.3407847
Collins DGuerraoui RKomatovic JKuznetsov PMonti MPavlovic MPignolet YSeredinschi DTonkikh AXygkis A(2020)Online Payments by Merely Broadcasting Messages2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN48063.2020.00023(26-38)Online publication date: Jun-2020
https://doi.org/10.1109/DSN48063.2020.00023
Sadi MAmad MBadache N(2019)HiCo-MoG: Hierarchical consensus-based group membership service in mobile Ad Hoc networksJournal of High Speed Networks10.3233/JHS-19060925:2(155-172)Online publication date: 21-Jun-2019
https://doi.org/10.3233/JHS-190609
Deyerl CDistler T(2019)In Search of a Scalable Raft-based Replication ArchitectureProceedings of the 6th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3301419.3323968(1-7)Online publication date: 25-Mar-2019
https://dl.acm.org/doi/10.1145/3301419.3323968

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents