[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2988336.2988356acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Atum: Scalable Group Communication Using Volatile Groups

Published: 28 November 2016 Publication History

Abstract

This paper presents Atum, a group communication middleware for a large, dynamic, and hostile environment. At the heart of Atum lies the novel concept of volatile groups: small, dynamic groups of nodes, each executing a state machine replication protocol, organized in a flexible overlay. Using volatile groups, Atum scatters faulty nodes evenly among groups, and then masks each individual fault inside its group. To broadcast messages among volatile groups, Atum runs a gossip protocol across the overlay.
We report on our synchronous and asynchronous (eventually synchronous) implementations of Atum, as well as on three representative applications that we build on top of it: A publish/subscribe platform, a file sharing service, and a data streaming system. We show that (a) Atum can grow at an exponential rate beyond 1000 nodes and disseminate messages in polylogarithmic time (conveying good scalability); (b) it smoothly copes with 18% of nodes churning every minute; and (c) it is impervious to arbitrary faults, suffering no performance decay despite 5.8% Byzantine nodes in a system of 850 nodes.

References

[1]
https://www.sqlite.org/.
[2]
https://aws.amazon.com/ec2/instance-types/.
[3]
Amazon S3 Availability Event: July 20, 2008. http://status.aws.amazon.com/s3-20080720.html.
[4]
A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, and R. P. Wattenhofer. FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. ACM SIGOPS Operating Systems Review, 36(SI), 2002.
[5]
A. Adya, G. Cooper, D. Myers, and M. Piatek. Thialfi: a Client Notification Service for Internet-Scale Applications. In SOSP, 2011.
[6]
M. Allman, V. Paxson, and E. Blanton. TCP Congestion Control. RFC 5681 (Draft Standard), Sept. 2009.
[7]
P.-L. Aublin, S. B. Mokhtar, and V. Quéma. Rbft: Redundant byzantine fault tolerance. In ICDCS, 2013.
[8]
B. Awerbuch and C. Scheideler. Towards Scalable and Robust Overlay Networks. IPTPS, 2007.
[9]
B. Awerbuch and C. Scheideler. Towards a Scalable and Robust DHT. Theory of Computing Systems, 45(2), 2009.
[10]
Z. Bar-Yossef, R. Friedman, and G. Kliot. RaWMS - Random Walk Based Lightweight Membership Service for Wireless Ad Hoc Networks. ACM Trans. Comput. Syst., 26(2), 2008.
[11]
L. A. Barroso, J. Clidaras, and U. Hölzle. The datacenter as a computer: an introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture, 8(3):1--154, 2013.
[12]
A. Bessani, J. Sousa, and E. Alchieri. State Machine Replication for the Masses with BFT-SMART. In DSN, 2014.
[13]
C. E. Bezerra, F. Pedone, and R. V. Renesse. Scalable state-machine replication. In DSN, 2014.
[14]
K. P. Birman. Replication and Fault-tolerance in the ISIS System. In SOSP, 1985.
[15]
K. P. Birman. The process group approach to reliable distributed computing. Communications of the ACM, 36, 1993.
[16]
G. Bracha and S. Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM, 32(4), 1985.
[17]
M. Castro, P. Druschel, A. Ganesh, A. Rowstron, and D. S. Wallach. Secure routing for structured peer-to-peer overlay networks. ACM SIGOPS Operating Systems Review, 36(SI), 2002.
[18]
M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, and A. Singh. Splitstream: high-bandwidth multicast in cooperative environments. In ACM SIGOPS Operating Systems Review, volume 37, pages 298--313, 2003.
[19]
M. Castro, P. Druschel, A.-M. Kermarrec, and A. I. Rowstron. Scribe: A large-scale and decentralized application-level multicast infrastructure. Selected Areas in Communications, IEEE Journal on, 20(8), 2002.
[20]
M. Castro and B. Liskov. Practical byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4), 2002.
[21]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM TOCS, 26(2), 2008.
[22]
Y. Chen, R. Griffith, J. Liu, R. H. Katz, and A. D. Joseph. Understanding TCP Incast Throughput Collapse in Datacenter Networks. In WREN, 2009.
[23]
G. V. Chockler, I. Keidar, and R. Vitenberg. Group communication specifications: A comprehensive study. ACM Comput. Surv., 33(4), 2001.
[24]
M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham. Vigilante: End-to-end Containment of Internet Worms. In SOSP, 2005.
[25]
J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In OSDI, 2006.
[26]
B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield. Remus: High availability via asynchronous virtual machine replication. In NSDI, 2008.
[27]
G. Danezis, C. Lesniewski-laas, M. F. Kaashoek, and R. Anderson. Sybil-resistant dht routing. In In ESORICS. Springer, 2005.
[28]
J. Dean. Designs, lessons and advice from building large distributed systems. Keynote from LADIS, 2009.
[29]
A. J. Demers, D. H. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. E. Sturgis, D. C. Swinehart, and D. B. Terry. Epidemic algorithms for replicated database maintenance. ACM SIGOPS Operating Systems Review, 22(1), 1988.
[30]
D. Dolev, E. Hoch, and R. Renesse. Self-stabilizing and byzantine-tolerant overlay network. In OPODIS, 2007.
[31]
D. Dolev and D. Malki. The Transis approach to high availability cluster communication. Communications of the ACM, 39(4), 1996.
[32]
D. Dolev and H. R. Strong. Authenticated algorithms for byzantine agreement. SIAM J. Comput., 12(4), 1983.
[33]
J. R. Douceur. The sybil attack. In IPTPS. Springer, 2002.
[34]
P. T. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec. The many faces of publish/subscribe. ACM Computing Surveys (CSUR), 35(2), 2003.
[35]
C. Fetzer and F. Cristian. A fail-aware membership service. In Reliable Distributed Systems, 1997.
[36]
A. Fiat, J. Saia, and M. Young. Making Chord Robust to Byzantine Attacks. Algorithms--ESA, 2005.
[37]
M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM), 32(2):374--382, 1985.
[38]
R. Geambasu, J. Falkner, P. Gardner, T. Kohno, A. Krishnamurthy, and H. M. Levy. Experiences building security applications on DHTs. Technical report, Technical report, UW-CSE-09-09-01, 2009.
[39]
H. Geng and R. van Renesse. Sprinkler - Reliable Broadcast for Geographically Dispersed Datacenters. In Middleware 2013. 2013.
[40]
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In SOSP, 2003.
[41]
L. Glendenning, I. Beschastnikh, A. Krishnamurthy, and T. Anderson. Scalable consistency in Scatter. In SOSP, 2011.
[42]
R. Guerraoui, F. Huc, and A.-M. Kermarrec. Highly dynamic distributed computing with byzantine failures. In PODC, 2013.
[43]
M. A. Hiltunen and R. D. Schlichting. The cactus approach to building configurable middleware services. In Proceedings of the Workshop on Dependable System Middleware and Group Communication (DSMGC 2000), 2000.
[44]
H. Johansen, A. Allavena, and R. Van Renesse. Fireflies: scalable support for intrusion-tolerant network overlays. ACM SIGOPS Operating Systems Review, 40(4), 2006.
[45]
M. F. Kaashoek and A. S. Tanenbaum. Group communication in the Amoeba distributed operating system. In ICDCS, 1991.
[46]
A. Kate, Y. Huang, and I. Goldberg. Distributed key generation in the wild. IACR Cryptology ePrint Archive, 2012.
[47]
D. Kostić, A. Rodriguez, J. Albrecht, and A. Vahdat. Bullet: High bandwidth data dissemination using an overlay mesh. In SOSP, 2003.
[48]
R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: speculative byzantine fault tolerance. In SOSP, Oct. 2007.
[49]
J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao. Oceanstore: An architecture for global-scale persistent storage. SIGPLAN Not., 2000.
[50]
L. Lamport. The implementation of reliable distributed multiprocess systems. Computer Networks, 2:95--114, 1978.
[51]
C. Law and K.-Y. Siu. Distributed construction of random expander networks. In INFOCOM, 2003.
[52]
C. Lesniewski-Lass and M. F. Kaashoek. Whanau: A sybil-proof distributed hash table. In NSDI, 2010.
[53]
H. C. Li, A. Clement, M. Marchetti, M. Kapritsos, L. Robison, L. Alvisi, and M. Dahlin. FlightPath: Obedience vs. Choice in Cooperative Services. In OSDI, 2008.
[54]
Z. Liu, R. Yuan, Z. Li, H. Li, and G. Chen. Survive under high churn in structured P2P systems: evaluation and strategy. In ICCS, 2006.
[55]
J. R. Lorch, A. Adya, W. J. Bolosky, R. Chaiken, J. R. Douceur, and J. Howell. The smart way to migrate replicated stateful services. In EuroSys, 2006.
[56]
S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system.
[57]
E. B. Nightingale, J. R. Douceur, and V. Orgovan. Cycles, Cells and Platters: An Empirical Analysis of Hardware Failures on a Million Consumer PCs. In EuroSys, 2011.
[58]
D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. USENIX ATC, 2014.
[59]
D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In USITS, 2003.
[60]
S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz. Handling churn in a DHT. In USENIX, 2004.
[61]
S. C. Rhea, P. R. Eaton, D. Geels, H. Weatherspoon, B. Y. Zhao, and J. Kubiatowicz. Pond: The OceanStore prototype. In FAST, volume 3, 2003.
[62]
R. Rodrigues and B. Liskov. Rosebud: A scalable byzantine-fault-tolerant storage architecture. Technical Report MIT-LCS-TR-932 and MIT-CSAIL-TR-2003-035, 2003.
[63]
R. Rodrigues, B. Liskov, and L. Shrira. The design of a robust peer-to-peer system. In ACM SIGOPS European workshop: beyond the PC - EW10, 2002.
[64]
R. Roverso, J. Dowling, and M. Jelasity. Through the wormhole: Low cost, fresh peer sampling for the internet. In Peer-to-Peer Computing (P2P), 2013.
[65]
C. Scheideler. How to spread adversarial nodes?: Rotate! In STOC, 2005.
[66]
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv., 22(4), 1990.
[67]
B. Schroeder, S. Damouras, and P. Gill. Understanding latent sector errors and how to protect against them. In FAST, 2010.
[68]
A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey. Jellyfish: Networking data centers randomly. In NSDI, 2012.
[69]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM, 2001.
[70]
R. Van Renesse, K. P. Birman, and S. Maffeis. Horus: A Flexible Group Communication System. Communications of the ACM, 39(4), 1996.
[71]
S. Voulgaris and M. van Steen. Middleware, 2013.
[72]
L. Vu, I. Gupta, J. Liang, and K. Nahrstedt. Measurement and modeling of a large-scale overlay for multimedia streaming. In QSHINE, 2007.
[73]
H. Weatherspoon, P. Eaton, B.-G. Chun, and J. Kubiatowicz. Antiquity: exploiting a secure log for wide-area distributed storage. ACM SIGOPS Operating Systems Review, 41(3), 2007.
[74]
M. Young, A. Kate, I. Goldberg, and M. Karsten. Practical Robust Communication in DHTs Tolerating a Byzantine Adversary. In ICDCS, 2010.
[75]
B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. D. Kubiatowicz. Tapestry: A resilient global-scale overlay for service deployment. IEEE JSAC, 22(1), 2004.

Cited By

View all
  • (2020)Practical client-side replicationProceedings of the VLDB Endowment10.14778/3407790.340784713:12(2590-2605)Online publication date: 14-Sep-2020
  • (2020)Online Payments by Merely Broadcasting Messages2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN48063.2020.00023(26-38)Online publication date: Jun-2020
  • (2019)HiCo-MoG: Hierarchical consensus-based group membership service in mobile Ad Hoc networksJournal of High Speed Networks10.3233/JHS-19060925:2(155-172)Online publication date: 21-Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware '16: Proceedings of the 17th International Middleware Conference
November 2016
280 pages
ISBN:9781450343008
DOI:10.1145/2988336
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Byzantine fault tolerance
  2. Distributed systems
  3. Gossip
  4. Group communication

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

Middleware '16
Sponsor:
  • ACM
  • USENIX Assoc

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Practical client-side replicationProceedings of the VLDB Endowment10.14778/3407790.340784713:12(2590-2605)Online publication date: 14-Sep-2020
  • (2020)Online Payments by Merely Broadcasting Messages2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN48063.2020.00023(26-38)Online publication date: Jun-2020
  • (2019)HiCo-MoG: Hierarchical consensus-based group membership service in mobile Ad Hoc networksJournal of High Speed Networks10.3233/JHS-19060925:2(155-172)Online publication date: 21-Jun-2019
  • (2019)In Search of a Scalable Raft-based Replication ArchitectureProceedings of the 6th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3301419.3323968(1-7)Online publication date: 25-Mar-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media