[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2789770.2789774guideproceedingsArticle/Chapter ViewAbstractPublication PagesnsdiConference Proceedingsconference-collections
Article

Designing distributed systems using approximate synchrony in data center networks

Published: 04 May 2015 Publication History

Abstract

Distributed systems are traditionally designed independently from the underlying network, making worst-case assumptions (e.g., complete asynchrony) about its behavior. However, many of today's distributed applications are deployed in data centers, where the network is more reliable, predictable, and extensible. In these environments, it is possible to co-design distributed systems with their network layer, and doing so can offer substantial benefits.
This paper explores network-level mechanisms for providing Mostly-Ordered Multicast (MOM): a best-effort ordering property for concurrent multicast operations. Using this primitive, we design Speculative Paxos, a state machine replication protocol that relies on the network to order requests in the normal case. This approach leads to substantial performance benefits: under realistic data center conditions, Speculative Paxos can provide 40% lower latency and 2.6× higher throughput than the standard Paxos protocol. It offers lower latency than a latency-optimized protocol (Fast Paxos) with the same throughput as a throughput-optimized protocol (batching).

References

[1]
M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In Proceedings of ACM SIGCOMM 2008, Seattle, WA, USA, Aug. 2008.
[2]
T. Benson, A. Akella, and D. A. Maltz. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC '10), Nov. 2010.
[3]
K. P. Birman and T. A. Joseph. Exploiting virtual synchrony in distributed systems. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP '87), Austin, TX, USA, Oct. 1987.
[4]
M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI '06), Seattle, WA, USA, Nov. 2006.
[5]
L. Camargos, R. Schmidt, and F. Pedone. Multicoordinated Paxos. Technical report, University of Lugano Faculty of Informatics, 2007/02, Jan. 2007.
[6]
M. Castro and B. Liskov. Practical Byzantine fault tolerance. In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation (OSDI '99), New Orleans, LA, USA, Feb. 1999.
[7]
T. D. Chandra, V. Hadzilacos, and S. Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 43(4):685-722, July 1996.
[8]
Cisco data center infrastructure design guide 2.5. https://www.cisco.com/application/pdf/en/us/guest/netsol/ns107/c649/ccmigration_ 09186a008073377d.pdf.
[9]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI '12), Hollywood, CA, USA, Oct. 2012.
[10]
J. Cowling and B. Liskov. Granola: Low-overhead distributed transaction coordination. In Proceedings of the 2012 USENIX Annual Technical Conference, Boston, MA, USA, June 2012.
[11]
C. Dwork, N. Lynch, and L. Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):228-323, Apr. 1988.
[12]
D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, and L. Wei. Protocol independent multicast-sparse mode (PIM-SM): Protocol specification. RFC 2117, June 1997. https://tools.ietf.org/html/rfc2117.
[13]
M. J. Fischer, N. A. Lynch, and M. S. Patterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374-382, Apr. 1985.
[14]
P. Gill, N. Jain, and N. Nagappan. Understanding network failures in data centers: Measurement, analysis, and implications. In Proceedings of ACM SIGCOMM 2011, Toronto, ON, Canada, Aug. 2011.
[15]
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A scalable and flexible data center network. In Proceedings of ACM SIGCOMM 2009, Barcelona, Spain, Aug. 2009.
[16]
M. P. Herlihy and J. M. Wing. Linearizabiliy: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463-492, July 1990.
[17]
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for Internetscale systems. In Proceedings of the 2010 USENIX Annual Technical Conference, Boston, MA, USA, June 2010.
[18]
IEEE 802.1 Data Center Bridging. http://www. ieee802.org/1/pages/dcbridges.html.
[19]
F. Junqueira, B. Reed, and M. Serafini. Zab: High-performance broadcast for primary-backup systems. In Proceedings of the 41st IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '11), Hong Kong, China, June 2011.
[20]
M. Kapritsos, Y. Wang, V. Quema, A. Clement, L. Alvisi, and M. Dahlin. All about Eve: Execute-verify replication for multi-core servers. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI '12), Hollywood, CA, USA, Oct. 2012.
[21]
B. Kemme, F. Pedone, G. Alonso, and A. Schiper. Processing transactions over optimistic atomic broadcast protocols. In Proceedings of the 13th International Symposium on Distributed Computing (DISC '99), Bratislava, Slovakia, Sept. 1999.
[22]
R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: Speculative Byzantine fault tolerance. In Proceedings of the 21th ACM Symposium on Operating Systems Principles (SOSP '07), Stevenson, WA, USA, Oct. 2007.
[23]
L. Lamport. Time, clocks, and ordering of events in a distributed system. Communications of the ACM, 21(7):558-565, July 1978.
[24]
L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133-169, May 1998.
[25]
L. Lamport. Paxos made simple. ACM SIGACT News, 32(4):18-25, Dec. 2001.
[26]
L. Lamport. Generalized consensus and Paxos. Technical Report MSR-TR-2005-33, Microsoft Research, Mar. 2005.
[27]
L. Lamport. Fast Paxos. Distributed Computing, 19(2):79-103, Oct. 2006.
[28]
L. Lamport. Lower bounds for asynchronous consensus. Distributed Computing, 19(2):104-125, Oct. 2006.
[29]
B. Liskov and J. Cowling. Viewstamped replication revisited. Technical Report MIT-CSAIL-TR-2012- 021, MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA, July 2012.
[30]
V. Liu, D. Halperin, A. Krishnamurthy, and T. Anderson. F10: A fault-tolerant engineered network. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI '13), Lombard, IL, USA, Apr. 2013.
[31]
N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. OpenFlow: enabling innovation in campus networks. ACM SIGCOMM Computer Communication Review, 38(2):69-74, Apr. 2008.
[32]
I. Moraru, D. G. Andersen, and M. Kaminsky. There is more consensus in egalitarian parliaments. In Proceedings of the 25rd ACM Symposium on Operating Systems Principles (SOSP '13), Farmington, PA, USA, Nov. 2013.
[33]
R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: A scalable faulttolerant layer 2 data center network fabric. In Proceedings of ACM SIGCOMM 2009, Barcelona, Spain, Aug. 2009.
[34]
E. B. Nightingale, P. M. Chen, and J. Flinn. Speculative execution in a distributed file system. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP '05), Brighton, United Kingdom, Oct. 2005.
[35]
B. M. Oki and B. H. Liskov. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the 7th ACM Symposium on Principles of Distributed Computing (PODC '88), Toronto, Ontario, Canada, Aug. 1988.
[36]
F. Pedone and A. Schiper. Optimistic atomic broadcast. In Proceedings of the 12th International Symposium on Distributed Computing (DISC '98), Andros, Greece, Sept. 1998.
[37]
S. M. Rumble, D. Ongaro, R. Stutsman, M. Rosenblum, and J. K. Ousterhout. It's time for low latency. In Proceedings of the 13th Workshop on Hot Topics in Operating Systems (HotOS '11), Napa, CA, USA, May 2011.
[38]
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Computing Surveys, 22(4):299-319, Dec. 1990.
[39]
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07), Vienna, Austria, Sept. 2007.
[40]
P. Urbán, X. Défago, and A. Schiper. Chasing the FLP impossibility result in a LAN: or, how robust can a fault tolerant server be? In Proceedings of the 20th IEEE Symposium on Reliable Distributed Systems (SRDS '01), New Orleans, LA USA, Oct. 2001.
[41]
B. Wester, J. Cowling, E. Nightingale, P. M. Chen, J. Flinn, and B. Liskov. Tolerating latency in replicated state machines through client speculation. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI '09), Boston, MA, USA, Apr. 2009.

Cited By

View all
  • (2024)IONIAProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650711(225-242)Online publication date: 27-Feb-2024
  • (2024)LazyLog: A New Shared Log Abstraction for Low-Latency ApplicationsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695983(296-312)Online publication date: 4-Nov-2024
  • (2023)NeoBFT: Accelerating Byzantine Fault Tolerance Using Authenticated In-Network OrderingProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604874(239-254)Online publication date: 10-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NSDI'15: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation
May 2015
620 pages
ISBN:9781931971218

Sponsors

  • VMware
  • NSF: National Science Foundation
  • Google Inc.
  • Microsoft Reasearch: Microsoft Reasearch
  • CISCO

Publisher

USENIX Association

United States

Publication History

Published: 04 May 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)IONIAProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650711(225-242)Online publication date: 27-Feb-2024
  • (2024)LazyLog: A New Shared Log Abstraction for Low-Latency ApplicationsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695983(296-312)Online publication date: 4-Nov-2024
  • (2023)NeoBFT: Accelerating Byzantine Fault Tolerance Using Authenticated In-Network OrderingProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604874(239-254)Online publication date: 10-Sep-2023
  • (2023)FlexLog: A Shared Log for Stateful Serverless ComputingProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592993(195-209)Online publication date: 7-Aug-2023
  • (2022)NezhaProceedings of the VLDB Endowment10.14778/3574245.357425016:4(629-642)Online publication date: 1-Dec-2022
  • (2022)From luna to solarProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544238(753-766)Online publication date: 22-Aug-2022
  • (2021)Scaling replicated state machines with compartmentalizationProceedings of the VLDB Endowment10.14778/3476249.347627314:11(2203-2215)Online publication date: 1-Jul-2021
  • (2021)MimicNetProceedings of the 2021 ACM SIGCOMM 2021 Conference10.1145/3452296.3472926(287-304)Online publication date: 9-Aug-2021
  • (2021)Strong and Efficient Consistency with Consistency-aware DurabilityACM Transactions on Storage10.1145/342313817:1(1-27)Online publication date: 18-Jan-2021
  • (2020)FLAIRProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388295(723-738)Online publication date: 25-Feb-2020
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media