[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3663338.3665826acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
research-article
Open access

Tracing the Latencies of Ares: A DSM Case Study

Published: 20 June 2024 Publication History

Abstract

Distributed tracing is a method used to monitor applications by tracking and visualizing requests as they move across various components and services in a distributed system. Despite being widely adopted in major cloud-computing applications, to the best of our knowledge, distributed tracing has not been employed in Distributed Shared Memory (DSM) emulations. In such emulations, typically, a set of networked nodes (servers) maintain copies of the memory data, and a set of clients (readers/writers) access the data by sending messages to the servers. The main challenge in this environment is to maintain the consistency of the data despite asynchrony and failures. Traditionally, the latency of operations in DSM implementations has been evaluated through simple log-based strategies providing a high-level performance analysis.
This paper introduces distributed tracing to DSM, in an attempt to provide a fine-grain performance analysis, helping to identify performance bottlenecks. To this respect, we use Ares as a case study. Ares is a crash-tolerant DSM algorithm, providing atomic consistency and supporting dynamic participation of networked nodes. Our approach employs a set of tracing tools: Opentelemetry for code instrumentation, Jaeger for telemetry data collection, and Grafana for visualization.

References

[1]
Emulab Network Testbed. https://www.emulab.net/. Accessed: [14/02/2024].
[2]
Grafana. https://grafana.com. Accessed: [14/02/2024].
[3]
Opentelemetry. https://opentelemetry.io. Accessed: [14/02/2024].
[4]
OpenTelemetry-Python. https://github.com/open-telemetry/opentelemetry-python.
[5]
PySyncObj. https://github.com/bakwc/PySyncObj.
[6]
C. Aniszczyk. 2012. Distributed Systems Tracing with Zipkin. https://blog.twitter.com/2012/distributed-systems-tracing-with-zipkin Google Scholar.
[7]
A.F. Anta, C. Georgiou, T. Hadjistasi, E. Stavrakis, and A. Trigeorgi. 2021. Fragmented Object : Boosting Concurrency of Shared Large Objects. In Proc. of SIROCCO (2021), 1--18.
[8]
Hagit Attiya. 2010. Robust Simulation of Shared Memory: 20 Years After. Bulletin of the EATCS 100 (2010), 99--114.
[9]
H. Attiya, A. Bar-Noy, and D. Dolev. 1995. Sharing Memory Robustly in Message-Passing Systems. Journal of the ACM (JACM) 42, 1 (1995), 124--142.
[10]
Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. 2004. Using Magpie for Request Extraction and Workload Modelling. In 6th Symposium on OSDI 04. USENIX, San Francisco, CA. https://www.usenix.org/conference/osdi-04/using-magpie-request-extraction-and-workload-modelling
[11]
M.Y. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer. 2002. Pinpoint: problem determination in large, dynamic Internet services. In Proc. International Conference on Dependable Systems and Networks. 595--604.
[12]
C. Georgiou, N. Nicolaou, and A. Trigeorgi. 2022. Fragmented ARES: Dynamic Storage for Large Objects. In Proc. of the 36th International Symposium on DISC. 25:1--25:24. Also at arXiv:2201.13292.
[13]
C. Georgiou, N. Nicolaou, and A. Trigeorgi. 2024. Ares II: Tracing the Flaws of a (Storage) God. (2024). https://tinyurl.com/axrmxxa8.
[14]
S. Gilbert, N. A. Lynch, and A. A. Shvartsman. 2010. RAMBO: A Robust, Reconfigurable Atomic Memory Service for Dynamic Networks. Distributed Comput. 23, 4 (2010), 225--272.
[15]
Vincent Gramoli, Nicolas Nicolaou, and Alexander A. Schwarzmann. 2021. Consistent Distributed Storage. Vol. 20. 1--192 pages.
[16]
M. P. Herlihy and J. M. Wing. 1990. Linearizability: a Correctness Condition for Concurrent Objects. ACM TOPLAS 12, 3 (1990), 463--492.
[17]
L. Jehl, R. Vitenberg, and H. Meling. 2015. Smartmerge: A new approach to reconfiguration for atomic storage. In International Symposium on Distributed Computing. Springer, 154--169.
[18]
Leslie Lamport. 1986. On Interprocess Communication, Part I: Basic Formalism. Distributed Computing 1, 2 (1986), 77--85.
[19]
N. Nicolaou, V. Cadambe, N. Prakash, A. Trigeorgi, K. Konwar, M. Medard, and N. Lynch. 2022. Ares: Adaptive, Reconfigurable, Erasure coded, Atomic Storage. ACM Trans. Storage 18, 4, Article 33 (nov 2022), 39 pages. Also in https://arxiv.org/abs/1805.03727.
[20]
N. Nicolaou, A. Fernández Anta, and C. Georgiou. 2016. Coverability: Consistent Versioning in Asynchronous, Fail-Prone, Message-Passing Environments. In Proc. of IEEE NCA 2016. IEEE.
[21]
Austin Parker, Daniel Spoonhower, Jonathan Mace, Ben Sigelman, and Rebecca Isaacs. 2020. Distributed Tracing in Practice. O'Reilly Media, Inc.
[22]
Irving S. Reed and Gustave Solomon. 1960. Polynomial Codes Over Certain Finite Fields. Journal of The Society for Industrial and Applied Mathematics 8 (1960), 300--304.
[23]
B. Sigelman, Luiz A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag. 2010. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. In Proc. of the 10th USENIX Symposium on OSDI. Google.
[24]
B. Sigelman and contributors. 2016. OpenTracing. https://opentracing.io/ Accessed: [14/02/2024].
[25]
A. Trigeorgi, N. Nicolaou, C. Georgiou, T. Hadjistasi, E. Stavrakis, V. Cadambe, and B. Urgaonkar. 2022. Invited Paper: Towards Practical Atomic Distributed Shared Memory: An Experimental Evaluation. In Proc. of SSS 2022 (Clermont-Ferrand, France). Springer-Verlag, Berlin, Heidelberg, 35--50.
[26]
Uber Technologies, Inc. 2023. Jaeger Distributed Tracing. https://www.jaegertracing.io/ Accessed: [14/02/2024].
[27]
Y. Wang, S. Ma, Y. Lai, and Y. Liang. 2022. Analyzing and Monitoring Kubernetes Microservices based on Distributed Tracing and Service Mesh. In Proc. of APSEC. 477--481.

Cited By

View all
  • (2024)Ares II: Tracing the Flaws of a (Storage) God2024 43rd International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS64841.2024.00027(187-197)Online publication date: 30-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ApPLIED'24: Proceedings of the 2024 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems
June 2024
95 pages
ISBN:9798400706707
DOI:10.1145/3663338
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2024

Check for updates

Author Tags

  1. distributed tracing
  2. distributed shared storage
  3. bottlenecks
  4. strong consistency
  5. reconfiguration

Qualifiers

  • Research-article

Funding Sources

Conference

ApPLIED'24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 3 of 4 submissions, 75%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)116
  • Downloads (Last 6 weeks)26
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Ares II: Tracing the Flaws of a (Storage) God2024 43rd International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS64841.2024.00027(187-197)Online publication date: 30-Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media