[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1555349.1555360acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Reference-driven performance anomaly identification

Published: 15 June 2009 Publication History

Abstract

Complex system software allows a variety of execution conditions on system configurations and workload properties. This paper explores a principled use of reference executions--those of similar execution conditions from the target--to help identify the symptoms and causes of performance anomalies. First, to identify anomaly symptoms, we construct change profiles that probabilistically characterize expected performance deviations between target and reference executions. By synthesizing several single-parameter change profiles, we can scalably identify anomalous reference-to-target changes in a complex system with multiple execution parameters. Second, to narrow the scope of anomaly root cause analysis, we filter anomaly-related low-level system metrics as those that manifest very differently between target and reference executions. Our anomaly identification approach requires little expert knowledge or detailed models on system internals and consequently it can be easily deployed. Using empirical case studies on the Linux I/O subsystem and a J2EE-based distributed online service, we demonstrate our approach's effectiveness in identifying performance anomalies over a wide range of execution conditions as well as multiple system software versions. In particular, we discovered five previously unknown performance anomaly causes in the Linux 2.6.23 kernel. Additionally, our preliminary results suggest that online anomaly detection and system reconfiguration may help evade performance anomalies in complex online systems.

References

[1]
Realistic nonstationary online workloads. http://www.cs.rochester.edu/u/stewart/models.html.
[2]
MySQL JDBC driver. http://www.mysql.com/products/connector.
[3]
R.A. Fisher. The arrangement of field experiments. J. of the Ministry of Agriculture of Great Britain, 33:503--513, 1926.
[4]
M. Grindal, J. Offutt, and S.F. Andler. Combination testing strategies: A survey. Software Testing, Verification and Reliability, 15(3):167--199, Mar. 2005.
[5]
S. Iyer and P. Druschel. Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O. In 18th ACM Symp. on Operating Systems Principles, pages 117--130, Banff, Canada, Oct. 2001.
[6]
N. Joukov, A. Traeger, R. Iyer, C.P. Wright, and E. Zadok. Operating system profiling via latency analysis. In 7th USENIX Symp. on Operating Systems Design and Implementation, pages 89--102, Seattle, WA, Nov. 2006.
[7]
C. Li and K. Shen. Managing prefetch memory for data-intensive online servers. In 4th USENIX Conf. on File and Storage Technologies, pages 253--266, Dec. 2005.
[8]
C. Li, K. Shen, and A. Papathanasiou. Competitive prefetching for concurrent sequential I/O. In Second EuroSys Conf., pages 189--202, Lisbon, Portugal, Mar. 2007.
[9]
Linux kernel bug tracker. http://bugzilla.kernel.org/.
[10]
Linux kernel bug tracker on "many pre-mature anticipation timeouts in anticipatory I/O scheduler". http://bugzilla.kernel.org/show_bug.cgi?id=10756.
[11]
M.P. Mesnier, M. Wachs, R.R. Sambasivan, A.X. Zheng, and G.R. Ganger. Modeling the relative fitness of storage. In ACM SIGMETRICS, pages 37--48, San Diego, CA, June 2007.
[12]
P. Reynolds, C. Killian, J. Wiener, J. Mogul, M. Shah, and A. Vahdat. Pip: Detecting the unexpected in distributed systems. In Third USENIX Symp. on Networked Systems Design and Implementation, San Jose, CA, May 2006.
[13]
RUBiS: Rice University bidding system. http://rubis.objectweb.org.
[14]
Y. Rubner, C. Tomasi, and L.J. Guibas. The earth mover's distance as a metric for image retrieval. Int'l J. of Computer Vision, 40(2):99--121, 2000.
[15]
K. Shen, M. Zhong, and C. Li. I/O system performance debugging using model-driven anomaly characterization. In 4th USENIX Conf. on File and Storage Technologies, pages 309--322, San Francisco, CA, Dec. 2005.
[16]
C. Stewart, T. Kelly, and A. Zhang. Exploiting nonstationarity for performance prediction. In Second EuroSys Conf., pages 31--44, Lisbon, Portugal, Mar. 2007.
[17]
C. Stewart and K. Shen. Performance modeling and system management for multi-component online services. In Second USENIX Symp. on Networked Systems Design and Implementation, pages 71--84, Boston, MA, May 2005.
[18]
E. Thereska and G.R. Ganger. IRONModel: Robust performance models in the wild. In ACM SIGMETRICS, pages 253--264, Annapolis, MD, June 2008.
[19]
A. Traeger, I. Deras, and E. Zadok. DARC: Dynamic analysis of root causes of latency distributions. In ACM SIGMETRICS, pages 277--288, Annapolis, MD, June 2008.
[20]
J. Tucek, S. Lu, C. Huang, S. Xanthos, and Y. Zhou. Triage: Diagnosing production run failures at the user's site. In 21th ACM Symp. on Operating Systems Principles, pages 131--144, Stevenson, WA, Oct. 2007.
[21]
H.J. Wang, J.C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic misconfiguration troubleshooting with PeerPressure. In 6th USENIX Symp. on Operating Systems Design and Implementation, pages 245--258, San Francisco, CA, Dec. 2004.
[22]
A. Zeller. Isolating cause-effect chains from computer programs. In 10th ACM Symp. on Foundations of Software Engineering, pages 1--10, Charleston, SC, Nov. 2002.

Cited By

View all
  • (2023)FSFP: A Fine-Grained Online Service System Performance Fault Prediction Method Based on Cross-attention2023 30th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC60848.2023.00018(81-90)Online publication date: 4-Dec-2023
  • (2021)Predicting Performance Anomalies in Software Systems at Run-timeACM Transactions on Software Engineering and Methodology10.1145/344075730:3(1-33)Online publication date: 23-Apr-2021
  • (2019)Hytrace: A Hybrid Approach to Performance Bug Diagnosis in Production Cloud InfrastructuresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.285880030:1(107-118)Online publication date: 1-Jan-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
June 2009
336 pages
ISBN:9781605585116
DOI:10.1145/1555349
  • cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 37, Issue 1
    SIGMETRICS '09
    June 2009
    320 pages
    ISSN:0163-5999
    DOI:10.1145/2492101
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. operating system
  2. performance anomaly

Qualifiers

  • Research-article

Conference

SIGMETRICS09

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)FSFP: A Fine-Grained Online Service System Performance Fault Prediction Method Based on Cross-attention2023 30th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC60848.2023.00018(81-90)Online publication date: 4-Dec-2023
  • (2021)Predicting Performance Anomalies in Software Systems at Run-timeACM Transactions on Software Engineering and Methodology10.1145/344075730:3(1-33)Online publication date: 23-Apr-2021
  • (2019)Hytrace: A Hybrid Approach to Performance Bug Diagnosis in Production Cloud InfrastructuresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.285880030:1(107-118)Online publication date: 1-Jan-2019
  • (2018)Differential energy profilingProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291206(511-526)Online publication date: 8-Oct-2018
  • (2018)Hang doctorProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190525(1-15)Online publication date: 23-Apr-2018
  • (2018)Anomaly Detection in Complex Real World Application SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2017.277140315:1(83-96)Online publication date: Mar-2018
  • (2017)Statistical Analysis of Latency Through Semantic ProfilingProceedings of the Twelfth European Conference on Computer Systems10.1145/3064176.3064179(64-79)Online publication date: 23-Apr-2017
  • (2017)Lightweight and Adaptive Service API Performance Monitoring in Highly Dynamic Cloud Environment2017 IEEE International Conference on Services Computing (SCC)10.1109/SCC.2017.80(35-43)Online publication date: Jun-2017
  • (2016)The Good, the Bad, and the DifferencesProceedings of the 2016 ACM SIGCOMM Conference10.1145/2934872.2934910(115-128)Online publication date: 22-Aug-2016
  • (2016)A Scalable, Non-Parametric Method for Detecting Performance Anomaly in Large Scale ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.247574127:7(1902-1914)Online publication date: 1-Jul-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media