[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1827418.1827461acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

HOLMES: an event-driven solution to monitor data centers through continuous queries and machine learning

Published: 12 July 2010 Publication History

Abstract

Supervisory processes are fundamental when running data center operations striving for fault resilience: any downtime can directly affect the business's income and definitely its reputation. Current monitoring tools rely on experts to configure constant thresholds on single streams, which is not appropriated for dynamic systems and insufficient to capture complex patterns. We present HOLMES, built to support data center experts to anticipate failures with a solution that combines Event Driven Architecture, Complex Event Processing and an unsupervised machine learning algorithm. Based on rules created by the users, the system continuously checks for known problems. Meanwhile, for the unknown ones, we leverage the CEP engine for aggregating and joining streams of real-time data to feed normalized input to FRAHST, our machine learning algorithm that detects anomalous patterns across multivariate numerical streams. We describe how the UI module also operates within the publish/subscribe paradigm to enhance situational awareness. The system had very well acceptance and was successfully implemented at one of the largest Internet Service Providers in South America.

References

[1]
T. Bernhardt. Esper. http://esper.codehaus.org.
[2]
V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys, 2009.
[3]
R. Clemente and D. Vieira. Intelligent monitoring. http://dist.codehaus.org/esper/FISL_IntelligentMonitoring.pdf, July 2009. 10th International Free Software Forum (FISL).
[4]
M. Edwards. A conceptual model for event processing systems. http://www.ibm.com/developerworks/webservices/library/ws-eventprocessing/index.html, February 2010.
[5]
R. C. Harlan. Network management with nagios. Linux J., 2003(111):3, 2003.
[6]
A. Hinze, K. Sachs, and A. Buchmann. Event-based applications and enabling technologies. In DEBS 2009: Proceedings of the Third ACM International Conference on Distributed Event-Based Systems, pages 1--15, New York, NY, USA, 2009. ACM.
[7]
G. Hohpe and B. Woolf. Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2003.
[8]
E. Hoke, J. Sun, J. D. Strunk, G. R. Ganger, and C. Faloutsos. Intemon: continuous mining of sensor data in large-scale self-infrastructures. SIGOPS Oper. Syst. Rev., 40(3):38--44, 2006.
[9]
Intelie. neb2activemq: Nagios event broker plugin to activemq. http://code.google.com/p/neb2activemq/.
[10]
G. Jakobson and M. Weissman. Real-time telecommunication network management: extending event correlation with temporal constraints. In Proceedings of the fourth international symposium on Integrated network management IV, pages 290--301, London, UK, UK, 1995. Chapman Hall, Ltd.
[11]
S. Sen, N. Stojanovic, and R. Lin. A graphical editor for complex event pattern generation. In DEBS 09: Proceedings of the Third ACM International Conference on Distributed Event-Based Systems, pages 1--2, New York, NY, USA, 2009. ACM.
[12]
P. H. S. Teixeira. Data stream anomaly detection through principal subspace tracking. Master's thesis, Pontifícia Universidade Católica, Rio de Janeiro, Brazil, September 2009.
[13]
P. H. S. Teixeira and R. L. Milidiu. Data stream anomaly detection through principal subspace tracking. In SAC 2010: Proceedings of the 2010 ACM symposium on Applied Computing. ACM, To Appear To Appear.
[14]
D. Yankov, E. Keogh, and U. Rebbapragada. Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl. Inf. Syst., 17(2):241--262, 2008.

Cited By

View all
  • (2021)Failures Forecast in Monitoring Datacenter Infrastructure Through Machine Learning Techniques: A Systematic ReviewComputational Science and Its Applications – ICCSA 202110.1007/978-3-030-87013-3_3(27-42)Online publication date: 10-Sep-2021
  • (2019)Infrastructure fault detection and prediction in edge cloud environmentsProceedings of the 4th ACM/IEEE Symposium on Edge Computing10.1145/3318216.3363305(222-235)Online publication date: 7-Nov-2019
  • (2015)Malicious virtual machines detection through a clustering approach2015 International Conference on Cloud Technologies and Applications (CloudTech)10.1109/CloudTech.2015.7336986(1-8)Online publication date: Jun-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '10: Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
July 2010
303 pages
ISBN:9781605589275
DOI:10.1145/1827418
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anomaly detection
  2. complex event processing
  3. data streams
  4. messaging-oriented middleware
  5. monitoring

Qualifiers

  • Research-article

Conference

DEBS '10

Acceptance Rates

Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Failures Forecast in Monitoring Datacenter Infrastructure Through Machine Learning Techniques: A Systematic ReviewComputational Science and Its Applications – ICCSA 202110.1007/978-3-030-87013-3_3(27-42)Online publication date: 10-Sep-2021
  • (2019)Infrastructure fault detection and prediction in edge cloud environmentsProceedings of the 4th ACM/IEEE Symposium on Edge Computing10.1145/3318216.3363305(222-235)Online publication date: 7-Nov-2019
  • (2015)Malicious virtual machines detection through a clustering approach2015 International Conference on Cloud Technologies and Applications (CloudTech)10.1109/CloudTech.2015.7336986(1-8)Online publication date: Jun-2015
  • (2014)CEP4CMA: Multi-layer Cloud Performance Monitoring and Analysis via Complex Event ProcessingNetworked Systems10.1007/978-3-319-09581-3_10(138-152)Online publication date: 3-Aug-2014
  • (2013)A Dynamic Complex Event Processing Architecture for Cloud Monitoring and AnalysisProceedings of the 2013 IEEE International Conference on Cloud Computing Technology and Science - Volume 0210.1109/CloudCom.2013.146(270-275)Online publication date: 2-Dec-2013
  • (2012)Predictive complex event processingProceedings of the Fifth Balkan Conference in Informatics10.1145/2371316.2371323(26-31)Online publication date: 16-Sep-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media