More Web Proxy on the site http://driver.im/

research-article

Public Access

Sequential Feature Explanations for Anomaly Detection

Authors:

Md Amran Siddiqui,

Thomas G. Dietterich,

Weng-Keen WongAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 13, Issue 1

Article No.: 1, Pages 1 - 22

https://doi.org/10.1145/3230666

Published: 09 January 2019 Publication History

All formats PDF

Abstract

In many applications, an anomaly detection system presents the most anomalous data instance to a human analyst, who then must determine whether the instance is truly of interest (e.g., a threat in a security setting). Unfortunately, most anomaly detectors provide no explanation about why an instance was considered anomalous, leaving the analyst with no guidance about where to begin the investigation. To address this issue, we study the problems of computing and evaluating sequential feature explanations (SFEs) for anomaly detectors. An SFE of an anomaly is a sequence of features, which are presented to the analyst one at a time (in order) until the information contained in the highlighted features is enough for the analyst to make a confident judgement about the anomaly. Since analyst effort is related to the amount of information that they consider in an investigation, an explanation’s quality is related to the number of features that must be revealed to attain confidence. In this article, we first formulate the problem of optimizing SFEs for a particular density-based anomaly detector. We then present both greedy algorithms and an optimal algorithm, based on branch-and-bound search, for optimizing SFEs. Finally, we provide a large scale quantitative evaluation of these algorithms using a novel framework for evaluating explanations. The results show that our algorithms are quite effective and that our best greedy algorithm is competitive with optimal solutions.

References

[1]

David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification decisions. Journal of Machine Learning Research 11 (2010), 1803--1831.

Digital Library

[2]

Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning, Vol. 4. Springer, New York.

Digital Library

[3]

Xuan Hong Dang, Ira Assent, Raymond T. Ng, Arthur Zimek, and Erich Schubert. 2014. Discriminative features for identifying and interpreting outliers. In Proceedings of the IEEE 30th International Conference on Data Engineering (ICDE’14). 88--99.

[4]

Xuan Hong Dang, Barbora Micenková, Ira Assent, and Raymond T. Ng. 2013. Local outlier detection with interpretation. In Machine Learning and Knowledge Discovery in Databases. Springer, 304--320.

Digital Library

[5]

Houtao Deng. 2013. Guided random forest in the RRF package. arXiv:1306.0237.

[6]

Lei Duan, Guanting Tang, Jian Pei, James Bailey, Akiko Campbell, and Changjie Tang. 2015. Mining outlying aspects on numeric data. Data Mining and Knowledge Discovery 29, 5 (2015), 1116--1151.

Digital Library

[7]

Andrew F. Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, and Weng-Keen Wong. 2013. Systematic construction of anomaly detection benchmarks from real data. In Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description. ACM, 16--21.

Digital Library

[8]

Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems. Journal of Machine Learning Research 15, 1 (2014), 3133--3181.

Digital Library

[9]

Seth Hettich and S. D. Bay. 1999. The UCI KDD Archive. Department of Information and Computer Science, University of California, Irvine, CA. Retrieved from http://kdd.ics.uci.edu.

[10]

Andreas Krause and Daniel Golovin. 2014. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems, Lucas Bordeaux, Youssef Hamadi, and Pushmeet Kohli (Eds.). Cambridge University Press, 71--104.

[11]

Barbora Micenková, Raymond T. Ng, Xuan-Hong Dang, and Ira Assent. 2013. Explaining outliers by subspace separability. In Proceedings of the IEEE 13th International Conference on Data Mining (ICDM’13). 518--527.

[12]

Marko Robnik-Sikonja and Igor Kononenko. 2008. Explaining classifications for individual instances. IEEE Transactions on Knowledge and Data Engineering 20, 5 (2008), 589--600.

Digital Library

[13]

Erik Strumbelj and Igor Kononenko. 2010. An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research 11 (2010), 1--18.

Digital Library

[14]

Erik Štrumbelj and Igor Kononenko. 2014. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems 41, 3 (2014), 647--665.

Digital Library

[15]

Nguyen Xuan Vinh, Jeffrey Chan, James Bailey, Christopher Leckie, Kotagiri Ramamohanarao, and Jian Pei. 2015. Scalable outlying-inlying aspects discovery via feature ranking. In Advances in Knowledge Discovery and Data Mining. Springer, 422--434.

Cited By

Antwarg Friedman LGaled CRokach LShapira B(2024)Evaluating Anomaly Explanations Using Ground TruthAI10.3390/ai50401175:4(2375-2392)Online publication date: 15-Nov-2024
https://doi.org/10.3390/ai5040117
Ali Tousi SDeSouza G(2024)Outlier Interpretation Using Regularized Auto Encoders and Genetic Algorithm2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10612022(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/CEC60901.2024.10612022
Gao YLin QYe SCheng YZhang TLiang BLu W(2024)Outlier detection in temporal and spatial sequences via correlation analysis based on graph neural networksDisplays10.1016/j.displa.2024.10277584(102775)Online publication date: Sep-2024
https://doi.org/10.1016/j.displa.2024.102775
Show More Cited By

Index Terms

Sequential Feature Explanations for Anomaly Detection
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Anomaly detection

Recommendations

A Survey on Explainable Anomaly Detection
In the past two decades, most research on anomaly detection has focused on improving the accuracy of the detection, while largely ignoring the explainability of the corresponding methods and thus leaving the explanation of outcomes to practitioners. As ...
Toward Explainable Deep Anomaly Detection
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Anomaly explanation, also known as anomaly localization, is as important as, if not more than, anomaly detection in many real-world applications. However, it is challenging to build explainable detection models due to the lack of anomaly-supervisory ...
Explainable contextual anomaly detection using quantile regression forests
Abstract
Traditional anomaly detection methods aim to identify objects that deviate from most other objects by treating all features equally. In contrast, contextual anomaly detection methods aim to detect objects that deviate from other objects within a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 13, Issue 1

February 2019

340 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3301280

Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
University of Louisiana at Lafayette, USA

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 January 2019

Accepted: 01 April 2018

Revised: 01 April 2018

Received: 01 December 2016

Published in TKDD Volume 13, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

DARPA
Future of Life Institute

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
2,511
Total Downloads

Downloads (Last 12 months)300
Downloads (Last 6 weeks)32

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Antwarg Friedman LGaled CRokach LShapira B(2024)Evaluating Anomaly Explanations Using Ground TruthAI10.3390/ai50401175:4(2375-2392)Online publication date: 15-Nov-2024
https://doi.org/10.3390/ai5040117
Ali Tousi SDeSouza G(2024)Outlier Interpretation Using Regularized Auto Encoders and Genetic Algorithm2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10612022(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/CEC60901.2024.10612022
Gao YLin QYe SCheng YZhang TLiang BLu W(2024)Outlier detection in temporal and spatial sequences via correlation analysis based on graph neural networksDisplays10.1016/j.displa.2024.10277584(102775)Online publication date: Sep-2024
https://doi.org/10.1016/j.displa.2024.102775
Panjei EGruenwald L(2024)Discovering outlying attributes of outliers in data streamsData & Knowledge Engineering10.1016/j.datak.2024.102349154(102349)Online publication date: Nov-2024
https://doi.org/10.1016/j.datak.2024.102349
Škvára VŠmídl VPevný T(2024)Anomaly detection in multifactor dataNeural Computing and Applications10.1007/s00521-024-10291-236:34(21561-21580)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s00521-024-10291-2
Li ZZhu YVan Leeuwen M(2023)A Survey on Explainable Anomaly DetectionACM Transactions on Knowledge Discovery from Data10.1145/360933318:1(1-54)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3609333
Panjei EGruenwald L(2023)EXOS: Explaining Outliers in Data StreamsBig Data Analytics and Knowledge Discovery10.1007/978-3-031-39831-5_3(25-41)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-39831-5_3
Ding XWang HLi GLi HLi YLiu Y(2022)IoT data cleaning techniques: A surveyIntelligent and Converged Networks10.23919/ICN.2022.00263:4(325-339)Online publication date: Dec-2022
https://doi.org/10.23919/ICN.2022.0026
Ghalehtaki REbrahimzadeh AWuhib FGlitho R(2022)An Unsupervised Machine Learning-based Method for Detection and Explanation of Anomalies in Cloud Environments2022 25th Conference on Innovation in Clouds, Internet and Networks (ICIN)10.1109/ICIN53892.2022.9758126(24-31)Online publication date: 7-Mar-2022
https://doi.org/10.1109/ICIN53892.2022.9758126
Mokoena TCelik TMarivate V(2022)Why is this an anomaly? Explaining anomalies using sequential explanationsPattern Recognition10.1016/j.patcog.2021.108227121:COnline publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1016/j.patcog.2021.108227
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents