[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
column

Theoretical Foundations and Algorithms for Outlier Ensembles

Published: 29 September 2015 Publication History

Abstract

Ensemble analysis has recently been studied in the context of the outlier detection problem. In this paper, we investigate the theoretical underpinnings of outlier ensemble analysis. In spite of the significant differences between the classification and the outlier analysis problems, we show that the theoretical underpinnings between the two problems are actually quite similar in terms of the bias-variance trade-off. We explain the existing algorithms within this traditional framework, and clarify misconceptions about the reasoning underpinning these methods. We propose more effective variants of subsampling and feature bagging. We also discuss the impact of the combination function and discuss the specific trade-offs of the average and maximization functions. We use these insights to propose new combination functions that are robust in many settings.

References

[1]
C. Aggarwal. Outlier Analysis, Springer, 2013.
[2]
C. Aggarwal. Outlier ensembles: Position paper, SIGKDD Explorations, 14(2), 2012.
[3]
C. Aggarwal, P. Yu. Outlier detection in highdimensional data. SIGMOD, 2001.
[4]
F. Angiulli, C. Pizzuti. Fast outlier detection in high dimensional spaces. PKDD, pp. 15--26, 2002.
[5]
D. Barbara, Y. Li, J. Couto, J. Lin, S. Jajodia. Bootstrapping a data mining intrusion detection system. In ACM SAC, pp. 421--425, 2003.
[6]
P. Buhlmann. Bagging, subagging and bragging for improving some prediction algorithms, Recent advances and trends in nonparametric statistics, Elsivier, 2003.
[7]
P. Buhlmann, B. Yu. Analyzing bagging. Annals of Statistics, pp. 927--961, 2002.
[8]
A. Buja, W. Stuetzle. Observations on bagging. Statistica Sinica, 16(2), 323, 2006.
[9]
M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying density-based local outliers, SIGMOD, 2000.
[10]
Y. Freund, R. Schapire. A Decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, 1995.
[11]
J. Gao, P.-N. Tan. Converting output scores from outlier detection algorithms into probability estimates. ICDM Conference, 2006.
[12]
Z. He, S. Deng, X. Xu. A unified subspace outlier ensemble framework for outlier detection. WAIM, 2005.
[13]
F. Keller, E. Muller, K. Bohm. HiCS: High-contrast subspaces for density-based outlier ranking. ICDE, 2012.
[14]
A. Lazarevic, V. Kumar. Feature bagging for outlier detection, ACM KDD Conference, 2005.
[15]
F. T. Liu, K. M. Ting, Z.-H. Zhou. Isolation forest. ICDM Conference, 2008.
[16]
P. Melville, R. Mooney. Creating diversity in ensembles using artificial data. Information Fusion, 6(1), 2005.
[17]
B. Micenkova, B. McWilliams, I. Assent. Learning representations for outlier detection on a budget. CoRR abs/1507.08104, 2015.
[18]
E. Muller, M. Schiffer, T. Seidl. Statistical selection of relevant subspace projections for outlier ranking. ICDE Conference, 2011.
[19]
H. Nguyen, H. Ang, V. Gopalakrishnan. Mining ensembles of heterogeneous detectors on random subspaces. DASFAA, 2010.
[20]
D. Politis, J. Romano, and M. Wolf. Subsampling. Springer, 1999.
[21]
S. Rayana, L. Akoglu. Less is more: Building selective anomaly ensembles. SDM Conference, 2015.
[22]
M. Shyu, S. Chen, K. Sarinnapakorn, L. Chang. A novel anomaly detection scheme based on principal component classifier. ICDMW, 2003.
[23]
A. Zimek, R. Campello, J. Sander. Ensembles for unsupervised outlier detection: Challenges and research questions, SIGKDD Explorations, 15(1), 2013.
[24]
A. Zimek, M. Gaudet, R. Campello, J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles, KDD Conference, 2013.
[25]
A. Zimek, R. Campello, J. Sander. Data perturbation for outlier detection ensembles. SSDBM, 2014.
[26]
http://elki.dbs.ifi.lmu.de/wiki/Algorithms

Cited By

View all
  • (2025)Iterative target updation based boosting ensembles for outlier detectionPattern Recognition10.1016/j.patcog.2024.111023158(111023)Online publication date: Feb-2025
  • (2024)Greedy Ensemble Hyperspectral Anomaly DetectionJournal of Imaging10.3390/jimaging1006013110:6(131)Online publication date: 28-May-2024
  • (2024)Outlier detection of clustered functional data with image and signal processing applications by archetype analysisPLOS ONE10.1371/journal.pone.031141819:11(e0311418)Online publication date: 25-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 17, Issue 1
June 2015
50 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/2830544
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 September 2015
Published in SIGKDD Volume 17, Issue 1

Check for updates

Qualifiers

  • Column

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)112
  • Downloads (Last 6 weeks)10
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Iterative target updation based boosting ensembles for outlier detectionPattern Recognition10.1016/j.patcog.2024.111023158(111023)Online publication date: Feb-2025
  • (2024)Greedy Ensemble Hyperspectral Anomaly DetectionJournal of Imaging10.3390/jimaging1006013110:6(131)Online publication date: 28-May-2024
  • (2024)Outlier detection of clustered functional data with image and signal processing applications by archetype analysisPLOS ONE10.1371/journal.pone.031141819:11(e0311418)Online publication date: 25-Nov-2024
  • (2024)Fast Unsupervised Deep Outlier Model Selection with HypernetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672003(585-596)Online publication date: 25-Aug-2024
  • (2024)Outlier Detection Using a GPU-Based Parallel Algorithm: Quantum ClusteringInternational Journal on Artificial Intelligence Tools10.1142/S021821302350077X33:04Online publication date: 30-May-2024
  • (2024)Smoothing Outlier Scores Is All You Need to Improve Outlier DetectorsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333275736:11(7044-7057)Online publication date: 1-Nov-2024
  • (2024)Regional Ensemble for Improving Unsupervised Outlier DetectorsIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33811025:9(4391-4402)Online publication date: Sep-2024
  • (2024)Unveiling Anomalies: A Review of Anomaly Detection Through Lens of Explainable AI2024 Third International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN)10.1109/ICSTSN61422.2024.10670939(1-6)Online publication date: 18-Jul-2024
  • (2024)ADecimo: Model Selection for Time Series Anomaly Detection2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00423(5441-5444)Online publication date: 13-May-2024
  • (2024)An Interactive Dive into Time-Series Anomaly Detection2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00409(5382-5386)Online publication date: 13-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media