[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Streaming anomaly detection using randomized matrix sketching

Published: 01 November 2015 Publication History

Abstract

Data is continuously being generated from sources such as machines, network traffic, application logs, etc. Timely and accurate detection of anomalies in massive data streams has important applications such as in preventing machine failures, intrusion detection, and dynamic load balancing. In this paper, we introduce a novel (unsupervised) anomaly detection framework which can be used to detect anomalies in a streaming fashion by making only one pass over the data while utilizing limited storage. We adapt ideas from matrix sketching to maintain, in a streaming model, a set of few orthogonal vectors that form a good approximate basis for all the observed data. Using this constructed orthogonal basis, anomalies in new incoming data are detected based on a simple reconstruction error test. We theoretically prove that our algorithm compares favorably with an offline approach based on expensive global singular value decomposition (SVD) updates. Additionally, we apply ideas from randomized low-rank matrix approximations to further speedup the algorithm. The experimental results show the effectiveness and efficiency of our approach over other popular scalable anomaly detection approaches.

References

[1]
C. Aggarwal. Outlier Analysis. Springer, 2013.
[2]
T. Ahmed, M. Coates, and A. Lakhina. Multivariate Online Anomaly Detection using Kernel Recursive Least Squares. In INFOCOM, 2007.
[3]
C. G. Baker, K. A. Gallivan, and P. V. Dooren. Low-Rank Incremental Methods for Computing Dominant Singular Subspaces. Linear Algebra and its Applications, 2012.
[4]
L. Balzano, R. Nowak, and B. Recht. Online Identification and Tracking of Subspaces from Highly Incomplete Information. In Annual Allerton Conference, 2010.
[5]
M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying Density-based Local Outliers. SIGMOD, 29(2), 2000.
[6]
P. Businger. Updating a Singular Value Decomposition. BIT, 10(3):376--385, 1970.
[7]
R. Caruana, T. Joachims, and L. Backstrom. KDD-Cup 2004: Results and Analysis. JMLR, 6(2):95--108, 2004.
[8]
Y. Chahlaoui, K. Gallivan, and P. Van Dooren. Recursive Calculation of Dominant Singular Subspaces. SIMAX, 25(2), 2003.
[9]
V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys, 2009.
[10]
X. Chen, W. Li, and W. Xu. Perturbation Analysis of the Eigenvector Matrix and Singular Vector Matrices. Taiwanese Journal of Mathematics, 16(1), 2012.
[11]
M. Gabel, A. Schuster, and D. Keren. Communication-efficient Distributed Variance Monitoring and Outlier Detection for Multivariate Time Series. In IPDPS, 2014.
[12]
M. Ghashami, A. Desai, and J. M. Phillips. Improved Practical Matrix Sketching with Guarantees. In ESA 2014.
[13]
M. Ghashami, E. Liberty, J. M. Phillips, and D. P. Woodruff. Frequent Directions: Simple and Deterministic Matrix Sketching. CoRR, abs/1501.01711, 2015.
[14]
M. Ghashami and J. M. Phillips. Relative Errors for Deterministic Low-Rank Matrix Approximations. In SODA, pages 707--717, 2014.
[15]
G. H. Golub and C. F. Van Loan. Matrix Computations, volume 3. JHU Press, 2012.
[16]
N. Halko, P. G. Martinsson, and J. A. Tropp. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Review, 53(2), 2011.
[17]
P. M. Hall, A. D. Marshall, and R. R. Martin. Incremental Eigenanalysis for Classification. In BMVC, 1998.
[18]
S. Hido, Y. Tsuboi, H. Kashima, M. Sugiyama, and T. Kanamori. Statistical Outlier Detection using Direct Density Ratio Estimation. KAIS, 26(2), 2011.
[19]
H. Huang, H. Qin, S. Yoo, and D. Yu. Local anomaly descriptor: a robust unsupervised algorithm for anomaly detection based on diffusion space. CIKM, 2012.
[20]
H. Huang, H. Qin, S. Yoo, and D. Yu. A new anomaly detection algorithm based on quantum mechanics. ICDM, 2012.
[21]
H. Huang, H. Qin, S. Yoo, and D. Yu. Physics-Based Anomaly Detection Defined on Manifold Space. TKDD, 9(2), 2014.
[22]
L. Huang, X. Nguyen, M. Garofalakis, J. M. Hellerstein, M. I. Jordan, A. D. Joseph, and N. Taft. Communication-efficient Online Detection of Network-wide Anomalies. In INFOCOM, 2007.
[23]
L. Huang, X. Nguyen, M. Garofalakis, M. I. Jordan, A. Joseph, and N. Taft. In-network PCA and Anomaly Detection. In NIPS, pages 617--624, 2006.
[24]
S. Kasiviswanathan, H. Wang, A. Banerjee, and P. Melville. Online L1-Dictionary Learning with Application to Novel Document Detection. NIPS, 2012.
[25]
A. Lakhina, M. Crovella, and C. Diot. Characterization of Network-wide Anomalies in Traffic Flows. In SIGCOMM, 2004.
[26]
A. Levey and M. Lindenbaum. Sequential Karhunen-Loeve Basis Extraction and its Application to Images. IEEE TIP, 9(8), 2000.
[27]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A New Benchmark Collection for Text Categorization Research. ACM SIGKDD Explorations Newsletter, 5, 2004.
[28]
E. Liberty. Simple and Deterministic Matrix Sketching. In ACM SIGKDD, pages 581--588, 2013.
[29]
M. Lichman. UCI machine learning repository, 2013.
[30]
F. T. Liu, K. M. Ting, and Z. H. Zhou. Isolation Forest. IEEE ICDM, pages 413--422, 2008.
[31]
S. Liu, M. Yamada, N. Collier, and M. Sugiyama. Change-point Detection in Time-series Data by Relative Density-ratio Estimation. Neural Networks, 43:72--83, 2013.
[32]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online Learning for Matrix Factorization and Sparse Coding. JMLR, 11, 2010.
[33]
M. Markou and S. Singh. Novelty Detection: A Review--part 1: Statistical Approaches. Signal Processing, 83(12), 2003.
[34]
J. Misra and D. Gries. Finding Repeated Elements. Science of Computer Programming, 2(2):143--152, 1982.
[35]
S. Papadimitriou, J. Sun, and C. Faloutsos. Streaming Pattern Discovery in Multiple Time-series. In VLDB, 2005.
[36]
K. M. Ting, G. T. Zhou, F. T. Liu, and J. S. Tan. Mass Estimation and its Applications. ACM SIGKDD, 2010.
[37]
A. Tsymbal. The Problem of Concept Drift: Definitions and Related Work. Tech Report, 2004.
[38]
A. V. Uzilov, J. M. Keegan, and D. H. Mathews. Detection of Non-coding RNAs on the Basis of Predicted Secondary Structure Formation Free Energy Change. BMC Bioinformatics, 2006.

Cited By

View all
  • (2024)Online adaptive anomaly thresholding with confidence sequencesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693987(47105-47132)Online publication date: 21-Jul-2024
  • (2024)Energy efficient streaming time series classification with attentive power iterationProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29151(12574-12582)Online publication date: 20-Feb-2024
  • (2024)Sustainable and Explainable Neural Network for Real-Time Time Series ClassificationPattern Recognition10.1007/978-3-031-78169-8_26(391-405)Online publication date: 1-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 9, Issue 3
November 2015
144 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 November 2015
Published in PVLDB Volume 9, Issue 3

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)4
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Online adaptive anomaly thresholding with confidence sequencesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693987(47105-47132)Online publication date: 21-Jul-2024
  • (2024)Energy efficient streaming time series classification with attentive power iterationProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29151(12574-12582)Online publication date: 20-Feb-2024
  • (2024)Sustainable and Explainable Neural Network for Real-Time Time Series ClassificationPattern Recognition10.1007/978-3-031-78169-8_26(391-405)Online publication date: 1-Dec-2024
  • (2023)Online robust non-stationary estimationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668319(50506-50544)Online publication date: 10-Dec-2023
  • (2023)METER: A Dynamic Concept Adaptation Framework for Online Anomaly DetectionProceedings of the VLDB Endowment10.14778/3636218.363623317:4(794-807)Online publication date: 1-Dec-2023
  • (2023)MicroView: Cloud-Native Observability with Temporal PrecisionProceedings of the on CoNEXT Student Workshop 202310.1145/3630202.3630233(7-8)Online publication date: 8-Dec-2023
  • (2022)Anomaly Detection and Failure Root Cause Analysis in (Micro) Service-Based Cloud Applications: A SurveyACM Computing Surveys10.1145/350129755:3(1-39)Online publication date: 3-Feb-2022
  • (2021)Near optimal frequent directions for sketching dense and sparse matricesThe Journal of Machine Learning Research10.5555/3322706.336199720:1(2018-2040)Online publication date: 9-Mar-2021
  • (2021)Explainable anomaly detection on high-dimensional time series dataProceedings of the 15th ACM International Conference on Distributed and Event-based Systems10.1145/3465480.3468292(2-14)Online publication date: 28-Jun-2021
  • (2021)Using Dirichlet Marked Hawkes Processes for Insider Threat DetectionDigital Threats: Research and Practice10.1145/34579083:1(1-19)Online publication date: 22-Oct-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media