[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Real-time distance-based outlier detection in data streams

Published: 01 October 2020 Publication History

Abstract

Real-time outlier detection in data streams has drawn much attention recently as many applications need to be able to detect abnormal behaviors as soon as they occur. The arrival and departure of streaming data on edge devices impose new challenges to process the data quickly in real-time due to memory and CPU limitations of these devices. Existing methods are slow and not memory efficient as they mostly focus on quick detection of inliers and pay less attention to expediting neighbor searches for outlier candidates. In this study, we propose a new algorithm, CPOD, to improve the efficiency of outlier detections while reducing its memory requirements. CPOD uses a unique data structure called "core point" with multi-distance indexing to both quickly identify inliers and reduce neighbor search spaces for outlier candidates. We show that with six real-world and one synthetic dataset, CPOD is, on average, 10, 19, and 73 times faster than M_MCOD, NETS, and MCOD, respectively, while consuming low memory.

References

[1]
Fabrizio Angiulli and Fabio Fassetti. 2007. Detecting distance-based outliers in streams of data. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 811--820.
[2]
Dominik Breitenbacher, Ivan Homoliak, Yan Lin Aung, Nils Ole Tippenhauer, and Yuval Elovici. 2019. HADES-IoT: A Practical Host-Based Anomaly Detection System for IoT Devices. In Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security. 479--484.
[3]
Lei Cao, Di Yang, Qingyang Wang, Yanwei Yu, Jiayuan Wang, and E.A. Rundensteiner. 2014. Scalable distance-based outlier detection over high-volume data streams. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on Data Engineering. 76--87.
[4]
Lei Cao, Di Yang, Qingyang Wang, Yanwei Yu, Jiayuan Wang, and Elke A Rundensteiner. 2014. Scalable distance-based outlier detection over high-volume data streams. In 2014 IEEE 30th International Conference on Data Engineering. IEEE, 76--87.
[5]
Paolo Ciaccia, Marco Patella, and Pavel Zezula. 1997. M-tree: An E cient Access Method for Similarity Search in Metric Spaces. In Proceedings of the 23rd VLDB conference, Athens, Greece. Citeseer, 426--435.
[6]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[7]
Nicholas Duffield, Patrick Haffner, Balachander Krishnamurthy, and Haakon Andreas Ringberg. 2016. Systems and methods for rule-based anomaly detection on IP network flow. US Patent 9,258,217.
[8]
Uriel Feige. 1998. A threshold of ln n for approximating set cover. Journal of the ACM (JACM) 45, 4 (1998), 634--652.
[9]
Junhao Gan and Yufei Tao. 2017. Dynamic density based clustering. In Proceedings of the 2017 ACM International Conference on Management of Data. 1493--1507.
[10]
Edwin M. Knorr and Raymond T. Ng. 1998. Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB '98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 392--403.
[11]
M. Kontaki, A. Gounaris, A.N. Papadopoulos, K. Tsichlas, and Y. Manolopoulos. 2011. Continuous monitoring of distance-based outliers over data streams. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on. 135--146.
[12]
Maria Kontaki, Anastasios Gounaris, Apostolos N Papadopoulos, Kostas Tsichlas, and Yannis Manolopoulos. 2011. Continuous monitoring of distance-based outliers over data streams. In 2011 IEEE 27th International Conference on Data Engineering. IEEE, 135--146.
[13]
Yuhang Lin, Byung Suk Lee, and Daniel Lustgarten. 2018. Continuous detection of abnormal heartbeats from ECG using online outlier detection. In Annual International Symposium on Information Management and Big Data. Springer, 349--366.
[14]
Marina Thottan and Chuanyi Ji. 2003. Anomaly detection in IP networks. IEEE Transactions on signal processing 51, 8 (2003), 2191--2204.
[15]
Luan Tran, Liyue Fan, and Cyrus Shahabi. 2016. Distance-based outlier detection in data streams. Proceedings of the VLDB Endowment 9, 12 (2016), 1089--1100.
[16]
Luan Tran, Liyue Fan, and Cyrus Shahabi. 2019. Fast Distance-based Outlier Detection in Data Streams based on Micro-clusters. In Proceedings of the Tenth International Symposium on Information and Communication Technology. 162--169.
[17]
Di Yang, Elke A. Rundensteiner, and Matthew O. Ward. 2009. Neighbor-based Pattern Detection for Windows over Streaming Data. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (Saint Petersburg, Russia) (EDBT '09). ACM, New York, NY, USA, 529--540.
[18]
Susik Yoon, Jae-Gil Lee, and Byung Suk Lee. 2019. NETS: extremely fast outlier detection from a data stream via set-based processing. Proceedings of the VLDB Endowment 12, 11 (2019), 1303--1315.

Cited By

View all
  • (2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693841(43461-43476)Online publication date: 21-Jul-2024
  • (2024)Distance-Based Outlier Query Optimization in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368196217:11(2778-2790)Online publication date: 1-Jul-2024
  • (2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: 1-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 14, Issue 2
October 2020
167 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2020
Published in PVLDB Volume 14, Issue 2

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)4
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693841(43461-43476)Online publication date: 21-Jul-2024
  • (2024)Distance-Based Outlier Query Optimization in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368196217:11(2778-2790)Online publication date: 1-Jul-2024
  • (2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: 1-Sep-2024
  • (2024)AFMFKnowledge-Based Systems10.1016/j.knosys.2024.111912296:COnline publication date: 19-Jul-2024
  • (2024)Multiple Continuous Outlier Detection over Data StreamDatabase Systems for Advanced Applications10.1007/978-981-97-5569-1_26(393-408)Online publication date: 2-Jul-2024
  • (2023)METER: A Dynamic Concept Adaptation Framework for Online Anomaly DetectionProceedings of the VLDB Endowment10.14778/3636218.363623317:4(794-807)Online publication date: 1-Dec-2023
  • (2023)Time Series Data ValidityProceedings of the ACM on Management of Data10.1145/35889391:1(1-26)Online publication date: 30-May-2023
  • (2023)Imputation-based Time-Series Anomaly Detection with Conditional Weight-Incremental Diffusion ModelsProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599391(2742-2751)Online publication date: 6-Aug-2023
  • (2023)BTADAdvanced Engineering Informatics10.1016/j.aei.2023.10194956:COnline publication date: 1-Apr-2023
  • (2022)TODProceedings of the VLDB Endowment10.14778/3570690.357070316:3(546-560)Online publication date: 1-Nov-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media