[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Distance-based outlier detection in data streams

Published: 01 August 2016 Publication History

Abstract

Continuous outlier detection in data streams has important applications in fraud detection, network security, and public health. The arrival and departure of data objects in a streaming manner impose new challenges for outlier detection algorithms, especially in time and space efficiency. In the past decade, several studies have been performed to address the problem of distance-based outlier detection in data streams (DODDS), which adopts an unsupervised definition and does not have any distributional assumptions on data values. Our work is motivated by the lack of comparative evaluation among the state-of-the-art algorithms using the same datasets on the same platform. We systematically evaluate the most recent algorithms for DODDS under various stream settings and outlier rates. Our extensive results show that in most settings, the MCOD algorithm offers the superior performance among all the algorithms, including the most recent algorithm Thresh_LEAP.

References

[1]
Distance-based outlier detection in data streams repository. http://infolab.usc.edu/Luan/Outlier/.
[2]
C. Aggarwal, editor. Data Streams -- Models and Algorithms. Springer, 2007.
[3]
F. Angiulli and F. Fassetti. Detecting distance-based outliers in streams of data. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM '07, pages 811--820, New York, NY, USA, 2007. ACM.
[4]
F. Angiulli and C. Pizzuti. Outlier mining in large high-dimensional data sets. Knowledge and Data Engineering, IEEE Transactions on, 17(2):203--215, Feb 2005.
[5]
L. Cao, Q. Wang, and E. A. Rundensteiner. Interactive outlier exploration in big data streams. Proc. VLDB Endow., 7(13):1621--1624, Aug. 2014.
[6]
L. Cao, D. Yang, Q. Wang, Y. Yu, J. Wang, and E. Rundensteiner. Scalable distance-based outlier detection over high-volume data streams. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pages 76--87, March 2014.
[7]
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece, pages 426--435, 1997.
[8]
D. Georgiadis, M. Kontaki, A. Gounaris, A. N. Papadopoulos, K. Tsichlas, and Y. Manolopoulos. Continuous outlier detection in data streams: An extensible framework and state-of-the-art algorithms. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 1061--1064, New York, NY, USA, 2013. ACM.
[9]
E. M. Knorr and R. T. Ng. Algorithms for mining distance-based outliers in large datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB '98, pages 392--403, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc.
[10]
M. Kontaki, A. Gounaris, A. Papadopoulos, K. Tsichlas, and Y. Manolopoulos. Continuous monitoring of distance-based outliers over data streams. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 135--146, April 2011.
[11]
S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD '00, pages 427--438, New York, NY, USA, 2000. ACM.
[12]
M. S. Sadik and L. Gruenwald. Database and Expert Systems Applications: 21st International Conference, DEXA 2010, Bilbao, Spain, August 30-September 3, 2010, Proceedings, Part I, chapter DBOD-DS: Distance Based Outlier Detection for Data Streams, pages 122--136. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
[13]
B. Sheng, Q. Li, W. Mao, and W. Jin. Outlier detection in sensor networks. In Proceedings of the 8th ACM International Symposium on Mobile Ad Hoc Networking and Computing, MobiHoc '07, pages 219--228, New York, NY, USA, 2007. ACM.
[14]
S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In Proceedings of the 32Nd International Conference on Very Large Data Bases, VLDB '06, pages 187--198. VLDB Endowment, 2006.
[15]
D. Yang, E. A. Rundensteiner, and M. O. Ward. Neighbor-based pattern detection for windows over streaming data. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT '09, pages 529--540, New York, NY, USA, 2009. ACM.

Cited By

View all
  • (2024)Distance-Based Outlier Query Optimization in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368196217:11(2778-2790)Online publication date: 1-Jul-2024
  • (2024)An Experimental Evaluation of Anomaly Detection in Time SeriesProceedings of the VLDB Endowment10.14778/3632093.363211017:3(483-496)Online publication date: 20-Jan-2024
  • (2024)RTOD: Efficient Outlier Detection With Ray Tracing CoresIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345390136:12(9192-9204)Online publication date: 1-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 9, Issue 12
August 2016
345 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2016
Published in PVLDB Volume 9, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)7
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Distance-Based Outlier Query Optimization in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368196217:11(2778-2790)Online publication date: 1-Jul-2024
  • (2024)An Experimental Evaluation of Anomaly Detection in Time SeriesProceedings of the VLDB Endowment10.14778/3632093.363211017:3(483-496)Online publication date: 20-Jan-2024
  • (2024)RTOD: Efficient Outlier Detection With Ray Tracing CoresIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345390136:12(9192-9204)Online publication date: 1-Dec-2024
  • (2023)METER: A Dynamic Concept Adaptation Framework for Online Anomaly DetectionProceedings of the VLDB Endowment10.14778/3636218.363623317:4(794-807)Online publication date: 1-Dec-2023
  • (2023)BPF: a novel cluster boundary points detection method for static and streaming dataKnowledge and Information Systems10.1007/s10115-023-01854-165:7(2991-3022)Online publication date: 21-Mar-2023
  • (2022)TSB-UADProceedings of the VLDB Endowment10.14778/3529337.352935415:8(1697-1711)Online publication date: 22-Jun-2022
  • (2022)A survey of outlier detection in high dimensional data streamsComputer Science Review10.1016/j.cosrev.2022.10046344:COnline publication date: 1-May-2022
  • (2022)A context-aware unsupervised predictive maintenance solution for fleet managementJournal of Intelligent Information Systems10.1007/s10844-022-00744-260:2(521-547)Online publication date: 17-Sep-2022
  • (2022)A neighborhood weighted-based method for the detection of outliersApplied Intelligence10.1007/s10489-022-03258-053:9(9897-9915)Online publication date: 12-Aug-2022
  • (2022)Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spacesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00729-131:4(797-821)Online publication date: 27-Jan-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media