Abstract
Outlier detection research has been seeing many new algorithms every year that often appear to be only slightly different from existing methods along with some experiments that show them to “clearly outperform” the others. However, few approaches come along with a clear analysis of existing methods and a solid theoretical differentiation. Here, we provide a formalized method of analysis to allow for a theoretical comparison and generalization of many existing methods. Our unified view improves understanding of the shared properties and of the differences of outlier detection models. By abstracting the notion of locality from the classic distance-based notion, our framework facilitates the construction of abstract methods for many special data types that are usually handled with specialized algorithms. In particular, spatial neighborhood can be seen as a special case of locality. Here we therefore compare and generalize approaches to spatial outlier detection in a detailed manner. We also discuss temporal data like video streams, or graph data such as community networks. Since we reproduce results of specialized approaches with our general framework, and even improve upon them, our framework provides reasonable baselines to evaluate the true merits of specialized approaches. At the same time, seeing spatial outlier detection as a special case of local outlier detection, opens up new potentials for analysis and advancement of methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
This adaptation could also be considered “Schönfinkeling” (or, tastier, “Currying”) of the original DB-outlier model.
Available at the US Census Bureau, United States Department of Commerce. http://www.census.gov/.
Available at the Arizona State University GeoDa Center, http://geodacenter.asu.edu/.
Available at Statistisches Bundesamt Deutschland, http://www.destatis.de.
Xapian search engine library, http://xapian.org/, GPL.
References
Achtert E, Hettab A, Kriegel HP, Schubert E, Zimek A (2011) Spatial outlier detection: data, algorithms, visualizations. In: Proceedings of the 12th international symposium on spatial and temporal databases (SSTD), Minneapolis, MN, pp 512–516. doi:10.1007/978-3-642-22922-0_41
Achtert E, Goldhofer S, Kriegel HP, Schubert E, Zimek A (2012) Evaluation of clusterings—metrics and visual support. In: Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC, pp 1285–1288. doi:10.1109/ICDE.2012.128
Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: Proceedings of the ACM international conference on management of data (SIGMOD), Santa Barbara, CA, pp 37–46
Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: Proceedings of the 8th SIAM international conference on data mining (SDM), Atlanta, GA, pp 483–493
Agyemang M, Barker K, Alhajj R (2006) A comprehensive survey of numeric and symbolic outlier mining techniques. Intell Data Anal 10:521–538
Angiulli F, Fassetti F (2009) DOLPHIN: an efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans Knowl Discov Data 3(1):4:1–4:57. doi:10.1145/1497577.1497581
Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery (PKDD), Helsinki, Finland, pp 15–26. doi:10.1007/3-540-45681-3_2
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM international conference on management of data (SIGMOD), Philadelphia, PA, pp 49–60
Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):93–115
Barnett V, Lewis T (1994) Outliers in statistical data, 3rd edn. Wiley, New York
Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 29–38. doi:10.1145/956750.956758
Breunig MM, Kriegel HP, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, TX, pp 93–104
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):Article 15, 1–58. doi:10.1145/1541880.1541882
Chandola V, Banerjee A, Kumar V (2012) Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng 24(5):823–839
Chawla S, Sun P (2006) SLOM: a new measure for local spatial outliers. Knowl Inf Syst 9(4):412–429. doi:10.1007/s10115-005-0200-2
Chen F, Lu CT, Boedihardjo AP (2010) GLS-SOD: a generalized local statistical approach for spatial outlier detection. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 1069–1078. doi:10.1145/1835804.1835939
de Vries T, Chawla S, Houle ME (2010) Finding local anomalies in very high dimensional space. In: Proceedings of the 10th IEEE international conference on data mining (ICDM), Sydney, Australia, pp 128–137. doi:10.1109/ICDM.2010.151
Gao J, Tan PN (2006) Converting output scores from outlier detection algorithms into probability estimates. In: Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, China, pp 212–221. doi:10.1109/ICDM.2006.43
Gao J, Liang F, Fan W, Wang C, Sun Y, Han J (2010) On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 813–822. doi:10.1145/1835804.1835907
Geary RC (1954) The contiguity ratio and statistical mapping. Inc Stat 5(3):115–146
Hadi AS, Rahmatullah Imon AHM, Werner M (2009) Detection of outliers. Wiley Interdiscip Rev Comput Stat 1(1):57–70
Hodge VJ, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22:85–126. doi:10.1023/B:AIRE.0000045502.10941.a9
Hong SB, Nah W, Baek JH (2003) Abrupt shot change detection using multiple features and classification tree. In: Proceedings of the 4th international conference on intelligent data engineering and automated learning (IDEAL), Hong Kong, China, pp 553–560. doi:10.1007/978-3-540-45080-1_76
Jagadish HV, Koudas N, Muthukrishnan S (1999) Mining deviants in a time series database. In: Proceedings of the 25th international conference on very large databases (VLDB), Edinburgh, Scotland, pp 102–113
Jin W, Tung A, Han J (2001) Mining top-n local outliers in large databases. In: Proceedings of the 7th ACM international conference on knowledge discovery and data mining (SIGKDD), San Francisco, CA, pp 293–298. doi:10.1145/502512.502554
Jin W, Tung AKH, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Singapore, pp 577–593. doi:10.1007/11731139_68
Keller F, Müller E, Böhm K (2012) HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC
Kim HH, Kim YH (2010) Toward a conceptual framework of key-frame extraction and storyboard display for video summarization. J Am Soc Inf Sc Technol 61(5):927–939. doi:10.1002/asi.21317
Knorr EM, Ng RT (1997) A unified notion of outliers: properties and computation. In: Proceedings of the 3rd ACM international conference on knowledge discovery and data mining (KDD), Newport Beach, CA, pp 219–222
Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large databases (VLDB), New York City, NY, pp 392–403
Knorr EM, Ng RT, Tucanov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253
Kollios G, Gunopulos D, Koudas N, Berchthold S (2003) Efficient biased sampling for approximate clustering and outlier detection in large datasets. IEEE Trans Knowl Data Eng 15(5):1170–1187. doi:10.1109/TKDE.2003.1232271
Kou Y, Lu CT, Chen D (2006) Spatial weighted outlier detection. In: Proceedings of the 6th SIAM international conference on data mining (SDM), Bethesda, MD
Kou Y, Lu CT, Dos Santos RF (2007) Spatial outlier detection: a graph-based approach. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence (ICTAI), Patras, Greece, pp 281–288. doi:10.1109/ICTAI.2007.169
Kriegel HP, Schubert M, Zimek A (2008) Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, NV, pp 444–452. doi:10.1145/1401890.1401946
Kriegel HP, Kröger P, Schubert E, Zimek A (2009a) LoOP: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM), Hong Kong, China, pp 1649–1652. doi:10.1145/1645953.1646195
Kriegel HP, Kröger P, Schubert E, Zimek A (2009b) Outlier detection in axis-parallel subspaces of high dimensional data. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Bangkok. Thailand, pp 831–838. doi:10.1007/978-3-642-01307-2_86
Kriegel HP, Kröger P, Schubert E, Zimek A (2011) Interpreting and unifying outlier scores. In: Proceedings of the 11th SIAM international conference on data mining (SDM), Mesa, AZ, pp 13–24
Kriegel HP, Kröger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: Proceedings of the 12th IEEE international conference on data mining (ICDM), Brussels, Belgium
Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 11th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, IL, pp 157–166. doi:10.1145/1081870.1081891
Lee W, Kim H, Kang H, Lee J, Kim Y, Jeon S (2004) Video cataloging system for real-time scene change detection of news video. In: Proceedings of Teh 10th international workshop on combinatorial image analysis (IWCIA), Auckland, New Zealand, pp 705–715
Lee JG, Han J, Li X (2008) Trajectory outlier detection: a partition-and-detect framework. In: Proceedings of the 24th international conference on data engineering (ICDE), Cancun, Mexico, pp 140–149. doi:10.1109/ICDE.2008.4497422
Liu X, Lu CT, Chen F (2010) Spatial outlier detection: Random walk based approaches. In: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS), San Jose, CA, pp 370–379, doi:10.1145/1869790.1869841
Lu CT, Chen D, Kou Y (2003) Algorithms for spatial outlier detection. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, FL, pp 597–600. doi:10.1109/ICDM.2003.1250986
Money AG, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143. doi:10.1016/j.jvcir.2007.04.002
Moran P (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23
Müller E, Assent I, Steinhausen U, Seidl T (2008) OutRank: ranking outliers in high dimensional data. In: Proceedings of the 24th international conference on data engineering (ICDE) workshop on ranking in databases (DBRank), Cancun, Mexico, pp 600–603. doi:10.1109/ICDEW.2008.4498387
Müller E, Schiffer M, Gerwert P, Hannen M, Jansen T, Seidl T (2010a) SOREX: subspace outlier ranking exploration toolkit. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD), Barcelona, Spain, pp 607–610. doi:10.1007/978-3-642-15939-8_44
Müller E, Schiffer M, Seidl T (2010b) Adaptive outlierness for subspace outlier ranking. In: Proceedings of the 19th ACM conference on information and knowledge management (CIKM), Toronto, ON, Canada, pp 1629–1632. doi:10.1145/1871437.1871690
Müller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: Proceedings of the 27th international conference on data engineering (ICDE), Hannover, Germany, pp 434–445. doi:10.1109/ICDE.2011.5767916
Nguyen HV, Ang HH, Gopalkrishnan V (2010) Mining outliers with ensemble of heterogeneous detectors on random subspaces. In: Proceedings of the 15th international conference on database systems for advanced applications (DASFAA), Tsukuba, Japan, pp 368–383. doi:10.1007/978-3-642-12026-8_29
Nguyen HV, Gopalkrishnan V, Assent I (2011) An unbiased distance-based outlier detection approach for high-dimensional data. In: Proceedings of the 16th international conference on database systems for advanced applications (DASFAA), Hong Kong, China, pp 138–152. doi:10.1007/978-3-642-20149-3_12
Orair GH, Teixeira C, Wang Y, Meira W Jr, Parthasarathy S (2010) Distance-based outlier detection: consolidation and renewed bearing. Proc VLDB Endow 3(2):1469–1480
Papadimitriou S, Kitagawa H, Gibbons P, Faloutsos C (2003) LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering (ICDE), Bangalore, India, pp 315–326. doi:10.1109/ICDE.2003.1260802
Pei Y, Zaïane O, Gao Y (2006) An efficient reference-based approach to outlier detection in large datasets. In: Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, China, pp 478–487. doi:10.1109/ICDM.2006.17
Pickup L, Zisserman A (2009) Automatic retrieval of visual continuity errors in movies. In: Proceedings of the 8th ACM international conference on image and video retrieval (CIVR), Santorini, Greece. doi:10.1145/1646396.1646406
Pokrajac D, Lazarevic A, Latecki LJ (2007) Incremental local outlier detection for data streams. In: Proceedings of the IEEE symposium on computational intelligence and data mining (CIDM), Honolulu, HI, pp 504–515. doi:10.1109/CIDM.2007.368917
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, TX, pp 427–438
Roddick JF, Spiliopoulou M (1999) A bibliography of temporal, spatial and spatio-temporal data mining research. ACM SIGKDD Explor 1(1):34–38. doi:10.1145/846170.846173
Schubert E, Wojdanowski R, Zimek A, Kriegel HP (2012) On evaluation of outlier rankings and outlier scores. In: Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CA, pp 1047–1058
Shekhar S, Lu CT, Zhang P (2003) A unified approach to detecting spatial outliers. GeoInformatica 7(2):139–166. doi:10.1023/A:1023455925009
Su X, Tsai CL (2011) Outlier detection. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):261–268. doi:10.1002/widm.19
Sun P, Chawla S (2004) On local spatial outliers. In: Proceedings of the 4th IEEE international conference on data mining (ICDM), Brighton, UK, pp 209–216. doi:10.1109/ICDM.2004.10097
Takeuchi J, Yamanishi K (2006) A unifying framework for detecting outliers and change points from time series. IEEE Trans Knowl Data Eng 18(4):482–492. doi:10.1109/TKDE.2006.1599387
Tan PN, Steinbach M, Kumar V (2006) Introduction to Data Mining. Addison Wesley, Upper Saddle River
Tang J, Chen Z, Fu AWC, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Taipei, Taiwan, pp 535–548. doi:10.1007/3-540-47887-6_53
Vu NH, Gopalkrishnan V (2009) Efficient pruning schemes for distance-based outlier detection. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD), Bled, Slovenia, pp 160–175. doi:10.1007/978-3-642-04174-7_11
Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixture with discounting learning algorithms. Data Min Knowl Discov 8:275–300. doi:10.1023/B:DAMI.0000023676.72185.7c
Yu JX, Qian W, Lu H, Zhou A (2006) Finding centric local outliers in categorical/numerical spaces. Knowl Inf Syst 9(3):309–338. doi:10.1007/s10115-005-0197-6
Zhang J, Lou M, Ling TW, Wang H (2004) HOS-miner: a system for detecting outlying subspaces of high-dimensional data. In: Proceedings of the 30th international conference on very large databases (VLDB), Toronto, Canada, pp 1265–1268
Zhang K, Hutter M, Jin H (2009) A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Bangkok, Thailand, pp 813–822. doi:10.1007/978-3-642-01307-2_84
Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363–387. doi:10.1002/sam.11161
Acknowledgments
Arthur Zimek is partially supported by NSERC.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: M. J. Zaki.
Rights and permissions
About this article
Cite this article
Schubert, E., Zimek, A. & Kriegel, HP. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Disc 28, 190–237 (2014). https://doi.org/10.1007/s10618-012-0300-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0300-z