[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Outlier detection research has been seeing many new algorithms every year that often appear to be only slightly different from existing methods along with some experiments that show them to “clearly outperform” the others. However, few approaches come along with a clear analysis of existing methods and a solid theoretical differentiation. Here, we provide a formalized method of analysis to allow for a theoretical comparison and generalization of many existing methods. Our unified view improves understanding of the shared properties and of the differences of outlier detection models. By abstracting the notion of locality from the classic distance-based notion, our framework facilitates the construction of abstract methods for many special data types that are usually handled with specialized algorithms. In particular, spatial neighborhood can be seen as a special case of locality. Here we therefore compare and generalize approaches to spatial outlier detection in a detailed manner. We also discuss temporal data like video streams, or graph data such as community networks. Since we reproduce results of specialized approaches with our general framework, and even improve upon them, our framework provides reasonable baselines to evaluate the true merits of specialized approaches. At the same time, seeing spatial outlier detection as a special case of local outlier detection, opens up new potentials for analysis and advancement of methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. This adaptation could also be considered “Schönfinkeling” (or, tastier, “Currying”) of the original DB-outlier model.

  2. Available at the US Census Bureau, United States Department of Commerce. http://www.census.gov/.

  3. Available at the Arizona State University GeoDa Center, http://geodacenter.asu.edu/.

  4. Available at Statistisches Bundesamt Deutschland, http://www.destatis.de.

  5. http://www.dblp.org/db/.

  6. Xapian search engine library, http://xapian.org/, GPL.

References

  • Achtert E, Hettab A, Kriegel HP, Schubert E, Zimek A (2011) Spatial outlier detection: data, algorithms, visualizations. In: Proceedings of the 12th international symposium on spatial and temporal databases (SSTD), Minneapolis, MN, pp 512–516. doi:10.1007/978-3-642-22922-0_41

  • Achtert E, Goldhofer S, Kriegel HP, Schubert E, Zimek A (2012) Evaluation of clusterings—metrics and visual support. In: Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC, pp 1285–1288. doi:10.1109/ICDE.2012.128

  • Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: Proceedings of the ACM international conference on management of data (SIGMOD), Santa Barbara, CA, pp 37–46

  • Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: Proceedings of the 8th SIAM international conference on data mining (SDM), Atlanta, GA, pp 483–493

  • Agyemang M, Barker K, Alhajj R (2006) A comprehensive survey of numeric and symbolic outlier mining techniques. Intell Data Anal 10:521–538

    Google Scholar 

  • Angiulli F, Fassetti F (2009) DOLPHIN: an efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans Knowl Discov Data 3(1):4:1–4:57. doi:10.1145/1497577.1497581

    Article  Google Scholar 

  • Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery (PKDD), Helsinki, Finland, pp 15–26. doi:10.1007/3-540-45681-3_2

  • Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM international conference on management of data (SIGMOD), Philadelphia, PA, pp 49–60

  • Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):93–115

    Article  Google Scholar 

  • Barnett V, Lewis T (1994) Outliers in statistical data, 3rd edn. Wiley, New York

    MATH  Google Scholar 

  • Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 29–38. doi:10.1145/956750.956758

  • Breunig MM, Kriegel HP, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, TX, pp 93–104

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):Article 15, 1–58. doi:10.1145/1541880.1541882

  • Chandola V, Banerjee A, Kumar V (2012) Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng 24(5):823–839

    Article  Google Scholar 

  • Chawla S, Sun P (2006) SLOM: a new measure for local spatial outliers. Knowl Inf Syst 9(4):412–429. doi:10.1007/s10115-005-0200-2

    Article  Google Scholar 

  • Chen F, Lu CT, Boedihardjo AP (2010) GLS-SOD: a generalized local statistical approach for spatial outlier detection. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 1069–1078. doi:10.1145/1835804.1835939

  • de Vries T, Chawla S, Houle ME (2010) Finding local anomalies in very high dimensional space. In: Proceedings of the 10th IEEE international conference on data mining (ICDM), Sydney, Australia, pp 128–137. doi:10.1109/ICDM.2010.151

  • Gao J, Tan PN (2006) Converting output scores from outlier detection algorithms into probability estimates. In: Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, China, pp 212–221. doi:10.1109/ICDM.2006.43

  • Gao J, Liang F, Fan W, Wang C, Sun Y, Han J (2010) On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 813–822. doi:10.1145/1835804.1835907

  • Geary RC (1954) The contiguity ratio and statistical mapping. Inc Stat 5(3):115–146

    Google Scholar 

  • Hadi AS, Rahmatullah Imon AHM, Werner M (2009) Detection of outliers. Wiley Interdiscip Rev Comput Stat 1(1):57–70

    Article  Google Scholar 

  • Hodge VJ, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22:85–126. doi:10.1023/B:AIRE.0000045502.10941.a9

    Article  MATH  Google Scholar 

  • Hong SB, Nah W, Baek JH (2003) Abrupt shot change detection using multiple features and classification tree. In: Proceedings of the 4th international conference on intelligent data engineering and automated learning (IDEAL), Hong Kong, China, pp 553–560. doi:10.1007/978-3-540-45080-1_76

  • Jagadish HV, Koudas N, Muthukrishnan S (1999) Mining deviants in a time series database. In: Proceedings of the 25th international conference on very large databases (VLDB), Edinburgh, Scotland, pp 102–113

  • Jin W, Tung A, Han J (2001) Mining top-n local outliers in large databases. In: Proceedings of the 7th ACM international conference on knowledge discovery and data mining (SIGKDD), San Francisco, CA, pp 293–298. doi:10.1145/502512.502554

  • Jin W, Tung AKH, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Singapore, pp 577–593. doi:10.1007/11731139_68

  • Keller F, Müller E, Böhm K (2012) HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC

  • Kim HH, Kim YH (2010) Toward a conceptual framework of key-frame extraction and storyboard display for video summarization. J Am Soc Inf Sc Technol 61(5):927–939. doi:10.1002/asi.21317

    Article  Google Scholar 

  • Knorr EM, Ng RT (1997) A unified notion of outliers: properties and computation. In: Proceedings of the 3rd ACM international conference on knowledge discovery and data mining (KDD), Newport Beach, CA, pp 219–222

  • Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large databases (VLDB), New York City, NY, pp 392–403

  • Knorr EM, Ng RT, Tucanov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253

    Article  Google Scholar 

  • Kollios G, Gunopulos D, Koudas N, Berchthold S (2003) Efficient biased sampling for approximate clustering and outlier detection in large datasets. IEEE Trans Knowl Data Eng 15(5):1170–1187. doi:10.1109/TKDE.2003.1232271

    Article  Google Scholar 

  • Kou Y, Lu CT, Chen D (2006) Spatial weighted outlier detection. In: Proceedings of the 6th SIAM international conference on data mining (SDM), Bethesda, MD

  • Kou Y, Lu CT, Dos Santos RF (2007) Spatial outlier detection: a graph-based approach. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence (ICTAI), Patras, Greece, pp 281–288. doi:10.1109/ICTAI.2007.169

  • Kriegel HP, Schubert M, Zimek A (2008) Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, NV, pp 444–452. doi:10.1145/1401890.1401946

  • Kriegel HP, Kröger P, Schubert E, Zimek A (2009a) LoOP: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM), Hong Kong, China, pp 1649–1652. doi:10.1145/1645953.1646195

  • Kriegel HP, Kröger P, Schubert E, Zimek A (2009b) Outlier detection in axis-parallel subspaces of high dimensional data. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Bangkok. Thailand, pp 831–838. doi:10.1007/978-3-642-01307-2_86

  • Kriegel HP, Kröger P, Schubert E, Zimek A (2011) Interpreting and unifying outlier scores. In: Proceedings of the 11th SIAM international conference on data mining (SDM), Mesa, AZ, pp 13–24

  • Kriegel HP, Kröger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: Proceedings of the 12th IEEE international conference on data mining (ICDM), Brussels, Belgium

  • Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 11th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, IL, pp 157–166. doi:10.1145/1081870.1081891

  • Lee W, Kim H, Kang H, Lee J, Kim Y, Jeon S (2004) Video cataloging system for real-time scene change detection of news video. In: Proceedings of Teh 10th international workshop on combinatorial image analysis (IWCIA), Auckland, New Zealand, pp 705–715

  • Lee JG, Han J, Li X (2008) Trajectory outlier detection: a partition-and-detect framework. In: Proceedings of the 24th international conference on data engineering (ICDE), Cancun, Mexico, pp 140–149. doi:10.1109/ICDE.2008.4497422

  • Liu X, Lu CT, Chen F (2010) Spatial outlier detection: Random walk based approaches. In: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS), San Jose, CA, pp 370–379, doi:10.1145/1869790.1869841

  • Lu CT, Chen D, Kou Y (2003) Algorithms for spatial outlier detection. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, FL, pp 597–600. doi:10.1109/ICDM.2003.1250986

  • Money AG, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143. doi:10.1016/j.jvcir.2007.04.002

    Article  Google Scholar 

  • Moran P (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23

    Article  MATH  MathSciNet  Google Scholar 

  • Müller E, Assent I, Steinhausen U, Seidl T (2008) OutRank: ranking outliers in high dimensional data. In: Proceedings of the 24th international conference on data engineering (ICDE) workshop on ranking in databases (DBRank), Cancun, Mexico, pp 600–603. doi:10.1109/ICDEW.2008.4498387

  • Müller E, Schiffer M, Gerwert P, Hannen M, Jansen T, Seidl T (2010a) SOREX: subspace outlier ranking exploration toolkit. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD), Barcelona, Spain, pp 607–610. doi:10.1007/978-3-642-15939-8_44

  • Müller E, Schiffer M, Seidl T (2010b) Adaptive outlierness for subspace outlier ranking. In: Proceedings of the 19th ACM conference on information and knowledge management (CIKM), Toronto, ON, Canada, pp 1629–1632. doi:10.1145/1871437.1871690

  • Müller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: Proceedings of the 27th international conference on data engineering (ICDE), Hannover, Germany, pp 434–445. doi:10.1109/ICDE.2011.5767916

  • Nguyen HV, Ang HH, Gopalkrishnan V (2010) Mining outliers with ensemble of heterogeneous detectors on random subspaces. In: Proceedings of the 15th international conference on database systems for advanced applications (DASFAA), Tsukuba, Japan, pp 368–383. doi:10.1007/978-3-642-12026-8_29

  • Nguyen HV, Gopalkrishnan V, Assent I (2011) An unbiased distance-based outlier detection approach for high-dimensional data. In: Proceedings of the 16th international conference on database systems for advanced applications (DASFAA), Hong Kong, China, pp 138–152. doi:10.1007/978-3-642-20149-3_12

  • Orair GH, Teixeira C, Wang Y, Meira W Jr, Parthasarathy S (2010) Distance-based outlier detection: consolidation and renewed bearing. Proc VLDB Endow 3(2):1469–1480

    Google Scholar 

  • Papadimitriou S, Kitagawa H, Gibbons P, Faloutsos C (2003) LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering (ICDE), Bangalore, India, pp 315–326. doi:10.1109/ICDE.2003.1260802

  • Pei Y, Zaïane O, Gao Y (2006) An efficient reference-based approach to outlier detection in large datasets. In: Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, China, pp 478–487. doi:10.1109/ICDM.2006.17

  • Pickup L, Zisserman A (2009) Automatic retrieval of visual continuity errors in movies. In: Proceedings of the 8th ACM international conference on image and video retrieval (CIVR), Santorini, Greece. doi:10.1145/1646396.1646406

  • Pokrajac D, Lazarevic A, Latecki LJ (2007) Incremental local outlier detection for data streams. In: Proceedings of the IEEE symposium on computational intelligence and data mining (CIDM), Honolulu, HI, pp 504–515. doi:10.1109/CIDM.2007.368917

  • Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, TX, pp 427–438

  • Roddick JF, Spiliopoulou M (1999) A bibliography of temporal, spatial and spatio-temporal data mining research. ACM SIGKDD Explor 1(1):34–38. doi:10.1145/846170.846173

    Article  Google Scholar 

  • Schubert E, Wojdanowski R, Zimek A, Kriegel HP (2012) On evaluation of outlier rankings and outlier scores. In: Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CA, pp 1047–1058

  • Shekhar S, Lu CT, Zhang P (2003) A unified approach to detecting spatial outliers. GeoInformatica 7(2):139–166. doi:10.1023/A:1023455925009

    Article  Google Scholar 

  • Su X, Tsai CL (2011) Outlier detection. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):261–268. doi:10.1002/widm.19

    Article  Google Scholar 

  • Sun P, Chawla S (2004) On local spatial outliers. In: Proceedings of the 4th IEEE international conference on data mining (ICDM), Brighton, UK, pp 209–216. doi:10.1109/ICDM.2004.10097

  • Takeuchi J, Yamanishi K (2006) A unifying framework for detecting outliers and change points from time series. IEEE Trans Knowl Data Eng 18(4):482–492. doi:10.1109/TKDE.2006.1599387

    Article  Google Scholar 

  • Tan PN, Steinbach M, Kumar V (2006) Introduction to Data Mining. Addison Wesley, Upper Saddle River

    Google Scholar 

  • Tang J, Chen Z, Fu AWC, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Taipei, Taiwan, pp 535–548. doi:10.1007/3-540-47887-6_53

  • Vu NH, Gopalkrishnan V (2009) Efficient pruning schemes for distance-based outlier detection. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD), Bled, Slovenia, pp 160–175. doi:10.1007/978-3-642-04174-7_11

  • Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixture with discounting learning algorithms. Data Min Knowl Discov 8:275–300. doi:10.1023/B:DAMI.0000023676.72185.7c

    Article  MathSciNet  Google Scholar 

  • Yu JX, Qian W, Lu H, Zhou A (2006) Finding centric local outliers in categorical/numerical spaces. Knowl Inf Syst 9(3):309–338. doi:10.1007/s10115-005-0197-6

    Article  Google Scholar 

  • Zhang J, Lou M, Ling TW, Wang H (2004) HOS-miner: a system for detecting outlying subspaces of high-dimensional data. In: Proceedings of the 30th international conference on very large databases (VLDB), Toronto, Canada, pp 1265–1268

  • Zhang K, Hutter M, Jin H (2009) A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Bangkok, Thailand, pp 813–822. doi:10.1007/978-3-642-01307-2_84

  • Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363–387. doi:10.1002/sam.11161

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

Arthur Zimek is partially supported by NSERC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur Zimek.

Additional information

Responsible editor: M. J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schubert, E., Zimek, A. & Kriegel, HP. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Disc 28, 190–237 (2014). https://doi.org/10.1007/s10618-012-0300-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-012-0300-z

Keywords

Navigation