Abstract
For many KDD applications finding the outliers, i.e. the rare events, is more interesting and useful than finding the common cases, e.g. detecting criminal activities in E-commerce. Being an outlier, however, is not just a binary property. Instead, it is a property that applies to a certain degree to each object in a data set, depending on how ‘isolated’ this object is, with respect to the surrounding clustering structure. In this paper, we formally introduce a new notion of outliers which bases outlier detection on the same theoretical foundation as density-based cluster analysis. Our notion of an outlier is ‘local’ in the sense that the outlier-degree of an object is determined by taking into account the clustering structure in a bounded neighborhood of the object. We demonstrate that this notion of an outlier is more appropriate for detecting different types of outliers than previous approaches, and we also present an algorithm for finding them. Furthermore, we show that by combining the outlier detection with a density-based method to analyze the clustering structure, we can get the outliers almost for free if we already want to perform a cluster analysis on a data set.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, PA (1999)
Berchthold, S., Keim, D., Kriegel, H.-P.: The X-Tree: An Index Structure for High- Dimensional Data. In: 22nd Conf. on Very Large Data Bases, Bombay, India, pp. 28–39 (1996)
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, pp. 322–331. ACM Press, New York (1990)
Barnett, V., Lewis, T.: Outliers in statistical data. John Wiley, Chichester (1994)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: Proc. 23rd Int. Conf. on Very Large Data Bases, Athens, Greece, pp. 426–435 (1997)
DuMouchel, W., Schonlau, M.: A Fast Computer Intrusion Detection Algorithm based on Hypothesis Testing of Command Transition Probabilities. In: Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, pp. 189–193. AAAI Press, Menlo Park (1998)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231. AAAI Press, Menlo Park (1996)
Fawcett, T., Provost, F.: Adaptive Fraud Detection. In: Data Mining and Knowledge Discovery Journal, 1st edn., pp. 291–316. Kluwer Academic Publishers, Dordrecht
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: Knowledge Discovery and Data Mining: Towards a Unifying Framework. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 82–88 (1996)
Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)
Johnson, T., Kwok, I., Ng, R.: Fast Computation of 2-Dimensional Depth Contours. In: Proc. 4th Int. Conf. on KDD, New York, NY, pp. 224–228. AAAI Press, Menlo Park (1998)
Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: Proc. 24th Int. Conf. on Very Large Data Bases, New York, NY, pp. 392–403 (1998)
Preparata, F., Shamos, M.: Computational Geometry: an Introduction. Springer, Heidelberg (1988)
Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34 (1973)
Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Breunig, M.M., Kriegel, HP., Ng, R.T., Sander, J. (1999). OPTICS-OF: Identifying Local Outliers. In: Żytkow, J.M., Rauch, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1999. Lecture Notes in Computer Science(), vol 1704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48247-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-48247-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66490-1
Online ISBN: 978-3-540-48247-5
eBook Packages: Springer Book Archive