Abstract
Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C., Yu, P.: Outlier Detection for High Dimensional Data. In: SIGMOD 2001 (2001)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density- based Local Outliers. In: SIGMOD (2000)
Chakrabarti, D.: AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. In: PKDD 2004 (2004)
Chen, Z.X., Fu, A.W., Tang, J.: On Complementarity of Cluster and Outlier Detection Schemes. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737. Springer, Heidelberg (2003)
Chiu, A.L., Fu, A.W.: Enhancements on Local Outlier Detection. In: IDEAS 2003 (2003)
Ester, M., Kriegel, H.P., et al.: A Density-based Algorithm for Discovering Clusters in Large Spatial Databases. In: KDD 1996 (1996)
Guha, S., Rastogi, R., Shim, K.: Cure: An Efficient Clustering Algorithm for Large Databases. In: SIGMOD 1998 (1998)
Hautamki, V., Krkkinen, I., Frnti, P.: Outlier Detection Using k-nearest Neigh-bour Graph. In: ICPR 2004 (2004)
Han, J.W., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco
Jagadish, H., Koudas, N., Muthukrishnan, S.: Mining Deviants in a Time Series Database. In: VLDB 1999 (1999)
Jin, W., Tung, K.H., Han, J.W.: Mining Top-n Local Outliers in Large Databases. In: KDD 2001 (2001)
Knorr, E., Ng, R.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: VLDB 1998 (1998)
Knorr, E., Ng, R.: Finding Intensional Knowledge of Distance-Based Outliers. In: VLDB 1999 (1999)
Korn, F., Muthukrishnan, S.: Influence Sets Based on Reverse Nearest Neighbor Queries. In: SIGMOD 2000 (2000)
Muthukrishnan, S., Shah, R., Vitter, J.S.: Mining Deviants in Time Series Data Streams. In: SSDBM 2004 (2004)
Ng, R., Han, J.W.: Efficient and Effective Clustering Method for Spatial Data Mining. In: VLDB 1994 (1994)
Papadimitriou, S., Kitagawa, H., et al.: LOCI:Fast Outlier Detection Using the Local Correlation Integral. In: ICDE 2003 (2003)
Papadimitriou, S., Faloutsos, C.: Cross-Outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750. Springer, Heidelberg (2003)
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD 1995 (1995)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000(2000)
Shekhar, S., Lu, C.T., Zhang, P.S.: Detecting Graph-based Spatial Outliers. In: KDD 2001 (2001)
Tang, J., Chen, Z.X., et al.: Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 535. Springer, Heidelberg (2002)
Wong, W.K., Moore, A.W., et al.: Rule-Based Anomaly Pattern Detection for Detecting Disease Outbreaks. In: AAAI 2002 (2002)
Yiu, M.L., Mamoulis, N.: Clustering Objects on a Spatial Network. In: SIGMOD 2004 (2004)
Yiu, M.L., et al.: Aggregate Nearest Neighbor Queries in Road Networks. IEEE Trans. Knowl. Data Eng 17(6) (2005)
Zhang, T., et al.: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, W., Tung, A.K.H., Han, J., Wang, W. (2006). Ranking Outliers Using Symmetric Neighborhood Relationship. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_68
Download citation
DOI: https://doi.org/10.1007/11731139_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)