Abstract
Unsupervised anomaly detection algorithm is typically suitable only to a specific type of anomaly, among point anomaly, collective anomaly, and contextual anomaly. A mismatch between the intended anomaly type of an algorithm and the actual type in the data can lead to poor performance. In this paper, utilizing Hilbert–Schmidt independence criterion (HSIC), we propose an unsupervised backward elimination feature selection algorithm BAHSIC-AD to identify a subset of features with the strongest interdependence for anomaly detection. Using BAHSIC-AD, we compare the effectiveness of a recent Spectral Ranking for Anomalies (SRA) algorithm with other popular anomaly detection methods on a few synthetic datasets and real-world datasets. Furthermore, we demonstrate that SRA, combined with BAHSIC-AD, can be a generally applicable method for detecting point, collective, and contextual anomalies.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Achtert, E., Kriegel, HP., Schubert, E., Zimek, A.: Interactive data mining with 3D-parallel-coordinate-trees. In: SIGMOD Conference, pp. 1009–1012 (2013)
Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 17, 255–287 (2010)
Almusallam, N.Y., Tari, Z., Bertok, P., Zomaya, A.Y.: Dimensionality reduction for intrusion detection systems in multi-data streams–a review and proposal of unsupervised feature selection scheme. In: Almusallam, N.Y., Tari, Z., Bertok, P., Zomaya, A.Y. (eds.) Emergent Computation, pp. 467–487. Springer, Berlin (2017)
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Principles of Data Mining and Knowledge Discovery, pp. 15–27. Springer (2002)
Atkins, J.E., Boman, E.G., Hendrickson, B.: A spectral algorithm for seriation and the consecutive ones problem. SIAM J. Comput. 28(1), 297–310 (1998)
Bache, K., Lichman, M.: UCI machine learning repository (2013) URL http://archive.ics.uci.edu/ml
Belkin, M., Niyogi, P.: Laplacian eigenmaps for demensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)
Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Stat. Sci. 17, 235–249 (2002)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Costa, B.S.J., Angelov, P.P., Guedes, L.A.: Fully unsupervised fault detection and identification based on recursive density estimation and self-evolving cloud-based classifier. Neurocomputing 150, 289–303 (2015)
Couto, J.: Kernel k-means for categorical data. In: Advances in Intelligent Data Analysis VI, pp. 46–56. Springer (2005)
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
Ertöz, L., Steinbach, M., Kumar, V.: Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach. Springer, Berlin (2004)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol. 96, pp. 226–231 (1996)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)
Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with hilbert-schmidt norms. In: Algorithmic Learning Theory, pp. 63–77. Springer (2005)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbour graph. In: ICPR (3), pp. 430–433 (2004)
Kohonen, T.: Self-Organization and Associative Memory. Springer Series in Information Sciences, vol. 8, p. 1. Springer, Berlin (1988)
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Loop: local outlier probabilities. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1649–1652. ACM (2009)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139–154 (2002)
Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002)
Nian, K., Zhang, H., Tayal, A., Coleman, T., Li, Y.: Auto insurance fraud detection using unsupervised spectral ranking for anomaly. J. Finance Data Sci. 2, 58–75 (2016). https://doi.org/10.1016/j.jfds.2016.03.001
Pang, G., Cao, L., Chen, L.C., Liu, H.: Unsupervised feature selection for outlier detection by modeling hierarchical value-feature couplings. In: 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE (2016)
Pang, G., Cao, L., Chen, L.C., Liu, H.: Learning representations ofultrahigh-dimensional data for random distance-based outlier detection. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2–41. ACM (2018)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: fast outlier detection using the local correlation integral. In: 2003 Proceedings of the 19th International Conference on Data Engineering. pp. 315–326. IEEE (2003)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Phua, C., Alahakoon, D., Lee, V.: Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor. Newslett. 6(1), 50–59 (2004)
Roth, V.: Kernel fisher discriminants for outlier detection. Neural Comput. 18(4), 942–960 (2006)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Song, L., Smola, A., Gretton, A., Bedo, J., Borgwardt, K.: Feature selection via dependence maximization. J. Mach. Learn. Res. 98888(1), 1393–1434 (2012)
Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm. IEEE Trans. Knowl. Data Eng. 25, 1–14 (2013)
Tayal, A., Coleman, T.F., Li, Y.: Primal explicit max margin feature selection for nonlinear support vector machines. J. Pattern Recognit. 47(6), 2153–2164 (2014)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Wang, X., Davidson, I.: Discovering contexts and contextual outliers using random walks in graphs. In: Ninth IEEE International Conference on Data Mining, 2009, ICDM’09, pp. 1034–1039. IEEE (2009)
Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero norm with linear models and kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2005)
Yuwono, M., Guo, Y., Wall, J., Li, J., West, S., Platt, G., Su, S.W.: Unsupervised feature selection using swarm intelligence and consensus clustering for automatic fault detection and diagnosis in heating ventilation and air conditioning systems. Appl. Soft Comput. 34, 402–425 (2015)
Zhang, J., Wang, H.: Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance. Knowl. Inf. Syst. 10(3), 333–355 (2006)
Zhu, W., Si, G., Zhang, Y., Wang, J.: Neoghborhood effective information ratio for hybrid feature subset evaluation and selection. Neurocomputing 99, 25–37 (2003)
Acknowledgements
All authors acknowledge funding from the National Sciences and Engineering Research Council of Canada. The third author acknowledges funding from the Ophelia Lazaridis University Research Chair. The views expressed herein are solely from the authors. The authors acknowledge comments from anonymous referees which have improved presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, H., Nian, K., Coleman, T.F. et al. Spectral ranking and unsupervised feature selection for point, collective, and contextual anomaly detection. Int J Data Sci Anal 9, 57–75 (2020). https://doi.org/10.1007/s41060-018-0161-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-0161-7