Abstract
One of the most challenging problems of modern data mining and Computational Intelligence society has been the task of anomaly detection in large datasets, particularly containing mixed data, namely categorical, spatial, or spatio-temporal. In this study, we discuss various versions of the well-known Isolation Forest method as a efficient tool for finding outliers or anomalies. The versions are based on binary, ternary, etc. search trees. Traditional Isolation Forest is based on searching binary search trees. We build and investigate n-ary search trees and analyze their efficiency in the context of anomaly detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–27. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45681-3_2
Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: a survey. arXiv:1901.03407
de la Hoz, E., de la Hoz, E., Ortiz, A., Ortega, J., Martínez-Álvarez, A.: Feature selection by multi-objective optimisation: application to network anomaly detection by hierarchical self-organising maps. Knowl.-Based Syst. 71, 322–338 (2014)
D’Urso, P., Massari, R.: Fuzzy clustering of mixed data. Inf. Sci. 505, 513–534 (2019)
Erfani, S.M., Rajasegarar, S., Karunasekera, S., Leckie, C.: High-dimensional and large-scale anomaly detection using a linearone-class SVM with deep learning. Pattern Recogn. 58, 121–134 (2016)
Flajolet, P., Odlyzko, A.: The average height of binary trees and other simple trees. J. Comput. Syst. Sci. 25(2), 171–213 (1982)
Habeeb, R.A.A., Nasaruddin, F., Gani, A., Hashem, I.A.T., Ahmed, E., Imran, M.: Real-time big data processing for anomaly detection: a survey. Int. J. Inf. Manag. 45, 289–307 (2019)
Izakian, H., Pedrycz, W.: Anomaly detection in time series data using a fuzzy c-means clustering. In: 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), Edmonton, AB, pp. 1513–1518 (2013)
Izakian, H., Pedrycz, W., Jamal, I.: Clustering spatiotemporal data: an augmented fuzzy c-means. IEEE Trans. Fuzzy Syst. 21(5), 855–868 (2013)
Izakian, H., Pedrycz, W.: Anomaly detection and characterization in spatial time series data: a cluster-centric approach. IEEE Trans. Fuzzy Syst. 22(6), 1612–1624 (2014)
Karczmarek, P., Kiersztyn, A., Pedrycz, W., Al, E.: K-means-based isolation forest. Knowl.-Based Syst. 195, 105659 (2020)
Knorr, E.B., Ng, R.T., Tucakov, V., et al.: Distance-based outliers: algorithms and applications. VLDB Int. J. Very Large Data Bases 8(3–4), 237–253 (2000)
Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008)
Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data (TKDD) 6(1) (2012). article no. 3
Liu, J., Tian, J., Cai, Z., Zhou, Y., Luo, R., Wang, R.: A hybrid semi-supervised approach for financial fraud detection. In: 2017 International Conference on Machine Learning and Cybernetics (ICMLC), Ningbo, pp. 217–222 (2017)
Malhotra, P., Vig, L., Shroff, G., Agarwal, P.: Long short term memory networks for anomaly detection in time series. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 89–94 (2015)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)
Schlegl, T., Seeb̈ock, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: IPMI 2017: Information Processing in Medical Imaging, pp. 146–157 (2017)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: KDD 2017 Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, pp. 665–674 (2017)
Acknowledgements
Funded by the National Science Centre, Poland under CHIST-ERA programme (Grant no. 2018/28/Z/ST6/00563).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Karczmarek, P., Kiersztyn, A., Pedrycz, W. (2020). n-ary Isolation Forest: An Experimental Comparative Analysis. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2020. Lecture Notes in Computer Science(), vol 12416. Springer, Cham. https://doi.org/10.1007/978-3-030-61534-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-61534-5_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61533-8
Online ISBN: 978-3-030-61534-5
eBook Packages: Computer ScienceComputer Science (R0)