Abstract
iForest uses a collection of isolation trees to detect anomalies. While it is effective in detecting global anomalies, it fails to detect local anomalies in data sets having multiple clusters of normal instances because the local anomalies are masked by normal clusters of similar density and they become less susceptible to isolation. In this paper, we propose a very simple but effective solution to overcome this limitation by replacing the global ranking measure based on path length with a local ranking measure based on relative mass that takes local data distribution into consideration. We demonstrate the utility of relative mass by improving the task specific performance of iForest in anomaly detection and information retrieval tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Proceedings of the Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008)
Ting, K.M., Zhou, G.T., Liu, F.T., Tan, S.C.: Mass estimation. Machine Learning 90(1), 127–160 (2013)
Zhou, G.T., Ting, K.M., Liu, F.T., Yin, Y.: Relevance feature mapping for content-based multimedia information retrieval. Pattern Recognition 45(4), 1707–1720 (2012)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Ting, K., Washio, T., Wells, J., Liu, F., Aryal, S.: DEMass: a new density estimator for big data. Knowledge and Information Systems 35(3), 493–524 (2013)
Rui, Y., Huang, T., Ortega, M., Mehrotra, S.: Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology 8(5), 644–655 (1998)
He, J., Li, M., Zhang, H.J., Tong, H., Zhang, C.: Manifold-ranking based image retrieval. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, pp. 9–16. ACM, New York (2004)
Giacinto, G., Roli, F.: Instance-based relevance feedback for image retrieval. In: Advances in Neural Information Processing Systems, vol. 17, pp. 489–496 (2005)
Zhou, Z.H., Dai, H.B.: Query-sensitive similarity measure for content-based image retrieval. In: Proceedings of the Sixth International Conference on Data Mining, pp. 1211–1215 (2006)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Achtert, E., Hettab, A., Kriegel, H.-P., Schubert, E., Zimek, A.: Spatial outlier detection: Data, algorithms, visualizations. In: Pfoser, D., Tao, Y., Mouratidis, K., Nascimento, M.A., Mokbel, M., Shekhar, S., Huang, Y. (eds.) SSTD 2011. LNCS, vol. 6849, pp. 512–516. Springer, Heidelberg (2011)
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302 (2002)
Zhou, Z.H., Chen, K.J., Dai, H.B.: Enhancing relevance feedback in image retrieval using unlabeled data. ACM Transactions on Information Systems 24(2), 219–244 (2006)
Ting, K.M., Fernando, T.L., Webb, G.I.: Mass-based Similarity Measure: An Effective Alternative to Distance-based Similarity Measures. Technical Report 2013/276, Calyton School of IT, Monash University, Australia (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Aryal, S., Ting, K.M., Wells, J.R., Washio, T. (2014). Improving iForest with Relative Mass. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-06605-9_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06604-2
Online ISBN: 978-3-319-06605-9
eBook Packages: Computer ScienceComputer Science (R0)