Abstract
We propose a novel clustering-based outlier detection approach for data streams. To deal with the data streams, we propose splitting the data into several windows. In each window, the data is divided into subspaces. First, a clustering algorithm is applied on one subspace. Based on the existing relations between the different subspaces, the obtained clusters can represent partitions on another subspace. Then the same clustering algorithm is applied on each partition separately in this second subspace. The process can be iterated on n subspaces. We perform tests on firewall logs data sets, we choose to test our approach with two subspaces and to visualize the results with neighborhood graphs in each window. A comparison is provided between the obtained results and the MCOD algorithm results. We can identify visually the outliers events and observe the evolution of the stream.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.: A Survey of Stream Clustering Algorithms. Data Clustering: Algorithms and Applications, pp. 229–253. CRC Press (2013)
Angiulli, F., Pizzuti, C.: Fast Outlier Detection in High Dimensional Spaces. In: 6th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 15–26. Springer, London (2002)
Breunig, M., Kriegel, H.P.: NG, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104. ACM, Texas (2000)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly Detection: A Survey. ACM Comput 15, 15:1–15:58 (2009)
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering Data Streams. In: 41st Annual Symposium on Foundations of Computer Science, pp. 359–366. IEEE Computer Society, Washington DC (2000)
Hawkins, D.M.: Identification of Outliers. Chapman and Hall, New York (1980)
Hodge, V.J., Austin, J.: A Survey of Outlier Detection Methodologies. Artif. Intell. Rev. 22, 85–126 (2004)
Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis. Prentice-Hall Inc., New Jersey (1992)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. The VLDB Journal 8, 237–253 (2000)
Kontaki, M., Gounaris, A., Papadopoulos, A.N., Tsichlas, K., Manolopoulos, Y.: Continuous Monitoring of Distance-based Outliers over Data Streams. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 135–146. IEEE Computer Society, Washington (2011)
Pinheiro, P., Didry, Y., Parisot, O., Tamisier, T.: Traitement Visuel et Interactif dans le Logiciel Cadral. In: Atelier Visualisation D’informations, Interaction et fouille de Donnes, GT-VIF, pp. 33–44. EGC, Rennes (2014)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. SIGMOD Rec 19, 427–438 (2000)
VAST Challenge (2012), http://www.vacommunity.org/VAST+Challenge+2012
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Louhi, I., Boudjeloud-Assala, L., Tamisier, T. (2015). Exploration and Visualization Approach for Outlier Detection on Log Files. In: Barbucha, D., Nguyen, N., Batubara, J. (eds) New Trends in Intelligent Information and Database Systems. Studies in Computational Intelligence, vol 598. Springer, Cham. https://doi.org/10.1007/978-3-319-16211-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-16211-9_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16210-2
Online ISBN: 978-3-319-16211-9
eBook Packages: EngineeringEngineering (R0)