Abstract
Outlier detection consists in detecting anomalous observations from data. During the past decade, pattern-based outlier detection methods have proposed to mine all frequent patterns in order to compute the outlier factor of each transaction. This approach remains too expensive despite recent progress in pattern mining field. In this paper, we provide exact and approximate methods for calculating the frequent pattern outlier factor (FPOF) without extracting any pattern or by extracting a small sample. We propose an algorithm that returns the exact FPOF without mining any pattern. Surprisingly, it works in polynomial time on the size of the dataset. We also present an approximate method where the end-user controls the maximum error on the estimated FPOF. Experiments show the interest of both methods for very large datasets where exhaustive mining fails to provide the exact solution. The accuracy of our approximate method outperforms the baseline approach for a same budget in time or number of patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
It is the proportion of pairs of transactions which would be ranked similarly with the approximate FPOF and with the true FPOF (see Sect. 6 for a formal definition).
References
Hawkins, D.M.: Identification of Outliers, vol. 11. Springer, The Netherlands (1980)
He, Z., Xu, X., Huang, Z.J., Deng, S.: FP-outlier: frequent pattern based outlier detection. Comput. Sci. Inf. Syst. 2(1), 103–118 (2005)
Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed-attribute data sets. Data Min. Knowl. Discovery 12(2–3), 203–228 (2006)
Koufakou, A., Secretan, J., Georgiopoulos, M.: Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowl. Inf. Syst. 29(3), 697–725 (2011)
Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: the lego approach to data mining. In: From Local Patterns to Global Models: Proceedings of the ECML PKDD 2008 Workshop, pp. 1–16 (2008)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: International Conference on Knowledge Discovery and Data mining (1998)
Liu, Q., Dong, G.: CPCQ: contrast pattern based clustering quality index for categorical data. Pattern Recogn. 45(4), 1739–1748 (2012)
Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J.: ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Stat. Anal. Data Min. 1(2), 67–84 (2008)
Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 582–590 (2011)
van Leeuwen, M.: Interactive data exploration using pattern mining. In: Jurisica, I., Holzinger, A. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014)
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: International conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)
Giacometti, A., Li, D.H., Soulet, A.: Balancing the analysis of frequent patterns. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part I. LNCS, vol. 8443, pp. 53–64. Springer, Heidelberg (2014)
Acknowledgements
This work has been partially supported by the Prefute project, PEPS 2015, CNRS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Giacometti, A., Soulet, A. (2016). Frequent Pattern Outlier Detection Without Exhaustive Mining. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-31750-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31749-6
Online ISBN: 978-3-319-31750-2
eBook Packages: Computer ScienceComputer Science (R0)