[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Frequent Pattern Outlier Detection Without Exhaustive Mining

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Included in the following conference series:

Abstract

Outlier detection consists in detecting anomalous observations from data. During the past decade, pattern-based outlier detection methods have proposed to mine all frequent patterns in order to compute the outlier factor of each transaction. This approach remains too expensive despite recent progress in pattern mining field. In this paper, we provide exact and approximate methods for calculating the frequent pattern outlier factor (FPOF) without extracting any pattern or by extracting a small sample. We propose an algorithm that returns the exact FPOF without mining any pattern. Surprisingly, it works in polynomial time on the size of the dataset. We also present an approximate method where the end-user controls the maximum error on the estimated FPOF. Experiments show the interest of both methods for very large datasets where exhaustive mining fails to provide the exact solution. The accuracy of our approximate method outperforms the baseline approach for a same budget in time or number of patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    It is the proportion of pairs of transactions which would be ranked similarly with the approximate FPOF and with the true FPOF (see Sect. 6 for a formal definition).

References

  1. Hawkins, D.M.: Identification of Outliers, vol. 11. Springer, The Netherlands (1980)

    Book  MATH  Google Scholar 

  2. He, Z., Xu, X., Huang, Z.J., Deng, S.: FP-outlier: frequent pattern based outlier detection. Comput. Sci. Inf. Syst. 2(1), 103–118 (2005)

    Article  Google Scholar 

  3. Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed-attribute data sets. Data Min. Knowl. Discovery 12(2–3), 203–228 (2006)

    Article  MathSciNet  Google Scholar 

  4. Koufakou, A., Secretan, J., Georgiopoulos, M.: Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowl. Inf. Syst. 29(3), 697–725 (2011)

    Article  Google Scholar 

  5. Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: the lego approach to data mining. In: From Local Patterns to Global Models: Proceedings of the ECML PKDD 2008 Workshop, pp. 1–16 (2008)

    Google Scholar 

  6. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: International Conference on Knowledge Discovery and Data mining (1998)

    Google Scholar 

  7. Liu, Q., Dong, G.: CPCQ: contrast pattern based clustering quality index for categorical data. Pattern Recogn. 45(4), 1739–1748 (2012)

    Article  Google Scholar 

  8. Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J.: ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Stat. Anal. Data Min. 1(2), 67–84 (2008)

    Article  MathSciNet  Google Scholar 

  9. Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 582–590 (2011)

    Google Scholar 

  10. van Leeuwen, M.: Interactive data exploration using pattern mining. In: Jurisica, I., Holzinger, A. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014)

    Google Scholar 

  11. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: International conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  12. Giacometti, A., Li, D.H., Soulet, A.: Balancing the analysis of frequent patterns. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part I. LNCS, vol. 8443, pp. 53–64. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the Prefute project, PEPS 2015, CNRS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnaud Giacometti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Giacometti, A., Soulet, A. (2016). Frequent Pattern Outlier Detection Without Exhaustive Mining. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31750-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31749-6

  • Online ISBN: 978-3-319-31750-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics