More Web Proxy on the site http://driver.im/

research-article

Finding Robust Itemsets under Subsampling

Authors:

Fabian Moerchen,

Toon CaldersAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 39, Issue 3

Article No.: 20, Pages 1 - 27

https://doi.org/10.1145/2656261

Published: 07 October 2014 Publication History

Abstract

Mining frequent patterns is plagued by the problem of pattern explosion, making pattern reduction techniques a key challenge in pattern mining. In this article we propose a novel theoretical framework for pattern reduction by measuring the robustness of a property of an itemset such as closedness or nonderivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties, namely an itemset being closed, free, non-derivable, or totally shattered, and demonstrate how to compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and, in contrast to noise-tolerant or approximate patterns, the robust patterns for a given property are always a subset of the patterns with this property. If the underlying property is monotonic then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-k approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets.

References

[1]

R. Agrawal, T. Imielinski, and A. N. Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'93). 207--216.

Digital Library

[2]

A. Asuncion and D. Newman. 2007. UCI machine learning repository. http://mlearn.ics.uci.edu/MLRepository.html.

[3]

J.-F. Boulicaut, A. Bykowski, and C. Rigotti. 2000. Approximation of frequency queries by means of free-sets. In Proceedings of the 4^th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD'00). 75--85.

Digital Library

[4]

J.-F. Boulicaut, A. Bykowski, and C. Rigotti. 2003. Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Mining Knowl. Discov. 7, 1, 5--22.

Digital Library

[5]

S. Brin, R. Motwani, and C. Silverstein. 1997. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'97). 265--276.

Digital Library

[6]

B. Bringmann and A. Zimmermann. 2009. One in a million: Picking the right patterns. Knowl. Inf. Syst. 18, 1, 61--81.

Digital Library

[7]

T. Calders and B. Goethals. 2007. Non-derivable itemset mining. Data Mining Knowl. Discov. 14, 1, 171--206.

Digital Library

[8]

T. Calders, B. Goethals, and M. Mampaey. 2007. Mining itemsets in the presence of missing values. In Proceedings of the ACM Symposium on Applied Computing (SAC'07). 404--408.

Digital Library

[9]

T. Calders, C. Rigotti, and J.-F. Boulicaut. 2006. A survey on condensed representations for frequent sets. In Proceedings of the European Conference on Constraint-Based Mining and Inductive Databases. 64--80.

Digital Library

[10]

H. Cheng, X. Yan, J. Han, and C. Hsu. 2007. Discriminative frequent pattern analysis for effective classification. In Proceedings of the 23^rd IEEE International Conference on Data Engineering (ICDE'07). 716--725.

[11]

H. Cheng, P. S. Yu, and J. Han. 2006. AC-close: Efficiently mining approximate closed itemsets by core pattern recovery. In Proceedings of the 6^th International Conference on Data Mining (ICDM'06). 839--844.

Digital Library

[12]

F. Coenen. 2003. The lucs-kdd discretised/normalised arm and carm data library. https://cgi.csc.liv.ac.uk/&sim;frans/KDD/Software/LUCS-KDD-DN/OLDversion/lucs-kdd_DN.html.

[13]

T. de Bie. 2011. Maximum entropy models and subjective interestingness: An application to tiles in binary databases. Data Mining Knowl. Discov. 23, 3, 407--446.

Digital Library

[14]

A. Gallo, T. de Bie, and N. Cristianini. 2007. Mini: Mining informative non-redundant itemsets. In Proceedings of the 11^th Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'07). 438--445.

Digital Library

[15]

F. Geerts, B. Goethals, and T. Mielikäinen. 2004. Tiling databases. In Discovery Science. Springer, 278--289.

[16]

G. Ghoshal and A.-L. Barabási. 2011. Ranking stability and super-stable nodes in complex networks. Nature Comm. 2.

[17]

A. Gionis, H. Mannila, T. Mielikäinen, and P. Tsaparas. 2007. Assessing data mining results via swap randomization. ACM Trans. Knowl. Discov. Data 1, 3.

Digital Library

[18]

B. Goethals and M. Zaki. 2003. FIMI'03, frequent itemset mining implementations. In Proceedings of the 3^rd International Conference on Data Ming Workshops on Frequent Itemset Mining Implementations (FIMI'03). 1--13.

[19]

R. Gupta, G. Fang, B. Field, M. Steinbach, and V. Kumar. 2008. Quantitative evaluation of approximate frequent pattern mining algorithms. In Proceedings of the 14^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08). 301--309.

Digital Library

[20]

S. Hanhijärvi, M. Ojala, N. Vuokko, K. Puolamäki, N. Tatti, and H. Mannila. 2009. Tell me something I don't know: Randomization strategies for iterative data mining. In Proceedings of the 15^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 379--388.

Digital Library

[21]

J. Hipp, U. Güntzer, and G. Nakhaeizadeh. 2000. Algorithms for association rule mining - A general survey and comparison. SIGKDD Explor. 2, 1, 58--64.

Digital Library

[22]

C. Lucchese, S. Orlando, and R. Perego. 2010. Mining top-k patterns from binary datasets in presence of noise. In Proceedings of the SIAM International Conference on Data Mining (SDM'10).

[23]

T. Mielikäinen. 2005. Transaction databases, frequent itemsets, and their condensed representations. In Proceedings of the 4^th International Conference on Knowledge Discovery in Inductive Databases (KDID'05). 139--164.

Digital Library

[24]

G. Misra, B. Golshan, and E. Terzi. 2012. A framework for evaluating the smoothness of data-mining results. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD'12). 660--675.

Digital Library

[25]

F. Moerchen, M. Thies, and A. Ultsch. 2010. Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowl. Inf. Syst. 29, 1, 55--80.

Digital Library

[26]

N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. 1999. Discovering frequent closed itemsets for association rules. In Proceedings of the 7^th International Conference on Database Theory (ICDT'99). 398--416.

Digital Library

[27]

J. Pei, J. Han, and L. V. S. Lakshmanan. 2001. Mining frequent itemsets with convertible constraints. In Proceedings of the International Conference on Data Engineering (ICDE'01). 433--442.

Digital Library

[28]

K. Smets and J. Vreeken. 2011. The odd one out: Identifying and characterising anomalies. In Proceedings of the SIAM International Conference on Data Mining (SDM'11).

[29]

N. Tatti. 2008. Maximum entropy based significance of itemsets. Knowl. Inf. Syst. 17, 1, 57--77.

Digital Library

[30]

N. Tatti and F. Moerchen. 2011. Finding robust itemsets under subsampling. In Proceedings of the 11^th IEEE International Conference on Data Mining (ICDM'11). 705--714.

Digital Library

[31]

T. Uno and H. Arimura. 2007. An efficient polynomial delay algorithm for pseudo frequent itemset mining. In Discovery Science. Springer, 219--230.

Digital Library

[32]

J. Vreeken, M. van Leeuwen, and A. Siebes. 2011. Krimp: Mining itemsets that compress. Data Mining Knowl. Discov. 23, 1, 169--214.

Digital Library

[33]

K. Wang, C. Xu, and B. Liu. 1999. Clustering transactions using large items. In Proceedings of the 11^th International Conference on Information and Knowledge Management (CIKM'99). 483--490.

Digital Library

[34]

G. I. Webb. 2007. Discovering significant patterns. Mach. Learn. 68, 1, 1--33.

Digital Library

[35]

D. Xin, J. Han, X. Yan, and H. Cheng. 2005. Mining compressed frequent-pattern sets. In Proceedings of the 31^st International Conference on Very Large Data Bases (VLDB'05). 709--720.

Digital Library

[36]

Y. Zhao and G. Karypis. 2002. Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the 11^th International Conference on Information and Knowledge Management (CIKM'02). 515--524.

Digital Library

Cited By

Ibrahim MMissaoui R(2024)Mining actionable concepts in concept lattice using Interestingness PropagationJournal of Computational Science10.1016/j.jocs.2023.10219675(102196)Online publication date: Jan-2024
https://doi.org/10.1016/j.jocs.2023.102196
Man TOsipov VZhukova NSubbotin AIgnatov D(2024)Neural networks for intelligent multilevel control of artificial and natural objects based on data fusion: A surveyInformation Fusion10.1016/j.inffus.2024.102427110(102427)Online publication date: Oct-2024
https://doi.org/10.1016/j.inffus.2024.102427
Hanika THirth J(2022)Knowledge cores in large formal contextsAnnals of Mathematics and Artificial Intelligence10.1007/s10472-022-09790-690:6(537-567)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1007/s10472-022-09790-6
Show More Cited By

Index Terms

Finding Robust Itemsets under Subsampling
1. Information systems
  1. Information systems applications

Recommendations

Finding Robust Itemsets under Subsampling
ICDM '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining

Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the ...
Finding Frequent Closed Itemsets in Sliding Window in Linear Time

One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent ...
An efficient algorithm for incrementally mining frequent closed itemsets

The purpose of mining frequent itemsets is to identify the items in groups that always appear together and exceed the user-specified threshold of a transaction database. However, numerous frequent itemsets may exist in a transaction database, hindering ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems

ACM Transactions on Database Systems Volume 39, Issue 3

September 2014

264 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/2676651

Editor:
Christian S. Jensen
Aalborg University, Denmark

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2014

Accepted: 01 February 2013

Revised: 01 November 2012

Received: 01 June 2012

Published in TODS Volume 39, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Fonds Wetenschappelijk Onderzoek

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ibrahim MMissaoui R(2024)Mining actionable concepts in concept lattice using Interestingness PropagationJournal of Computational Science10.1016/j.jocs.2023.10219675(102196)Online publication date: Jan-2024
https://doi.org/10.1016/j.jocs.2023.102196
Man TOsipov VZhukova NSubbotin AIgnatov D(2024)Neural networks for intelligent multilevel control of artificial and natural objects based on data fusion: A surveyInformation Fusion10.1016/j.inffus.2024.102427110(102427)Online publication date: Oct-2024
https://doi.org/10.1016/j.inffus.2024.102427
Hanika THirth J(2022)Knowledge cores in large formal contextsAnnals of Mathematics and Artificial Intelligence10.1007/s10472-022-09790-690:6(537-567)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1007/s10472-022-09790-6
Ferré SHuchard MKaytoue MKuznetsov SNapoli A(2020)Formal Concept Analysis: From Knowledge Discovery to Knowledge ProcessingA Guided Tour of Artificial Intelligence Research10.1007/978-3-030-06167-8_13(411-445)Online publication date: 8-May-2020
https://doi.org/10.1007/978-3-030-06167-8_13
Ibrahim MMissaoui R(2019)Approximating concept stability using variance reduction techniquesDiscrete Applied Mathematics10.1016/j.dam.2019.03.002Online publication date: Mar-2019
https://doi.org/10.1016/j.dam.2019.03.002
Kuznetsov SMakhalova T(2018)On interestingness measures of formal conceptsInformation Sciences: an International Journal10.1016/j.ins.2018.02.032442:C(202-219)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1016/j.ins.2018.02.032
Buzmakov AKuznetsov SNapoli A(2017)Efficient Mining of Subsample-Stable Graph Patterns2017 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2017.88(757-762)Online publication date: Nov-2017
https://doi.org/10.1109/ICDM.2017.88
Joglekar MGarcia-Molina HParameswaran A(2016)Interactive data exploration with smart drill-down2016 IEEE 32nd International Conference on Data Engineering (ICDE)10.1109/ICDE.2016.7498300(906-917)Online publication date: May-2016
https://doi.org/10.1109/ICDE.2016.7498300
Buzmakov AKuznetsov SNapoli A(2015)Fast Generation of Best Interval Patterns for Nonmonotonic ConstraintsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-319-23525-7_10(157-172)Online publication date: 29-Aug-2015
https://doi.org/10.1007/978-3-319-23525-7_10

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents