Abstract
Mining frequent itemsets from transactional datasets is a well known problem. Thus, various methods have been studied to deal with this issue. Recently, original proposals have emerged from the cross-fertilization between data mining and artificial intelligence. In these declarative approaches, the itemset mining problem is modeled either as a constraint network or a propositional formula whose models correspond to the patterns of interest. In this paper, we focus on the propositional satisfiability based itemset mining framework. Our main goal is to enhance the efficiency of SAT model enumeration algorithms. This issue is particularly crucial for the scalability and competitiveness of such declarative itemset mining approaches. In this context, we deeply analyse the effect of the different SAT solver components on the efficiency of the model enumeration problem. Our analysis includes the main components of modern SAT solvers such as restarts, activity based variable ordering heuristics and clauses learning mechanism. Through extensive experiments, we show that these classical components play an essential role in such procedure to improve the performance by pushing forward the efficiency of SAT solvers. More precisely, our experimental evaluation includes a comparative study in enumerating all the models corresponding to the closed frequent itemsets. Additionally, our experimental analysis is extended to include the Top-k itemset mining problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
FIMI: http://fimi.ua.ac.be/data/.
- 2.
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. SIGMOD ’93, pp. 207–216. ACM, New York, NY, USA (1993)
Asín, R., Nieuwenhuis, R., Oliveras, A., Rodríguez-Carbonell, E.: Cardinality networks: a theoretical and empirical study. Constraints 16(2), 195–221 (2011)
Bailleux, O., Boufkhad, Y.: Efficient CNF encoding of boolean cardinality constraints. In: CP, pp. 108–122 (2003)
Biere, A., Heule, M.J.H., van Maaren, H., Walsh, T. (eds.): Handbook of Satisfiability, Frontiers in AI and Applications, vol. 185. IOS Press (2009)
Chauhan, P., Clarke, E.M., Kroening, D.: Using sat based image computation for reachability analysis. Tech. rep., Technical Report CMU-CS-03-151 (2003)
Coquery, E., Jabbour, S., Saïs, L., Salhi, Y.: A SAT-based approach for discovering frequent, closed and maximal patterns in a sequence. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI’12), pp. 258–263 (2012)
Davis, M., Logemann, G., Loveland, D.W.: A machine program for theorem-proving. Commun. ACM 5(7), 394–397 (1962)
Fu, A.W.C., Kwong, R.W.W., Tang, J.: Mining n-most interesting itemsets. In: Proceedings of the 12th International Symposium on Methodologies for Intelligent Systems (ISMIS 2000), Lecture Notes in Computer Science, pp. 59–67. Springer (2000)
Gebser, M., Kaufmann, B., Neumann, A., Schaub, T.: Conflict-driven answer set enumeration. In: Baral, C., Brewka, G., Schlipf, J. (eds.) Logic Programming and Nonmonotonic Reasoning. Lecture notes in computer science, vol. 4483, pp. 136–148. Springer, Berlin (2007)
Guns, T., Nijssen, S., Raedt, L.D.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12–13), 1951–1983 (2011)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29, 1–12 (2000)
Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top-k frequent closed patterns without minimum support. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), 9–12 December 2002, Maebashi City, Japan, pp. 211–218 (2002)
Jabbour, S., Lonlac, J., Sais, L., Salhi, Y.: Extending modern SAT solvers for models enumeration. In: Proceedings of the 15th IEEE International Conference on Information Reuse and Integration, IRI 2014, Redwood City, CA, USA, August 13–15, 2014, pp. 803–810 (2014)
Jabbour, S., Sais, L., Salhi, Y.: Boolean satisfiability for sequence mining. In: 22nd ACM International Conference on Information and Knowledge Management (CIKM’13), pp. 649–658. ACM (2013)
Jabbour, S., Sais, L., Salhi, Y.: A pigeon-hole based encoding of cardinality constraints. TPLP 13(4-5-Online-Supplement) (2013)
Jabbour, S., Sais, L., Salhi, Y.: The top-k frequent closed itemset mining using top-k sat problem. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’03), pp. 403–418 (2013)
Jeroslow, R.G., Wang, J.: Solving propositional satisfiability problems. Ann. Math. Artif. Intell. 1, 167–187 (1990)
Jin, H., Han, H., Somenzi, F.: Efficient conflict analysis for finding all satisfying assignments of a boolean circuit. In: In TACAS’05, LNCS 3440, pp. 287–300. Springer (2005)
Marques-Silva, J.P., Sakallah, K.A.: GRASP—A new search algorithm for satisfiability. In: Proceedings of IEEE/ACM CAD, pp. 220–227 (1996)
McMillan, K.L.: Applying sat methods in unbounded symbolic model checking. In: Proceedings of the 14th International Conference on Computer Aided Verification (CAV’02), pp. 250–264 (2002)
Morgado, A.R., Marques-Silva, J.A.P.: Good Learning and Implicit Model Enumeration. In: International Conference on Tools with Artificial Intelligence (ICTAI’2005), pp. 131–136. IEEE (2005)
Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001, pp. 441–448 (2001)
Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: ACM SIGKDD, pp. 204–212 (2008)
Silva, J.P.M., Lynce, I.: Towards robust cnf encodings of cardinality constraints. In: CP, pp. 483–497 (2007)
Sinz, C.: Towards an optimal cnf encoding of boolean cardinality constraints. In: CP’05, pp. 827–831 (2005)
Tiwari, A., Gupta, R., Agrawal, D.: A survey on frequent pattern mining: current status and challenging issues. Inf. Technol. J 9, 1278–1293 (2010)
Tseitin, G.: On the complexity of derivations in the propositional calculus. In: H. Slesenko (ed.) Structures in Constructives Mathematics and Mathematical Logic, Part II, pp. 115–125 (1968)
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In: FIMI ’04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, November 1, 2004 (2004)
Zhang, L., Madigan, C.F., Moskewicz, M.W., Malik, S.: Efficient conflict driven learning in Boolean satisfiability solver. In: IEEE/ACM CAD’2001, pp. 279–285 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Dlala, I.O., Jabbour, S., Sais, L., Yaghlane, B.B. (2016). A Comparative Study of SAT-Based Itemsets Mining. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXIII. SGAI 2016. Springer, Cham. https://doi.org/10.1007/978-3-319-47175-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-47175-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47174-7
Online ISBN: 978-3-319-47175-4
eBook Packages: Computer ScienceComputer Science (R0)