Abstract
Correlated patterns are an important class of regularities that exist in a database. Although there exists no universally acceptable best measure to judge the interestingness of a pattern, all-confidence is emerging as a popular measure to discover the patterns. It is because the measure satisfies both the anti-monotonic and null-invariance properties. The former property makes the pattern mining practicable in real-world applications. The latter property facilitates the user to discover the patterns involving both frequent and rare items without generating the huge number of patterns. In this paper, we show that though the measure satisfies the null-invariance property, mining the patterns containing both frequent and rare items with a single minimum all-confidence (minAllConf) threshold leads to the dilemma known as “rare item problem.” At a high minAllConf, the discovered correlated patterns involving rare items have very short length. At a low minAllConf, combinatorial explosion can occur, producing too many patterns. To confront the problem, the paper introduces an alternative model based on the concept of multiple minAllConf thresholds. The proposed model generalizes the existing model of correlated patterns and facilitates the user to specify a different minAllConf for each pattern depending upon its items’ frequencies. A pattern-growth algorithm, called GCoMine, has also been proposed to discover the patterns. Experiment results show that GCoMine is efficient, and the proposed model can address the problem effectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal, R., Imieliński, T., Swami, A. (1993). Mining association rules between sets of items in large databases. In SIGMOD (pp. 207–216).
Agrawal, R., & Srikanth, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB (pp. 487–499).
Brin, S., Motwani, R., Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. SIGMOD Rec, 26, 265–276.
Brijs, T., Goethals, B., Swinnen, G., Vanhoof, K., Wets, G. (2000). A data mining framework for optimal product selection in retail supermarket data: the generalized PROFSET model. In KDD (pp. 300–304).
Cohen, H., West, S.G., Cohen, P., Aiken, L. (2002). Applied multiple regression correlation analysis for the behavioral sciences, 3rd edn. Lawrence Erlbaum Assoc Inc.
Gedikli, F., & Jannach, D. (2010). Neighborhood-restricted mining and weighted application of association rules for recommenders. In International conference on web information system engineering, (pp. 157–165).
Han, J., Pei, J., Yin, Y., Mao, R. (2004). Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowledge Discovery, 15(1), 55–86.
Han, J., Cheng, H., Xin, D., Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining Knowledge Discovery, 15(1), 55–86.
Kim, W.Y., Lee, Y.K., Han, J. (2004). Ccmine: efficient mining of confidence-closed correlated patterns. In PAKDD (pp. 569–579).
Kim, S., Barsky, M., Han, J. (2011). Efficient mining of top correlated patterns based on null invariant measures. In ECML PKDD (pp. 172–192).
Kiran, R.U., & Reddy, P.K. (2011). Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms. In EDBT (pp. 11–20).
Kiran, R.U., & Kitsuregawa, M. (2012). Efficient discovery of correlated patterns in transactional databases using items’ support intervals. In DEXA (pp. DEXA (1) 234–248).
Kiran, R.U., & Kitsuregawa, M. (2013). Mining correlated patterns with multiple minimum all-confidence thresholds. In PAKDD-QIMIE (pp. 234–248).
Kubat, M., Holte, R.C., Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30(2), 195–215.
Kuo, P.W., Jenssen, T.K., Butte, A.J., Onno-Machado, L., Kohane, I.S. (2002). Analysis of matched mrna measurements from two different microarray technologies. Bioinformatics, 18(3), 405–412.
Lee, Y.K., Kim, W.Y., Cao, D., Han, J. (2003). CoMine: efficient mining of correlated patterns. In ICDM (pp. 581–584).
Liu, B., Hsu, W., Ma, Y. (1999). Mining association rules with multiple minimum supports. In KDD (pp. 337–341).
Omiecinski, E.R. (2003). Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15, 57–69.
Pei, J., Han, J., Lakshmanan, L.V. (2004). Pushing convertible constraints in frequent itemset mining. Data Mining and Knowledge Discovery, 8, 227–251.
Storch, H.V., & Zwiers, F.W. (2002). Statistical analysis in climate research. Cambridge University Press.
Surana, A., Kiran, R.U., Reddy, P.K. (2010). Selecting a right interestingness measure for rare association rules. In COMAD (pp. 115–124).
Tan, P.N., Kumar, V., Srivasta, J. (2002). Selecting the right interestingness measure for association patterns. In KDD (pp. 32–41).
Weiss, G.M. (2004). Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter, 6(1), 7–19.
Wu, T., Chen, Y., Han, J. (2010). Re-examination of interestingness measures in pattern mining: a unified framework. Data Mining Knolwedge Discovery, 21, 371–397.
Xiong, H., He, X., Ding, C.H.Q., Zhang, Y., Kumar, V., Holbrook, S.R. (2005). Identification of functional modules in protein complexes via hyperclique pattern discovery. In Pacific symposium on biocomputing.
Xiong, H., Tan, P.N., Kumar, V. (2006). Hyperclique pattern discovery. Data Mining Knowledge Discovery, 13(2), 219–242.
Yun, H., Ha, D., Hwang, B., Ryu, K.H. (2003). Mining association rules on significant rare data using relative support. Journal of Systems and Software, 67(3), 181–191.
Zhou, Z., Wu, Z., Wang, C., Feng, Y. (2006). Mining both associated and correlated patterns. Computational Science ICCS, 3994, 468–475.
Zhou, Z., Wu, Z., Wang, C., Feng, Y. (2006). Efficiently mining mutually and positively correlated patterns. Advanced Data Mining and Applications, 4093, 118–125.
Zheng, Z., Kohavi, R., Mason, L. (2001). Real world performance of association rule algorithms. In KDD (pp. 401–406).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rage, U.K., Kitsuregawa, M. Efficient discovery of correlated patterns using multiple minimum all-confidence thresholds. J Intell Inf Syst 45, 357–377 (2015). https://doi.org/10.1007/s10844-014-0314-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-014-0314-7