[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Efficient discovery of correlated patterns using multiple minimum all-confidence thresholds

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Correlated patterns are an important class of regularities that exist in a database. Although there exists no universally acceptable best measure to judge the interestingness of a pattern, all-confidence is emerging as a popular measure to discover the patterns. It is because the measure satisfies both the anti-monotonic and null-invariance properties. The former property makes the pattern mining practicable in real-world applications. The latter property facilitates the user to discover the patterns involving both frequent and rare items without generating the huge number of patterns. In this paper, we show that though the measure satisfies the null-invariance property, mining the patterns containing both frequent and rare items with a single minimum all-confidence (minAllConf) threshold leads to the dilemma known as “rare item problem.” At a high minAllConf, the discovered correlated patterns involving rare items have very short length. At a low minAllConf, combinatorial explosion can occur, producing too many patterns. To confront the problem, the paper introduces an alternative model based on the concept of multiple minAllConf thresholds. The proposed model generalizes the existing model of correlated patterns and facilitates the user to specify a different minAllConf for each pattern depending upon its items’ frequencies. A pattern-growth algorithm, called GCoMine, has also been proposed to discover the patterns. Experiment results show that GCoMine is efficient, and the proposed model can address the problem effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Agrawal, R., Imieliński, T., Swami, A. (1993). Mining association rules between sets of items in large databases. In SIGMOD (pp. 207–216).

  • Agrawal, R., & Srikanth, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB (pp. 487–499).

  • Brin, S., Motwani, R., Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. SIGMOD Rec, 26, 265–276.

    Article  Google Scholar 

  • Brijs, T., Goethals, B., Swinnen, G., Vanhoof, K., Wets, G. (2000). A data mining framework for optimal product selection in retail supermarket data: the generalized PROFSET model. In KDD (pp. 300–304).

  • Cohen, H., West, S.G., Cohen, P., Aiken, L. (2002). Applied multiple regression correlation analysis for the behavioral sciences, 3rd edn. Lawrence Erlbaum Assoc Inc.

  • Gedikli, F., & Jannach, D. (2010). Neighborhood-restricted mining and weighted application of association rules for recommenders. In International conference on web information system engineering, (pp. 157–165).

  • Han, J., Pei, J., Yin, Y., Mao, R. (2004). Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowledge Discovery, 15(1), 55–86.

    Article  MathSciNet  Google Scholar 

  • Han, J., Cheng, H., Xin, D., Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining Knowledge Discovery, 15(1), 55–86.

    Article  MathSciNet  Google Scholar 

  • Kim, W.Y., Lee, Y.K., Han, J. (2004). Ccmine: efficient mining of confidence-closed correlated patterns. In PAKDD (pp. 569–579).

  • Kim, S., Barsky, M., Han, J. (2011). Efficient mining of top correlated patterns based on null invariant measures. In ECML PKDD (pp. 172–192).

  • Kiran, R.U., & Reddy, P.K. (2011). Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms. In EDBT (pp. 11–20).

  • Kiran, R.U., & Kitsuregawa, M. (2012). Efficient discovery of correlated patterns in transactional databases using items’ support intervals. In DEXA (pp. DEXA (1) 234–248).

  • Kiran, R.U., & Kitsuregawa, M. (2013). Mining correlated patterns with multiple minimum all-confidence thresholds. In PAKDD-QIMIE (pp. 234–248).

  • Kubat, M., Holte, R.C., Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30(2), 195–215.

    Article  Google Scholar 

  • Kuo, P.W., Jenssen, T.K., Butte, A.J., Onno-Machado, L., Kohane, I.S. (2002). Analysis of matched mrna measurements from two different microarray technologies. Bioinformatics, 18(3), 405–412.

    Article  Google Scholar 

  • Lee, Y.K., Kim, W.Y., Cao, D., Han, J. (2003). CoMine: efficient mining of correlated patterns. In ICDM (pp. 581–584).

  • Liu, B., Hsu, W., Ma, Y. (1999). Mining association rules with multiple minimum supports. In KDD (pp. 337–341).

  • Omiecinski, E.R. (2003). Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15, 57–69.

    Article  Google Scholar 

  • Pei, J., Han, J., Lakshmanan, L.V. (2004). Pushing convertible constraints in frequent itemset mining. Data Mining and Knowledge Discovery, 8, 227–251.

    Article  MathSciNet  Google Scholar 

  • Storch, H.V., & Zwiers, F.W. (2002). Statistical analysis in climate research. Cambridge University Press.

  • Surana, A., Kiran, R.U., Reddy, P.K. (2010). Selecting a right interestingness measure for rare association rules. In COMAD (pp. 115–124).

  • Tan, P.N., Kumar, V., Srivasta, J. (2002). Selecting the right interestingness measure for association patterns. In KDD (pp. 32–41).

  • Weiss, G.M. (2004). Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter, 6(1), 7–19.

    Article  Google Scholar 

  • Wu, T., Chen, Y., Han, J. (2010). Re-examination of interestingness measures in pattern mining: a unified framework. Data Mining Knolwedge Discovery, 21, 371–397.

    Article  MathSciNet  Google Scholar 

  • Xiong, H., He, X., Ding, C.H.Q., Zhang, Y., Kumar, V., Holbrook, S.R. (2005). Identification of functional modules in protein complexes via hyperclique pattern discovery. In Pacific symposium on biocomputing.

  • Xiong, H., Tan, P.N., Kumar, V. (2006). Hyperclique pattern discovery. Data Mining Knowledge Discovery, 13(2), 219–242.

    Article  MathSciNet  Google Scholar 

  • Yun, H., Ha, D., Hwang, B., Ryu, K.H. (2003). Mining association rules on significant rare data using relative support. Journal of Systems and Software, 67(3), 181–191.

    Article  Google Scholar 

  • Zhou, Z., Wu, Z., Wang, C., Feng, Y. (2006). Mining both associated and correlated patterns. Computational Science ICCS, 3994, 468–475.

    Google Scholar 

  • Zhou, Z., Wu, Z., Wang, C., Feng, Y. (2006). Efficiently mining mutually and positively correlated patterns. Advanced Data Mining and Applications, 4093, 118–125.

    Article  Google Scholar 

  • Zheng, Z., Kohavi, R., Mason, L. (2001). Real world performance of association rule algorithms. In KDD (pp. 401–406).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Uday Kiran Rage.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rage, U.K., Kitsuregawa, M. Efficient discovery of correlated patterns using multiple minimum all-confidence thresholds. J Intell Inf Syst 45, 357–377 (2015). https://doi.org/10.1007/s10844-014-0314-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-014-0314-7

Keywords

Navigation