[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Incremental sequence-based frequent query pattern mining from XML queries

Published: 01 June 2009 Publication History

Abstract

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i +, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly.

References

[1]
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB, pp 487-499.
[2]
Aggarwal C, Ta N, Wang J, Feng J, Zaki MJ (2007) Xproj: a framework for projected structural clustering of xml documents. In: KDD.
[3]
Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDM.
[4]
Aumann Y, Feldman R, Liphstat O, Mannila H (1999) Borders: an efficient algorithm for association generation in dynamic databases. J Intell Inf Syst 12(1):61-73.
[5]
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD.
[6]
Balmin A, Ozcan F, Beyer K, Cochrane R, Pirahesh H (2004) A framework for using materialized xpath views in xml query processing. In: VLDB, pp 60-71.
[7]
Bettini C, Wang XS, Jajodia S (1998) Mining temporal relationships with multiple granularities in time sequences. IEEE Data Eng Bull 21(1):32-38.
[8]
Chen L, Rundensteiner EA, Wang S (2002) Xcache: a semantic caching system for xml queries. In: SIGMOD.
[9]
Chen Y, Yang L, Wang YG (2004) Incremental mining of frequent xml query patterns. In: ICDM, pp 343-346.
[10]
Chung C-W, Min J-K, Shim K (2002) Apex: an adaptive path index for xml data. In: SIGMOD Conference, pp 121-132.
[11]
Dehaspe L, Toivonen H, King R (1998) Finding frequent substructures in chemical compounds. In: KDD, pp 30-36.
[12]
Feng J, Qian Q, Wang J, Zhou L (2006) Exploit sequencing to accelerate hot xml query pattern mining. In: ACM SAC.
[13]
Feng J, Ta N, Li G (2007) Exploit sequencing views in semantic cache to accelerate xpath query evaluation. In: WWW.
[14]
Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: ICDE, pp 106-115.
[15]
Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M (2000) Freespan: frequent pattern-projected sequential pattern mining. In: KDD, pp 355-359.
[16]
Hristidis V, Petropoulos M (2002) Semantic caching of xml databases. In: WebDB.
[17]
Kaushik R, Shenoy P, Bohannon P, Gudes E (2002) Exploiting local similarity for indexing paths in graph-structured data. In: ICDE, pp 129-140.
[18]
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDM, pp 313-320.
[19]
Kwon J, Rao P, Moon B, Lee S (2005) Fist: scalable xml document filtering by sequencing twig patterns. In: VLDB, pp 217-228.
[20]
Li G, Feng J, Ta N, Zhang Y, Zhou L (2006a) Scend: an efficient semantic cache to exploit xpath query/view answerability. In: WISE, pp 460-473.
[21]
Li G, Feng J, Wang J, Zhang Y, Zhou L (2006b) Incremental mining of frequent query patterns from xml queries for caching. In: ICDM, pp 350-361.
[22]
Luo Q, Krishnamurthy S, Mohan C, Pirahesh H, Woo H, Lindsay BG, Naughton JF (2002) Middle-tier database caching for e-business. In: SIGMOD, pp 600-611.
[23]
Mandhani B, Suciu D (2005) Query caching and view selection for xml databases. In: VLDB.
[24]
Masseglia F, Cathala F, Poncelet P (1998) The psp approach for mining sequential patterns. In: PKDD.
[25]
Milo T, Suciu D (1999) Index structures for path expressions. In: ICDT, pp 277-295.
[26]
Ozden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: ICDE, pp 412-421.
[27]
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp 215-224.
[28]
Prüfer H (1918) Neuer beweis eines satzes uber permutationen. Archiv fur Mathematik und Physik 27:142- 144.
[29]
Qun C, Lim A, Ong KW (2003) D(k)-index: an adaptive structural summary for graph-structured data. In: SIGMOD, pp 134-144.
[30]
Rao PR, Moon B (2004) Prix: indexing and querying xml using prufer sequences. In: ICDE, pp 288-299.
[31]
Re C, Brinkley J, Hinshaw K, Suciu D (2004) Distributed xquery. In: Information integration on the web (IIWeb).
[32]
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT, pp 3-17.
[33]
Termier A, Rousset M-C, Sebag M (2002) Treefinder: a first step towards xml data mining. In: ICDM, pp 450-457.
[34]
Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: ICDE, pp 79-90.
[35]
Wang K, Liu H (2000) Discovering structural association of semistructured data. IEEE TKDE 12(2):353- 371.
[36]
Wang H, Park S, Fan W, Yu PS (2003) Vist: a dynamic indexmethod for querying xml data by tree structures. In: SIGMOD, pp 110-121.
[37]
Xu W (2005) The framework of an xml semantic caching system. In: WebDB.
[38]
Yan X, Han J, Afshar R (2003) Clospan: mining closed sequential patterns in large databases. In: SDM.
[39]
Yang J, Wang W, Yu PS, Han J (2002) Mining long sequential patterns in a noisy environment. In: SIGMOD, pp 406-417.
[40]
Yang LH, Lee M-L, Hsu W (2003a) Efficient mining of xml query patterns for caching. In: VLDB, pp 69-80.
[41]
Yang LH, Lee M-L, Hsu W, Acharya S (2003b) Mining frequent query patterns from xml queries. In: DASFAA, pp 355-362.
[42]
Yang LH, Lee ML, Hsu W, Guo X (2004) 2pxminer: an efficient two pass mining of frequent xml query patterns. In: KDD.
[43]
Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: SIGKDD, pp 71-80.
[44]
Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE TKDE 17(8):1021-1035.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery  Volume 18, Issue 3
June 2009
180 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2009

Author Tags

  1. Frequent query patterns
  2. Incremental mining
  3. Sequential pattern mining
  4. XML frequent pattern mining
  5. XML query patterns

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media