More Web Proxy on the site http://driver.im/

article

Incremental sequence-based frequent query pattern mining from XML queries

Authors:

Lizhu ZhouAuthors Info & Claims

Data Mining and Knowledge Discovery, Volume 18, Issue 3

Pages 472 - 516

https://doi.org/10.1007/s10618-009-0126-5

Published: 01 June 2009 Publication History

Abstract

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i ⁺, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly.

References

[1]

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB, pp 487-499.

[2]

Aggarwal C, Ta N, Wang J, Feng J, Zaki MJ (2007) Xproj: a framework for projected structural clustering of xml documents. In: KDD.

[3]

Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDM.

[4]

Aumann Y, Feldman R, Liphstat O, Mannila H (1999) Borders: an efficient algorithm for association generation in dynamic databases. J Intell Inf Syst 12(1):61-73.

Digital Library

[5]

Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD.

[6]

Balmin A, Ozcan F, Beyer K, Cochrane R, Pirahesh H (2004) A framework for using materialized xpath views in xml query processing. In: VLDB, pp 60-71.

[7]

Bettini C, Wang XS, Jajodia S (1998) Mining temporal relationships with multiple granularities in time sequences. IEEE Data Eng Bull 21(1):32-38.

[8]

Chen L, Rundensteiner EA, Wang S (2002) Xcache: a semantic caching system for xml queries. In: SIGMOD.

[9]

Chen Y, Yang L, Wang YG (2004) Incremental mining of frequent xml query patterns. In: ICDM, pp 343-346.

[10]

Chung C-W, Min J-K, Shim K (2002) Apex: an adaptive path index for xml data. In: SIGMOD Conference, pp 121-132.

[11]

Dehaspe L, Toivonen H, King R (1998) Finding frequent substructures in chemical compounds. In: KDD, pp 30-36.

[12]

Feng J, Qian Q, Wang J, Zhou L (2006) Exploit sequencing to accelerate hot xml query pattern mining. In: ACM SAC.

[13]

Feng J, Ta N, Li G (2007) Exploit sequencing views in semantic cache to accelerate xpath query evaluation. In: WWW.

[14]

Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: ICDE, pp 106-115.

[15]

Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M (2000) Freespan: frequent pattern-projected sequential pattern mining. In: KDD, pp 355-359.

[16]

Hristidis V, Petropoulos M (2002) Semantic caching of xml databases. In: WebDB.

[17]

Kaushik R, Shenoy P, Bohannon P, Gudes E (2002) Exploiting local similarity for indexing paths in graph-structured data. In: ICDE, pp 129-140.

[18]

Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDM, pp 313-320.

[19]

Kwon J, Rao P, Moon B, Lee S (2005) Fist: scalable xml document filtering by sequencing twig patterns. In: VLDB, pp 217-228.

[20]

Li G, Feng J, Ta N, Zhang Y, Zhou L (2006a) Scend: an efficient semantic cache to exploit xpath query/view answerability. In: WISE, pp 460-473.

[21]

Li G, Feng J, Wang J, Zhang Y, Zhou L (2006b) Incremental mining of frequent query patterns from xml queries for caching. In: ICDM, pp 350-361.

[22]

Luo Q, Krishnamurthy S, Mohan C, Pirahesh H, Woo H, Lindsay BG, Naughton JF (2002) Middle-tier database caching for e-business. In: SIGMOD, pp 600-611.

[23]

Mandhani B, Suciu D (2005) Query caching and view selection for xml databases. In: VLDB.

[24]

Masseglia F, Cathala F, Poncelet P (1998) The psp approach for mining sequential patterns. In: PKDD.

[25]

Milo T, Suciu D (1999) Index structures for path expressions. In: ICDT, pp 277-295.

[26]

Ozden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: ICDE, pp 412-421.

[27]

Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp 215-224.

[28]

Prüfer H (1918) Neuer beweis eines satzes uber permutationen. Archiv fur Mathematik und Physik 27:142- 144.

[29]

Qun C, Lim A, Ong KW (2003) D(k)-index: an adaptive structural summary for graph-structured data. In: SIGMOD, pp 134-144.

[30]

Rao PR, Moon B (2004) Prix: indexing and querying xml using prufer sequences. In: ICDE, pp 288-299.

[31]

Re C, Brinkley J, Hinshaw K, Suciu D (2004) Distributed xquery. In: Information integration on the web (IIWeb).

[32]

Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT, pp 3-17.

[33]

Termier A, Rousset M-C, Sebag M (2002) Treefinder: a first step towards xml data mining. In: ICDM, pp 450-457.

[34]

Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: ICDE, pp 79-90.

[35]

Wang K, Liu H (2000) Discovering structural association of semistructured data. IEEE TKDE 12(2):353- 371.

[36]

Wang H, Park S, Fan W, Yu PS (2003) Vist: a dynamic indexmethod for querying xml data by tree structures. In: SIGMOD, pp 110-121.

[37]

Xu W (2005) The framework of an xml semantic caching system. In: WebDB.

[38]

Yan X, Han J, Afshar R (2003) Clospan: mining closed sequential patterns in large databases. In: SDM.

[39]

Yang J, Wang W, Yu PS, Han J (2002) Mining long sequential patterns in a noisy environment. In: SIGMOD, pp 406-417.

[40]

Yang LH, Lee M-L, Hsu W (2003a) Efficient mining of xml query patterns for caching. In: VLDB, pp 69-80.

[41]

Yang LH, Lee M-L, Hsu W, Acharya S (2003b) Mining frequent query patterns from xml queries. In: DASFAA, pp 355-362.

[42]

Yang LH, Lee ML, Hsu W, Guo X (2004) 2pxminer: an efficient two pass mining of frequent xml query patterns. In: KDD.

[43]

Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: SIGKDD, pp 71-80.

[44]

Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE TKDE 17(8):1021-1035.

Digital Library

Cited By

Yong LMingmin ZYunliang JHaiying Z(2012)Novel Applications of VRComputers and Graphics10.1016/j.cag.2012.01.00336:3(178-184)Online publication date: 1-May-2012
https://dl.acm.org/doi/10.1016/j.cag.2012.01.003

Incremental sequence-based frequent query pattern mining from XML queries

Recommendations

CanTree: a canonical-order tree for incremental frequent-pattern mining

Since its introduction, frequent-pattern mining has been the subject of numerous studies, including incremental updating. Many existing incremental mining algorithms are Apriori-based, which are not easily adoptable to FP-tree-based frequent-pattern ...
A survey of incremental high-utility itemset mining

Traditional association rule mining has been widely studied. But it is unsuitable for real-world applications where factors such as unit profits of items and purchase quantities must be considered. High-utility itemset mining HUIM is designed to find ...
An effective approach for maintenance of pre-large-based frequent-itemset lattice in incremental mining

Incremental mining has attracted the attention of many researchers due to its usefulness in online applications. Many algorithms have thus been proposed for incrementally mining frequent itemsets. Maintaining a frequent-itemset lattice (FIL) is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery Volume 18, Issue 3

June 2009

180 pages

ISSN:1384-5810

Issue’s Table of Contents

Copyright © Copyright © 2009 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2009

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yong LMingmin ZYunliang JHaiying Z(2012)Novel Applications of VRComputers and Graphics10.1016/j.cag.2012.01.00336:3(178-184)Online publication date: 1-May-2012
https://dl.acm.org/doi/10.1016/j.cag.2012.01.003

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents