Efficient Algorithms for Finding Frequent Substructures from Semi-structured Data Streams

Tatsuya Asai¹,
Kenji Abe¹,
Shinji Kawasoe¹,
Hiroki Arimura¹ &
…
Setsuo Arikawa¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3609))

Included in the following conference series:

599 Accesses
1 Citations

Abstract

In this paper, we study an online data mining problem from streams of semi-structured data such as XML data. Modeling-semistructured data and patterns as labeled ordered trees, we present an online algorithm StreamT that receives fragments of an unseen possibly infinite semi-structured data in the document order through a data stream, and can return the current set of frequent patterns immediately on request at any time. We give modifications of the algorithm to other online mining models. Furthermore we implement our algorithms in different online models and candidate management strategies, then show empirical analyses to evaluate the algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Tree-Miner: Mining Sequential Patterns from SP-Tree

ProSecCo: progressive sequence mining with convergence guarantees

Article 20 August 2019

A Relevance Criterion for Sequential Patterns

References

Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized Substructure Discovery for Semi-structured Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 1–14. Springer, Heidelberg (2002)
Chapter Google Scholar
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
MATH Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: Proc. SIAM SDM’02, pp. 158–174 (2002)
Google Scholar
Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online Algorithms for Mining Semi-structured Data Stream. In: Proc. IEEE ICDM’02, pp. 27–34. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordred Trees. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 47–61. Springer, Heidelberg (2003)
Chapter Google Scholar
Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: Proc. SIGMOD98, pp. 85–93 (1998)
Google Scholar
de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry, Algorithms and Applications. Springer, Heidelberg (2000)
Book Google Scholar
Dehaspe, L., Toivonen, H., King, R.D.: Finding Frequent Substructures in Chemical Compounds. In: Proc. KDD-98, pp. 30–36 (1998)
Google Scholar
Gibbons, P.B., Matias, Y.: Synopsis Data Structures for Massive Data Sets. In: External Memory Algorithms. DIMACS Series in Discr. Math. and Theor. Compt. Sci, vol. 50, pp. 39–70. AMS, New York (2000)
Chapter Google Scholar
Hidber, C.: Online Association Rule Mining. In: Proc. SIGMOD’99, pp. 145–156 (1999)
Google Scholar
Kilpelainen, P., Mannila, H.: Ordered and Unordered Tree Inclusion. SIAM J. Comput. 24(2), 340–356 (1995)
Article MathSciNet Google Scholar
Laird, P., Saul, R.: Discrete Sequence Prediction and Its Applications. Machine Learning 15(1), 43–68 (1994)
MATH Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering Frequent Episode in Sequences. In: Proc. KDD-95, pp. 210–215. AAAI, Menlo Park (1995)
Google Scholar
Matsuda, T., Horiuchi, T., Motoda, H., Washio, T., Kumazawa, K., Arai, N.: Graph-Based Induction for General Graph Structured Data. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 340–342. Springer, Heidelberg (1999)
Chapter Google Scholar
Miyahara, T., Suzuki, Y., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 341–355. Springer, Heidelberg (2002)
Chapter Google Scholar
Nakano, S.: Efficient Generation of Plane Trees. Information Processing Letters 84, 167–172 (2002)
Article MathSciNet Google Scholar
Parthasarathy, S., Zaki, M.J., Ogihara, M., Dwarkadas, S.: Incremental and Interactive Sequence Mining. In: CIKM’99, pp. 251–258 (1999)
Google Scholar
Rastogi, R.: Single-Path Algorithms for Querying and Mining Data Streams. In: Proc. SDM’02 Workshop HDM’02, pp. 43–48 (2002)
Google Scholar
W3C, Extensive Markup Language (XML) 1.0 (Second Edition), W3C Recommendation, 06 October (2000), http://www.w3.org/TR/REC-xml
Wang, K., Liu, H.: Discovering Structural Association of Semistructured Data. TKDE 12(2), 353–371 (2000)
Google Scholar
Yamanishi, K., Takeuchi, J.: A Unifying Framework for Detecting Outliers and Change Points from Non-Stationary Time Series Data. In: Proc. SIGKDD-2002, ACM Press, New York (2002)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. SIGKDD-2002, ACM Press, New York (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Kyushu University, 6–10–1 Hakozaki Higashi-ku, Fukuoka 812–8581, Japan
Tatsuya Asai, Kenji Abe, Shinji Kawasoe, Hiroki Arimura & Setsuo Arikawa

Authors

Tatsuya Asai
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Abe
View author publications
You can also search for this author in PubMed Google Scholar
Shinji Kawasoe
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Arimura
View author publications
You can also search for this author in PubMed Google Scholar
Setsuo Arikawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Akito Sakurai Kôiti Hasida Katsumi Nitta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Asai, T., Abe, K., Kawasoe, S., Arimura, H., Arikawa, S. (2007). Efficient Algorithms for Finding Frequent Substructures from Semi-structured Data Streams. In: Sakurai, A., Hasida, K., Nitta, K. (eds) New Frontiers in Artificial Intelligence. JSAI JSAI 2003 2004. Lecture Notes in Computer Science(), vol 3609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71009-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-71009-7_3
Published: 20 July 2007
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71008-0
Online ISBN: 978-3-540-71009-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Algorithms for Finding Frequent Substructures from Semi-structured Data Streams

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Tree-Miner: Mining Sequential Patterns from SP-Tree

ProSecCo: progressive sequence mining with convergence guarantees

A Relevance Criterion for Sequential Patterns

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficient Algorithms for Finding Frequent Substructures from Semi-structured Data Streams

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Tree-Miner: Mining Sequential Patterns from SP-Tree

ProSecCo: progressive sequence mining with convergence guarantees

A Relevance Criterion for Sequential Patterns

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation