Abstract
In this paper, we study an online data mining problem from streams of semi-structured data such as XML data. Modeling-semistructured data and patterns as labeled ordered trees, we present an online algorithm StreamT that receives fragments of an unseen possibly infinite semi-structured data in the document order through a data stream, and can return the current set of frequent patterns immediately on request at any time. We give modifications of the algorithm to other online mining models. Furthermore we implement our algorithms in different online models and candidate management strategies, then show empirical analyses to evaluate the algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized Substructure Discovery for Semi-structured Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 1–14. Springer, Heidelberg (2002)
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web. Morgan Kaufmann, San Francisco (2000)
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: Proc. SIAM SDM’02, pp. 158–174 (2002)
Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online Algorithms for Mining Semi-structured Data Stream. In: Proc. IEEE ICDM’02, pp. 27–34. IEEE Computer Society Press, Los Alamitos (2002)
Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordred Trees. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 47–61. Springer, Heidelberg (2003)
Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: Proc. SIGMOD98, pp. 85–93 (1998)
de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry, Algorithms and Applications. Springer, Heidelberg (2000)
Dehaspe, L., Toivonen, H., King, R.D.: Finding Frequent Substructures in Chemical Compounds. In: Proc. KDD-98, pp. 30–36 (1998)
Gibbons, P.B., Matias, Y.: Synopsis Data Structures for Massive Data Sets. In: External Memory Algorithms. DIMACS Series in Discr. Math. and Theor. Compt. Sci, vol. 50, pp. 39–70. AMS, New York (2000)
Hidber, C.: Online Association Rule Mining. In: Proc. SIGMOD’99, pp. 145–156 (1999)
Kilpelainen, P., Mannila, H.: Ordered and Unordered Tree Inclusion. SIAM J. Comput. 24(2), 340–356 (1995)
Laird, P., Saul, R.: Discrete Sequence Prediction and Its Applications. Machine Learning 15(1), 43–68 (1994)
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering Frequent Episode in Sequences. In: Proc. KDD-95, pp. 210–215. AAAI, Menlo Park (1995)
Matsuda, T., Horiuchi, T., Motoda, H., Washio, T., Kumazawa, K., Arai, N.: Graph-Based Induction for General Graph Structured Data. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 340–342. Springer, Heidelberg (1999)
Miyahara, T., Suzuki, Y., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 341–355. Springer, Heidelberg (2002)
Nakano, S.: Efficient Generation of Plane Trees. Information Processing Letters 84, 167–172 (2002)
Parthasarathy, S., Zaki, M.J., Ogihara, M., Dwarkadas, S.: Incremental and Interactive Sequence Mining. In: CIKM’99, pp. 251–258 (1999)
Rastogi, R.: Single-Path Algorithms for Querying and Mining Data Streams. In: Proc. SDM’02 Workshop HDM’02, pp. 43–48 (2002)
W3C, Extensive Markup Language (XML) 1.0 (Second Edition), W3C Recommendation, 06 October (2000), http://www.w3.org/TR/REC-xml
Wang, K., Liu, H.: Discovering Structural Association of Semistructured Data. TKDE 12(2), 353–371 (2000)
Yamanishi, K., Takeuchi, J.: A Unifying Framework for Detecting Outliers and Change Points from Non-Stationary Time Series Data. In: Proc. SIGKDD-2002, ACM Press, New York (2002)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. SIGKDD-2002, ACM Press, New York (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Arikawa, S. (2007). Efficient Algorithms for Finding Frequent Substructures from Semi-structured Data Streams. In: Sakurai, A., Hasida, K., Nitta, K. (eds) New Frontiers in Artificial Intelligence. JSAI JSAI 2003 2004. Lecture Notes in Computer Science(), vol 3609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71009-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-71009-7_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71008-0
Online ISBN: 978-3-540-71009-7
eBook Packages: Computer ScienceComputer Science (R0)