Abstract
Natural language processing based information retrieval (NIR) aims to go beyond the conventional bag-of-words based information retrieval (KIR) by considering syntactic and even semantic information in documents. NIR is a conceptually appealing approach to IR, but is hard due to the need to measure distance/similarity between structures. We aim to move beyond the state of the art in measuring structure similarity for NIR.
In this paper, a novel tree similarity measurement dtwAcs is proposed in terms of a novel interpretation of trees as multi dimensional sequences. We calculate the distance between trees by the way of computing the distance between multi dimensional sequences, which is conducted by integrating the all common subsequences into the dynamic time warping method. Experimental result shows that dtwAcs outperforms the state of the art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mauldin, M.: Retrieval performance in FERRET: a conceptual information retrieval system. In: SIGIR 1991 (1991)
Strzalkowski, T. (ed.): Natural language Information Retrieval. Kluwer, New York (1999)
Carballo, J.P., Strzalkowski, T.: Natural language information retrieval: progress report. Information Processing Management 36(1), 155–178 (2000)
Mittendorfer, M., Winiwarter, W.: Exploiting syntactic analysis of queries for information retrieval. Journal of Data and Knowledge Engineering (2002)
Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 26–32. ACM, New York (2003)
Strzalkowski, T.: Natural Language Information Retrieval Project Homepage, http://www.cs.albany.edu/tomek
Strzalkowski, T., Perez-Carballo, J., Karlgren, J., Hulth, A., Tapanainen, P., Lahtinen, T.: Natural language information retrieval: TREC-8 report. In: TREC 1999, pp. 381–390 (1999)
Bille, P.: A survey on tree edit distance and related problems. Theor. Comput. Sci. 337(1-3), 217–239 (2005)
Chawathe, S.S.: Comparing hierarchical data in external memory. In: VLDB 1999, Edinburgh, UK, pp. 90–101 (1999)
Dalamagas, T., Cheng, T., Winkel, K.J., Sellis, T.: A methodology for clustering xml documents by structure. Information System 31(3), 187–228 (2006)
Selkow, S.M.: The tree-to-tree editing problem. Information Processing Letters 6(6), 184–186 (1977)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing 18(6), 1245–1262 (1989)
Che, W., Zhang, M., Aw, A., Tan, C., Liu, T., Li, S.: Using a hybrid convolution tree kernel for semantic role labeling. ACM Transactions on Asian Language Information Processing (TALIP) 7(4), 1–23 (2008)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Advances in Neural Information Processing Systems 14, pp. 625–632. MIT Press, Cambridge (2001)
Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: ICML 2002: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 291–298. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Moschitti, A.: Making tree kernels practical for natural language learning. In: Proceedings of the Eleventh International Conference on European Association for Computational Linguistics, Trento, Italy (2006)
Moschitti, A., Pighin, D., Basili, R.: Tree kernels for semantic role labeling. Computational Linguistics 34(2), 193–224 (2008)
Tetsuji, K., Kouichi, H., Hisashi, K., Kiyoko, F.K., Hiroshi, Y.: A spectrum tree kernel. Transactions of the Japanese Society for Artificial Intelligence 22(2), 140–147 (2007)
Vishwanathan, S.V.N., Smola, A.: Fast kernels for string and tree matching. Advances in Neural Information Processing Systems 15 (2003)
Haussler, D.: Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz (1999)
Aiolli, F., Da San Martino, G., Sperduti, A.: Route kernels for trees. In: ICML 2009: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 17–24. ACM, New York (2009)
Elzinga, C., Rahmann, S., Wang, H.: Algorithms for subsequence combinatorics. Theoretical Computer Science 409(3), 394–404 (2008)
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Communications of the ACM 18(6), 341–343 (1975)
Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for svm protein classification. Neural Information Processing Systems 15, 1441–1448 (2003)
Levenshtein, V.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10, 707 (1966)
Wang, H.: All common subsequences. In: IJCAI 2007: Proceedings of the 20th international joint conference on Artifical intelligence, Hyderabad, India, pp. 635–640 (2007)
Sakoe, H.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26, 43–49 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, Z., Wang, H., McClean, S. (2010). Measuring Tree Similarity for Natural Language Processing Based Information Retrieval. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds) Natural Language Processing and Information Systems. NLDB 2010. Lecture Notes in Computer Science, vol 6177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13881-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-13881-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13880-5
Online ISBN: 978-3-642-13881-2
eBook Packages: Computer ScienceComputer Science (R0)