Abstract
This paper proposes a word sense language model based method for information retrieval. This method, differing from most of traditional ones, combines word senses defined in a thesaurus with a classic statistical model. The word sense language model regards the word sense as a form of linguistic knowledge, which is helpful in handling mismatch caused by synonym and data sparseness due to data limit. Experimental results based on TREC-Mandarin corpus show that this method gains 12.5% improvement on MAP over traditional tf-idf retrieval method but 5.82% decrease on MAP compared to a classic language model. A combination result of this method and the language model yields 8.92% and 7.93% increases over either respectively. We present analysis and discussions on the not-so-exciting results and conclude that a higher performance of word sense language model will owe to high accurate of word sense labeling. We believe that linguistic knowledge such as word sense of a thesaurus will help IR improve ultimately in many ways.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sanderson, M.: Word Sense Disambiguation and Information Retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1994), pp. 142–151 (1994)
Krovetz, R., Croft, W.B.: Lexical Ambiguity and Information Retrieval. In: Proceedings of ACM Transactions on Information Systems, pp. 115–141 (1992)
Weiss, S.F.: Learning to Disambiguate. Information Storage and Retrieval 9, 33–41 (1973)
Voorhees, E.M.: Using WordNet to Disambiguate Word Senses for Text Retrieval. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993), pp. 171–180 (1993)
Uzuner, O.: Word Sense Disambiguation Applied to Information Retrieval. Master paper of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, 5–20 (1998)
Christopher Stokoe, M.P.O., Tait, J.: Word Sense Disambiguation in Information Retrieval Revisited. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR 2003), pp. 159–166 (2003)
Ponte, J., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 275–281 (1998)
Liu, X., Croft, W.B.: Statistical Language Modeling for Information Retrieval. In: The Annual Review of Information Science and Technology, vol. 39 (2003)
Schütze, H., Pedersen, J.O.: Information Retrieval Based on Word Senses. In: Proceedings 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1995), pp. 161–175 (1995)
Gao, J., Nie, J.-Y., Wu, G., Cao, G.: Dependence Language Model for Information Retrieval. In: Proceedings of the 27th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2004), pp. 170–177 (2004)
Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 222–229 (1999)
Liu, T., Lu, Z., Li, S.: Implement a Full-Text Automatic System for Word Sense Tagging. Journal of Harbin Institute of Technology 37(12), 1603–1604 (2004)
Zhai, C., Lafferty, J.: Two-Stage Language Models for Information Retrieval. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 49–56 (2002)
Miller, D.R.H., Leek, T., Schwartz, R.M.: A Hidden Markov Model Information Retrieval System. In: The Proceedings of the 22nd International Conference on Research and Development in Information Retrieval (SIGIR1999), pp. 214–221 (1999)
Song, F., Croft, W.B.: A General Language Model for Information Retrieval. In: Proceedings of the Conference on Information and Knowledge Management (CIKM 1999), pp. 316–321 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, L., Zhang, Y., Liu, T., Liu, G. (2006). Word Sense Language Model for Information Retrieval. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_13
Download citation
DOI: https://doi.org/10.1007/11880592_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)