Abstract
Information in Digital Libraries is explicitly organized, described, and managed. The content of their data resources is summarized into small descriptions, usually called metadata, which can be either introduced manually or automatically generated. In this context, specialized thesauri are frequently used to provide accurate content for subject or keyword metadata elements. However, if a Digital Library aims at providing access for the general public, it is not reasonable to assume that casual users will use the same terms as the keywords used in metadata records. As an initial step to fill the semantic gap between user queries and metadata records, the authors of this paper already created a method for the semantic disambiguation of thesauri with respect to an upper-level ontology (WordNet). This paper presents now the integration of this disambiguation within an information retrieval system, in this case adapting the vector-space retrieval model. Thanks to the disambiguation, both metadata records and queries can be homogenously represented as a collection of WordNet synsets, thus enabling the computing of a similarity value, which ranks the results.
The basic technology of this work has been partially supported by the Spanish Ministry of Science and Technology through the projects TIC2000-1568-C03-01 from the National Plan for Scientific Research, Development and Technology Innovation and FIT-150500-2003-519 from the National Plan for Information Society. The work of J. Lacasta has been partially supported by a grant from the Aragón Government and the European Social Fund (ref. B139/2003).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Clark, P., Thompson, J., Holmback, H., Duncan, L.: Exploiting a thesaurus-based semantic net for knowledge-based search. In: Proc 12th Conf on Innovative Application of AI (AAAI/IAAI 2000), pp. 988–995 (2000)
Mata, E.J., Ansó, J., Bañares, J.A., Muro-Medrano, P.R., Rubio, J.: Enriquecimiento de tesauros con wordnet: una aproximación heurística. In: Actas IX CAEPIA, Gijón, pp. 593–602 (2001)
Miller, G.A.: Wordnet: An on-line lexical database. Int. J. Lexicography 3 (1990)
Gonzalo, J., Verdejo, F., Chugur, I., Cigarran, J.: Indexing with WordNet synsets can improve Text Retrieval. In: Proc. COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing (1998)
Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of the 17th International Conference on Research and Development in Information Retrieval (1994)
Salton, G. (ed.): The SMART retrieval system - Experiments in Automatic Document Processing. Prentice Hall, Inc., Englewood Cliffs (1971)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Voorhees, E.M.: Using WordNet to disambiguate Word Senses for Text Retrieval. In: SIGIR 1993, Proc. 16th annual international ACM SIGIR conf. on Research and Development in Information Retrieval, pp. 171–180 (1993)
Voorhees, E.M.: On Expanding Query Vectors with Lexically Related Words. In: Text REtrieval Conference, pp. 223–232 (1993)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 513–523 (1988)
Bernabé, M.A., Gould, M., Muro-Medrano, P.R., Nogueras, J., Zarazaga, F.J.: Effective steps toward the Spain National Geographic Information Infrastructure. In: Proc 4th AGILE Conference on Geographic Information Science, Brno, Czech Republic, pp. 236–243 (2001)
Nassar, N.: Searching With Isearch, Moving beyond WAIS. Web Techniques magazine (1997), www.webtechniques.com
Scherer, D., Brennan, C.: Exploring Oracle Text Basics. Oracle Magazine (March/April 2001) http://www.oracle.com/oramag/index.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nogueras-Iso, J., Lacasta, J., Bañares, J.Á., Muro-Medrano, P.R., Zarazaga-Soria, F.J. (2004). Exploiting Disambiguated Thesauri for Information Retrieval in Metadata Catalogs. In: Conejo, R., Urretavizcaya, M., Pérez-de-la-Cruz, JL. (eds) Current Topics in Artificial Intelligence. TTIA 2003. Lecture Notes in Computer Science(), vol 3040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25945-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-25945-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22218-7
Online ISBN: 978-3-540-25945-9
eBook Packages: Springer Book Archive