Abstract
Regarding that information in broad-coverage knowledge bases, such as thesauri, is usually incomplete, merging information from different sources is a good option to amplify coverage. We propose a method for the enrichment of a thesaurus with information acquired automatically from dictionaries: pairs of synonyms are assigned to candidate synsets and, the pairs whose elements are not in the thesaurus are clustered to identify new synsets. This method was used in the enrichment of a Brazilian Portuguese thesaurus with synonyms from a European Portuguese dictionary, and resulted in a larger and broader thesaurus with new words and new concepts. The assignments and the obtained synsets were manually evaluated and yielded correction scores higher than 71% and 85% respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proc. Human Language Technologies: 2009 Annual Conference of the North American Chapter of ACL (NAACL-HLT), pp. 19–27. ACL, Stroudsburg (2009)
Dolan, W.B.: Word sense ambiguation: clustering related senses. In: Proc. 15th Conference on Computational Linguistics (COLING), pp. 712–716. ACL, Morristown (1994)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press (May 1998)
Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Interfacing WordNet with DOLCE: towards OntoWordNet. In: Ontology and the Lexicon: A Natural Language Processing Perspective, ch.3. Cambridge University Press (2010)
Gfeller, D., Chappelier, J.C., Rios, P.D.L.: Synonym Dictionary Improvement through Markov Clustering and Clustering Stability. In: Proc. International Symposium on Applied Stochastic Models and Data Analysis (ASMDA), pp. 106–113 (2005)
Gomes, P., Pereira, F.C., Paiva, P., Seco, N., Carreiro, P., Ferreira, J.L., Bento, C.: Noun sense disambiguation with wordnet for software design retrieval. In: Proc. Advances in Artificial Intelligence, 16th Conference of the Canadian Society for Computational Studies of Intelligence, Halifax, Canada, pp. 537–543 (2003)
Gonçalo Oliveira, H., Gomes, P.: Onto.PT: Automatic Construction of a Lexical Ontology for Portuguese. In: Proc. 5th European Starting AI Researcher Symposium (STAIRS 2010). IOS Press (2010)
Gonçalo Oliveira, H., Gomes, P.: Automatic discovery of fuzzy synsets from dictionary definitions. In: Proc. 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain (2011)
Gonçalo Oliveira, H., Santos, D., Gomes, P.: Extracção de relações semânticas entre palavras a partir de um dicionário: o PAPEL e sua avaliação. Linguamática 2(1), 77–93 (2010)
Harabagiu, S.M., Moldovan, D.I.: Enriching the WordNet taxonomy with contextual knowledge acquired from text. In: Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language, pp. 301–333. MIT Press, Cambridge (2000)
Hearst, M.: Automated Discovery of WordNet Relations. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database and Some of its Applications, pp. 131–153. MIT Press, Cambridge (1998)
Kilgarriff, A.: Word senses are not bona fide objects: implications for cognitive science, formal semantics. In: Proc. 5th International Conference on the Cognitive Science of Natural Language Processing, NLP, pp. 193–200 (1996)
Lin, D., Pantel, P.: Concept discovery from text. In: Proc. 19th International Conference on Computational Linguistics (COLING), pp. 577–583 (2002)
Maziero, E.G., Pardo, T.A.S., Felippo, A.D., Dias-da-Silva, B.C.: A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In: VI Workshop em Tecnologia da Informação e da Linguagem Humana (TIL), pp. 390–392 (2008)
Nastase, V., Szpakowicz, S.: Augmenting WordNet’s Structure Using LDOCE. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 281–294. Springer, Heidelberg (2003)
Navarro, E., Sajous, F., Gaume, B., Prévot, L., Hsieh, S., Kuo, T.Y., Magistry, P., Huang, C.R.: Wiktionary and NLP: Improving synonymy networks. In: Proc. 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, pp. 19–27. ACL, Suntec (2009)
Navigli, R., Velardi, P., Cucchiarelli, A., Neri, F.: Extending and enriching WordNet with OntoLearn. In: Proc. 2nd Global WordNet Conference (GWC), pp. 279–284. Masaryk University, Brno (2004)
Niemann, E., Gurevych, I.: The people’s web meets linguistic knowledge: Automatic sense alignment of wikipedia and WordNet. In: Proc. International Conference on Computational Semantics (IWCS), Oxford, UK, pp. 205–214 (2011)
Pantel, P.: Inducing ontological co-occurrence vectors. In: Proc. 43rd Annual Meeting of the Association for Computational Linguistics, pp. 125–132. ACL Press (2005)
Pasca, M., Harabagiu, S.M.: The informative role of WordNet in open-domain question answering. In: Proc. NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, USA, pp. 138–143 (2001)
Pease, A., Fellbaum, C.: Formal ontology as interlingua: the SUMO and WordNet linking project and global WordNet linking project and global WordNet. In: Ontology and the Lexicon: A Natural Language Processing Perspective, ch.2., Cambridge University Press (2010)
Peters, W., Peters, I., Vossen, P.: Automatic sense clustering in EuroWordnet. In: Proc. 1st International Conference on Language Resources and Evaluation (LREC), Granada, pp. 409–416 (May 1998)
Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proc. 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, pp. 2083–2088 (2009)
Ponzetto, S.P., Navigli, R.: Knowledge-rich word sense disambiguation rivaling supervised systems. In: Procs. of 48th Annual Meeting of the Association for Computational Linguistics, pp. 1522–1531. ACL Press, Uppsala (2010)
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)
Santos, D., Barreiro, A., Costa, L., Freitas, C., Gomes, P., Gonçalo Oliveira, H., Medeiros, J.C., Silva, R.: O papel das relações semânticas em português: Comparando o TeP, o MWN.PT e o PAPEL. In: Actas do XXV Encontro Nacional da Associação Portuguesa de Linguística (forthcomming, 2010)
Teixeira, J., Sarmento, L., Oliveira, E.: Comparing Verb Synonym Resources for Portuguese. In: Computational Processing of the Portuguese Language, 9th International Conference Proc. (PROPOR), Porto Alegre, Brasil, pp. 100–109 (2010)
Tonelli, S., Pighin, D.: New features for FrameNet: WordNet mapping. In: Proc. 13th Conference on Computational Natural Language Learning (CoNLL), pp. 219–227. ACL, Stroudsburg (2009)
Toral, A., Muñoz, R., Monachini, M.: Named Entity Wordnet. In: Proc. International Conference on Language Resources and Evaluation (LREC). ELRA, Marrakech (2008)
Vossen, P.: EuroWordNet: a multilingual database for information retrievaleuroWordNet: a multilingual database for information retrieval. In: Proc. DELOS workshop on Cross-Language Information Retrieval, Zurich (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oliveira, H.G., Gomes, P. (2011). Automatically Enriching a Thesaurus with Information from Dictionaries. In: Antunes, L., Pinto, H.S. (eds) Progress in Artificial Intelligence. EPIA 2011. Lecture Notes in Computer Science(), vol 7026. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24769-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-24769-9_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24768-2
Online ISBN: 978-3-642-24769-9
eBook Packages: Computer ScienceComputer Science (R0)