Abstract
Knowledge engineers have had difficulty in automatically constructing and populating domain ontologies, mainly due to the well-known knowledge acquisition bottleneck. In this paper, we attempt to alleviate this problem by proposing an unsupervised approach for extracting class instances using the web as a big corpus and exploring linguistic patterns to identify and extract ontological class instances. The prototype implementation uses shallow syntactic parsing for disambiguation issues. In addition, we propose a confidence-weighted metric based on different versions of the classical PMI metric, WordNet similarity measures, and heuristics to calculate the final confidence score that can altogether improve the ranking of candidate instances retrieved by the system. We conducted preliminary experiments comparing the proposed confidence metric against some versions of the PMI metric. We obtained promising results for the final ranking of the candidate instances, achieving a gain in precision up to 24%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284(5), 34–43 (2001)
Cimiano, P.: Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer, New York (2006)
Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: An introduction and a Survey of Current Approaches. J. Information Science 36(3), 306–323 (2010)
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Web-Scale Information Extraction in KnowItAll. In: Proc. of the 13th Inter. WWW Conference (WWW 2004), New York City, New York, pp. 100–110 (2004)
Cimiano, P., Handschuh, S., Staab, S.: Towards the self-annotating web. In: Proceedings of the 13th International Conf. on World Wide Web, pp. 462–471. ACM, New York (2004)
Cimiano, P., Ladwig, G., Staab, S.: Gimme The Context: Context driven Automatic Semantic Annotation with CPANKOW. In: Proc. of the 14th Inter. Conf. on WWW, Japan, pp. 332–341 (2005)
McDowell, L.K., Cafarella, M.: Ontology-Driven, Unsupervised Instance Population. Web Semantics: Science, Services and Agents on the World Wide Web 6(3), 218–236 (2008)
Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th Conference on Computational Linguistics, COLING 1992, Nantes, France, vol. 2, pp. 539–545. Morgan Kaufmann (1992)
Wu, F., Weld, D.S.: Autonomously Semantifying Wikipedia. In: CIKM, pp. 41-50. ACM (2007)
Brill, E.: Processing Natural Language without Natural Language Processing. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 360–369. Springer, Heidelberg (2003)
Ciravegna, F., Dingli, A., Guthrie, D., Wilks, Y.: Integrating Information to Bootstrap Information Extraction from Web Sites. In: IJCAI 2003 Workshop on Intelligent Information Integration, pp. 9–14 (2003)
Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavitsanos, E.: Ontology Population and Enrichment: State of the Art. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Multimedia Information Extraction. LNCS, vol. 6050, pp. 134–166. Springer, Heidelberg (2011)
Pedersen, T.: Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text. In: Proc. of the 11th Annual Conf. of the North American Chapter of the Association for Computational Linguistics, Los Angeles, pp. 329–332 (2010)
Lin, D.: An Information-Theoretic Definition of Similarity. In: Proceedings of International Conference on Machine Learning, Madison, Wisconsin (1998)
Wu, Z., Palmer, M.: Verb Semantics and Lexical Selection. In: 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, pp. 133–138 (1994)
Monllaó, C.V.: Ontology-based Information Extraction. Dissertation Thesis, Polytechnic University of Catalunya (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oliveira, H., Lima, R., Gomes, J., Ferreira, R., Freitas, F., Costa, E. (2012). A Confidence–Weighted Metric for Unsupervised Ontology Population from Web Texts. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32600-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-32600-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32599-1
Online ISBN: 978-3-642-32600-4
eBook Packages: Computer ScienceComputer Science (R0)