Abstract
In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data – which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from incorrect ontological role assignments. Our experimental evaluations with the TAP and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40% to 85% for template driven sites, from 68% to 87% for non-template driven sites.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: ACM SIGMOD, Washington, D.C., pp. 207–216. ACM Press, New York (1993), citeseer.ist.psu.edu/agrawal93mining.html
Alpaydin, E.: Introduction to Machine Learning, pp. 39–59. MIT Press, Cambridge (2004)
Chickering, D.M.: Learning bayesian networks is NP-complete. In: Learning from Data: Artificial Intelligence and Statistics V (1996)
Crescenzi, V., Mecca, G.: Automatic information extraction from large web sites. Journal of ACM 51(5), 731–779 (2004)
Dill, S., et al.: A case for automated large-scale semantic annotation. Journal of Web Semantics 1(1), 115–132 (2003)
Friedman, N., et al.: Learning probabilistic relational models. In: IJCAI, pp. 1300–1309 (1999), citeseer.ist.psu.edu/friedman99learning.html
Gelgi, F., Vadrevu, S., Davulcu, H.: Automatic extraction of relational models from the web data. Technical Report ASU-CSE-TR-06-009, Arizona State University (April 2006)
Guha, R., McCool, R.: TAP: A semantic web toolkit. Semantic Web Journal (2003)
Murphy, K.: A brief intro. to graphical models and bayesian networks (1998)
Vadrevu, S., Gelgi, F., Davulcu, H.: Semantic partitioning of web pages. In: WISE, New York, NY, USA, pp. 107–118 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gelgi, F., Vadrevu, S., Davulcu, H. (2007). Relational Model Based Annotation of the Web Data. In: Wegrzyn-Wolska, K.M., Szczepaniak, P.S. (eds) Advances in Intelligent Web Mastering. Advances in Soft Computing, vol 43. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72575-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-72575-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72574-9
Online ISBN: 978-3-540-72575-6
eBook Packages: EngineeringEngineering (R0)