Abstract
This paper presents and evaluates a method for the detection of DBpedia entity types (classes) that can be used to assess DBpedia’s quality and to complete missing types for un-typed resources. This method compares entity embeddings with traditional N-gram models coupled with clustering and classification. We evaluate the results for 358 typical DBpedia classes. Our results show that entity embeddings outperform n-gram models for type detection and can contribute to the improvement of DBpedia’s quality, maintenance, and evolution. This is a step toward improving the quality of Linked Open Data in general.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Krötzsch, M., Vrandečić, D., Völkel, M., Haller, H., Studer, R.: Semantic wikipedia. Web Semant. 5(4), 251–261 (2007)
Morsey, M., Lehmann, J., Auer, S., Stadler, C., Hellmann, S.: DBpedia and the live extraction of structured data from Wikipedia. Program 46(2), 157–181 (2012)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Bizer, C.: DBpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_52
Zhang, Z., Chen, S., Feng, Z.: Semantic annotation for web services based on DBpedia. In: 2013 IEEE 7th International Symposium on Service Oriented System Engineering (SOSE), pp. 280–285 (2013)
Keong, B.V., Anthony, P.: Meta search engine powered by DBpedia. In: Proceedings of the 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011, pp. 89–93 (2011)
Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using DBpedia. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM), pp. 465–474 (2013)
Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013), pp. 1–12 (2013)
Hu, Z., Huang, P., Deng, Y., Gao, Y., Xing, E.: Entity hierarchy embedding. In: Proceedings of the Association for Computational Linguistics 2015 (ACL 2015), pp. 1292–1300 (2015)
Chen, T., Tang, L.A., Sun, Y., Chen, Z., Zhang, K.: Entity embedding-based anomaly detection for heterogeneous categorical events. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2016), vol. 2016, pp. 1396–1403, January 2016
Zaveri, A., Kontokostas, D., Sherif, M.A., Bühmann, L., Morsey, M., Auer, S., Lehmann, J.: User-driven quality evaluation of DBpedia. In: Proceedings of the 9th International Conference on Semantic Systems - I-SEMANTICS 2013, p. 97 (2013)
Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R., Zaveri, A.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web - WWW 2014, pp. 747–758 (2014)
Gerber, D., Hellmann, S., Bühmann, L., Soru, T., Usbeck, R., Ngonga Ngomo, A.-C.: Real-time RDF extraction from unstructured data streams. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 135–150. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41335-3_9
Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Semant. Web Inf. Syst. (IJSWIS) 10, 63–86 (2014)
Seok, M., Song, H.-J., Park, C.-Y., Kim, J.-D., Kim, Y.-S.: Named entity recognition using word embedding as a feature 1. Int. J. Softw. Eng. Appl. 10(2), 93–104 (2016)
Ganguly, D., Roy, D., Mitra, M., Jones, G.J.F.: Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 795–798 (2015)
Zhou, G., He, T., Zhao, J., Hu, P.: Learning continuous word embedding with metadata for question retrieval in community question answering. In: Proceedings of ACL (2015)
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Goldberg, Y., Levy, O.: Word2vec explained: deriving Mikolov et al. Negative-Sampling Word-Embedding Method. arXiv Preprint arXiv:1402.3722, 2, 1–5 (2014)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Roark, B., Collins, M.: Discriminative n-gram language modeling. Comput. Speech Lang. 21(2), 1–30 (2007)
Jurafsky, D., Martin, J.H.: N-Gram. Speech and Language Processing (2014). https://lagunita.stanford.edu/c4x/Engineering/CS-224N/asset/slp4.pdf
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics (2010)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In NIPS, pp. 1–9 (2013)
Han, L., Embrechts, M., Szymanski, B., Sternickel, K., Ross, A.: Random forests feature selection with kernel partial least squares: detecting ischemia from MagnetoCardiograms. In: Proceedings of the European Symposium on Artificial Neural Networks, Burges, Belgium, pp. 221–226 (2006)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. 3rd edn. Morgan Kaufmann, San Francisco (2012)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Acknowledgements
We thank the Natural Sciences and Engineering Research Council of Canada (NSERC) for the financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhou, H., Zouaq, A., Inkpen, D. (2017). DBpedia Entity Type Detection Using Entity Embeddings and N-Gram Models. In: Różewski, P., Lange, C. (eds) Knowledge Engineering and Semantic Web. KESW 2017. Communications in Computer and Information Science, vol 786. Springer, Cham. https://doi.org/10.1007/978-3-319-69548-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-69548-8_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69547-1
Online ISBN: 978-3-319-69548-8
eBook Packages: Computer ScienceComputer Science (R0)