[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1242572.1242667acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Yago: a core of semantic knowledge

Published: 08 May 2007 Publication History

Abstract

We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts. This includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE). The facts have been automatically extracted from Wikipedia and unified with WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet: in quality by adding knowledge about individuals like persons, organizations, products, etc. with their semantic relationships - and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correctness shows an accuracy of about 95%. YAGO is based on a logically clean model, which is decidable, extensible, and compatible with RDFS. Finally, we show how YAGO can be further extended by state-of-the-art information extraction techniques.

References

[1]
E. Agichtein and L. Gravano. Snowball: extracting relations from large plain-text collections. In ICDL, 2000.
[2]
F. Baader and T. Nipkow. Term rewriting and all that. Cambridge University Press, New York, NY, USA, 1998.
[3]
R. C. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In EACL, 2006.
[4]
M. J. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the web. In EMNLP, 2005.
[5]
N. Chatterjee, S. Goyal, and A. Naithani. Resolving pattern ambiguity for english to hindi machine translation using WordNet. In Workshop on Modern Approaches in Translation Technologies, 2005.
[6]
S. Chaudhuri, V. Ganti, and R. Motwani. Robust identification of fuzzy duplicates. In ICDE, 2005.
[7]
W. W. Cohen and S. Sarawagi. Exploiting dictionaries in named entity extraction: combining semi-markov extraction processes and data integration methods. In KDD, 2004.
[8]
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL, 2002.
[9]
O. Etzioni, M. J. Cafarella, D. Downey, S. Kok, A. -M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in KnowItAll. In WWW, 2004.
[10]
C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, 1998.
[11]
J. Graupmann, R. Schenkel, and G. Weikum. The spheresearch engine for unified ranked retrieval of heterogeneous XML and web documents. In VLDB, 2005.
[12]
I. Horrocks, O. Kutz, and U. Sattler. The even more irresistible SROIQ. In KR, 2006.
[13]
W. Hunt, L. Lita, and E. Nyberg. Gazetteers, wordnet, encyclopedias, and the web: Analyzing question answering resources. Technical Report CMU-LTI-04-188, Language Technologies Institute, Carnegie Mellon, 2004.
[14]
G. Ifrim and G. Weikum. Transductive learning for text classification using explicit knowledge models. In PKDD, 2006.
[15]
D. Kinzler. WikiSense - Mining the Wiki. In Wikimania, 2005.
[16]
S. Liu, F. Liu, C. Yu, and W. Meng. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In SIGIR, 2004.
[17]
C. Matuszek, J. Cabral, M. Witbrock, and J. DeOliveira. An introduction to the syntax and content of Cyc. In AAAI Spring Symposium, 2006.
[18]
I. Niles and A. Pease. Towards a standard upper ontology. In FOIS, 2001.
[19]
N. F. Noy, A. Doan, and A. Y. Halevy. Semantic integration. AI Magazine, 26(1):7--10, 2005.
[20]
P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In ACL, 2006.
[21]
M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatic extraction of semantic relationships for WordNet by means of pattern learning from Wikipedia. In NLDB, pages 67--79, 2006.
[22]
S. Russell and P. Norvig. Artificial Intelligence: a Modern Approach. Prentice Hall, 2002.
[23]
R. Snow, D. Jurafsky, and A. Y. Ng. Semantic taxonomy induction from heterogenous evidence. In ACL, 2006.
[24]
S. Staab and R. Studer. Handbook on Ontologies. Springer, 2004.
[25]
F. M. Suchanek, G. Ifrim, and G. Weikum. Combining linguistic and statistical analysis to extract relations from web documents. In KDD, 2006.
[26]
F. M. Suchanek, G. Ifrim, and G. Weikum. LEILA: Learning to Extract Information by Linguistic Analysis. In Workshop on Ontology Population at ACL/COLING, 2006.
[27]
M. Theobald, R. Schenkel, and G. Weikum. TopX and XXL at INEX 2005. In INEX, 2005.
[28]
W3C. Sparql, 2005. retrieved from http://www.w3.org/TR/rdf-sparql-query/.

Cited By

View all
  • (2025)Knowledge GraphsAdvanced Research Trends in Sustainable Solutions, Data Analytics, and Security10.4018/979-8-3693-7117-6.ch005(99-146)Online publication date: 3-Jan-2025
  • (2025)Machining Scheme Selection of Features Based on Process Knowledge Graph and Improved Cosine Similarity MatchingMachines10.3390/machines1303018813:3(188)Online publication date: 26-Feb-2025
  • (2025)HGeoKG: A Hierarchical Geographic Knowledge Graph for Geographic Knowledge ReasoningISPRS International Journal of Geo-Information10.3390/ijgi1401001814:1(18)Online publication date: 3-Jan-2025
  • Show More Cited By

Index Terms

  1. Yago: a core of semantic knowledge

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '07: Proceedings of the 16th international conference on World Wide Web
    May 2007
    1382 pages
    ISBN:9781595936547
    DOI:10.1145/1242572
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. WordNet
    2. wikipedia

    Qualifiers

    • Article

    Conference

    WWW'07
    Sponsor:
    WWW'07: 16th International World Wide Web Conference
    May 8 - 12, 2007
    Alberta, Banff, Canada

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)237
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Knowledge GraphsAdvanced Research Trends in Sustainable Solutions, Data Analytics, and Security10.4018/979-8-3693-7117-6.ch005(99-146)Online publication date: 3-Jan-2025
    • (2025)Machining Scheme Selection of Features Based on Process Knowledge Graph and Improved Cosine Similarity MatchingMachines10.3390/machines1303018813:3(188)Online publication date: 26-Feb-2025
    • (2025)HGeoKG: A Hierarchical Geographic Knowledge Graph for Geographic Knowledge ReasoningISPRS International Journal of Geo-Information10.3390/ijgi1401001814:1(18)Online publication date: 3-Jan-2025
    • (2025)A Reinforcement Learning Approach for Graph Rule LearningBig Data Mining and Analytics10.26599/BDMA.2024.90200708:1(31-44)Online publication date: Feb-2025
    • (2025)Knowledge Error Detection via Textual and Structural Joint LearningBig Data Mining and Analytics10.26599/BDMA.2024.90200408:1(233-240)Online publication date: Feb-2025
    • (2025)HTEA: Heterogeneity-aware Embedding Learning for Temporal Entity AlignmentProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703588(982-990)Online publication date: 10-Mar-2025
    • (2025)Untapping the Power of Indirect Relationships in Entity SummarizationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703566(820-828)Online publication date: 10-Mar-2025
    • (2025)Unsupervised Robust Cross-Lingual Entity Alignment via Neighbor Triple Matching with Entity and Relation TextsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703500(184-193)Online publication date: 10-Mar-2025
    • (2025)Label Informed Contrastive Pretraining for Node Importance Estimation on Knowledge GraphsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2024.336369536:3(4462-4476)Online publication date: Mar-2025
    • (2025)Graph Percolation Embeddings for Efficient Knowledge Graph Inductive ReasoningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350806437:3(1198-1212)Online publication date: Mar-2025
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media