[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1242572.1242667acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Yago: a core of semantic knowledge

Published: 08 May 2007 Publication History

Abstract

We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts. This includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE). The facts have been automatically extracted from Wikipedia and unified with WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet: in quality by adding knowledge about individuals like persons, organizations, products, etc. with their semantic relationships - and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correctness shows an accuracy of about 95%. YAGO is based on a logically clean model, which is decidable, extensible, and compatible with RDFS. Finally, we show how YAGO can be further extended by state-of-the-art information extraction techniques.

References

[1]
E. Agichtein and L. Gravano. Snowball: extracting relations from large plain-text collections. In ICDL, 2000.
[2]
F. Baader and T. Nipkow. Term rewriting and all that. Cambridge University Press, New York, NY, USA, 1998.
[3]
R. C. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In EACL, 2006.
[4]
M. J. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the web. In EMNLP, 2005.
[5]
N. Chatterjee, S. Goyal, and A. Naithani. Resolving pattern ambiguity for english to hindi machine translation using WordNet. In Workshop on Modern Approaches in Translation Technologies, 2005.
[6]
S. Chaudhuri, V. Ganti, and R. Motwani. Robust identification of fuzzy duplicates. In ICDE, 2005.
[7]
W. W. Cohen and S. Sarawagi. Exploiting dictionaries in named entity extraction: combining semi-markov extraction processes and data integration methods. In KDD, 2004.
[8]
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL, 2002.
[9]
O. Etzioni, M. J. Cafarella, D. Downey, S. Kok, A. -M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in KnowItAll. In WWW, 2004.
[10]
C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, 1998.
[11]
J. Graupmann, R. Schenkel, and G. Weikum. The spheresearch engine for unified ranked retrieval of heterogeneous XML and web documents. In VLDB, 2005.
[12]
I. Horrocks, O. Kutz, and U. Sattler. The even more irresistible SROIQ. In KR, 2006.
[13]
W. Hunt, L. Lita, and E. Nyberg. Gazetteers, wordnet, encyclopedias, and the web: Analyzing question answering resources. Technical Report CMU-LTI-04-188, Language Technologies Institute, Carnegie Mellon, 2004.
[14]
G. Ifrim and G. Weikum. Transductive learning for text classification using explicit knowledge models. In PKDD, 2006.
[15]
D. Kinzler. WikiSense - Mining the Wiki. In Wikimania, 2005.
[16]
S. Liu, F. Liu, C. Yu, and W. Meng. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In SIGIR, 2004.
[17]
C. Matuszek, J. Cabral, M. Witbrock, and J. DeOliveira. An introduction to the syntax and content of Cyc. In AAAI Spring Symposium, 2006.
[18]
I. Niles and A. Pease. Towards a standard upper ontology. In FOIS, 2001.
[19]
N. F. Noy, A. Doan, and A. Y. Halevy. Semantic integration. AI Magazine, 26(1):7--10, 2005.
[20]
P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In ACL, 2006.
[21]
M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatic extraction of semantic relationships for WordNet by means of pattern learning from Wikipedia. In NLDB, pages 67--79, 2006.
[22]
S. Russell and P. Norvig. Artificial Intelligence: a Modern Approach. Prentice Hall, 2002.
[23]
R. Snow, D. Jurafsky, and A. Y. Ng. Semantic taxonomy induction from heterogenous evidence. In ACL, 2006.
[24]
S. Staab and R. Studer. Handbook on Ontologies. Springer, 2004.
[25]
F. M. Suchanek, G. Ifrim, and G. Weikum. Combining linguistic and statistical analysis to extract relations from web documents. In KDD, 2006.
[26]
F. M. Suchanek, G. Ifrim, and G. Weikum. LEILA: Learning to Extract Information by Linguistic Analysis. In Workshop on Ontology Population at ACL/COLING, 2006.
[27]
M. Theobald, R. Schenkel, and G. Weikum. TopX and XXL at INEX 2005. In INEX, 2005.
[28]
W3C. Sparql, 2005. retrieved from http://www.w3.org/TR/rdf-sparql-query/.

Cited By

View all
  • (2025)Expressiveness Analysis and Enhancing Framework for Geometric Knowledge Graph Embedding ModelsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348691537:1(306-318)Online publication date: Jan-2025
  • (2025)Unsupervised fuzzy temporal knowledge graph entity alignment via joint fuzzy semantics learning and global structure learningNeurocomputing10.1016/j.neucom.2024.129019617(129019)Online publication date: Feb-2025
  • (2025)Knowledge graph completion with low-dimensional gated hierarchical hyperbolic embeddingKnowledge-Based Systems10.1016/j.knosys.2024.112804309(112804)Online publication date: Jan-2025
  • Show More Cited By

Index Terms

  1. Yago: a core of semantic knowledge

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '07: Proceedings of the 16th international conference on World Wide Web
    May 2007
    1382 pages
    ISBN:9781595936547
    DOI:10.1145/1242572
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. WordNet
    2. wikipedia

    Qualifiers

    • Article

    Conference

    WWW'07
    Sponsor:
    WWW'07: 16th International World Wide Web Conference
    May 8 - 12, 2007
    Alberta, Banff, Canada

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)270
    • Downloads (Last 6 weeks)41
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Expressiveness Analysis and Enhancing Framework for Geometric Knowledge Graph Embedding ModelsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348691537:1(306-318)Online publication date: Jan-2025
    • (2025)Unsupervised fuzzy temporal knowledge graph entity alignment via joint fuzzy semantics learning and global structure learningNeurocomputing10.1016/j.neucom.2024.129019617(129019)Online publication date: Feb-2025
    • (2025)Knowledge graph completion with low-dimensional gated hierarchical hyperbolic embeddingKnowledge-Based Systems10.1016/j.knosys.2024.112804309(112804)Online publication date: Jan-2025
    • (2025)EvoPath: Evolutionary meta-path discovery with large language models for complex heterogeneous information networksInformation Processing & Management10.1016/j.ipm.2024.10392062:1(103920)Online publication date: Jan-2025
    • (2025)Triplet trustworthiness validation with knowledge graph reasoningEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109813141(109813)Online publication date: Feb-2025
    • (2024)A knowledge graph embedding model based attention mechanism for enhanced node information integrationPeerJ Computer Science10.7717/peerj-cs.180810(e1808)Online publication date: 22-Jan-2024
    • (2024)CR‐M‐SpanBERT: Multiple embedding‐based DNN coreference resolution using self‐attention SpanBERTETRI Journal10.4218/etrij.2023-030846:1(35-47)Online publication date: 28-Feb-2024
    • (2024)Large Knowledge Model: Perspectives and ChallengesData Intelligence10.3724/2096-7004.di.2024.00016:3(587-620)Online publication date: 6-Sep-2024
    • (2024)Unleashing the Power of Decoders: Temporal Knowledge Graph Extrapolation with Householder TransformationSymmetry10.3390/sym1609116616:9(1166)Online publication date: 6-Sep-2024
    • (2024)Knowledge Graph Embedding Using a Multi-Channel Interactive Convolutional Neural Network with Triple AttentionMathematics10.3390/math1218282112:18(2821)Online publication date: 11-Sep-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media