[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2213836.2213891acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Probase: a probabilistic taxonomy for text understanding

Published: 20 May 2012 Publication History

Abstract

Knowledge is indispensable to understanding. The ongoing information explosion highlights the need to enable machines to better understand electronic text in human language. Much work has been devoted to creating universal ontologies or taxonomies for this purpose. However, none of the existing ontologies has the needed depth and breadth for universal understanding. In this paper, we present a universal, probabilistic taxonomy that is more comprehensive than any existing ones. It contains 2.7 million concepts harnessed automatically from a corpus of 1.68 billion web pages. Unlike traditional taxonomies that treat knowledge as black and white, it uses probabilities to model inconsistent, ambiguous and uncertain information it contains. We present details of how the taxonomy is constructed, its probabilistic modeling, and its potential applications in text understanding.

References

[1]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, pages 722--735, 2007.
[2]
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, pages 2670--2676, 2007.
[3]
D. Blei and J. Lafferty. Topic models. In Text Mining: Classification, Clustering, and Applications. Taylor & Francis, 2009.
[4]
P. Bloom. Glue for the mental world. Nature, 421:212--213, Jan 2003.
[5]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.
[6]
S. A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text. In ACL, 1999.
[7]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.
[8]
P. Cimiano, A. Pivk, L. S. Thieme, and S. Staab. Learning taxonomic relations from heterogeneous sources of evidence. In Proceedings of the ECAI 2004 Ontology Learning and Population Workshop, 2004.
[9]
B. Ding, H. Wang, R. Jin, J. Han, and Z. Wang. Optimizing index for taxonomy keyword search. In SIGMOD, 2012.
[10]
D. Downey, M. Broadhead, and O. Etzioni. Locating complex named entities in web text. In IJCAI, pages 2733--2739, 2007.
[11]
D. Downey, O. Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In IJCAI, 2005.
[12]
O. Etzioni, M. J. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in knowitall. In WWW, pages 100--110, 2004.
[13]
C. Fellbaum, editor. WordNet: an electronic lexical database. MIT Press, 1998.
[14]
M. Fleischman. Automated subcategorization of named entities. In ACL (Companion Volume), pages 25--30, 2001.
[15]
M. Fleischman and E. H. Hovy. Fine grained classification of named entities. In COLING, 2002.
[16]
M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, pages 539--545, 1992.
[17]
T. Lee, Z. Wang, H. Wang, and S. Hwang. Web scale taxonomy cleansing. In VLDB, 2011.
[18]
D. B. Lenat and R. V. Guha. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, 1989.
[19]
P. Li, H. Wang, H. Li, and X. Wu. Sparse information extraction based on semantic contexts. Under submission, 2012.
[20]
Z. Li, H. Li, H. Wang, and X. Zhou. Overcoming semantic drift in web-scale information extraction. Under submission, 2012.
[21]
C. Matuszek, M. J. Witbrock, R. C. Kahlert, J. Cabral, D. Schneider, P. Shah, and D. B. Lenat. Searching for common sense: Populating cyc from the web. In AAAI, pages 1430--1435, 2005.
[22]
G. Murphy. The big book of concepts. The MIT Press, 2004.
[23]
D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3--26, 2007.
[24]
R. Navigli. Word sense disambiguation: A survey. ACM Comput. Surv., 41(2), 2009.
[25]
M. Pasca. Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds. In WWW, 2007.
[26]
S. P. Ponzetto and M. Strube. Deriving a large-scale taxonomy from wikipedia. In AAAI, 2007.
[27]
A. Ritter, S. Soderland, and O. Etzioni. What is this, anyway: Automatic hypernym discovery. In AAAI Spring Symposium on Learning by Reading and Learning to Read, 2009.
[28]
E. Segal, D. Koller, and D. Ormoneit. Probabilistic abstraction hierarchies. In NIPS, pages 913--920, 2001.
[29]
B. Shao, H. Wang, and Y. Li. The Trinity graph engine. Technical report, Microsoft Research, 2012.
[30]
B. Shao, H. Wang, and Y. Xiao. Managing and mining large graphs: Systems and implementations. In SIGMOD, 2012.
[31]
P. Singh, T. Lin, E. Mueller, G. Lim, T. Perkins, and W. Li Zhu. Open Mind Common Sense: Knowledge acquisition from the general public. On the Move to Meaningful Internet Systems: CoopIS, DOA, and ODBASE, pages 1223--1237, 2002.
[32]
R. Snow, D. Jurafsky, and A. Y. Ng. Semantic taxonomy induction from heterogenous evidence. In ACL, 2006.
[33]
R. Snow, S. Prakash, D. Jurafsky, and A. Y. Ng. Learning to merge word senses. In EMNLP-CoNLL, pages 1005--1014, 2007.
[34]
Y. Song, H. Wang, Z. Wang, H. Li, and W. Chen. Short text conceptualization using a probabilistic knowledgebase. In IJCAI, 2011.
[35]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, pages 697--706, 2007.
[36]
C. Thomas, P. Mehra, R. Brooks, and A. P. Sheth. Growing fields of interest - using an expand and reduce strategy for domain model extraction. In Web Intelligence, pages 496--502, 2008.
[37]
J. Wang, Z. Wang, H. Wang, and K. Q. Zhu. Understanding tables on the web. Technical report, Microsoft Research, 2010.
[38]
S. Wang, Y. Song, H. Wang, and Z. Zhang. On understanding short texts. Under submission, 2012.
[39]
Y. Wang, H. Li, H. Wang, and K. Q. Zhu. Toward topic search on the web. Technical report, Microsoft Research, 2010.
[40]
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: A probabilistic taxonomy for text understanding. Technical report, Microsoft Research, 2012.

Cited By

View all
  • (2025)Unsupervised fuzzy temporal knowledge graph entity alignment via joint fuzzy semantics learning and global structure learningNeurocomputing10.1016/j.neucom.2024.129019617(129019)Online publication date: Feb-2025
  • (2025)MHEC: One-shot relational learning of knowledge graphs completion based on multi-hop information enhancementNeurocomputing10.1016/j.neucom.2024.128760614(128760)Online publication date: Jan-2025
  • (2025)A survey on pragmatic processing techniquesInformation Fusion10.1016/j.inffus.2024.102712114(102712)Online publication date: Feb-2025
  • Show More Cited By

Index Terms

  1. Probase: a probabilistic taxonomy for text understanding

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
      May 2012
      886 pages
      ISBN:9781450312479
      DOI:10.1145/2213836
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 May 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. knowledgebase
      2. taxonomy
      3. text understanding

      Qualifiers

      • Research-article

      Conference

      SIGMOD/PODS '12
      Sponsor:

      Acceptance Rates

      SIGMOD '12 Paper Acceptance Rate 48 of 289 submissions, 17%;
      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)122
      • Downloads (Last 6 weeks)15
      Reflects downloads up to 30 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Unsupervised fuzzy temporal knowledge graph entity alignment via joint fuzzy semantics learning and global structure learningNeurocomputing10.1016/j.neucom.2024.129019617(129019)Online publication date: Feb-2025
      • (2025)MHEC: One-shot relational learning of knowledge graphs completion based on multi-hop information enhancementNeurocomputing10.1016/j.neucom.2024.128760614(128760)Online publication date: Jan-2025
      • (2025)A survey on pragmatic processing techniquesInformation Fusion10.1016/j.inffus.2024.102712114(102712)Online publication date: Feb-2025
      • (2024)A Hybrid Semantic Representation Method Based on Fusion Conceptual Knowledge and Weighted Word Embeddings for English TextsInformation10.3390/info1511070815:11(708)Online publication date: 5-Nov-2024
      • (2024)Deep Learning Classification of Traffic-Related Tweets: An Advanced Framework Using Deep Learning for Contextual Understanding and Traffic-Related Short Text ClassificationApplied Sciences10.3390/app14231100914:23(11009)Online publication date: 27-Nov-2024
      • (2024)XLORE 3: A Large-Scale Multilingual Knowledge Graph from Heterogeneous Wiki Knowledge ResourcesACM Transactions on Information Systems10.1145/366052142:6(1-47)Online publication date: 19-Aug-2024
      • (2024)A Semantics-enhanced Topic Modelling Technique: Semantic-LDAACM Transactions on Knowledge Discovery from Data10.1145/363940918:4(1-27)Online publication date: 12-Feb-2024
      • (2024)Ontology Enrichment for Effective Fine-grained Entity TypingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671857(2318-2327)Online publication date: 25-Aug-2024
      • (2024)OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity TypingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671745(1407-1417)Online publication date: 25-Aug-2024
      • (2024)Generating Intent-aware Clarifying Questions in Conversational Information Retrieval SystemsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679851(3384-3394)Online publication date: 21-Oct-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media