[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2835776.2835808acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Public Access

EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion

Published: 08 February 2016 Publication History

Abstract

A key challenge of entity set expansion is that multifaceted input seeds can lead to significant incoherence in the result set. In this paper, we present a novel solution to handling multifaceted seeds by combining existing user-generated ontologies with a novel word-similarity metric based on skip-grams. By blending the two resources we are able to produce sparse word ego-networks that are centered on the seed terms and are able to capture semantic equivalence among words. We demonstrate that the resulting networks possess internally-coherent clusters, which can be exploited to provide non-overlapping expansions, in order to reflect different semantic classes of the seeds. Empirical evaluation against state-of-the-art baselines shows that our solution, EgoSet, is able to not only capture multiple facets in the input query, but also generate expansions for each facet with higher precision.

References

[1]
Ramnath Balasubramanyan, Bhavana Dalvi, and William W Cohen. From topic models to semi-supervised learning: Biasing mixed-membership models to exploit topic-indicative features in entity clustering. In MLKDD, 2013.
[2]
Mohit Bansal, Kevin Gimpel, and Karen Livescu. Tailoring continuous word representations for dependency parsing. In ACL, 2014.
[3]
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008.
[4]
Jordan L Boyd-Graber, David M Blei, and Xiaojin Zhu. A topic model for word sense disambiguation. In EMNLP-CoNLL, 2007.
[5]
Junfu Cai, Wee Sun Lee, and Yee Whye Teh. Improving word sense disambiguation using topic features. In EMNLP-CoNLL, 2007.
[6]
Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. Context-aware query suggestion by mining click-through and session data. In KDD, 2008.
[7]
Zhe Chen, Michael Cafarella, and H. V. Jagadish. Long-tail vocabulary dictionary extraction from the web. In WSDM, 2016.
[8]
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Computational linguistics, 16 (1): 22--29, 1990.
[9]
Bhavana Bharat Dalvi, William W Cohen, and Jamie Callan. Websets: Extracting sets of entities from the web using unsupervised information extraction. In WSDM, 2012.
[10]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, and Ni Lao. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In KDD, 2014.
[11]
Zoubin Ghahramani and Katherine Heller. Bayesian sets. In NIPS, 2005.
[12]
Yeye He and Dong Xin. Seisa: set expansion by iterative similarity aggregation. In WWW, 2011.
[13]
Marti A Hearst. Automatic acquisition of hyponyms from large text corpora. In Computational linguistics, 1992.
[14]
Sergey Ioffe. Improved consistent sampling, weighted minhash and l1 sketching. In ICDM, 2010.
[15]
Weize Kong and James Allan. Extracting query facets from search results. In SIGIR, 2013.
[16]
Andrea Lancichinetti and Santo Fortunato. Community detection algorithms: a comparative analysis. Physical review E, 80, 2009.
[17]
Omer Levy and Yoav Goldberg. Dependency-based word embeddings. In ACL, 2014.
[18]
Dekang Lin. Automatic retrieval and clustering of similar words. In ACL, 1998.
[19]
Dekang Lin and Xiaoyun Wu. Phrase clustering for discriminative learning. In ACL, 2009.
[20]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. pharXiv preprint arXiv:1301.3781, 2013.
[21]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
[22]
Marie-Francine Moens, Juanzi Li, and Tat-Seng Chua. Knowledge extraction from wikis/bbs/blogs/news web sites. In Mining User Generated Content, pages 129--166. CRC Press, 2015. ISBN 9781466557413.
[23]
Patrick Pantel and Dekang Lin. Discovering word senses from text. In KDD, 2002.
[24]
Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, and Vishnu Vyas. Web-scale distributional similarity and entity set expansion. In EMNLP, 2009.
[25]
Marius Pasca. Acquisition of categorized named entities for web search. In CIKM, 2004.
[26]
Marius Pasca. Open-domain fine-grained class extraction from web search queries. In EMNLP, 2013.
[27]
John Prager. Question answering using constraint satisfaction. In ACL, 2004.
[28]
Xin Rong. word2vec parameter learning explained. pharXiv preprint arXiv:1411.2738, 2014.
[29]
Abeed Sarker and Graciela Gonzalez. Portable automatic text classification for adverse drug reaction detection via multi-corpus training. Journal of Biomedical Informatics, 2014.
[30]
Shuming Shi, Huibin Zhang, Xiaojie Yuan, and Ji-Rong Wen. Corpus-based semantic class mining: distributional vs. pattern-based approaches. In Computational Linguistics, 2010.
[31]
Partha Pratim Talukdar, Joseph Reisinger, Marius Paşca, Deepak Ravichandran, Rahul Bhagat, and Fernando Pereira. Weakly-supervised acquisition of labeled class instances using graph random walks. In EMNLP, 2008.
[32]
Simon Tong and Jeff Dean. System and methods for automatically creating lists, March 25 2008. US Patent 7,350,187.
[33]
Benjamin Van Durme and Marius Pasca. Finding cars, goddesses and enzymes: Parametrizable acquisition of labeled instances for open-domain information extraction. In AAAI, 2008.
[34]
Chi Wang, Kaushik Chakrabarti, Yeye He, Kris Ganjam, Zhimin Chen, and Philip A Bernstein. Concept expansion using web tables. In WWW, 2015.
[35]
Richard C Wang and William W Cohen. Language-independent set expansion of named entities using the web. In ICDM, 2007.
[36]
Richard C Wang and William W Cohen. Iterative set expansion of named entities using the web. In ICDM, 2008.
[37]
Seon Yang and Youngjoong Ko. Extracting comparative entities and predicates from texts using comparative type classification. In ACL, 2011.
[38]
Huibin Zhang, Mingjie Zhu, Shuming Shi, and Ji-Rong Wen. Employing topic models for pattern-based semantic class discovery. In ACL, 2009.

Cited By

View all
  • (2024)From Retrieval to Generation: Efficient and Effective Entity Set ExpansionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679837(921-931)Online publication date: 21-Oct-2024
  • (2024)Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity RecognitionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657754(630-640)Online publication date: 10-Jul-2024
  • (2023)Automatic Context Pattern Generation for Entity Set ExpansionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327521135:12(12458-12469)Online publication date: 1-Dec-2023
  • Show More Cited By

Index Terms

  1. EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
      February 2016
      746 pages
      ISBN:9781450337168
      DOI:10.1145/2835776
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 February 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. entity set expansion
      2. information extraction
      3. web mining

      Qualifiers

      • Research-article

      Funding Sources

      • National Science Foundation

      Conference

      WSDM 2016
      WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining
      February 22 - 25, 2016
      California, San Francisco, USA

      Acceptance Rates

      WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;
      Overall Acceptance Rate 498 of 2,863 submissions, 17%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)69
      • Downloads (Last 6 weeks)22
      Reflects downloads up to 24 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)From Retrieval to Generation: Efficient and Effective Entity Set ExpansionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679837(921-931)Online publication date: 21-Oct-2024
      • (2024)Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity RecognitionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657754(630-640)Online publication date: 10-Jul-2024
      • (2023)Automatic Context Pattern Generation for Entity Set ExpansionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327521135:12(12458-12469)Online publication date: 1-Dec-2023
      • (2022)Rows from Many Sources: Enriching row completions from Wikidata with a pre-trained Language ModelCompanion Proceedings of the Web Conference 202210.1145/3487553.3524923(1272-1280)Online publication date: 25-Apr-2022
      • (2022)Contrastive Learning with Hard Negative Entities for Entity Set ExpansionProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531954(1077-1086)Online publication date: 6-Jul-2022
      • (2022)Entity Set Co-Expansion in StackOverflow2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020770(4792-4795)Online publication date: 17-Dec-2022
      • (2022)Taxonomy ConstructionAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_3(31-48)Online publication date: 22-Sep-2022
      • (2022)Concept Set ExpansionAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_2(9-29)Online publication date: 22-Sep-2022
      • (2021)AutoName: A Corpus-Based Set Naming FrameworkProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463100(2101-2105)Online publication date: 11-Jul-2021
      • (2021)Self-Supervised Euphemism Detection and Identification for Content Moderation2021 IEEE Symposium on Security and Privacy (SP)10.1109/SP40001.2021.00075(229-246)Online publication date: May-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media