[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1963405.1963467acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

SEISA: set expansion by iterative similarity aggregation

Published: 28 March 2011 Publication History

Abstract

In this paper, we study the problem of expanding a set of given seed entities into a more complete set by discovering other entities that also belong to the same concept set. A typical example is to use "Canon" and "Nikon" as seed entities, and derive other entities (e.g., "Olympus") in the same concept set of camera brands. In order to discover such relevant entities, we exploit several web data sources, including lists extracted from web pages and user queries from a web search engine. While these web data are highly diverse with rich information that usually cover a wide range of the domains of interest, they tend to be very noisy. We observe that previously proposed random walk based approaches do not perform very well on these noisy data sources. Accordingly, we propose a new general framework based on iterative similarity aggregation, and present detailed experimental results to show that, when using general-purpose web data for set expansion, our approach outperforms previous techniques in terms of both precision and recall.

References

[1]
Google Sets: http://labs.google.com/sets.
[2]
List of United Nations member states. http://en.wikipedia.org/wiki/united_nations_member_states.
[3]
Web colors. http://en.wikipedia.org/wiki/web_colors.
[4]
E. Agichtein and L. Gravano. Snowball: extracting relations from large plain-text collection. In JCDL, 2000.
[5]
S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: Taking random walks through the view graph. In WWW, 2008.
[6]
O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in KnowItAll. In WWW, 2004.
[7]
O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Unsupervised named-entity extraction from the web: An experimental study. In Artifical Intelligence, 2005.
[8]
Z. Ghahramani and K. A. Heller. Bayesian sets. In NIPS, 2005.
[9]
R. M. Karp. Reducibility among combinatorial problems. Complexity of Computer Computations, 1972.
[10]
N. Otsu. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics, 1979.
[11]
M. S. Pang-Ning Tan and V. Kumar. Introduction to Data Mining. 2005.
[12]
T. Ridler and S. Calvard. Picture thresholding using an iterative selection method. IEEE Transactions on Systems, Man and Cybernetics, 1978.
[13]
B. Settles. Biomedical named entity recognition using conditional random fields and rich feature sets. In CoLING, 2004.
[14]
P. P. Talukdar, T. Brants, M. Liberman, and F. Pereira. A context pattern induction method for named entity extraction. In CoNLP, 2006.
[15]
P. P. Talukdar, J. Reisinger, M. Pasca, D. Ravichandran, R. Bhagat, and F. Pereira. Weakly-supervised acquisition of labeled class instances using graph random walks. In EMNLP, 2008.
[16]
R. Wang and W. Cohen. SEAL: http://rcwang.com/seal.
[17]
R. Wang and W. Cohen. Language-independent set expansion of named entities using the web. In ICDM, 2007.
[18]
R. Wang and W. Cohen. Iterative set expansion of named entity using the web. In ICDM, 2008.
[19]
R. Wang and W. Cohen. Character-level analysis of semi-structured documents for set expansion. In EMNLP, 2009.
[20]
Y.-Y. Wang, R. Hoffmann, X. Li, and J. Szymanski. Semi-supervised learning of semantic classes for query understanding. In CIKM, 2009.

Cited By

View all
  • (2023)Beyond Single Items: Exploring User Preferences in Item Sets with the Conversational Playlist Curation DatasetProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591881(2754-2764)Online publication date: 19-Jul-2023
  • (2023)Distance-based positive and unlabeled learning for rankingPattern Recognition10.1016/j.patcog.2022.109085134:COnline publication date: 1-Feb-2023
  • (2022)Rows from Many Sources: Enriching row completions from Wikidata with a pre-trained Language ModelCompanion Proceedings of the Web Conference 202210.1145/3487553.3524923(1272-1280)Online publication date: 25-Apr-2022
  • Show More Cited By

Index Terms

  1. SEISA: set expansion by iterative similarity aggregation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '11: Proceedings of the 20th international conference on World wide web
    March 2011
    840 pages
    ISBN:9781450306324
    DOI:10.1145/1963405
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 March 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. named entity recognition
    2. set expansion
    3. similarity measure

    Qualifiers

    • Research-article

    Conference

    WWW '11
    WWW '11: 20th International World Wide Web Conference
    March 28 - April 1, 2011
    Hyderabad, India

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Beyond Single Items: Exploring User Preferences in Item Sets with the Conversational Playlist Curation DatasetProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591881(2754-2764)Online publication date: 19-Jul-2023
    • (2023)Distance-based positive and unlabeled learning for rankingPattern Recognition10.1016/j.patcog.2022.109085134:COnline publication date: 1-Feb-2023
    • (2022)Rows from Many Sources: Enriching row completions from Wikidata with a pre-trained Language ModelCompanion Proceedings of the Web Conference 202210.1145/3487553.3524923(1272-1280)Online publication date: 25-Apr-2022
    • (2022)A Meta Path Based Method for Entity Set Expansion in Knowledge GraphIEEE Transactions on Big Data10.1109/TBDATA.2018.28053668:3(616-629)Online publication date: 1-Jun-2022
    • (2022)Concept Set ExpansionAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_2(9-29)Online publication date: 22-Sep-2022
    • (2022)Topic Aware Contextualized Embeddings for High Quality Phrase ExtractionAdvances in Information Retrieval10.1007/978-3-030-99736-6_31(457-471)Online publication date: 10-Apr-2022
    • (2021)SAUCEProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481950(4173-4183)Online publication date: 26-Oct-2021
    • (2021)Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data LakesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457250(1678-1691)Online publication date: 9-Jun-2021
    • (2021)KTabulator: Interactive Ad hoc Table Creation using Knowledge GraphsProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445227(1-14)Online publication date: 6-May-2021
    • (2021)FUSE: Multi-faceted Set Expansion by Coherent Clustering of Skip-GramsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-67664-3_37(617-632)Online publication date: 25-Feb-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media