[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2783258.2783362acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering

Published: 10 August 2015 Publication History

Abstract

Entity recognition is an important but challenging research problem. In reality, many text collections are from specific, dynamic, or emerging domains, which poses significant new challenges for entity recognition with increase in name ambiguity and context sparsity, requiring entity detection without domain restriction. In this paper, we investigate entity recognition (ER) with distant-supervision and propose a novel relation phrase-based ER framework, called ClusType, that runs data-driven phrase mining to generate entity mention candidates and relation phrases, and enforces the principle that relation phrases should be softly clustered when propagating type information between their argument entities. Then we predict the type of each entity mention based on the type signatures of its co-occurring relation phrases and the type indicators of its surface name, as computed over the corpus. Specifically, we formulate a joint optimization problem for two tasks, type propagation with relation phrases and multi-view relation phrase clustering. Our experiments on multiple genres---news, Yelp reviews and tweets---demonstrate the effectiveness and robustness of ClusType, with an average of 37% improvement in F1 score over the best compared method.

Supplementary Material

MP4 File (p995.mp4)

References

[1]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.
[2]
X. L. Dong, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014.
[3]
A. El-Kishky, Y. Song, C. Wang, C. R. Voss, and J. Han. Scalable topical phrase mining from text corpora. VLDB, 2015.
[4]
A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, 2011.
[5]
J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005.
[6]
L. Galárraga, G. Heitz, K. Murphy, and F. M. Suchanek. Canonicalizing open knowledge bases. In CIKM, 2014.
[7]
S. Gupta and C. D. Manning. Improved pattern learning for bootstrapped entity extraction. In CONLL, 2014.
[8]
X. He and P. Niyogi. Locality preserving projections. In NIPS, 2004.
[9]
R. Huang and E. Riloff. Inducing domain-specific semantic class taggers from (almost) nothing. In ACL, 2010.
[10]
Z. Kozareva and E. Hovy. Not all seeds are equal: Measuring the quality of text mining seeds. In NAACL, 2010.
[11]
T. Lin, O. Etzioni, et al. No noun phrase left behind: detecting and typing unlinkable entities. In EMNLP, 2012.
[12]
X. Ling and D. S. Weld. Fine-grained entity recognition. In AAAI, 2012.
[13]
J. Liu, C. Wang, J. Gao, and J. Han. Multi-view clustering via joint nonnegative matrix factorization. In SDM, 2013.
[14]
B. Min, S. Shi, R. Grishman, and C.-Y. Lin. Ensemble semantics for large-scale unsupervised relation extraction. In EMNLP, 2012.
[15]
D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3--26, 2007.
[16]
N. Nakashole, T. Tylenda, and G. Weikum. Fine-grained semantic typing of emerging entities. In ACL, 2013.
[17]
K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In CIKM, 2000.
[18]
L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In ACL, 2009.
[19]
S. Sarawagi and W. W. Cohen. Semi-markov conditional random fields for information extraction. In NIPS, 2004.
[20]
M. Schmitz, R. Bart, S. Soderland, O. Etzioni, et al. Open language learning for information extraction. In EMNLP, 2012.
[21]
W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. TKDE, (99):1--20, 2014.
[22]
W. Shen, J. Wang, P. Luo, and M. Wang. A graph-based approach for ontology population with named entities. In CIKM, 2012.
[23]
S. Shi, H. Zhang, X. Yuan, and J.-R. Wen. Corpus-based semantic class mining: distributional vs. pattern-based approaches. In COLING, 2010.
[24]
P. P. Talukdar and F. Pereira. Experiments in graph-based semi-supervised learning methods for class-instance acquisition. In ACL, 2010.
[25]
K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In HLT-NAACL, 2003.
[26]
P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. JOTA, 109(3):475--494, 2001.

Cited By

View all
  • (2023)Reinforcement learning based distantly supervised biomedical named entity recognitionIntelligent Decision Technologies10.3233/IDT-22020517:2(317-330)Online publication date: 1-Jan-2023
  • (2023)Few-shot Named Entity Recognition: Definition, Taxonomy and Research DirectionsACM Transactions on Intelligent Systems and Technology10.1145/360948314:5(1-46)Online publication date: 9-Oct-2023
  • (2023)Toward Open-domain Slot Filling via Self-supervised Co-trainingProceedings of the ACM Web Conference 202310.1145/3543507.3583541(1928-1937)Online publication date: 30-Apr-2023
  • Show More Cited By

Index Terms

  1. ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2015
    2378 pages
    ISBN:9781450336642
    DOI:10.1145/2783258
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entity recognition and typing
    2. relation phrase clustering

    Qualifiers

    • Research-article

    Funding Sources

    • Defense Threat Reduction Agency
    • National Science Foundation
    • U.S. Army Research Laboratory

    Conference

    KDD '15
    Sponsor:

    Acceptance Rates

    KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 19 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Reinforcement learning based distantly supervised biomedical named entity recognitionIntelligent Decision Technologies10.3233/IDT-22020517:2(317-330)Online publication date: 1-Jan-2023
    • (2023)Few-shot Named Entity Recognition: Definition, Taxonomy and Research DirectionsACM Transactions on Intelligent Systems and Technology10.1145/360948314:5(1-46)Online publication date: 9-Oct-2023
    • (2023)Toward Open-domain Slot Filling via Self-supervised Co-trainingProceedings of the ACM Web Conference 202310.1145/3543507.3583541(1928-1937)Online publication date: 30-Apr-2023
    • (2022)Fine-Grained Entity Typing with a Type Taxonomy: a Systematic ReviewIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3148980(1-1)Online publication date: 2022
    • (2022)Open Named Entity Modeling From Embedding DistributionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.304965434:11(5472-5483)Online publication date: 1-Nov-2022
    • (2022)Concept Set ExpansionAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_2(9-29)Online publication date: 22-Sep-2022
    • (2021)Extraction of Traditional Chinese Medicine Entity: Design of a Novel Span-Level Named Entity Recognition Method With Distant SupervisionJMIR Medical Informatics10.2196/282199:6(e28219)Online publication date: 14-Jun-2021
    • (2021)Fine-Grained Entity Typing via Label Noise Reduction and Data AugmentationDatabase Systems for Advanced Applications10.1007/978-3-030-73194-6_24(356-374)Online publication date: 6-Apr-2021
    • (2020)OctetProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403274(2247-2257)Online publication date: 23-Aug-2020
    • (2020)An Automated Pipeline for Character and Relationship Extraction from Readers Literary Book Reviews on Goodreads.comProceedings of the 12th ACM Conference on Web Science10.1145/3394231.3397918(277-286)Online publication date: 6-Jul-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media