[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/1072228.1072382dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free access

Unsupervised learning of generalized names

Published: 24 August 2002 Publication History

Abstract

We present an algorithm, NOMEN, for learning generalized names in text. Examples of these are names of diseases and infectious agents, such as bacteria and viruses. These names exhibit certain properties that make their identification more complex than that of regular proper names, NOMEN uses a novel form of bootstrapping to grow sets of textual instances and of their contextual patterns. The algorithm makes use of competing evidence to boost the learning of several categories of names simultaneously. We present results of the algorithm on a large corpus. We also investigate the relative merits of several evaluation strategies.

References

[1]
D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. 1997. Nymble: a high-performance learning name-finder. In Proc. 5th Applied Natural Language Processing Conf., Washington, DC.
[2]
A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. 1998. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In Proc. 6th Workshop on Very Large Corpora, Montreal, Canada.
[3]
F. Ciravegna. 2001. Adaptive information extraction from text by rule induction and generalisation. In Proc. 17th Intl. Joint Conf. on AI (IJCAI 2001), Seattle, WA.
[4]
M. Collins and Y. Singer. 1999. Unsupervised models for named entity classification. In Proc. Joint SIGDAT Conf. on EMNLP/VLC.
[5]
S. Cucerzan and D. Yarowsky. 1999. Language independent named entity recognition combining morphological and contextual evidence. In Proc. Joint SIGDAT Conf. on EMNLP/VLC.
[6]
K. Frantzi, S. Ananiadou, and H. Mima. 2000. Automatic recognition of multi-word terms: the C-value/NC-value method. Intl. Journal on Digital Libraries, 2000(3):115--130.
[7]
R. Grishman, S. Huttunen, and R. Yangarber. 2002. Event extraction for infectious disease outbreaks. In Proc. 2nd Human Lang. Technology Conf. (HLT 2002), San Diego, CA.
[8]
J. S. Justeson and S. M. Katz. 1995. Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1):9--27.
[9]
E. Riloff and R. Jones. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In Proc. 16th Natl. Conf. on AI (AAAI-99), Orlando, FL.
[10]
T. Strzalkowski and J. Wang. 1996. A self-learning universal concept spotter. In Proc. 16th Intl. Conf. Computational Linguistics (COLING-96).
[11]
A. Ushioda. 1996. Hierarchical clustering of words. In Proc. 16th Intl. Conf. Computational Linguistics (COLING-96), Copenhagen, Denmark.
[12]
T. Wakao, R. Gaizauskas, and Y. Wilks. 1996. Evaluation of an algorithm for the recognition and classification of proper names. In Proc. 16th Int'l Conf. on Computational Linguistics (COLING 96), Copenhagen, Denmark.
[13]
R. Yangarber, R. Grishman, P. Tapanainen, and S. Huttunen. 2000. Automatic acquisition of domain knowledge for information extraction. In Proc. 18th Intl. Conf. Computational Linguistics (COLING 2000), Saarbrücken, Germany.
[14]
R. Yangarber. 2002. Acquisition of domain knowledge. In M.T. Pazienza, editor, Information Extraction. Springer-Verlag, LNAI, Rome.

Cited By

View all
  • (2019)Document Gated Reader for Open-Domain Question AnsweringProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331190(85-94)Online publication date: 18-Jul-2019
  • (2017)Unsupervised Concept Categorization and Extraction from Scientific Document TitlesProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133023(1339-1348)Online publication date: 6-Nov-2017
  • (2016)Automatic Entity Recognition and Typing in Massive Text DataProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2912567(2235-2239)Online publication date: 26-Jun-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1
August 2002
1184 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 24 August 2002

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 1,537 of 1,537 submissions, 100%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)6
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Document Gated Reader for Open-Domain Question AnsweringProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331190(85-94)Online publication date: 18-Jul-2019
  • (2017)Unsupervised Concept Categorization and Extraction from Scientific Document TitlesProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133023(1339-1348)Online publication date: 6-Nov-2017
  • (2016)Automatic Entity Recognition and Typing in Massive Text DataProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2912567(2235-2239)Online publication date: 26-Jun-2016
  • (2016)Automatic Entity Recognition and Typing in Massive Text CorporaProceedings of the 25th International Conference Companion on World Wide Web10.1145/2872518.2891065(1025-1028)Online publication date: 11-Apr-2016
  • (2016)A bootstrapping method for extracting attribute names with keys from the webProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851992(368-371)Online publication date: 4-Apr-2016
  • (2013)Concept-based analysis of scientific literatureProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505613(1733-1738)Online publication date: 27-Oct-2013
  • (2011)Automatically building training examples for entity extractionProceedings of the Fifteenth Conference on Computational Natural Language Learning10.5555/2018936.2018955(163-171)Online publication date: 23-Jun-2011
  • (2011)Relation guided bootstrapping of semantic lexiconsProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 210.5555/2002736.2002792(266-270)Online publication date: 19-Jun-2011
  • (2011)HITS-based seed selection and stop list construction for bootstrappingProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 210.5555/2002736.2002744(30-36)Online publication date: 19-Jun-2011
  • (2011)Weighted Vote-Based Classifier Ensemble for Named Entity RecognitionACM Transactions on Asian Language Information Processing10.1145/1967293.196729610:2(1-37)Online publication date: 1-Jun-2011
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media