[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/1220355.1220433dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free access

Cascading use of soft and hard matching pattern rules for weakly supervised information extraction

Published: 23 August 2004 Publication History

Abstract

Current rule induction techniques based on hard matching (i.e., strict slot-by-slot matching) tend to fare poorly in extracting information from natural language texts, which often exhibit great variations. The reason is that hard matching techniques result in relatively high precision but low recall. To tackle this problem, we take advantage of the newly proposed soft pattern rules which offer high recall through the use of probabilistic matching. We propose a bootstrapping framework in which soft and hard matching pattern rules are combined in a cascading manner to realize a weakly supervised rule induction scheme. The system starts with a small set of hand-tagged instances. At each iteration, we first generate soft pattern rules and utilize them to tag new training instances automatically. We then apply hard pattern rule induction on the overall tagged data to generate more precise rules, which are used to tag the data again. The process can be repeated until satisfactory results are obtained. Our experimental results show that our bootstrapping scheme with two cascaded learners approaches the performance of a fully supervised information extraction system while using much fewer hand-tagged instances.

References

[1]
A. Blum and T. Mitchell. 1998. Combining Labeled and Unlabeled Data with Co-training. Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98), pages 92--100.
[2]
T.-S. Chua and J. Liu. 2002. Learning Pattern Rules for Chinese Named Entity Extraction. Proceedings of the 18th National Conference on Artificial Intelligence. (AAAI-02), pages 411--418.
[3]
F. Ciravegna. 2001. Adaptive Information Extraction from Text by Rule Induction and Generalisation. Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI-2001), pages 1251--1256.
[4]
M. Collins and Y. Singer. 1999. Unsupervised Models for Named Entity Classification. Proceedings of the 1999 Joint SIGDAT Conference on EMNLP and VLC.
[5]
H. Cui, M.-Y. Kan and T.-S. Chua. 2004. Unsupervised Learning of Soft Patterns for Definitional Question Answering. Proceedings of 13th World Wide Web Conference. (WWW-04), pages 90--99.
[6]
MUC-4, 1992. Proceedings of the Fourth Message Understanding Conference. San Mateo, CA: Morgan Kaufmann. 1992.
[7]
I. Muslea. 1999. Extraction Patterns for Information Extraction Tasks: A Survey. The AAAI-99 Workshop on Machine Learning for Information Extraction.
[8]
U. Y. Nahm and R. J. Mooney. 2001. Mining Soft Matching Rules from Textual Data. Proceedings of the 17th International Joint Conference on Artificial Intelligence. (IJCAI-01), pages 979--986.
[9]
C. Niu, W. Li, J. Ding and R. K. Srihari. 2003. A Bootstrapping Approach to Named Entity Classification Using Successive Learners. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. (ACL-03), pages 335--342.
[10]
E. Riloff. 1996. Automatically Generating Extraction Patterns from Untagged Text. Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 1044--1049.
[11]
E. Riloff and R. Jones, 1999, Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), pages 474--479.
[12]
S. Soderland. 1999. Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning, vol.34, pages 233--272.
[13]
J. Xiao, T.-S. Chua and J. Liu. 2003. A Global Rule Induction Approach to Information Extraction. Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI-03), pages 530--536.
[14]
R. Yangarber. 2003. Counter-Training in Discovery of Semantic Patterns. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages 343--350.

Cited By

View all
  • (2024)DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert KnowledgeProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997658:4(1-29)Online publication date: 21-Nov-2024
  • (2012)Multi event extraction guided by global constraintsProceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies10.5555/2382029.2382040(70-79)Online publication date: 3-Jun-2012
  • (2011)Template-based information extraction without the templatesProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002595(976-986)Online publication date: 19-Jun-2011
  • Show More Cited By
  1. Cascading use of soft and hard matching pattern rules for weakly supervised information extraction

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      COLING '04: Proceedings of the 20th international conference on Computational Linguistics
      August 2004
      1411 pages

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 23 August 2004

      Qualifiers

      • Article

      Acceptance Rates

      COLING '04 Paper Acceptance Rate 1,411 of 1,411 submissions, 100%;
      Overall Acceptance Rate 1,537 of 1,537 submissions, 100%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)66
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 14 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert KnowledgeProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997658:4(1-29)Online publication date: 21-Nov-2024
      • (2012)Multi event extraction guided by global constraintsProceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies10.5555/2382029.2382040(70-79)Online publication date: 3-Jun-2012
      • (2011)Template-based information extraction without the templatesProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002595(976-986)Online publication date: 19-Jun-2011
      • (2010)Combining relations for information extraction from free textACM Transactions on Information Systems10.1145/1777432.177743728:3(1-35)Online publication date: 2-Jul-2010
      • (2009)A local tree alignment-based soft pattern matching approach for information extractionProceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers10.5555/1620853.1620900(169-172)Online publication date: 31-May-2009
      • (2006)AREProceedings of the COLING/ACL on Main conference poster sessions10.5555/1273073.1273147(571-578)Online publication date: 17-Jul-2006
      • (2005)Generic soft pattern models for definitional question answeringProceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1076034.1076101(384-391)Online publication date: 15-Aug-2005

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media