[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2675744.2675762acmotherconferencesArticle/Chapter ViewAbstractPublication PagescomputeConference Proceedingsconference-collections
research-article

Finding acronym expansion using semi-Markov conditional random fields

Published: 09 October 2014 Publication History

Abstract

Acronyms are heavily used Out of Vocabulary terms in sms, search-queries, social media postings. The performance of text mining algorithms such as Part of Speech Tagging(POS), Named Entity Recognition, Chunking often suffer when they are applied over the noisy text. Text normalization systems are developed to normalize the noisy text. Acronym mapping and expansion has become an important component of the text normalization process. Since manually collecting acronyms and their corresponding expansions from the documents is difficult, automatically building such a dictionary using supervised learning is the need of the hour. In this work, we focus on the acronym search problem: Given acronyms as queries, finding their corresponding expansions in a document.
Recent works formulate the given problem as a token-level sequence labelling task and employ Hidden Markov Model, or Conditional Random Fields, to tackle the problem. However, these models do not utilize the segment level information inherent in the expansion. Hence we propose a Semi-Markov Conditional Random Field based approach for the given problem, that gives us power to write more effective features that work on a group of neighbouring tokens together than the features working on individual tokens. We design and implement Semi-Markov Conditional Random Fields to identify the correct acronym expansions for data extracted from Wikipedia and compare the performance with the Conditional Random fields. The experimental results show that Semi-CRF based approach for the given task performs better than the CRF based approach.

References

[1]
Jeffrey T. Chang, Hinrich Schaijtze, and Russ B. Altman. Creating an online dictionary of abbreviations from medline. Journal of the American Medical Informatics Association, 9: 612--620, November 2002.
[2]
Huizhong Dnan, Yanen Li, ChengXiang Zhai, and Dan Roth. A discriminative model for query spelling correction with latent structural svm. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 1511--1521, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.
[3]
Yanen Li, Huizhong Duan, and ChengXiang Zhai. A generalized hidden markov model with discriminative training for query spelling correction. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, pages 611--620, New York, NY, USA, 2012. ACM.
[4]
Jic Liu, Jimeng Chen, Yi Zhang, and Yalou Huang. Learning conditional random fields with latent sparse features for acronym expansion finding. CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management, pages 867--872, October 2011.
[5]
David Nadeau and Peter D. Turney. A supervised learning approach to acronym identification. In 8th Canadian Conference on Artificial Intelligence, pages 319--329, 2005.
[6]
Jian Peng, Liefeng Bo, and Jinbo Xu. Conditional neural fields. In Advances in neural information processing systems, pages 1419--1427, 2009.
[7]
Natalia Ponomareva, Paolo Rosso, Ferran Pla, and Antonio Molina. Conditional random fields vs. hidden markov models in a biomedical named entity recognition task, in MISC, 2008.
[8]
Sunita Sarwagi and William W Cohen. Semi-markov conditional random fields for information extraction. In Proceedings of the IJCAI, pages 1185--1192, 2004.
[9]
Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL, pages 213--220, 2003.
[10]
Kazem Taghva and Jeff Gilbreth. Recognizing acronyms and their definitions. ISRI (Information Science Research Institute) UNLV, 1: 191--198, 1999.
[11]
Kazem Taghva and Lakshmi Vyas. Acronym expansion via hidden markov models. International Conference on Systems Engineering, pages 120--125, August 2011.
[12]
Jun Xu and Yalou Huang. Using svm to extract acronyms from text. Soft Computing, pages 369--373, 2007.
[13]
Bishan Yang and Claire Cardie. Extracting opinion expressions with semi-markov conditional random fields. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1335âĂŞ--1345, 2012.
[14]
Stuart Yeates. Automatic extraction of acronyms from text. In New Zealand Computer Science Research Studentsâç Conference, pages 117--124, 1999.

Cited By

View all
  • (2021) Mining the web to discover acronym‐definitions based on sequence labeling and iterative query expansion model Concurrency and Computation: Practice and Experience10.1002/cpe.629133:17Online publication date: 31-Mar-2021

Index Terms

  1. Finding acronym expansion using semi-Markov conditional random fields

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      COMPUTE '14: Proceedings of the 7th ACM India Computing Conference
      October 2014
      175 pages
      ISBN:9781605588148
      DOI:10.1145/2675744
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      • Google India: Google India
      • Persistent Systems

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 October 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. expansion finding
      2. sequence labelling
      3. text mining

      Qualifiers

      • Research-article

      Conference

      Compute '14
      Sponsor:
      • Google India
      Compute '14: ACM India Compute Conference
      October 9 - 11, 2014
      Nagpur, India

      Acceptance Rates

      COMPUTE '14 Paper Acceptance Rate 21 of 110 submissions, 19%;
      Overall Acceptance Rate 114 of 622 submissions, 18%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021) Mining the web to discover acronym‐definitions based on sequence labeling and iterative query expansion model Concurrency and Computation: Practice and Experience10.1002/cpe.629133:17Online publication date: 31-Mar-2021

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media