research-article

Finding acronym expansion using semi-Markov conditional random fields

Authors:

Ankit Nautial,

Nagesh Bhattu Sristy,

D. V. L. N. SomayajuluAuthors Info & Claims

COMPUTE '14: Proceedings of the 7th ACM India Computing Conference

Article No.: 16, Pages 1 - 6

https://doi.org/10.1145/2675744.2675762

Published: 09 October 2014 Publication History

Get Access

Abstract

Acronyms are heavily used Out of Vocabulary terms in sms, search-queries, social media postings. The performance of text mining algorithms such as Part of Speech Tagging(POS), Named Entity Recognition, Chunking often suffer when they are applied over the noisy text. Text normalization systems are developed to normalize the noisy text. Acronym mapping and expansion has become an important component of the text normalization process. Since manually collecting acronyms and their corresponding expansions from the documents is difficult, automatically building such a dictionary using supervised learning is the need of the hour. In this work, we focus on the acronym search problem: Given acronyms as queries, finding their corresponding expansions in a document.

Recent works formulate the given problem as a token-level sequence labelling task and employ Hidden Markov Model, or Conditional Random Fields, to tackle the problem. However, these models do not utilize the segment level information inherent in the expansion. Hence we propose a Semi-Markov Conditional Random Field based approach for the given problem, that gives us power to write more effective features that work on a group of neighbouring tokens together than the features working on individual tokens. We design and implement Semi-Markov Conditional Random Fields to identify the correct acronym expansions for data extracted from Wikipedia and compare the performance with the Conditional Random fields. The experimental results show that Semi-CRF based approach for the given task performs better than the CRF based approach.

References

[1]

Jeffrey T. Chang, Hinrich Schaijtze, and Russ B. Altman. Creating an online dictionary of abbreviations from medline. Journal of the American Medical Informatics Association, 9: 612--620, November 2002.

Crossref

Google Scholar

[2]

Huizhong Dnan, Yanen Li, ChengXiang Zhai, and Dan Roth. A discriminative model for query spelling correction with latent structural svm. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 1511--1521, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.

Digital Library

Google Scholar

[3]

Yanen Li, Huizhong Duan, and ChengXiang Zhai. A generalized hidden markov model with discriminative training for query spelling correction. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, pages 611--620, New York, NY, USA, 2012. ACM.

Digital Library

Google Scholar

[4]

Jic Liu, Jimeng Chen, Yi Zhang, and Yalou Huang. Learning conditional random fields with latent sparse features for acronym expansion finding. CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management, pages 867--872, October 2011.

Digital Library

Google Scholar

[5]

David Nadeau and Peter D. Turney. A supervised learning approach to acronym identification. In 8th Canadian Conference on Artificial Intelligence, pages 319--329, 2005.

Digital Library

Google Scholar

[6]

Jian Peng, Liefeng Bo, and Jinbo Xu. Conditional neural fields. In Advances in neural information processing systems, pages 1419--1427, 2009.

Google Scholar

[7]

Natalia Ponomareva, Paolo Rosso, Ferran Pla, and Antonio Molina. Conditional random fields vs. hidden markov models in a biomedical named entity recognition task, in MISC, 2008.

Google Scholar

[8]

Sunita Sarwagi and William W Cohen. Semi-markov conditional random fields for information extraction. In Proceedings of the IJCAI, pages 1185--1192, 2004.

Google Scholar

[9]

Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL, pages 213--220, 2003.

Digital Library

Google Scholar

[10]

Kazem Taghva and Jeff Gilbreth. Recognizing acronyms and their definitions. ISRI (Information Science Research Institute) UNLV, 1: 191--198, 1999.

Google Scholar

[11]

Kazem Taghva and Lakshmi Vyas. Acronym expansion via hidden markov models. International Conference on Systems Engineering, pages 120--125, August 2011.

Digital Library

Google Scholar

[12]

Jun Xu and Yalou Huang. Using svm to extract acronyms from text. Soft Computing, pages 369--373, 2007.

Digital Library

Google Scholar

[13]

Bishan Yang and Claire Cardie. Extracting opinion expressions with semi-markov conditional random fields. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1335âĂŞ--1345, 2012.

Digital Library

Google Scholar

[14]

Stuart Yeates. Automatic extraction of acronyms from text. In New Zealand Computer Science Research StudentsâĂ§ Conference, pages 117--124, 1999.

Google Scholar

Cited By

View all

R MVE J(2021) Mining the web to discover acronym‐definitions based on sequence labeling and iterative query expansion model Concurrency and Computation: Practice and Experience10.1002/cpe.629133:17Online publication date: 31-Mar-2021
https://doi.org/10.1002/cpe.6291

Index Terms

Finding acronym expansion using semi-Markov conditional random fields
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Learning conditional random fields with latent sparse features for acronym expansion finding
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

The ever increasing usage of acronyms in many kinds of documents, including web pages, is becoming an obstacle for average readers. This paper studies the task of finding expansions in documents for a given set of acronyms. We cast the expansion finding ...
Acronym Expansion Via Hidden Markov Models
ICSENG '11: Proceedings of the 2011 21st International Conference on Systems Engineering

In this paper, we report on design and implementation of a Hidden Markov Model (HMM) to extract acronyms and their expansions. We also report on the training of this HMM with Maximum Likelihood Estimation (MLE) algorithm using a set of examples. Finally,...
Hierarchical hidden conditional random fields for information extraction
LION'05: Proceedings of the 5th international conference on Learning and Intelligent Optimization

Hidden Markov Models (HMMs) are very popular generative models for time series data. Recent work, however, has shown that for many tasks Conditional Random Fields (CRFs), a type of discriminative model, perform better than HMMs. Information extraction ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

COMPUTE '14: Proceedings of the 7th ACM India Computing Conference

October 2014

175 pages

ISBN:9781605588148

DOI:10.1145/2675744

General Chairs:
Pushpak Bhattacharya
IIT, Mumbai
,
P. J. Narayanan
IIIT Hyderabad
,
Program Chair:
Srinivas Padmanabhuni
ACM India and Infosys Labs

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

Compute '14

Sponsor:

Google India

Compute '14: ACM India Compute Conference

October 9 - 11, 2014

Nagpur, India

Acceptance Rates

COMPUTE '14 Paper Acceptance Rate 21 of 110 submissions, 19%;

Overall Acceptance Rate 114 of 622 submissions, 18%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
87
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

R MVE J(2021) Mining the web to discover acronym‐definitions based on sequence labeling and iterative query expansion model Concurrency and Computation: Practice and Experience10.1002/cpe.629133:17Online publication date: 31-Mar-2021
https://doi.org/10.1002/cpe.6291

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Learning conditional random fields with latent sparse features for acronym expansion finding

Acronym Expansion Via Hidden Markov Models

Hierarchical hidden conditional random fields for information extraction