Towards Malay Abbreviation Disambiguation: Corpus and Unsupervised Model

Haoyuan Bu¹¹,
Nankai Lin¹²,
Lianxi Wang¹¹ &
…
Shengyi Jiang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14303))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1295 Accesses

Abstract

Abbreviation disambiguation constitutes a highly crucial natural language processing task in all languages, including Malay. Its objective involves the identification of the most suitable definition, from a candidate set of definitions, that corresponds to a given abbreviation based on contextual information. The current state of research on Malay abbreviation disambiguation is hindered by the absence of an extensive database of abbreviations, thus posing difficulties in supporting model training. Simultaneously, the challenge lies in developing a Malay abbreviation disambiguation model that can achieve a satisfactory level of restoration performance even in the absence of annotated samples, thereby facilitating enhanced comprehension of literature among individuals. Consequently, the lack of a large-scale abbreviation database and the construction of an effective disambiguation model without annotated samples present ongoing challenges in the field of Malay abbreviation disambiguation. To address the above issues, we construct a dataset of Malay abbreviations and propose an unsupervised method based on a pre-trained model to solve the problem of abbreviation disambiguation. This method sorts out the perplexity score of each definition according to the definition corresponding to the abbreviation in the same sentence. Subsequently, the definition associated with the lowest perplexity score is selected as the most suitable choice. On the constructed Malay dataset, our method exhibits a mere 3% decrease in accuracy compared to the current state-of-the-art (SOTA) supervised approach, thereby showcasing a remarkable advantage within the domain of unsupervised methods. Notably, in the SDU@AAAI-22-Shared Task 2: Acronym Disambiguation, our experimental results demonstrate effectiveness across all four test sets. Particularly, the performance is exceptionally notable in the context of legal English, achieving an accuracy rate of 77.28%. The source code and dataset of this paper is publicly available at https://github.com/bhysss/TMAD-CUM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Sequence Labeling for Disambiguating Medical Abbreviations

Article 14 September 2023

Supervised Clinical Abbreviations Detection and Normalisation Approach

Leveraging Large Language Models for Clinical Abbreviation Disambiguation

Article 27 February 2024

Notes

References

Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41, 1–69 (2009). https://doi.org/10.1145/1459352.1459355
Mcinnes, B.T., Pedersen, T., Liu, Y., Pakhomov, S.V., Melton, G.B.: Using second-order vectors in a knowledge-based method for acronym disambiguation. In: Association for Computational Linguistics (2011)
Google Scholar
Taneva, B., Cheng, T., Chakrabarti, K., He, Y.: Mining acronym expansions and their meanings using query click log. In: Proceedings of the 22nd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2013)
Google Scholar
Ciosici, M., Sommer, T., Assent, I.: Unsupervised Abbreviation Disambiguation Contextual disambiguation using word embeddings. CoRR (2019)
Google Scholar
Li, Y., Zhao, B., Fuxman, A., Tao, F.: Guess me if you can: acronym disambiguation for enterprises. In: Association for Computational Linguistics (2018)
Google Scholar
Wu, Y., Xu, J., Zhang, Y., Xu, H.: Clinical abbreviation disambiguation using neural word embeddings. In: Proceedings of BioNLP 15 (2015)
Google Scholar
Jin, Q., Liu, J., Lu, X.: Deep Contextualized Biomedical Abbreviation Expansion. BioNLP@ACL (2019)
Google Scholar
Chopard, D., Spasi´c, I.: A deep learning approach to self-expansion of abbreviations based on morphology and context distance. In: International Conference on Statistical Language and Speech Processing (2019)
Google Scholar
Li, I., et al.: A Neural Topic-Attention Model for Medical Term Abbreviation Disambiguation. In: NeurIPS (2019)
Google Scholar
Pan, C., Song, B., Wang, S., Luo, Z.: BERT-based acronym disambiguation with multiple training strategies. In: Association for the Advancement of Artificial Intelligence (2021)
Google Scholar
Li, B., Xia, F., Weng, Y., Huang, X., Sun, B.: SimCLAD: a simple framework for contrastive learning of acronym disambiguation. In: Association for the Advancement of Artificial Intelligence (2021)
Google Scholar
Weng, Y., Xia, F., Li, B., Huang, X., He, S.: ADBCMM: acronym disambiguation by building counterfactuals and multilingual mixing. In: Association for the Advancement of Artificial Intelligence (2021)
Google Scholar
Kacker, P., Cupallari, A., Subramanian, A.G., Jain, N.: ABB-BERT: A BERT model for disambiguating abbreviations and contractions. ICON (2022)
Google Scholar
Schwartz, A.S., Hearst, M.A.: A simple algorithm for identifying abbreviation definitions in biomedical text. Pac. Symp. Biocomput. (2003), 451–62 (2003)
Google Scholar
Lin, N., Zhang, H., Shen, M., Wang, Y., Jiang, S., Yang, A.: A BERT-based Unsupervised Grammatical Error Correction Framework. CoRR (2023)
Google Scholar

Download references

Acknowledgement

This work was supported by the National Social Science Fund of China (No. 22BTQ045).

Author information

Authors and Affiliations

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, 510006, Guangdong, China
Haoyuan Bu, Lianxi Wang & Shengyi Jiang
School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
Nankai Lin

Authors

Haoyuan Bu
View author publications
You can also search for this author in PubMed Google Scholar
Nankai Lin
View author publications
You can also search for this author in PubMed Google Scholar
Lianxi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shengyi Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nankai Lin or Shengyi Jiang .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bu, H., Lin, N., Wang, L., Jiang, S. (2023). Towards Malay Abbreviation Disambiguation: Corpus and Unsupervised Model. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-44696-2_6
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44695-5
Online ISBN: 978-3-031-44696-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Towards Malay Abbreviation Disambiguation: Corpus and Unsupervised Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Sequence Labeling for Disambiguating Medical Abbreviations

Supervised Clinical Abbreviations Detection and Normalisation Approach

Leveraging Large Language Models for Clinical Abbreviation Disambiguation

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Towards Malay Abbreviation Disambiguation: Corpus and Unsupervised Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Sequence Labeling for Disambiguating Medical Abbreviations

Supervised Clinical Abbreviations Detection and Normalisation Approach

Leveraging Large Language Models for Clinical Abbreviation Disambiguation

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation