More Web Proxy on the site http://driver.im/

Article

ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish

Authors:

Fernando Gallego,

Guillermo López-García,

Luis Gasco-Sánchez,

Martin Krallinger,

Francisco J. VeredasAuthors Info & Claims

Computational Science – ICCS 2024: 24th International Conference, Malaga, Spain, July 2–4, 2024, Proceedings, Part V

Pages 266 - 280

https://doi.org/10.1007/978-3-031-63775-9_19

Published: 02 July 2024 Publication History

Abstract

Advances in natural language processing techniques, such as named entity recognition and normalization to widely used standardized terminologies like UMLS or SNOMED-CT, along with the digitalization of electronic health records, have significantly advanced clinical text analysis. This study presents ClinLinker, a novel approach employing a two-phase pipeline for medical entity linking that leverages the potential of in-domain adapted language models for biomedical text mining: initial candidate retrieval using a SapBERT-based bi-encoder and subsequent re-ranking with a cross-encoder, trained by following a contrastive-learning strategy to be tailored to medical concepts in Spanish. This methodology, focused initially on content in Spanish, substantially outperforming multilingual language models designed for the same purpose. This is true even for complex scenarios involving heterogeneous medical terminologies and being trained on a subset of the original data. Our results, evaluated using top-k accuracy at 25 and other top-k metrics, demonstrate our approach’s performance on two distinct clinical entity linking Gold Standard corpora, DisTEMIST (diseases) and MedProcNER (clinical procedures), outperforming previous benchmarks by 40 points in DisTEMIST and 43 points in MedProcNER, both normalized to SNOMED-CT codes. These findings highlight our approach’s ability to address language-specific nuances and set a new benchmark in entity linking, offering a potent tool for enhancing the utility of digital medical records. The resulting system is of practical value, both for large scale automatic generation of structured data derived from clinical records, as well as for exhaustive extraction and harmonization of predefined clinical variables of interest.

References

[1]

Alsentzer, E., et al.: Publicly available clinical BERT embeddings, April 2019

[2]

Carrino, C.P., et al.: Biomedical and clinical language models for Spanish: on the benefits of domain-specific pretraining in a mid-resource scenario (2021)

[3]

Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale (2019)

[4]

Ding, N., et al.: Prompt-learning for fine-grained entity typing. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 6888–6901. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (2022)., https://aclanthology.org/2022.findings-emnlp.512

[5]

Douze, M., et al.: The Faiss library (2024)

[6]

Gallego, F., Veredas, F.J.: ICB-UMA at BioCreative VIII @ AMIA 2023 task 2 SYMPTEMIST (symptom TExt mining shared task). In: Islamaj, R., et al. (eds.) Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models (2023)

[7]

Gu Y et al. Domain-Specific language model pretraining for biomedical natural language processing ACM Trans. Comput. Healthc. 2021 3 1 1-23

Digital Library

[8]

Lai TM, Zhai C, and Ji H KEBLM: knowledge-enhanced biomedical language models J. Biomed. Inform. 2023 143

Digital Library

[9]

Lee J et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining Bioinformatics 2020 36 4 1234-1240

[10]

Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)

[11]

Lima-López, S., et al.: Overview of MedProcNER task on medical procedure detection and entity linking at bioasq 2023. In: Conference and Labs of the Evaluation Forum (2023). https://api.semanticscholar.org/CorpusID:264441740

[12]

Liu, F., Shareghi, E., Meng, Z., Basaldella, M., Collier, N.: Self-alignment pretraining for biomedical entity representations. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4228–4238. Association for Computational Linguistics, Online (2021)

[13]

Liu, F., Vulić, I., Korhonen, A., Collier, N.: Learning domain-specialised representations for cross-lingual biomedical entity linking. In: Proceedings of ACL-IJCNLP 2021 (2021)

[14]

Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)

[15]

Miranda-Escalada, A., et al.: Overview of DisTEMIST at BioASQ: automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources. In: Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings (2022). https://ceur-ws.org/Vol-3180/paper-11.pdf

[16]

Yu, P., Fei, H., Li, P.: Cross-lingual language model pretraining for retrieval. In: Proceedings of the Web Conference 2021, pp. 1029–1039 (2021)

[17]

Yuan, H., Yuan, Z., Gan, R., Zhang, J., Xie, Y., Yu, S.: BioBART: pretraining and evaluation of a biomedical generative language model. arXiv preprint arXiv:2204.03905 (2022)

[18]

Yuan, H., Yuan, Z., Yu, S.: Generative biomedical entity linking via knowledge base-guided pre-training and synonyms-aware fine-tuning. arXiv preprint arXiv:2204.05164 (2022)

Index Terms

ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish

Index terms have been assigned to the content through auto-classification.

Recommendations

Medical Entity Linking in Laypersons’ Language
Advances in Information Retrieval
Abstract
Due to the vast amount of health-related data on the Internet, a trend toward digital health literacy is emerging among laypersons. We hypothesize that providing trustworthy explanations of informal medical terms in social media can improve ...
Data Augmentation for Layperson’s Medical Entity Linking Task
FIRE '21: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation

Due to the vast amount of health-related data on social media, it is beneficial to monitor health-related issues experienced by the users, such as monitoring adverse drug effects. This problem is known as the Medical Entity Linking (MEL) task, which ...
Exploiting anonymous entity mentions for named entity linking
Abstract
Named entity linking or named entity disambiguation is to link entity mentions to corresponding entities in a knowledge base for resolving the ambiguity of entity mentions. Recently, collective linking methods exploit document-level coherence of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computational Science – ICCS 2024: 24th International Conference, Malaga, Spain, July 2–4, 2024, Proceedings, Part V

Jul 2024

457 pages

ISBN:978-3-031-63774-2

DOI:10.1007/978-3-031-63775-9

Editors:
Leonardo Franco
https://ror.org/036b2ww28University of Malaga, Malaga, Spain
,
Clélia de Mulatier
University of Amsterdam, Amsterdam, The Netherlands
,
Maciej Paszynski
AGH University of Science and Technology, Krakow, Poland
,
Valeria V. Krzhizhanovskaya
https://ror.org/04dkp9463University of Amsterdam, Amsterdam, The Netherlands
,
Jack J. Dongarra
University of Tennessee, Knoxville, TN, USA
,
Peter M. A. Sloot
University of Amsterdam, Amsterdam, The Netherlands

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 02 July 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents