[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-031-63775-9_19guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish

Published: 02 July 2024 Publication History

Abstract

Advances in natural language processing techniques, such as named entity recognition and normalization to widely used standardized terminologies like UMLS or SNOMED-CT, along with the digitalization of electronic health records, have significantly advanced clinical text analysis. This study presents ClinLinker, a novel approach employing a two-phase pipeline for medical entity linking that leverages the potential of in-domain adapted language models for biomedical text mining: initial candidate retrieval using a SapBERT-based bi-encoder and subsequent re-ranking with a cross-encoder, trained by following a contrastive-learning strategy to be tailored to medical concepts in Spanish. This methodology, focused initially on content in Spanish, substantially outperforming multilingual language models designed for the same purpose. This is true even for complex scenarios involving heterogeneous medical terminologies and being trained on a subset of the original data. Our results, evaluated using top-k accuracy at 25 and other top-k metrics, demonstrate our approach’s performance on two distinct clinical entity linking Gold Standard corpora, DisTEMIST (diseases) and MedProcNER (clinical procedures), outperforming previous benchmarks by 40 points in DisTEMIST and 43 points in MedProcNER, both normalized to SNOMED-CT codes. These findings highlight our approach’s ability to address language-specific nuances and set a new benchmark in entity linking, offering a potent tool for enhancing the utility of digital medical records. The resulting system is of practical value, both for large scale automatic generation of structured data derived from clinical records, as well as for exhaustive extraction and harmonization of predefined clinical variables of interest.

References

[1]
Alsentzer, E., et al.: Publicly available clinical BERT embeddings, April 2019
[2]
Carrino, C.P., et al.: Biomedical and clinical language models for Spanish: on the benefits of domain-specific pretraining in a mid-resource scenario (2021)
[3]
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale (2019)
[4]
Ding, N., et al.: Prompt-learning for fine-grained entity typing. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 6888–6901. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (2022)., https://aclanthology.org/2022.findings-emnlp.512
[5]
Douze, M., et al.: The Faiss library (2024)
[6]
Gallego, F., Veredas, F.J.: ICB-UMA at BioCreative VIII @ AMIA 2023 task 2 SYMPTEMIST (symptom TExt mining shared task). In: Islamaj, R., et al. (eds.) Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models (2023)
[7]
Gu Y et al. Domain-Specific language model pretraining for biomedical natural language processing ACM Trans. Comput. Healthc. 2021 3 1 1-23
[8]
Lai TM, Zhai C, and Ji H KEBLM: knowledge-enhanced biomedical language models J. Biomed. Inform. 2023 143
[9]
Lee J et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining Bioinformatics 2020 36 4 1234-1240
[10]
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
[11]
Lima-López, S., et al.: Overview of MedProcNER task on medical procedure detection and entity linking at bioasq 2023. In: Conference and Labs of the Evaluation Forum (2023). https://api.semanticscholar.org/CorpusID:264441740
[12]
Liu, F., Shareghi, E., Meng, Z., Basaldella, M., Collier, N.: Self-alignment pretraining for biomedical entity representations. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4228–4238. Association for Computational Linguistics, Online (2021)
[13]
Liu, F., Vulić, I., Korhonen, A., Collier, N.: Learning domain-specialised representations for cross-lingual biomedical entity linking. In: Proceedings of ACL-IJCNLP 2021 (2021)
[14]
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)
[15]
Miranda-Escalada, A., et al.: Overview of DisTEMIST at BioASQ: automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources. In: Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings (2022). https://ceur-ws.org/Vol-3180/paper-11.pdf
[16]
Yu, P., Fei, H., Li, P.: Cross-lingual language model pretraining for retrieval. In: Proceedings of the Web Conference 2021, pp. 1029–1039 (2021)
[17]
Yuan, H., Yuan, Z., Gan, R., Zhang, J., Xie, Y., Yu, S.: BioBART: pretraining and evaluation of a biomedical generative language model. arXiv preprint arXiv:2204.03905 (2022)
[18]
Yuan, H., Yuan, Z., Yu, S.: Generative biomedical entity linking via knowledge base-guided pre-training and synonyms-aware fine-tuning. arXiv preprint arXiv:2204.05164 (2022)

Index Terms

  1. ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            Computational Science – ICCS 2024: 24th International Conference, Malaga, Spain, July 2–4, 2024, Proceedings, Part V
            Jul 2024
            457 pages
            ISBN:978-3-031-63774-2
            DOI:10.1007/978-3-031-63775-9

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 02 July 2024

            Author Tags

            1. Encoder-only large language model
            2. Contrastive learning
            3. Biomedical text mining
            4. Medical entity linking
            5. SNOMED-CT

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 13 Jan 2025

            Other Metrics

            Citations

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media