Applying unsupervised keyphrase methods on concepts extracted from discharge sheets

Hoda Memarzadeh¹,
Nasser Ghadiri ORCID: orcid.org/0000-0002-6519-6548¹,
Matthias Samwald² &
…
Maryam Lotfi Shahreza³

161 Accesses
Explore all metrics

Abstract

Clinical notes contain valuable patient information. These notes are written by health care providers with various scientific levels and writing styles. It might be helpful for clinicians and researchers to understand what information is essential when dealing with extensive electronic medical records. Entities recognizing them and mapping them to standard terminologies is crucial to reducing ambiguity in processing clinical notes. Although named entity recognition and entity linking are critical steps in clinical natural language processing, they can produce repetitive and low-value concepts. On the other hand, all parts of a clinical text do not share the same importance or content in predicting the patient's condition. As a result, it is necessary to identify the section in which each content item is recorded and critical concepts to extract meaning from clinical texts. In this study, these challenges have been addressed by using clinical natural language processing techniques. In addition, a set of unsupervised essential phrase extraction methods has been verified and evaluated to identify key concepts. Considering that most clinical concepts are in the form of multi-word expressions and their accurate identification requires the user to specify an n-gram range, we have proposed a shortcut method to preserve the structure of the term based on TF-IDF (Term Frequency Inverse Document Frequency). To evaluate, we have designed two types of downstream tasks (multiple and binary classification) using the capabilities of transformer-based models. The results show the proposed method's superiority in combination with the SciBERT model. Also, they offer an insight into the efficacy of general methods for extracting essential phrases from clinical notes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Rule-Based Method for Automatic Medical Concept Extraction from Unstructured Clinical Text

Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications

An Approach to Extract Meaningful Data from Unstructured Clinical Notes

Data availability

The datasets generated during and/or analyzed during the current study are available at https://physionet.org/content/mimiciii/1.4/.

Code availability

Codes are available at https://github.com/HodaMemar/A3.

Notes

International statistical classification of diseases and related health problems.
The normalized naming system for generic and branded drugs.
Logical Observation Identifiers Names and Codes.
https://github.com/HodaMemar/A3.

References

Dalianis H (2018) Clinical text mining: secondary use of electronic patient records. Springer, Cham
Book Google Scholar
Holzinger A, Haibe-Kains B, Jurisica I (2019) Why imaging data alone is not enough: AI-based integration of imaging, omics, and clinical data. Eur J Nucl Med Mol Imaging 46(13):2722–2730
Article Google Scholar
Yadav P, Steinbach M, Kumar V, Simon G (2018) Mining Electronic Health Records (EHRs) A Survey. ACM Comput Surv 50(6):1–40
Article Google Scholar
Ford E, Carroll JA, Smith HE, Scott D, Cassell JA (2016) Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Informatics Assoc 23(5):1007–1015. https://doi.org/10.1093/jamia/ocv180
Article Google Scholar
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V (2019) Natural language processing of clinical notes on chronic diseases: systematic review. JMIR Med informatics 7(2):e12239
Article Google Scholar
Liu Z, Lin Y, Sun M (2020) “Document representation bt - representation learning for natural language processing. Springer, Singapore, pp 91–123
Google Scholar
Sammut C, Webb GI (2010) TF–IDF BT-encyclopedia of machine learning. Springer, Boston, pp 986–987
Book Google Scholar
Darabi S, Kachuee M, Fazeli S, Sarrafzadeh M (2020) TAPER: Time-aware patient EHR representation. IEEE J Biomed Heal Informatics 24(11):3268–3275. https://doi.org/10.1109/JBHI.2020.2984931
Article Google Scholar
Sushil M, Šuster S, Luyckx K, Daelemans W (2018) Patient representation learning and interpretable evaluation using clinical notes. J Biomed Inform 84:103–113. https://doi.org/10.1016/j.jbi.2018.06.016
Article Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, (pp. 785–794) https://doi.org/10.1145/2939672.2939785
Eyre H et al (2022) Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python. AMIA Annu Symp Proc 2021:438–447
Google Scholar
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A Simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310. https://doi.org/10.1006/jbin.2001.1029
Article Google Scholar
Neumann M, King D, Beltagy I, Ammar W (2019) ScispaCy: Fast and robust models for biomedical natural language processing. BioNLP 2019 - SIGBioMed Work. Biomed. Nat. Lang. Process. Proc. 18th BioNLP Work. Shar. Task, (pp. 319–327). https://doi.org/10.18653/v1/w19-5034.
“sklearn.feature_extraction.text.TfidfVectorizer.” https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
Boudin F (2016) PKE: an open source python-based keyphrase extraction toolkit. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: system demonstrations, (pp. 69–73) [Online]. Available: https://github.com/boudinfl/pke
Mahata D, Kuriakose J, Shah R, Zimmermann R (2018) Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, pp. 634–639
Maarten Grootendorst, “Keyword Extraction with BERT,” Towar. Data Sci., 2020, [Online]. Available: https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv Prepr. arXiv1810.04805
Gu Y et al (2021) Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc 3(1):1–23
Article Google Scholar
Beltagy I, Lo K, Cohan A (2019) SCIBERT: A pretrained language model for scientific text. EMNLP-IJCNLP 2019 - 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 3615–3620. https://doi.org/10.18653/v1/d19-1371
Yogarajan V, Montiel J, Smith T, Pfahringer B (2021) Transformers for multi-label classification of medical text: an empirical comparison. In International Conference on Artificial Intelligence in Medicine, pp. 114–123
Yogarajan V (2022) Domain-specific language models for multi-label classification of medical text. The University of Waikato, New Zealand
Google Scholar
Duque A, Fabregat H, Araujo L, Martinez-Romo J (2021) A keyphrase-based approach for interpretable ICD-10 code classification of Spanish medical reports. Artif. Intell. Med. 121:102177. https://doi.org/10.1016/j.artmed.2021.102177
Article Google Scholar
Schopf T, Klimek S, Matthes F (2022) PatternRank: leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. arXiv Prepr. arXiv2210.05245, 2022
Peng Y, Yan S, Lu Z (2019) Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. BioNLP 2019 - SIGBioMed Work. Biomed. Nat. Lang. Process. Proc. 18th BioNLP Work. Shar. Task, pp. 58–65. https://doi.org/10.18653/v1/w19-5006.
Michalopoulos G, Wang Y, Kaka H, Chen H, Wong A (2021) UmlsBERT: clinical domain knowledge augmentation of contextual embeddings using the unified medical language system metathesaurus. arXiv Prepr. arXiv2010.10391

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, 84156-83111, Iran
Hoda Memarzadeh & Nasser Ghadiri
Institute for Artificial Intelligence, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
Matthias Samwald
Department of Computer Engineering, Shahreza Campus, University of Isfahan, Isfahan, Iran
Maryam Lotfi Shahreza

Authors

Hoda Memarzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Nasser Ghadiri
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Samwald
View author publications
You can also search for this author in PubMed Google Scholar
Maryam Lotfi Shahreza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nasser Ghadiri.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Memarzadeh, H., Ghadiri, N., Samwald, M. et al. Applying unsupervised keyphrase methods on concepts extracted from discharge sheets. Pattern Anal Applic 26, 1715–1727 (2023). https://doi.org/10.1007/s10044-023-01198-0

Download citation

Received: 04 November 2022
Accepted: 06 September 2023
Published: 29 September 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10044-023-01198-0

Applying unsupervised keyphrase methods on concepts extracted from discharge sheets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Rule-Based Method for Automatic Medical Concept Extraction from Unstructured Clinical Text

Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications

An Approach to Extract Meaningful Data from Unstructured Clinical Notes

Data availability

Code availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Applying unsupervised keyphrase methods on concepts extracted from discharge sheets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Rule-Based Method for Automatic Medical Concept Extraction from Unstructured Clinical Text

Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications

An Approach to Extract Meaningful Data from Unstructured Clinical Notes

Data availability

Code availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation