Diagnostic Code Group Prediction by Integrating Structured and Unstructured Clinical Data

Akshara Prabhakar¹³,
Shidharth Srinivasan¹³,
Gokul S. Krishnan^13,14 &
…
Sowmya S. Kamath¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13147))

Included in the following conference series:

International Conference on Big Data Analytics

783 Accesses

Abstract

Diagnostic coding is a process by which written, verbal and other patient-case related documentation are used for enabling disease prediction, accurate documentation, and insurance settlements. It is a prevalently manual process even in countries that have successfully adopted Electronic Health Record (EHR) systems. The problem is exacerbated in developing countries where widespread adoption of EHR systems is still not at par with Western counterparts. EHRs contain a wealth of patient information embedded in numerical, text, and image formats. A disease prediction model that exploits all this information, enabling accurate and faster diagnosis would be quite beneficial. We address this challenging task by proposing mixed ensemble models consisting of boosting and deep learning architectures for the task of diagnostic code group prediction. The models are trained on a dataset created by integrating features from structured (lab test reports) as well as unstructured (clinical text) data. We analyze the proposed model’s performance on MIMIC-III, an open dataset of clinical data using standard multi-label metrics. Empirical evaluations underscored the significant performance of our approach for this task, compared to state-of-the-art works which rely on a single data source. Our novelty lies in effectively integrating relevant information from both data sources thereby ensuring larger ICD-9 code coverage, handling the inherent class imbalance, and adopting a novel approach to form the ensemble models.

A. Prabhakar and S. Srinivasan—Equal contribution.

G. S. Krishnan—Author contributed to this work as part of Ph.D. research in HALE Lab, NITK.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Embed Wisely: An Ensemble Approach to Predict ICD Coding

Automatic ICD Coding Based on Multi-granularity Feature Fusion

NNBSVR: Neural Network-Based Semantic Vector Representations of ICD-10 codes

Article 21 February 2025

References

Ayyar, S., Don, O., Iv, W.: Tagging patient notes with icd-9 codes. In: Proceedings of the 29th Conference on Neural Information Processing Systems, pp. 1–8 (2016)
Google Scholar
Huang, J., Osorio, C., Sy, L.W.: An empirical evaluation of deep learning for icd-9 code assignment using mimic-iii clinical notes. Comput. Methods Programs Biomed. 177, 141–153 (2019)
Google Scholar
Perotte, A., et al.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inf. Assoc. JAMIA 21 (2013)
Google Scholar
Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor ai: predicting clinical events via recurrent neural networks. JMLR Workshop and Conf. Proc. 56, 301–318 (2016)
Google Scholar
Purushotham, S., Meng, C., Che, Z., Liu, Y.: Benchmarking deep learning models on large healthcare datasets. J. Biomed. Inf. 83 (2018)
Google Scholar
Gangavarapu, T., Jayasimha, A., Krishnan, G.S., S., S.K.: Predicting icd-9 code groups with fuzzy similarity based supervised multi-label classification of unstructured clinical nursing notes. Knowl. Based Syst. 190, 105321 (2020)
Google Scholar
Lipton, Z.C., Kale, D.C., Elkan, C., Wetzel, R.: Learning to diagnose with LSTM recurrent neural networks. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico(2016)
Google Scholar
Xie, P., Xing, E.: A neural architecture for automated ICD coding. In: Proceedings of the 56th Annual Meeting of the ACL. ACL, pp. 1066-1076 (2018)
Google Scholar
Krishnan, G.S., Kamath S.S.: Ontology-driven text feature modeling for disease prediction using unstructured radiological notes. Computación y Sistemas 23(3) (2019)
Google Scholar
Larkey, L.S., Croft, W.B.: Combining classifiers in text categorization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 289-297 (1996)
Google Scholar
Prakash, A., et al.: Condensed memory networks for clinical diagnostic inferencing. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Sukhbaatar, S., Szlam, A., Weston, J., Fergus, R.: End-to-end memory networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. Vol. 2, pp. 2440–2448. NIPS’15, MIT Press, Cambridge, MA, USA (2015)
Google Scholar
Akshara, P., Shidharth, S., Krishnan, G.S., Kamath, S.: Integrating structured and unstructured patient data for icd9 disease code group prediction. In: 8th ACM IKDD CODS and 26th COMAD, p. 436. CODS COMAD 2021, Association for Computing Machinery, New York, NY, USA (2021)
Google Scholar
Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: Catboost: unbiased boosting with categorical features. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6639–6649. NIPS’18, Curran Associates Inc., Red Hook, NY, USA (2017)
Google Scholar
Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, p. 3149–3157. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Google Scholar
Vaswani, A., et al.: Attention is All You Need, pp. 6000–6010. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Google Scholar
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 3859–3869. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems. vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012)
Google Scholar
Sluban, B., Lavrac, N.: Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160, 120–131 (2015)
Google Scholar
Wu, X.-Z., Zhou, Z.-H.: A unified view of multi-label performance measures. In: Proceedings of the 34th International Conference on Machine Learning. Vol. 70, pp. 3780–3788. ICML’17, JMLR.org, Sydney, NSW, Australia (2017)
Google Scholar
Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Google Scholar
Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis. IEEE J. Biomed. Health Inf. 22(5), 1589–1604 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Healthcare Analytics and Language Engineering (HALE) Lab,Department of Information Technology, National Institute of Technology Karnataka, Surathkal, 575025, India
Akshara Prabhakar, Shidharth Srinivasan, Gokul S. Krishnan & Sowmya S. Kamath
Robert Bosch Centre for Data Science and Artificial Intelligence, Indian Institute of Technology, Madras, India
Gokul S. Krishnan

Authors

Akshara Prabhakar
View author publications
You can also search for this author in PubMed Google Scholar
Shidharth Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Gokul S. Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
Sowmya S. Kamath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akshara Prabhakar .

Editor information

Editors and Affiliations

University of Hyderabad, Hyderabad, India
Satish Narayana Srirama
Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin
University of Cincinnati, Cincinnati, OH, USA
Raj Bhatnagar
Indian Institute of Information Technology Allahabad, Prayagraj, India
Sonali Agarwal
International Institute of Information Technology, Hyderabad, India
P. Krishna Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prabhakar, A., Srinivasan, S., Krishnan, G.S., Kamath, S.S. (2021). Diagnostic Code Group Prediction by Integrating Structured and Unstructured Clinical Data. In: Srirama, S.N., Lin, J.CW., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds) Big Data Analytics. BDA 2021. Lecture Notes in Computer Science(), vol 13147. Springer, Cham. https://doi.org/10.1007/978-3-030-93620-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-93620-4_15
Published: 18 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93619-8
Online ISBN: 978-3-030-93620-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics