Abstract
Diagnostic coding is a process by which written, verbal and other patient-case related documentation are used for enabling disease prediction, accurate documentation, and insurance settlements. It is a prevalently manual process even in countries that have successfully adopted Electronic Health Record (EHR) systems. The problem is exacerbated in developing countries where widespread adoption of EHR systems is still not at par with Western counterparts. EHRs contain a wealth of patient information embedded in numerical, text, and image formats. A disease prediction model that exploits all this information, enabling accurate and faster diagnosis would be quite beneficial. We address this challenging task by proposing mixed ensemble models consisting of boosting and deep learning architectures for the task of diagnostic code group prediction. The models are trained on a dataset created by integrating features from structured (lab test reports) as well as unstructured (clinical text) data. We analyze the proposed model’s performance on MIMIC-III, an open dataset of clinical data using standard multi-label metrics. Empirical evaluations underscored the significant performance of our approach for this task, compared to state-of-the-art works which rely on a single data source. Our novelty lies in effectively integrating relevant information from both data sources thereby ensuring larger ICD-9 code coverage, handling the inherent class imbalance, and adopting a novel approach to form the ensemble models.
A. Prabhakar and S. Srinivasan—Equal contribution.
G. S. Krishnan—Author contributed to this work as part of Ph.D. research in HALE Lab, NITK.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ayyar, S., Don, O., Iv, W.: Tagging patient notes with icd-9 codes. In: Proceedings of the 29th Conference on Neural Information Processing Systems, pp. 1–8 (2016)
Huang, J., Osorio, C., Sy, L.W.: An empirical evaluation of deep learning for icd-9 code assignment using mimic-iii clinical notes. Comput. Methods Programs Biomed. 177, 141–153 (2019)
Perotte, A., et al.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inf. Assoc. JAMIA 21 (2013)
Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor ai: predicting clinical events via recurrent neural networks. JMLR Workshop and Conf. Proc. 56, 301–318 (2016)
Purushotham, S., Meng, C., Che, Z., Liu, Y.: Benchmarking deep learning models on large healthcare datasets. J. Biomed. Inf. 83 (2018)
Gangavarapu, T., Jayasimha, A., Krishnan, G.S., S., S.K.: Predicting icd-9 code groups with fuzzy similarity based supervised multi-label classification of unstructured clinical nursing notes. Knowl. Based Syst. 190, 105321 (2020)
Lipton, Z.C., Kale, D.C., Elkan, C., Wetzel, R.: Learning to diagnose with LSTM recurrent neural networks. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico(2016)
Xie, P., Xing, E.: A neural architecture for automated ICD coding. In: Proceedings of the 56th Annual Meeting of the ACL. ACL, pp. 1066-1076 (2018)
Krishnan, G.S., Kamath S.S.: Ontology-driven text feature modeling for disease prediction using unstructured radiological notes. Computación y Sistemas 23(3) (2019)
Larkey, L.S., Croft, W.B.: Combining classifiers in text categorization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 289-297 (1996)
Prakash, A., et al.: Condensed memory networks for clinical diagnostic inferencing. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Sukhbaatar, S., Szlam, A., Weston, J., Fergus, R.: End-to-end memory networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. Vol. 2, pp. 2440–2448. NIPS’15, MIT Press, Cambridge, MA, USA (2015)
Akshara, P., Shidharth, S., Krishnan, G.S., Kamath, S.: Integrating structured and unstructured patient data for icd9 disease code group prediction. In: 8th ACM IKDD CODS and 26th COMAD, p. 436. CODS COMAD 2021, Association for Computing Machinery, New York, NY, USA (2021)
Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: Catboost: unbiased boosting with categorical features. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6639–6649. NIPS’18, Curran Associates Inc., Red Hook, NY, USA (2017)
Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, p. 3149–3157. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Vaswani, A., et al.: Attention is All You Need, pp. 6000–6010. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 3859–3869. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems. vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012)
Sluban, B., Lavrac, N.: Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160, 120–131 (2015)
Wu, X.-Z., Zhou, Z.-H.: A unified view of multi-label performance measures. In: Proceedings of the 34th International Conference on Machine Learning. Vol. 70, pp. 3780–3788. ICML’17, JMLR.org, Sydney, NSW, Australia (2017)
Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis. IEEE J. Biomed. Health Inf. 22(5), 1589–1604 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Prabhakar, A., Srinivasan, S., Krishnan, G.S., Kamath, S.S. (2021). Diagnostic Code Group Prediction by Integrating Structured and Unstructured Clinical Data. In: Srirama, S.N., Lin, J.CW., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds) Big Data Analytics. BDA 2021. Lecture Notes in Computer Science(), vol 13147. Springer, Cham. https://doi.org/10.1007/978-3-030-93620-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-93620-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93619-8
Online ISBN: 978-3-030-93620-4
eBook Packages: Computer ScienceComputer Science (R0)