Dear Editors,
We write to respond to Hond et al.’s recent study entitled, “Machine learning for developing a prediction model of hospital admission of emergency department patients: Hype or hope?” [1]. In their study, they highlighted the benefits of reducing Emergency Department (ED) crowding, how prediction of admission can help to achieve this, and demonstrated that logistic regression and machine learning (ML) models are equally excellent predictors of hospital admission.
Hond et al. cite Afnan et al.’s meta-analysis [2] to suggest that triage nurse prediction is “not good enough to accurately predict the hospitalization of ED patients”. Whilst true that triage nurse predictions are not accurate enough on their own to allow one-to-one planning of admission, the meta-analysis showed that triage nurse predictions do significantly correlate with outcome [2]: when a nurse predicts admission, a patient’s chance of needing a hospital bed significantly increases from a baseline rate of 29–63%. Likewise, a nurse’s prediction of discharge decreases the chance of admission to 12%. Therefore, triage nurse prediction of disposition could constitute a covariate in an ML prediction model which only draws on early data (<30 min).
The model’s use of number of consultations as a component of complexity/comorbidity is based on a 2018 prospective cohort study which identified a number of factors, including number of consultations, which predicted length of stay (LOS) [3]. Even under the assumption that LOS is a reasonable proxy for comorbidity, the advent of COVID-19 since that study complicates use of “number of consultations” as a component. Many patients presenting to the ED with COVID-19 would have multiple comorbidities, but staff may wish to minimise contact with these patients and instead expedite their transfer to ITU/COVID wards. For EDs which see a high volume of COVID presentations, using “number of consultations” as a feature for model training, which may correlate positively with case complexity sometimes and inversely at others, could cause the model to overfit.
Hond et al. claim that uninterpretable ML models will outperform logistic regression when faced with complex data patterns, but this is not necessarily true, especially when the model’s features are meaningful to domain experts [4]. This applies to the ED, for example, a patient’s presenting complaint and disease severity score, which are meaningful to ED clinicians as predictors of needing hospital care, are features included in Hond et al.’s model [1]. This aligns with their results which showed no predictive benefit of their ML model over their logistic regression model. Several ethical and epistemic problems with uninterpretable ML algorithms have been highlighted in other healthcare domains [5], and many of these concerns generalise to ED admission prediction. For example, the algorithm may generalise poorly to different populations due to the presence of confounding variables in the initial evaluation (as has been the case for an ML algorithm designed to detect intracranial haemorrhage [6]); if this were the case in the context of admission prediction, patients from certain demographics may be less likely to receive a hospital bed promptly than others. For instance, patients with learning disabilities may have greater difficulty reporting pain [7], pulse oximetry is worse at detecting hypoxemia for Black people than Caucasians [8], and there may be many more confounders we would fail to consider. Furthermore, how practically would a healthcare professional convince a patient that they would need to stay in hospital based on an entirely inscrutable algorithm’s recommendation? Would healthcare professionals and bed managers themselves be inclined to trust such an algorithm? Interpretable models could mitigate these concerns because healthcare professionals can check a model’s reasoning process for plausibility, errors and confounders, which would also promote trust when using the tool.
We read Hond et al.’s study with great enthusiasm and hope they will consider creating and evaluating interpretable ML models for admission prediction, for which triage nurse prediction could be a covariate.
Author’s statement
MAMA conceived the idea, wrote the initial draft and approved of the manuscript’s submission.
FA contributed substantially to writing the draft and approved of the manuscript’s submission.
HW, TN, PS, and CK critically revised the manuscript and approved of the manuscript’s submission.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- 1.De Hond A., Raven W., Schinkelshoek L., Gaakeer M., Ter Avest E., Sir O., Lameijer H., Hessels R.A., Reijnen R., De Jonge E., Steyerberg E., Nickel C.H., De Groot B. Machine learning for developing a prediction model of hospital admission of emergency department patients: Hype or hope? Int. J. Med. Inform. 2021;152:104496. doi: 10.1016/j.ijmedinf.2021.104496. [DOI] [PubMed] [Google Scholar]
- 2.Afnan M.A.M., Netke T., Singh P., Worthington H., Ali F., Kajamuhan C., Nagra A. Ability of triage nurses to predict, at the time of triage, the eventual disposition of patients attending the emergency department (ED): A systematic literature review and meta-analysis. Emerg. Med. J. 2020 doi: 10.1136/emermed-2019-208910. [DOI] [PubMed] [Google Scholar]
- 3.van der Veen D., Remeijer C., Fogteloo A.J., Heringhaus C., de Groot B. Independent determinants of prolonged emergency department length of stay in a tertiary care centre: a prospective cohort study. Scand. J. Trauma. Resusc. Emerg. Med. 2018;26:81. doi: 10.1186/s13049-018-0547-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019;1(5):206–215. doi: 10.1038/s42256-019-0048-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.M.A.M. Afnan, C. Rudin, V. Conitzer, J. Savulescu, A. Mishra, Y. Liu, M. Afnan, Ethical implementation of artificial intelligence to select embryos in in vitro fertilization, (2021). arXiv:2105.00060.
- 6.Voter A.F., Meram E., Garrett J.W., Yu J.-P.-J. Diagnostic accuracy and failure mode analysis of a deep learning algorithm for the detection of intracranial hemorrhage. J. Am. Coll. Radiol. 2021 doi: 10.1016/j.jacr.2021.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Moulster G. Identifying pain in people who have complex communication needs. Clin. Pract. Rev. Learn. Disabil. 2020;116:19–22. [Google Scholar]
- 8.Sjoding M.W., Dickson R.P., Iwashyna T.J., Gay S.E., Valley T.S. Racial bias in pulse oximetry measurement. N. Engl. J. Med. 2020;383(25):2477–2478. doi: 10.1056/NEJMc2029240. [DOI] [PMC free article] [PubMed] [Google Scholar]