A review of recent advances in data analytics for post-operative patient deterioration detection

Clemence Petit^1,2,
Rick Bezemer² &
Louis Atallah³

4935 Accesses
21 Citations
Explore all metrics

Abstract

Most deaths occurring due to a surgical intervention happen postoperatively rather than during surgery. The current standard of care in many hospitals cannot fully cope with detecting and addressing post-surgical deterioration in time. For millions of patients, this deterioration is left unnoticed, leading to increased mortality and morbidity. Postoperative deterioration detection currently relies on general scores that are not fully able to cater for the complex post-operative physiology of surgical patients. In the last decade however, advanced risk and warning scoring techniques have started to show encouraging results in terms of using the large amount of data available peri-operatively to improve postoperative deterioration detection. Relevant literature has been carefully surveyed to provide a summary of the most promising approaches as well as how they have been deployed in the perioperative domain. This work also aims to highlight the opportunities that lie in personalizing the models developed for patient deterioration for these particular post-surgical patients and make the output more actionable. The integration of pre- and intra-operative data, e.g. comorbidities, vitals, lab data, and information about the procedure performed, in post-operative early warning algorithms would lead to more contextualized, personalized, and adaptive patient modelling. This, combined with careful integration in the clinical workflow, would result in improved clinical decision support and better post-surgical care outcomes.

Big Data for Predictive Analytics in High Acuity Health Settings

Predictive modeling of perioperative patient deterioration: combining unanticipated ICU admissions and mortality for improved risk prediction

Article Open access 03 July 2024

Impact of predictive analytics based on continuous cardiorespiratory monitoring in a surgical and trauma intensive care unit

Article 18 August 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Background

Despite considerable advances in anesthesia, ventilation, surgical and monitoring technologies, about 25% of surgical patients suffer from postoperative complications [1]. Serious adverse events in the time following surgery include respiratory failure, cardiac arrest, severe sepsis, and acute renal failure. These conditions constitute life-threatening risks, in particular for elderly patients and emergency surgery patients [2]. Pearse et al., in 2012, performed a large European study to assess the patient outcomes after non-cardiac surgery [3], discovering an overall 4% in-hospital mortality, which reveals the extent of the problem.

Concerning patient safety, the general ward turns out to be the hardest setting to detect patient deterioration in the perioperative care journey. For instance, patients there are more likely to undergo a potentially avoidable cardiac arrest and are less likely to survive it than patients in intensive care [4]. Although early warning scores have been introduced there, resources to perform the required assessments and integrating these into the hospital data management system are limited. The intermittent monitoring and low nurse-to-patient ratio are both in contrast with the high levels of vigilance and monitoring in the operating room and intensive care unit (ICU). With the increased vulnerability of patients, it is no surprise that deterioration sometimes remains unnoticed in these conditions [5]. To this point, Pearse et al. noted that 73% of the patients who died after surgery were never admitted to a higher level of care [3]. This emphasizes the need for early recognition of post-operative patient deterioration.

In 1992, Silber et al. introduced the concept of failure to rescue defined as the hospital deaths after adverse occurrences, such as postoperative complications [6]. It was later highlighted that the timely recognition and management of postoperative complications may actually be responsible for variation in hospital mortality rates [1]. There has been an emergence of rapid response systems to address this failure to rescue. Two components can be distinguished: the afferent limb aiming at detecting deterioration, and the efferent limb aiming at responding to this deterioration. Despite the importance of both parts in improving patient outcomes, we will focus on the former as advanced data analytics could significantly affect the performance of early warning algorithms.

This paper first describes the current standard of care for the prediction or detection of patient deterioration throughout the perioperative course, and analyzes its current weaknesses and limitations. Then, it discusses analytical frameworks that have emerged to improve postoperative deterioration detection and the key steps to improve perioperative clinical decision support. Finally, the paper concludes with recommendations for future work in the area.

2 Current methods for identification of high-risk (post-) surgical patients

The identification of high-risk surgical patients can take place at different stages of the care process: preoperatively, intraoperatively, or postoperatively [7]. The emergence of electronic medical records has facilitated the access to an increasing quantity of features such as demographics, comorbidities, vital signs, and laboratory data. Therefore, several research teams have explored the use of analytics for the prediction or detection of postoperative deterioration. The developed tools range from single or multi-parameter predictors to aggregate scores and machine learning algorithms. While preoperative and intraoperative models tend to focus on risk stratification, postoperative models are more dedicated to the (early) detection of patient deterioration. Figure 1 gives a brief overview of the different features and scores available at the different steps of perioperative care. These will be covered in more detail in the next few sections.

2.1 Pre- and intra-operative risk stratification

Surgical risk assessment is of great interest for various reasons. It may guide preoperative decision making and better inform patients and their families [8,9,10]. It also serves as a tool for research, comparative audit, and quality monitoring [8,9,10]. Furthermore, it can be used to plan clinical management, for example by adjusting the postoperative monitoring and level of care depending on the patient’s risk [10]. This last application is of particular interest with a view to address failure to rescue since it could allow a better allocation of resources directly after the operating room release and thus facilitate postoperative deterioration detection and timely intervention in a cost-effective way.

The ASA physical status score [11], commonly used, is a very simple preoperative score aiming at stratifying patients into several categories of surgical risk. These are ASA I: normal patients, ASA ll: patients with mild systemic diseases, ASA III: severe systemic diseases, ASA IV: severe systemic diseases that are a constant threat to life, ASA V: patients who are not expected to survive without the operation, ASA VI: declared brain-dead patient. It is still widely used despite several studies showing that there is an inter-observer variability [12,13,14]. In addition to the ASA scores, other scores have been developed to give a pre-operative risk assessment, mostly using logistic regression methods [15]. The risk assessment provided can be used for in-hospital mortality [8, 16], or targeting a specific type of complication, such as pulmonary complications [17, 18].

Patients undergo extensive monitoring in the operating room and as such, research has started to investigate the large amount of data recorded during surgery in order to find predictors of patient outcomes. Some papers have highlighted the association between intraoperative factors and postoperative complications. For example, intraoperative hypotension is associated with myocardial damage and 30-day mortality after non-cardiac surgery [19, 20]. The coexisting occurrence of low blood pressure, low bispectral index, and low minimum alveolar concentration of volatile anaesthesia, referred to as the “Triple Low”, has also been validated as a predictor of 30-day mortality [21, 22]. Intraoperative features may also be combined into an aggregate score. Gawande et al. developed the Surgical Apgar Score to assess the risk of major complication or death, based on the estimated amount of blood loss, the lowest heart rate, and the lowest mean arterial blood pressure [23]. The score has later been validated across various settings [24,25,26].

However, the use of pre-and intra-operative risk stratification may have some shortcomings. Some of the pre- and intra-operative risk scores were developed from and/or for specific populations [27]. Thus, although these models may work very well in these populations, they may generalize poorly and lead to a low predictive accuracy in a ward with a more heterogeneous surgical population. Another drawback of the surgical risk stratification lies in the widespread use of logistic regression models. These models assume a monotonous (increasing or decreasing) association between the predictor variables and the outcome, whereas in physiology, the relation may be more complex (e.g., only very low and very high values may be indicative of a risk, whereas values in a wide ‘normal’ range may have very low predictive value) [15]. Finally, yet importantly, the clinical implementation of pre- and intra-operative risk stratification for post-operative deterioration detection and associated workflows has proven to be challenging in routine care [9].

2.2 Postoperative deterioration detection

In the operating room and intensive care unit, patients are closely observed by the clinical teams and are well equipped with monitoring solutions. However, outside these high-acuity settings, such as at the general ward, the standard of care for postoperative deterioration detection still relies on the manual periodic measurement of vital signs, usually every 4–6 h [28]. Laboratory tests may also be used but are infrequently performed for the surgical ward patients [29]. Yet, several studies have established an association between a lab test result and a complication for a given surgical subpopulation [30,31,32]. Some of these studies are shown in Table 1. Vital signs have been used as indicators of deterioration in a more structured approach. Indeed, criteria of physiological instability have emerged with the introduction of medical emergency teams (MET) [33]. These criteria rely on heart rate, systolic blood pressure, and respiratory rate, among others. The introduction of an MET has been shown to reduce the incidence of postoperative adverse outcomes [34]. Several single-parameter systems have been described in Gao et al.‘s review of track and trigger systems for identifying at-risk patients on the ward [35]. Interestingly in their survey, they found that there is a large variability of track and trigger solutions, with little evidence of reliability and validity. Sensitivity was found to be poor, which could have been due to the nature of the (patho) physiology monitored or the choice of trigger threshold [35].

Table 1 Overview of predictors that have been used for deterioration prediction/detection in post-operative patients

Full size table

In addition to these single-parameter criteria, scores based on multiple parameters have been used for deterioration detection. Early warning scores (EWS) were first introduced in 1997 by Morgan et al. [40] as a bedside tool to help identify patients at risk of deterioration. These scores evaluate the deviation from normal ranges of routine vital signs. Since then, similar track and trigger systems have become common practice in many hospitals for patient monitoring in the general ward, acute assessment unit, or emergency department. The most often cited EWS include the MEWS [41] (using four vitals and a patient reaction score), the ViEWS [42] (adding Oxygen saturation and inspired O2 to the above), and the NEWS [43]. The latter has been recommended by the UK National Institute for Health and Clinical Excellence to provide a national standard in the UK. A more recent alternative to EWS is called the Rothman Index [44] which is a continuous index that summarizes a patient’s clinical condition based on 26 variables. These include vital signs, laboratory results, cardiac rhythms, and nursing assessments. The score was constructed by using the variables’ excess risk functions, i.e. the increase in 1-year post-discharge mortality associated with any value. Both EWS and the Rothman Index are general scores not designed specifically for postoperative care, but they have been further validated in surgical patients [37,38,39] for the prediction of ICU admission, death or complications. Table 1 provides an overview of parameters and scores, which have been evaluated for the prediction/detection of deterioration in post-surgical patients.

As regards post-operative deterioration detection, scores that are used include either risk scores specific to certain conditions or generic ones that summarize the overall patient condition. The first group includes scores for cardiac arrest [45], acute kidney injury [46], and sepsis [47].

Embedding these scores in hospital care has to take into consideration nursing workflow and workload, especially if nurses are required to periodically perform manual assessments of vital signs and manually enter the data into the hospital data management system to make it available for more advanced risk assessment and early warning.

However, before warning scores such as EWS [44, 48], can be successfully implemented into clinical workflows, several hurdles have to be addressed. At present, vital sign and warning scores are infrequently assessed, usually only every 4–6 h [28], and up to every 12 h [48] which means deterioration events can easily be missed during these periods. Furthermore, current EWS have been shown to have poor predictive capabilities for in-hospital mortality, admission to critical care, and cardiac arrest, in particular a low sensitivity and a low positive predictive value [35, 49]. A review by Gao et al. gives them, at present, “little evidence of reliability, validity and utility” [35]. Indeed, the studies reporting their impact on clinical outcomes had mixed results: a literature review by Alam et al. found that only two out of six studies indicated significant reduction of in-hospital mortality (respectively from 5.8 to 3.0% and from 1.4 to 1.2%) [50]. This highlights the need for more advanced approached to make the scores more accurate and actionable. Another limitation of current EWS, also valid for the Rothman Index described above, is that vital signs are treated independently, which means that the possible correlations, such as (changes in) those between heart rate and blood pressure and heart rate and respiration rate, are not taken into account [48].

Moreover, many of the published scores have not been based on data acquired from postoperative patients, but rather a more general ward population, and their extrapolation to post-surgical patients could present challenges [51]. Indeed, the postoperative patient features a different physiology due to the hormonal and metabolic changes triggered by the surgery, the anesthetics, fluids, (anti)coagulants, vasopressors, and inotropes administered [52]. This, combined with the underlying disease and physiology as well as the medications the patients received, encourages taking into account the specific requirements of the postoperative patient. An additional shortcoming of current risk and warning scores is that they often fail to provide the underlying clinical context [53], which makes them difficult to act upon.

3 Recent advances and promising directions in data analytics for postoperative patient deterioration detection

3.1 Postoperative physiological trajectory modelling

The need to exploit the temporal structure of data has been advocated by Mao et al. [54]. This is particularly relevant for surgical patients as these patients may experience vital sign alterations in response to the postoperative “inflammatory state, volume shifts, and pain” [55] as well as administered fluids and drugs. It is therefore essential to be able to identify pathological trajectories from normal recovery patterns (i.e., taken into account any coexisting conditions). Pimentel et al. [51] analyzed the trajectory of vital signs in the postoperative ward for stable patients, defined as those without ICU admission nor in-hospital death. Based on the assumption that postoperative patients start from a deranged state and slowly return to normal, they were able to visualize a general recovery pattern. In recent years, a few papers have investigated the postoperative evolution of several scores, e.g. EWS or the Rothman Index, and shown how they could help differentiate deteriorating patients from patients normally recovering. Their findings are presented in Table 2. They all emphasize that a worsening of a score is associated with poor outcomes, which highlights the benefit of postoperative trajectory analysis.

Table 2 Postoperative temporal patterns of different scores for deterioration detection

Full size table

Another approach for the study of temporal patterns involves the direct modelling of physiological trajectories through, for example, Gaussian process regression. This method provides a framework able to handle noisy, sparse, and unevenly sampled data. Pimentel et al. [57] applied this method on manual observations of heart rate and respiratory rate of patients recovering from cancer surgery. The authors were then able to discriminate known (normal) trajectories from unknown (abnormal) trajectories. More recently, Dürichen et al. proposed a multitask Gaussian process, which takes into account the correlation between physiological time series and uses prior knowledge of the relationships between these time series, leading to improved prediction of outcomes [58].

In addition to detecting when a postoperative patient is deviating from their normal recovery trajectory, it is of interest to know more precisely the complications of which they are at risk. In this respect, Thompson et al. looked at the temporal patterns of various postoperative complications [59]. They found, for example, that the greatest incidence of myocardial infarction was in the early postoperative period (i.e., within 1 day) and the greatest incidence of sepsis was in the late postoperative period (i.e., between 8 and 30 days after surgery), while gastrointestinal tract bleeding had the same incidence throughout the postoperative period. This knowledge could be used to develop warnings that are more specific to certain complications and to better guide clinical management.

Recently, Feld et al. presented an approach based on machine learning for characterizing the temporal evolution of post-operative complications [60]. They demonstrated that the models have significant predictive value, in particular, for the development of serious complications, such as coma longer than a day, cardiac arrest, myocardial infarction, septic shock, renal failure, and pneumonia, and interventional complications, such as unplanned re-intubation, longer than 2 days on ventilator support, and the need for blood transfusion. Although integrating the temporal component in scoring systems for postoperative patients would lead to improved deterioration detection, the intermittent monitoring characteristic of the general ward may hamper this [61]. Further limitations include the significant levels of incomplete documentation of vital signs following major surgery [62] and the interruptive nature of vital signs checks, which might not be an accurate measure of the patient’s condition [61, 63]. Therefore the systematic implementation of continuous monitoring on the general ward has been recommended [61, 64].

A blinded study by Sun et al. [63] reinforces this idea. The implementation of continuous pulse oximetry on the postoperative ward highlighted that 90% of hypoxemic episodes were missed when intermittent monitoring was used. In this respect, Clifton et al. suggest a combination of manual measurements with data from wearable sensors [65]. In their study, mobile pulse oximeters and ECG sensors were provided to ward patients following upper-gastrointestinal cancer surgery. These were combined with manually annotated values for blood pressure and respiration rate. Promising results, in terms of detecting adverse events, were obtained using four different machine-learning techniques (namely one-class support vector machines, one-class Gaussian processes, Gaussian mixture models, and kernel density estimates). Furthermore, two example cases highlighted that the processing of continuous data was able to identify deterioration that remained unnoticed with traditional intermittent EWS from manual measurements. Although it falls out of the scope of this data-focused review, the development of wireless sensors for continuous monitoring on the general ward, and proper integration of the data into the hospital data management system, are gaining momentum and appear to be part of a larger, data-enabled solution to improve timely detection of patient deterioration [5].

3.2 Personalized models

Most of the above-mentioned scores use population-based, rather than individual-based, thresholds that are determined from general statistics. These thresholds may not fit specific patients with more complex (patho)-physiologies, leaving room for optimization. Moreover, most of the algorithms were not trained on surgical patients, but rather on general ward populations, presenting some limitations when applied postoperatively. Erb et al. [66] recently performed a study on patients undergoing bowel resection with anastomosis. They observed that abnormal vital signs, e.g. elevated heart rate or respiratory rate, were common in the first postoperative week, even in patients without complications, and therefore had a poor predictive value. These findings suggest that traditional methods developed in other settings (e.g., EWS) might generalize poorly on surgical patients, or at least within some subpopulations. Furthermore, patients usually exhibit different baseline characteristics, e.g. depending on age and comorbidities. For example, Churpek et al. emphasized that elderly and nonelderly patients have significantly different vital signs prior to cardiac arrest [67]. This population’s heterogeneity should be taken into account to reduce false alarms for some patients, prevent undetected deterioration for others.

In recent years, personalized models have become a topic of focus [68,69,70]. Visweswaran et al. described patient-specific predictive models where a Bayesian model averaging was performed over a set of models generated using patient characteristics such as history, demographics, laboratory data, etc [71]. Although they tested their algorithm on sepsis and heart failure, it can be tailored towards other settings and outcomes as well. More recently, Alaa et al. proposed a personalized real-time risk scoring algorithm (using a hierarchical latent class model) for general ward patients, based on their successive laboratory tests and vital signs measurements [72]. In their study, this technique outperformed both the MEWS and the Rothman Index in terms of positive predictive value and sensitivity for the prediction of ICU admission. The observed reduction in false alarms in combination with an increased sensitivity for detecting deterioration validates the attractiveness of patient-specific models for deterioration detection.

3.3 Combination of pre-, intra-, and post-operative predictors

The real-time training required for some personalized models raises multiple challenges when these models are suddenly applied post-operatively. A considerable amount of data should be readily available so that the algorithm, for example, establishes an estimation of the patient profile and finds similar patients to predict an expected recovery pattern. A promising way to tailor scores and thresholds to the individual patient is to make use of preoperative data. This data can provide a personalized baseline and thus guide the monitoring of patient recovery, i.e. the evolution from a deranged postoperative state towards the desired state. Knowing the pre-operative baseline vital signs can, at least in elective surgery patients, provide a customized indication of the normal state of the patient, taking into account any co-existing conditions. For instance, a study by Labgaa et al. suggests that the postoperative drop of serum albumin, compared to the preoperative concentration, reflects the intensity of the surgical stress in patients who underwent elective liver surgery [31]. A significantly higher drop was observed for patients with postoperative complications. This concept could be extended to other parameters in a dynamic scoring framework to track the patient’s postoperative recovery, customized for different types of patients and types of surgery.

In that sense, it may be of interest to leverage all data from pre-, intra-, and post-operative phases in order to take into account the clinical context surrounding the patient’s care journey. However, it is not yet understood how to combine these data points successfully. For instance, Terekhov et al. found that preoperative risk models were not meaningfully improved by adding intraoperative risk using the Surgical Apgar Score [10]. To explain these results, they hypothesized that the preoperative scores already included the intraoperative information in relation to mortality prediction. Hence, further study is required to combine data throughout the care journey into an integrated, patient-specific risk assessment and warning for post-operative deterioration.

Different approaches can be proposed to deploy the clinical context acquired through pre- and intra-operative risk assessment into strategies for postoperative deterioration detection. One option is to incorporate these parameters into the postoperative algorithms personalized to the individual patient. Van Esbroeck et al. have proposed a method to quantify surgical complexity by analyzing the association between individual procedures and postoperative complications [73]. The procedural risk scores that were generated achieved moderate to high levels of discrimination, with a high ability to predict mortality, morbidity, and several complications. One could imagine using these risk scores to weight a postoperative score based on patient-specific factors (e.g., vitals and/or laboratory data). Hence, the integration of such procedure-specific information could improve the postoperative model and provide a more accurate warning for deterioration. Another option is to use the pre- and intra-operative risk assessments to recommend continuous monitoring to the patients at greatest risk. This would allow early identification of patient deterioration in a cost-effective way [74].

3.4 Machine learning techniques for deterioration detection

Research in the fields of machine learning and pattern recognition has provided innovative solutions to overcome some of the limitations mentioned above. Such methods can be used to, for example, deal with missing data, find trends and correlations, and build models. In this regard, there has been a growing interest in the use of machine learning techniques to improve the traditional EWS for deterioration detection on the general ward, though not yet specifically targeting postoperative patients.

Machine learning is the term for automated data analysis for analytical and statistical pattern recognition and model building. It relies on iteratively learning from new data, updating the generated model(s), and thereby enabling the finding of hidden information or patterns in large data sets. Data mining, on the other hand, focuses on the discovery of new, previously unknown properties of a present large data set. Machine learning and data mining often apply the same methods and are quite similar in approach, but the goals are different: machine learning aims to predict new data based on analyzed data (i.e., input–output) while data mining aims to find hidden information in the present data (i.e., knowledge discovery in databases). In this context, it is important to distinguish supervised learning from unsupervised learning. In supervised learning, the computer receives labeled data and tries to infer a relation between the input parameters (e.g. vitals) and the labeled output (e.g., cardiac arrest; the supervisory signal). In unsupervised learning, the computer aims to find relations between input parameters and unlabeled (i.e., unidentified) outputs. There are several different methods applied in machine learning and data mining, such as support vector machines, Gaussian processes, Gaussian mixture models, kernel density estimates, latent class models, Parzen windows, and classification and regression trees, More information on these methods can be found elsewhere [75, 76].

In machine learning and data mining, novelty detection is the task of recognizing new data that differ in some way from the data that were used for training. Novelty detection involves the construction of a model of normality using examples of normal behavior, and then the classification of test data as either normal or abnormal with respect to that model. In the context of EWS systems, it looks at deviations from normal ranges. This novelty detection approach is typically employed when the quantity of available non-normal data is insufficient to construct explicit models for different non-normal classes, which would ideally be desired. As such, novelty detection is a popular approach in this field since it averts the modeling of the abnormal class (for which data is often sparse).

Techniques that have been used for deterioration detection include Parzen windows [77] and one-class support vector machines (SVMs) [48]. Parzen windows were used in combination with K-nearest neighbor techniques to detect critical events for 150 patients in the general ward whereas SVMs were used to classify normal versus abnormal patient progress patterns in a step-down-unit (for 19 patients). Recently, Churpek et al. [78] performed a multicenter comparison of various machine learning and regression methods for the prediction of clinical deterioration on the wards. The algorithms incorporated features such as demographics, laboratory values, and vital signs. The random forest algorithm, which constructs a multitude of decision trees then averages their results, performed best in detecting deteriorations on the ward [AUC- area under the receiver operating characteristic (ROC) curve: 0.801]. Along with several other machine learning algorithms, it outperformed logistic regression (AUC: 0.770 for spline regression and 0.735 for linear regression) and the MEWS (AUC: 0.698).

Mao et al. [54] approached the problem from a different perspective and proposed a data mining framework that tackles the challenges associated with the typical suboptimal quality and consistency of data in hospital databases. A bucketing method was used to overcome the presence of irregular gaps in the data (which is typical for data acquisition for in-hospital patients in the ward). An exploratory under-sampling technique was also applied to address the problem of having much more patients without than with specific complications (termed class imbalance). A simple logistic regression, performed after these steps, obtained better classification results for patients with and without complications than two machine-learning algorithms (support vector machines and decision trees). Improved detection of deterioration and the ability to deal with class imbalance are two of the main advantages of using machine learning that would lead to more accurate alarms or warnings and thereby improve patient care and outcomes.

4 Recommendations

This review first described the current standard of care for the prediction or detection of patient deterioration throughout the perioperative course, as summarized in Fig. 1. Several risk and warning scores have been developed and clinically validated for their use in (peri-operative) clinical decision support. Although these scores perform well in some studies, their application in other, either more general or more specific, patient populations remains challenging. Moreover, the poor training for specific patient populations, such as the post-operative patient, and specific complications, such as respiratory depression or cardiac arrest, hampers the ability to act upon the provided alarms or warnings in clinical practice. Furthermore, the infrequent assessment of vitals and lab tests, and poor integration of patient monitoring systems with other data sources in the hospital leaves large gaps and limits temporal trajectory analysis. In addition to that, the independent treatment of the vitals with no attention to their underlying correlations and relations to other patient data, such as personal health data and lab data, leaves a lot of information hidden.

Recent analytical frameworks have emerged to improve postoperative patient deterioration detection and better support clinical decision-making. Promising approaches include: postoperative physiological trajectory modelling, further enabled by continuous, wearable monitoring; personalization of models, scores, and thresholds; combining pre-, intra-, and post-operative predictors; and machine learning and data mining techniques to find newly-identified relations in the vast amount of data generated through the patient’s care journey, helpful for workflow and clinical decision support. Figure 2 provides an illustrative summary of several analytical approaches, and how they relate, to improve risk and warning scoring systems for patient deterioration prediction and detection. Based on the promising results of the reviewed work, we conclude with the following recommendations for future work on risk and warning scores for post-operative patient deterioration detection:

Identify important periods in the post-operative journey where deterioration occurs then focus predictive models on conditions that are more common during these periods. An example is the risk of myocardial infarction within 1 day after surgery and sepsis between 8 and 30 days after surgery.
Identify pathological trajectories from normal physiological changes by using temporal data modelling and personalization for each patient. These trajectories can be based on vitals, other indices (EWS, Rothman index), or a combination of them. Adding pre- and intra-operative data may improve performance in this case.
Analyze and model the correlation between features derived from vitals and other types of data and specific complications that can occur post-operatively. This can provide more insight into the underlying (patho)-physiology of the predicted complications.
Explore big data approaches that exploit large databases of patients who have similar trajectories and conditions. These databases will also help to discover the relationship between pre-operative conditions, patient demographics, surgical procedures, and post-operative deterioration events. These relationships will be crucial in developing patient-specific and actionable early detection systems.
Tackle challenges in real data, including that of irregular gaps, and use techniques to address class imbalance, to improve the implementation and performance of deterioration detection algorithms.
Introduce more continuous, yet unobtrusive, monitoring for post-operative patients. Features deriving from these devices can be integrated in novel machine learning and modelling techniques that look into the relationship between continuous vital sign data, surgical context, and the patient-specific post-surgical trajectory.

Further to the above recommendations, which focus on the more technical elements of implementing and deploying algorithms for post-operative deterioration detection, it is of extreme importance to consider how these algorithms are embedded in the clinical workflow. We recommend the following considerations for integrating them:

In many hospitals worldwide, clinicians and nurses are already suffering from alarm fatigue [79, 80], due to a multitude of alarms in the post-operative period. However, only a small percentage of these alarms is important or actionable. It is important that the above algorithms have improved sensitivity and specificity, and are embedded in the clinical workflow without increasing non-actionable alarms. Machine learning has already been used across various settings to facilitate the reduction of alarms [81,82,83]. However, the problem of alarming is far from solved in everyday clinical workflows and still poses a challenge that needs to be addressed.
Before deploying any of these algorithms in clinical practice, a good assessment of the clinical workflow needs to be done. In busy clinical environments, it is important to avoid adding too many new tasks and reminders, which are seen as a burden. Identifying who acts on which alarm and at what point of time is crucial for the success of a clinical change associated to these algorithms.
Before acting on a recommendation of an algorithm, clinical teams need to understand why the algorithm shows irregular scores. This raises the importance of highlighting the feature changes that led to these scores e.g. if an algorithm shows a raised risk of kidney injury, irregular values of urine output and labs need to be obvious to the clinician making a decision.
There has to be a flexibility in how alerts are received. Although some hospitals are keen on mobile devices, other hospitals avoid having them embedded in the workflow and would rather use existing medical devices, such as patient monitors and central stations.
Training and follow-up are extra important to embed clinical changes due to algorithms in the daily clinical workflow. Before adopting a new solution in the post-operative workflow, adequate training has to be given on what this algorithm does and how it is integrated. Roles for all stakeholders also need to be very clear. After the integration of the algorithm, clinical and nursing teams need to see how well they are doing and how patient care is affected over time. Adherence to clinical protocols is the main factor in maintaining improved clinical results over time.

References

Ghaferi AA, Birkmeyer JD, Dimick JB. Variation in hospital mortality associated with inpatient surgery. N Engl J Med. 2009;361:1368–75.
Article CAS PubMed Google Scholar
Bellomo R, Goldsmith D, Russell S, Uchino S. Postoperative serious adverse events in a teaching hospital: a prospective study. Med J Aust. 2002;176:216–8.
PubMed Google Scholar
Pearse RM, Moreno RP, Bauer P, Pelosi P, Metnitz P, Spies C, et al. Mortality after surgery in Europe: a 7 day cohort study. Lancet. 2012;380:1059–65.
Article PubMed PubMed Central Google Scholar
Hodgetts TJ, Kenward G, Vlackonikolis I, Payne S, Castle N, Crouch R, et al. Incidence, location and reasons for avoidable in-hospital cardiac arrest in a district general hospital. Resuscitation 2002;54:115–23.
Article PubMed Google Scholar
Bates DW, Zimlichman E. Finding patients before they crash: the next major opportunity to improve patient safety. BMJ Qual Saf. 2014. doi:10.1136/bmjqs-2014-003499.
Google Scholar
Silber JH, Williams SV, Krakauer H, Schwartz JS. Hospital and patient characteristics associated with death after surgery: a study of adverse occurrence and failure to rescue. Med Care. 1992;30:615–29.
Article CAS PubMed Google Scholar
Goldhill D. Preventing surgical deaths: critical care and intensive care outreach services in the postoperative period. Br J Anaesth. 2005;95:88–94.
Article CAS PubMed Google Scholar
Le Manach Y, Collins G, Rodseth R, Le Bihan-Benjamin C, Biccard B, Riou B, et al. Preoperative Score to Predict Postoperative Mortality (POSPOM) Derivation and Validation. J Am Soc Anesthesiol. 2016;124:570–9.
Article Google Scholar
Moonesinghe SR, Mythen MG, Das P, Rowan KM, Grocott MP. Risk stratification tools for predicting morbidity and mortality in adult patients undergoing major surgery qualitative systematic review. J Am Soc Anesthesiol. 2013;119:959–81.
Article Google Scholar
Terekhov MA, Ehrenfeld JM, Wanderer JP. Preoperative surgical risk predictions are not meaningfully improved by including the Surgical Apgar Score: an analysis of the Risk Quantification Index and Present-On-Admission Risk Models. J Am Soc Anesthesiol. 2015;123:1059–66.
Article Google Scholar
Saklad M. Grading of patients for surgical procedures. J Am Soc Anesthesiol. 1941;2:281–4.
Article Google Scholar
Aronson W, McAuliffe MS, Miller K. Variability in the American Society of Anesthesiologists physical status classification scale. AANA J. 2003;71:265–76.
PubMed Google Scholar
Haynes S, Lawler P. An assessment of the consistency of ASA physical status classification allocation. Anaesthesia. 1995;50:195–9.
Article CAS PubMed Google Scholar
Mak P, Campbell R, Irwin M. The ASA physical status classification: inter-observer consistency. Anaesth Intensive Care. 2002;30:633.
CAS PubMed Google Scholar
Thottakkara P, Ozrazgat-Baslanti T, Hupf BB, Rashidi P, Pardalos P, Momcilovic P, et al. Application of machine learning techniques to high-dimensional clinical data to forecast postoperative complications. PloS ONE. 2016;11:e0155705.
Article PubMed PubMed Central Google Scholar
Whiteley M, Prytherch D, Higgins B, Weaver P, Prout W. An evaluation of the POSSUM surgical scoring system. Br J Surg. 1996;83:812–5.
Article CAS PubMed Google Scholar
Brueckmann B, Villa-Uribe JL, Bateman BT, Grosse-Sundrup M, Hess DR, Schlett CL, et al. Development and validation of a score for prediction of postoperative respiratory complications. J Am Soc Anesthesiol. 2013;118:1276–85.
Article Google Scholar
Canet J, Gallart L, Gomar C, Paluzie G, Valles J, Castillo J, et al. Prediction of postoperative pulmonary complications in a population-based surgical cohort. J Am Soc Anesthesiol. 2010;113:1338–50.
Article Google Scholar
Hallqvist L, Mårtensson J, Granath F, Sahlén A, Bell M. Intraoperative hypotension is associated with myocardial damage in noncardiac surgery: an observational study. Eur J Anaesthesiol. 2016;33:450–6.
Article PubMed Google Scholar
Mascha EJ, Yang D, Weiss S, Sessler DI. Intraoperative mean arterial pressure variability and 30-day mortality in patients having noncardiac surgery. J Am Soc Anesthesiol. 2015;123:79–91.
Article Google Scholar
Sessler DI, Sigl JC, Kelley SD, Chamoun NG, Manberg PJ, Saager L, et al. Hospital stay and mortality are increased in patients having a “triple low” of low blood pressure, low bispectral index, and low minimum alveolar concentration of volatile anesthesia. J Am Soc Anesthesiol. 2012;116:1195–203.
Article Google Scholar
Willingham MD, Karren E, Shanks AM, O’Connor MF, Jacobsohn E, Kheterpal S, et al. Concurrence of intraoperative hypotension, low minimum alveolar concentration, and low bispectral index is associated with postoperative death. J Am Soc Anesthesiol. 2015;123:775–85.
Article Google Scholar
Gawande AA, Kwaan MR, Regenbogen SE, Lipsitz SA, Zinner MJ. An Apgar score for surgery. J Am Coll Surg. 2007;204:201–8.
Article PubMed Google Scholar
Haynes AB, Regenbogen SE, Weiser TG, Lipsitz SR, Dziekan G, Berry WR, et al. Surgical outcome measurement for a global patient population: validation of the Surgical Apgar Score in 8 countries. Surgery. 2011;149:519–24.
Article PubMed Google Scholar
Melis M, Pinna A, Okochi S, Masi A, Rosman AS, Neihaus D, et al. Validation of the Surgical Apgar Score in a veteran population undergoing general surgery. J Am Coll Surg. 2014;218:218–25.
Article PubMed Google Scholar
Reynolds PQ, Sanders NW, Schildcrout JS, Mercaldo ND, Jacques PJS. Expansion of the surgical Apgar score across all surgical subspecialties as a means to predict postoperative mortality. J Am Soc Anesthesiol. 2011;114:1305–12.
Article Google Scholar
Sessler DI, Sigl JC, Manberg PJ, Kelley SD, Schubert A, Chamoun NG. Broadly applicable risk stratification system for predicting duration of hospitalization and mortality. J Am Soc Anesthesiol. 2010;113:1026–37.
Article Google Scholar
Brown H, Terrence J, Vasquez P, Bates DW, Zimlichman E. Continuous monitoring in an inpatient medical-surgical unit: a controlled clinical trial. Am J Med. 2014;127:226–32.
Article PubMed Google Scholar
Escobar GJ, LaGuardia JC, Turk BJ, Ragins A, Kipnis P, Draper D. Early detection of impending physiologic deterioration among patients who are not in intensive care: development of predictive models using data from an automated electronic medical record. J Hosp Med. 2012;7:388–95.
Article PubMed Google Scholar
Almeida A, Faria G, Moreira H, Pinto-de-Sousa J, Correia-da-Silva P, Maia JC. Elevated serum C-reactive protein as a predictive factor for anastomotic leakage in colorectal surgery. Int J Surg. 2012;10:87–91.
Article CAS PubMed Google Scholar
Labgaa I, Joliat G-R, Demartines N, Hübner M. Serum albumin is an early predictor of complications after liver surgery. Dig Liver Dis. 2016;48:559–61.
Article CAS PubMed Google Scholar
MacKay G, Molloy R, O’Dwyer P. C-reactive protein as a predictor of postoperative infective complications following elective colorectal resection. Colorectal Dis. 2011;13:583–7.
Article CAS PubMed Google Scholar
Lee A, Bishop G, Hillman K, Daffurn K. The medical emergency team. Anaesth Intensive Care. 1995;23:183–6.
CAS PubMed Google Scholar
Bellomo R, Goldsmith D, Uchino S, Buckmaster J, Hart G, Opdam H, et al. Prospective controlled trial of effect of medical emergency team on postoperative morbidity and mortality rates. Crit Care Med. 2004;32:916–21.
Article PubMed Google Scholar
Gao H, McDonnell A, Harrison DA, Moore T, Adam S, Daly K, et al. Systematic review and evaluation of physiological track and trigger warning systems for identifying at-risk patients on the ward. Intensive Care Med. 2007;33:667–79.
Article PubMed Google Scholar
Bell MB, Konrad D, Granath F, Ekbom A, Martling C-R. Prevalence and sensitivity of MET-criteria in a Scandinavian University Hospital. Resuscitation. 2006;70:66–73.
Article PubMed Google Scholar
Gardner-Thorpe J, Love N, Wrightson J, Walsh S, Keeling N. The value of Modified Early Warning Score (MEWS) in surgical in-patients: a prospective observational study. Ann R Coll Surg Engl. 2006;88:571–5.
Article CAS PubMed PubMed Central Google Scholar
Smith T, Den Hartog D, Moerman T, Patka P, Van Lieshout E, Schep N. Accuracy of an expanded early warning score for patients in general and trauma surgery wards. Br J Surg. 2012;99:192–7.
Article CAS PubMed Google Scholar
Tepas JJ, Rimar JM, Hsiao AL, Nussbaum MS. Automated analysis of electronic medical record data reflects the pathophysiology of operative complications. Surgery. 2013;154:918–26.
Article PubMed Google Scholar
Morgan R, Williams F, Wright M. An early warning scoring system for detecting developing critical illness. Clin Intensive Care. 1997;8:100.
Google Scholar
Subbe C, Kruger M, Rutherford P, Gemmel L. Validation of a modified Early Warning Score in medical admissions. Qjm. 2001;94:521–6.
Article CAS PubMed Google Scholar
Prytherch DR, Smith GB, Schmidt PE, Featherstone PI. ViEWS—towards a national early warning score for detecting adult inpatient deterioration. Resuscitation. 2010;81:932–7.
Article PubMed Google Scholar
Smith GB, Prytherch DR, Meredith P, Schmidt PE, Featherstone PI. The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. Resuscitation. 2013;84:465–70.
Article PubMed Google Scholar
Rothman MJ, Rothman SI, Beals J. Development and validation of a continuous measure of patient condition using the electronic medical record. J Biomed Inform. 2013;46:837–48.
Article PubMed Google Scholar
Churpek MM, Yuen TC, Park SY, Meltzer DO, Hall JB, Edelson DP. Derivation of a cardiac arrest prediction model using ward vital signs. Crit Care Med. 2012;40:2102.
Article PubMed PubMed Central Google Scholar
Kate RJ, Perez RM, Mazumdar D, Pasupathy KS, Nilakantan V. Prediction and detection models for acute kidney injury in hospitalized older adults. BMC Med Inform Decis Mak. 2016;16:1.
Article Google Scholar
Calvert JS, Price DA, Chettipally UK, Barton CW, Feldman MD, Hoffman JL, et al. A computational approach to early sepsis detection. Comput Biol Med. 2016;74:69–73.
Article PubMed Google Scholar
Clifton L, Clifton DA, Watkinson PJ, Tarassenko L. Identification of patient deterioration in vital-sign data using one-class support vector machines. Computer Science and Information Systems (FedCSIS), 2011 Federated Conference on IEEE; 2011. pp.125–31.
Jansen JO, Cuthbertson BH. Detecting critical illness outside the ICU: the role of track and trigger systems. Curr Opin Crit Care. 2010;16:184–90.
Article PubMed Google Scholar
Alam N, Hobbelink E, van Tienhoven A, van de Ven P, Jansma E, Nanayakkara P. The impact of the use of the Early Warning Score (EWS) on patient outcomes: a systematic review. Resuscitation. 2014;85:587–94.
Article CAS PubMed Google Scholar
Pimentel MA, Clifton DA, Clifton L, Watkinson PJ, Tarassenko L. Modelling physiological deterioration in post-operative patient vital-sign data. Med Biol Eng Comput. 2013;51:869–77.
Article PubMed PubMed Central Google Scholar
Carli F. Physiologic considerations of enhanced recovery after surgery (ERAS) programs: implications of the stress response. Can J Anesth. 2015;62:110–9.
Article PubMed Google Scholar
Wang X, Wang F, Hu J, Sorrentino R. Towards actionable risk stratification: a bilinear approach. J Biomed Inform. 2015;53:147–55.
Article PubMed Google Scholar
Mao Y, Chen Y, Hackmann G, Chen M, Lu C, Kollef M, et al. Medical data mining for early deterioration warning in general hospital wards. Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on IEEE; 2011. pp. 1042–9.
Hollis RH, Graham LA, Lazenby JP, Brown DM, Taylor BB, Heslin MJ, et al. A role for the early warning score in early identification of critical postoperative complications. Ann Surg. 2016;263:918–23.
Article PubMed Google Scholar
Piper GL, Kaplan LJ, Maung AA, Lui FY, Barre K, Davis KA. Using the Rothman index to predict early unplanned surgical intensive care unit readmissions. J Trauma Acute Care Surg. 2014;77:78–82.
Article PubMed Google Scholar
Pimentel MA, Clifton DA, Tarassenko L. Gaussian process clustering for the functional characterisation of vital-sign trajectories. Machine Learning for Signal Processing (MLSP), 2013 IEEE International Workshop on IEEE; 2013. pp. 1–6.
Durichen R, Pimentel MA, Clifton L, Schweikard A, Clifton DA. Multitask Gaussian processes for multivariate physiological time-series analysis. IEEE Trans Biomed Eng. 2015;62:314–22.
Article PubMed Google Scholar
Thompson JS, Baxter BT, Allison JG, Johnson FE, Lee KK, Park WY. Temporal patterns of postoperative complications. Arch Surg. 2003;138:596–603.
Article PubMed Google Scholar
Feld SI, Cobian AG, Tevis SE, Kennedy GD, Craven MW. Modeling the Temporal Evolution of Postoperative Complications. AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2016. p. 551.
Taenzer AH, Pyke JB, McGrath SP. A review of current and emerging approaches to address failure-to-rescue. J Am Soc Anesthesiol. 2011;115:421–31.
Article Google Scholar
McGain F, Cretikos MA, Jones D, Van Dyk S, Buist MD, Opdam H, et al. Documentation of clinical review and vital signs after major surgery. Med J Aust. 2008;189:380.
PubMed Google Scholar
Sun Z, Sessler DI, Dalton JE, Devereaux P, Shahinyan A, Naylor AJ, et al. Postoperative hypoxemia is common and persistent: a prospective blinded observational study. Anesth Analg. 2015;121:709–15.
Article CAS PubMed PubMed Central Google Scholar
Stoelting RK. Continuous postoperative electronic monitoring and the will to require it. Anesth Analg. 2015;121:579–81.
Article PubMed Google Scholar
Clifton L, Clifton DA, Pimentel MA, Watkinson PJ, Tarassenko L. Predictive monitoring of mobile patients by combining clinical observations with data from wearable sensors. IEEE J Biomed Health Inform 2014;18:722–30.
Article PubMed Google Scholar
Erb L, Hyman NH, Osler T. Abnormal vital signs are common after bowel resection and do not predict anastomotic leak. J Am Coll Surg. 2014;218:1195–9.
Article PubMed Google Scholar
Churpek MM, Yuen TC, Winslow C, Hall J, Edelson DP. Differences in vital signs between elderly and nonelderly patients prior to ward cardiac arrest. Crit Care Med. 2015;43:816–22.
Article PubMed PubMed Central Google Scholar
Chawla NV, Davis DA. Bringing big data to personalized healthcare: a patient-centered framework. J Gen Intern Med. 2013;28:660–5.
Article PubMed Central Google Scholar
Ng K, Sun J, Hu J, Wang F. Personalized predictive modeling and risk factor identification using patient similarity. AMIA Summits on Translational Science Proceedings. American Medical Informatics Association; 2015;2015:132.
Schulam P, Saria S. A framework for individualizing predictions of disease trajectories by exploiting multi-resolution structure. Adv Neural Inform Process Syst. 2015. 748–56.
Visweswaran S, Angus DC, Hsieh M, Weissfeld L, Yealy D, Cooper GF. Learning patient-specific predictive models from clinical data. J Biomed Inform. 2010;43:669–85.
Article PubMed PubMed Central Google Scholar
Alaa AM, Yoon J, Hu S, van der Schaar M . Personalized Risk Scoring for Critical Care Patients using Mixtures of Gaussian Process Experts. arXiv preprint arXiv:1605.00959. 2016.
Van Esbroeck A, Rubinfeld I, Hall B, Syed Z. Quantifying surgical complexity with machine learning: looking beyond patient factors to improve surgical models. Surgery. 2014;156:1097–105.
Article PubMed Google Scholar
Hackmann G, Chen M, Chipara O, Lu C, Chen Y, Kollef M, et al. Toward a two-tier clinical warning system for hospitalized patients. AMIA Annual Symposium proceedings. American Medical Informatics Association; 2011. p. 511.
Barber D. Bayesian reasoning and machine learning. Cambridge: Cambridge University Press; 2012.
Google Scholar
Goodfellow I, Bengio Y, Courville A. Deep learning (adaptive computation and machine learning series). Cambridge: The MIT Press; 2016.
Google Scholar
Tarassenko L, Hann A, Young D. Integrated monitoring and analysis for early warning of patient deterioration. British journal of anaesthesia. British J Anaesth. 2006;97:64–8.
Article CAS Google Scholar
Churpek MM, Yuen TC, Winslow C, Meltzer DO, Kattan MW, Edelson DP. Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit Care Med. 2016;44:368–74.
Article PubMed PubMed Central Google Scholar
Curry JP, Jungquist CR. A critical assessment of monitoring practices, patient deterioration, and alarm fatigue on inpatient wards: a review. Patient Saf Surg. 2014;8:29.
Article PubMed PubMed Central Google Scholar
Cvach M. Monitor alarm fatigue: an integrative review. Biomed Instrum Technol. 2012;46:268–77.
Article PubMed Google Scholar
Chen L, Dubrawski A, Wang D, Fiterau M, Guillame-Bert M, Bose E, et al. Using supervised machine learning to classify real alerts and artifact in online multi-SIGNAL vital sign monitoring data. Crit Care Med. 2016;44:e456.
Article PubMed PubMed Central Google Scholar
Joshi R, van Pul C, Atallah L, Feijs L, Van Huffel S, Andriessen P. Pattern discovery in critical alarms originating from neonates under intensive care. Physiol Meas. 2016;37:564.
Article PubMed Google Scholar
Stevens N, Giannareas AR, Kern V, Viesca A, Fortino-Mullen M, King A, et al. Smart alarms: multivariate medical alarm integration for post CABG surgery patients. Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. ACM; 2012. pp. 533–542.

Download references

Funding

This work was funded by Philips Research through the perioperative stream of the ‘IMPULS’ program which is a collaboration between the Technical University of Eindhoven, Philips and the Catharina Hospital in Eindhoven. The authors work for Philips Research: RB, the Netherlands and LA in Cambridge, MA USA. CP has a joint position between Philips Research and the Technical University of Eindhoven.

Author information

Authors and Affiliations

Department of Electrical Engineering, Technical University of Eindhoven, P.O. Box 513, 5600 MB, Eindhoven, The Netherlands
Clemence Petit
Patient Care and Measurements Department, Philips Research Eindhoven, High Tech Campus 34, 5656 AE, Eindhoven, The Netherlands
Clemence Petit & Rick Bezemer
Acute Care Solutions Department, Philips Research North America, 2 Canal Park, Cambridge, MA, 02141, USA
Louis Atallah

Authors

Clemence Petit
View author publications
You can also search for this author in PubMed Google Scholar
Rick Bezemer
View author publications
You can also search for this author in PubMed Google Scholar
Louis Atallah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Louis Atallah.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Petit, C., Bezemer, R. & Atallah, L. A review of recent advances in data analytics for post-operative patient deterioration detection. J Clin Monit Comput 32, 391–402 (2018). https://doi.org/10.1007/s10877-017-0054-7

Download citation

Received: 21 March 2017
Accepted: 14 August 2017
Published: 21 August 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10877-017-0054-7