1 Background

Despite considerable advances in anesthesia, ventilation, surgical and monitoring technologies, about 25% of surgical patients suffer from postoperative complications [1]. Serious adverse events in the time following surgery include respiratory failure, cardiac arrest, severe sepsis, and acute renal failure. These conditions constitute life-threatening risks, in particular for elderly patients and emergency surgery patients [2]. Pearse et al., in 2012, performed a large European study to assess the patient outcomes after non-cardiac surgery [3], discovering an overall 4% in-hospital mortality, which reveals the extent of the problem.

Concerning patient safety, the general ward turns out to be the hardest setting to detect patient deterioration in the perioperative care journey. For instance, patients there are more likely to undergo a potentially avoidable cardiac arrest and are less likely to survive it than patients in intensive care [4]. Although early warning scores have been introduced there, resources to perform the required assessments and integrating these into the hospital data management system are limited. The intermittent monitoring and low nurse-to-patient ratio are both in contrast with the high levels of vigilance and monitoring in the operating room and intensive care unit (ICU). With the increased vulnerability of patients, it is no surprise that deterioration sometimes remains unnoticed in these conditions [5]. To this point, Pearse et al. noted that 73% of the patients who died after surgery were never admitted to a higher level of care [3]. This emphasizes the need for early recognition of post-operative patient deterioration.

In 1992, Silber et al. introduced the concept of failure to rescue defined as the hospital deaths after adverse occurrences, such as postoperative complications [6]. It was later highlighted that the timely recognition and management of postoperative complications may actually be responsible for variation in hospital mortality rates [1]. There has been an emergence of rapid response systems to address this failure to rescue. Two components can be distinguished: the afferent limb aiming at detecting deterioration, and the efferent limb aiming at responding to this deterioration. Despite the importance of both parts in improving patient outcomes, we will focus on the former as advanced data analytics could significantly affect the performance of early warning algorithms.

This paper first describes the current standard of care for the prediction or detection of patient deterioration throughout the perioperative course, and analyzes its current weaknesses and limitations. Then, it discusses analytical frameworks that have emerged to improve postoperative deterioration detection and the key steps to improve perioperative clinical decision support. Finally, the paper concludes with recommendations for future work in the area.

2 Current methods for identification of high-risk (post-) surgical patients

The identification of high-risk surgical patients can take place at different stages of the care process: preoperatively, intraoperatively, or postoperatively [7]. The emergence of electronic medical records has facilitated the access to an increasing quantity of features such as demographics, comorbidities, vital signs, and laboratory data. Therefore, several research teams have explored the use of analytics for the prediction or detection of postoperative deterioration. The developed tools range from single or multi-parameter predictors to aggregate scores and machine learning algorithms. While preoperative and intraoperative models tend to focus on risk stratification, postoperative models are more dedicated to the (early) detection of patient deterioration. Figure 1 gives a brief overview of the different features and scores available at the different steps of perioperative care. These will be covered in more detail in the next few sections.

Fig. 1
figure 1

Overview of the available features and scores currently used throughout the perioperative journey

2.1 Pre- and intra-operative risk stratification

Surgical risk assessment is of great interest for various reasons. It may guide preoperative decision making and better inform patients and their families [8,9,10]. It also serves as a tool for research, comparative audit, and quality monitoring [8,9,10]. Furthermore, it can be used to plan clinical management, for example by adjusting the postoperative monitoring and level of care depending on the patient’s risk [10]. This last application is of particular interest with a view to address failure to rescue since it could allow a better allocation of resources directly after the operating room release and thus facilitate postoperative deterioration detection and timely intervention in a cost-effective way.

The ASA physical status score [11], commonly used, is a very simple preoperative score aiming at stratifying patients into several categories of surgical risk. These are ASA I: normal patients, ASA ll: patients with mild systemic diseases, ASA III: severe systemic diseases, ASA IV: severe systemic diseases that are a constant threat to life, ASA V: patients who are not expected to survive without the operation, ASA VI: declared brain-dead patient. It is still widely used despite several studies showing that there is an inter-observer variability [12,13,14]. In addition to the ASA scores, other scores have been developed to give a pre-operative risk assessment, mostly using logistic regression methods [15]. The risk assessment provided can be used for in-hospital mortality [8, 16], or targeting a specific type of complication, such as pulmonary complications [17, 18].

Patients undergo extensive monitoring in the operating room and as such, research has started to investigate the large amount of data recorded during surgery in order to find predictors of patient outcomes. Some papers have highlighted the association between intraoperative factors and postoperative complications. For example, intraoperative hypotension is associated with myocardial damage and 30-day mortality after non-cardiac surgery [19, 20]. The coexisting occurrence of low blood pressure, low bispectral index, and low minimum alveolar concentration of volatile anaesthesia, referred to as the “Triple Low”, has also been validated as a predictor of 30-day mortality [21, 22]. Intraoperative features may also be combined into an aggregate score. Gawande et al. developed the Surgical Apgar Score to assess the risk of major complication or death, based on the estimated amount of blood loss, the lowest heart rate, and the lowest mean arterial blood pressure [23]. The score has later been validated across various settings [24,25,26].

However, the use of pre-and intra-operative risk stratification may have some shortcomings. Some of the pre- and intra-operative risk scores were developed from and/or for specific populations [27]. Thus, although these models may work very well in these populations, they may generalize poorly and lead to a low predictive accuracy in a ward with a more heterogeneous surgical population. Another drawback of the surgical risk stratification lies in the widespread use of logistic regression models. These models assume a monotonous (increasing or decreasing) association between the predictor variables and the outcome, whereas in physiology, the relation may be more complex (e.g., only very low and very high values may be indicative of a risk, whereas values in a wide ‘normal’ range may have very low predictive value) [15]. Finally, yet importantly, the clinical implementation of pre- and intra-operative risk stratification for post-operative deterioration detection and associated workflows has proven to be challenging in routine care [9].

2.2 Postoperative deterioration detection

In the operating room and intensive care unit, patients are closely observed by the clinical teams and are well equipped with monitoring solutions. However, outside these high-acuity settings, such as at the general ward, the standard of care for postoperative deterioration detection still relies on the manual periodic measurement of vital signs, usually every 4–6 h [28]. Laboratory tests may also be used but are infrequently performed for the surgical ward patients [29]. Yet, several studies have established an association between a lab test result and a complication for a given surgical subpopulation [30,31,32]. Some of these studies are shown in Table 1. Vital signs have been used as indicators of deterioration in a more structured approach. Indeed, criteria of physiological instability have emerged with the introduction of medical emergency teams (MET) [33]. These criteria rely on heart rate, systolic blood pressure, and respiratory rate, among others. The introduction of an MET has been shown to reduce the incidence of postoperative adverse outcomes [34]. Several single-parameter systems have been described in Gao et al.‘s review of track and trigger systems for identifying at-risk patients on the ward [35]. Interestingly in their survey, they found that there is a large variability of track and trigger solutions, with little evidence of reliability and validity. Sensitivity was found to be poor, which could have been due to the nature of the (patho) physiology monitored or the choice of trigger threshold [35].

Table 1 Overview of predictors that have been used for deterioration prediction/detection in post-operative patients

In addition to these single-parameter criteria, scores based on multiple parameters have been used for deterioration detection. Early warning scores (EWS) were first introduced in 1997 by Morgan et al. [40] as a bedside tool to help identify patients at risk of deterioration. These scores evaluate the deviation from normal ranges of routine vital signs. Since then, similar track and trigger systems have become common practice in many hospitals for patient monitoring in the general ward, acute assessment unit, or emergency department. The most often cited EWS include the MEWS [41] (using four vitals and a patient reaction score), the ViEWS [42] (adding Oxygen saturation and inspired O2 to the above), and the NEWS [43]. The latter has been recommended by the UK National Institute for Health and Clinical Excellence to provide a national standard in the UK. A more recent alternative to EWS is called the Rothman Index [44] which is a continuous index that summarizes a patient’s clinical condition based on 26 variables. These include vital signs, laboratory results, cardiac rhythms, and nursing assessments. The score was constructed by using the variables’ excess risk functions, i.e. the increase in 1-year post-discharge mortality associated with any value. Both EWS and the Rothman Index are general scores not designed specifically for postoperative care, but they have been further validated in surgical patients [37,38,39] for the prediction of ICU admission, death or complications. Table 1 provides an overview of parameters and scores, which have been evaluated for the prediction/detection of deterioration in post-surgical patients.

As regards post-operative deterioration detection, scores that are used include either risk scores specific to certain conditions or generic ones that summarize the overall patient condition. The first group includes scores for cardiac arrest [45], acute kidney injury [46], and sepsis [47].

Embedding these scores in hospital care has to take into consideration nursing workflow and workload, especially if nurses are required to periodically perform manual assessments of vital signs and manually enter the data into the hospital data management system to make it available for more advanced risk assessment and early warning.

However, before warning scores such as EWS [44, 48], can be successfully implemented into clinical workflows, several hurdles have to be addressed. At present, vital sign and warning scores are infrequently assessed, usually only every 4–6 h [28], and up to every 12 h [48] which means deterioration events can easily be missed during these periods. Furthermore, current EWS have been shown to have poor predictive capabilities for in-hospital mortality, admission to critical care, and cardiac arrest, in particular a low sensitivity and a low positive predictive value [35, 49]. A review by Gao et al. gives them, at present, “little evidence of reliability, validity and utility” [35]. Indeed, the studies reporting their impact on clinical outcomes had mixed results: a literature review by Alam et al. found that only two out of six studies indicated significant reduction of in-hospital mortality (respectively from 5.8 to 3.0% and from 1.4 to 1.2%) [50]. This highlights the need for more advanced approached to make the scores more accurate and actionable. Another limitation of current EWS, also valid for the Rothman Index described above, is that vital signs are treated independently, which means that the possible correlations, such as (changes in) those between heart rate and blood pressure and heart rate and respiration rate, are not taken into account [48].

Moreover, many of the published scores have not been based on data acquired from postoperative patients, but rather a more general ward population, and their extrapolation to post-surgical patients could present challenges [51]. Indeed, the postoperative patient features a different physiology due to the hormonal and metabolic changes triggered by the surgery, the anesthetics, fluids, (anti)coagulants, vasopressors, and inotropes administered [52]. This, combined with the underlying disease and physiology as well as the medications the patients received, encourages taking into account the specific requirements of the postoperative patient. An additional shortcoming of current risk and warning scores is that they often fail to provide the underlying clinical context [53], which makes them difficult to act upon.

3 Recent advances and promising directions in data analytics for postoperative patient deterioration detection

3.1 Postoperative physiological trajectory modelling

The need to exploit the temporal structure of data has been advocated by Mao et al. [54]. This is particularly relevant for surgical patients as these patients may experience vital sign alterations in response to the postoperative “inflammatory state, volume shifts, and pain” [55] as well as administered fluids and drugs. It is therefore essential to be able to identify pathological trajectories from normal recovery patterns (i.e., taken into account any coexisting conditions). Pimentel et al. [51] analyzed the trajectory of vital signs in the postoperative ward for stable patients, defined as those without ICU admission nor in-hospital death. Based on the assumption that postoperative patients start from a deranged state and slowly return to normal, they were able to visualize a general recovery pattern. In recent years, a few papers have investigated the postoperative evolution of several scores, e.g. EWS or the Rothman Index, and shown how they could help differentiate deteriorating patients from patients normally recovering. Their findings are presented in Table 2. They all emphasize that a worsening of a score is associated with poor outcomes, which highlights the benefit of postoperative trajectory analysis.

Table 2 Postoperative temporal patterns of different scores for deterioration detection

Another approach for the study of temporal patterns involves the direct modelling of physiological trajectories through, for example, Gaussian process regression. This method provides a framework able to handle noisy, sparse, and unevenly sampled data. Pimentel et al. [57] applied this method on manual observations of heart rate and respiratory rate of patients recovering from cancer surgery. The authors were then able to discriminate known (normal) trajectories from unknown (abnormal) trajectories. More recently, Dürichen et al. proposed a multitask Gaussian process, which takes into account the correlation between physiological time series and uses prior knowledge of the relationships between these time series, leading to improved prediction of outcomes [58].

In addition to detecting when a postoperative patient is deviating from their normal recovery trajectory, it is of interest to know more precisely the complications of which they are at risk. In this respect, Thompson et al. looked at the temporal patterns of various postoperative complications [59]. They found, for example, that the greatest incidence of myocardial infarction was in the early postoperative period (i.e., within 1 day) and the greatest incidence of sepsis was in the late postoperative period (i.e., between 8 and 30 days after surgery), while gastrointestinal tract bleeding had the same incidence throughout the postoperative period. This knowledge could be used to develop warnings that are more specific to certain complications and to better guide clinical management.

Recently, Feld et al. presented an approach based on machine learning for characterizing the temporal evolution of post-operative complications [60]. They demonstrated that the models have significant predictive value, in particular, for the development of serious complications, such as coma longer than a day, cardiac arrest, myocardial infarction, septic shock, renal failure, and pneumonia, and interventional complications, such as unplanned re-intubation, longer than 2 days on ventilator support, and the need for blood transfusion. Although integrating the temporal component in scoring systems for postoperative patients would lead to improved deterioration detection, the intermittent monitoring characteristic of the general ward may hamper this [61]. Further limitations include the significant levels of incomplete documentation of vital signs following major surgery [62] and the interruptive nature of vital signs checks, which might not be an accurate measure of the patient’s condition [61, 63]. Therefore the systematic implementation of continuous monitoring on the general ward has been recommended [61, 64].

A blinded study by Sun et al. [63] reinforces this idea. The implementation of continuous pulse oximetry on the postoperative ward highlighted that 90% of hypoxemic episodes were missed when intermittent monitoring was used. In this respect, Clifton et al. suggest a combination of manual measurements with data from wearable sensors [65]. In their study, mobile pulse oximeters and ECG sensors were provided to ward patients following upper-gastrointestinal cancer surgery. These were combined with manually annotated values for blood pressure and respiration rate. Promising results, in terms of detecting adverse events, were obtained using four different machine-learning techniques (namely one-class support vector machines, one-class Gaussian processes, Gaussian mixture models, and kernel density estimates). Furthermore, two example cases highlighted that the processing of continuous data was able to identify deterioration that remained unnoticed with traditional intermittent EWS from manual measurements. Although it falls out of the scope of this data-focused review, the development of wireless sensors for continuous monitoring on the general ward, and proper integration of the data into the hospital data management system, are gaining momentum and appear to be part of a larger, data-enabled solution to improve timely detection of patient deterioration [5].

3.2 Personalized models

Most of the above-mentioned scores use population-based, rather than individual-based, thresholds that are determined from general statistics. These thresholds may not fit specific patients with more complex (patho)-physiologies, leaving room for optimization. Moreover, most of the algorithms were not trained on surgical patients, but rather on general ward populations, presenting some limitations when applied postoperatively. Erb et al. [66] recently performed a study on patients undergoing bowel resection with anastomosis. They observed that abnormal vital signs, e.g. elevated heart rate or respiratory rate, were common in the first postoperative week, even in patients without complications, and therefore had a poor predictive value. These findings suggest that traditional methods developed in other settings (e.g., EWS) might generalize poorly on surgical patients, or at least within some subpopulations. Furthermore, patients usually exhibit different baseline characteristics, e.g. depending on age and comorbidities. For example, Churpek et al. emphasized that elderly and nonelderly patients have significantly different vital signs prior to cardiac arrest [67]. This population’s heterogeneity should be taken into account to reduce false alarms for some patients, prevent undetected deterioration for others.

In recent years, personalized models have become a topic of focus [68,69,70]. Visweswaran et al. described patient-specific predictive models where a Bayesian model averaging was performed over a set of models generated using patient characteristics such as history, demographics, laboratory data, etc [71]. Although they tested their algorithm on sepsis and heart failure, it can be tailored towards other settings and outcomes as well. More recently, Alaa et al. proposed a personalized real-time risk scoring algorithm (using a hierarchical latent class model) for general ward patients, based on their successive laboratory tests and vital signs measurements [72]. In their study, this technique outperformed both the MEWS and the Rothman Index in terms of positive predictive value and sensitivity for the prediction of ICU admission. The observed reduction in false alarms in combination with an increased sensitivity for detecting deterioration validates the attractiveness of patient-specific models for deterioration detection.

3.3 Combination of pre-, intra-, and post-operative predictors

The real-time training required for some personalized models raises multiple challenges when these models are suddenly applied post-operatively. A considerable amount of data should be readily available so that the algorithm, for example, establishes an estimation of the patient profile and finds similar patients to predict an expected recovery pattern. A promising way to tailor scores and thresholds to the individual patient is to make use of preoperative data. This data can provide a personalized baseline and thus guide the monitoring of patient recovery, i.e. the evolution from a deranged postoperative state towards the desired state. Knowing the pre-operative baseline vital signs can, at least in elective surgery patients, provide a customized indication of the normal state of the patient, taking into account any co-existing conditions. For instance, a study by Labgaa et al. suggests that the postoperative drop of serum albumin, compared to the preoperative concentration, reflects the intensity of the surgical stress in patients who underwent elective liver surgery [31]. A significantly higher drop was observed for patients with postoperative complications. This concept could be extended to other parameters in a dynamic scoring framework to track the patient’s postoperative recovery, customized for different types of patients and types of surgery.

In that sense, it may be of interest to leverage all data from pre-, intra-, and post-operative phases in order to take into account the clinical context surrounding the patient’s care journey. However, it is not yet understood how to combine these data points successfully. For instance, Terekhov et al. found that preoperative risk models were not meaningfully improved by adding intraoperative risk using the Surgical Apgar Score [10]. To explain these results, they hypothesized that the preoperative scores already included the intraoperative information in relation to mortality prediction. Hence, further study is required to combine data throughout the care journey into an integrated, patient-specific risk assessment and warning for post-operative deterioration.

Different approaches can be proposed to deploy the clinical context acquired through pre- and intra-operative risk assessment into strategies for postoperative deterioration detection. One option is to incorporate these parameters into the postoperative algorithms personalized to the individual patient. Van Esbroeck et al. have proposed a method to quantify surgical complexity by analyzing the association between individual procedures and postoperative complications [73]. The procedural risk scores that were generated achieved moderate to high levels of discrimination, with a high ability to predict mortality, morbidity, and several complications. One could imagine using these risk scores to weight a postoperative score based on patient-specific factors (e.g., vitals and/or laboratory data). Hence, the integration of such procedure-specific information could improve the postoperative model and provide a more accurate warning for deterioration. Another option is to use the pre- and intra-operative risk assessments to recommend continuous monitoring to the patients at greatest risk. This would allow early identification of patient deterioration in a cost-effective way [74].

3.4 Machine learning techniques for deterioration detection

Research in the fields of machine learning and pattern recognition has provided innovative solutions to overcome some of the limitations mentioned above. Such methods can be used to, for example, deal with missing data, find trends and correlations, and build models. In this regard, there has been a growing interest in the use of machine learning techniques to improve the traditional EWS for deterioration detection on the general ward, though not yet specifically targeting postoperative patients.

Machine learning is the term for automated data analysis for analytical and statistical pattern recognition and model building. It relies on iteratively learning from new data, updating the generated model(s), and thereby enabling the finding of hidden information or patterns in large data sets. Data mining, on the other hand, focuses on the discovery of new, previously unknown properties of a present large data set. Machine learning and data mining often apply the same methods and are quite similar in approach, but the goals are different: machine learning aims to predict new data based on analyzed data (i.e., input–output) while data mining aims to find hidden information in the present data (i.e., knowledge discovery in databases). In this context, it is important to distinguish supervised learning from unsupervised learning. In supervised learning, the computer receives labeled data and tries to infer a relation between the input parameters (e.g. vitals) and the labeled output (e.g., cardiac arrest; the supervisory signal). In unsupervised learning, the computer aims to find relations between input parameters and unlabeled (i.e., unidentified) outputs. There are several different methods applied in machine learning and data mining, such as support vector machines, Gaussian processes, Gaussian mixture models, kernel density estimates, latent class models, Parzen windows, and classification and regression trees, More information on these methods can be found elsewhere [75, 76].

In machine learning and data mining, novelty detection is the task of recognizing new data that differ in some way from the data that were used for training. Novelty detection involves the construction of a model of normality using examples of normal behavior, and then the classification of test data as either normal or abnormal with respect to that model. In the context of EWS systems, it looks at deviations from normal ranges. This novelty detection approach is typically employed when the quantity of available non-normal data is insufficient to construct explicit models for different non-normal classes, which would ideally be desired. As such, novelty detection is a popular approach in this field since it averts the modeling of the abnormal class (for which data is often sparse).

Techniques that have been used for deterioration detection include Parzen windows [77] and one-class support vector machines (SVMs) [48]. Parzen windows were used in combination with K-nearest neighbor techniques to detect critical events for 150 patients in the general ward whereas SVMs were used to classify normal versus abnormal patient progress patterns in a step-down-unit (for 19 patients). Recently, Churpek et al. [78] performed a multicenter comparison of various machine learning and regression methods for the prediction of clinical deterioration on the wards. The algorithms incorporated features such as demographics, laboratory values, and vital signs. The random forest algorithm, which constructs a multitude of decision trees then averages their results, performed best in detecting deteriorations on the ward [AUC- area under the receiver operating characteristic (ROC) curve: 0.801]. Along with several other machine learning algorithms, it outperformed logistic regression (AUC: 0.770 for spline regression and 0.735 for linear regression) and the MEWS (AUC: 0.698).

Mao et al. [54] approached the problem from a different perspective and proposed a data mining framework that tackles the challenges associated with the typical suboptimal quality and consistency of data in hospital databases. A bucketing method was used to overcome the presence of irregular gaps in the data (which is typical for data acquisition for in-hospital patients in the ward). An exploratory under-sampling technique was also applied to address the problem of having much more patients without than with specific complications (termed class imbalance). A simple logistic regression, performed after these steps, obtained better classification results for patients with and without complications than two machine-learning algorithms (support vector machines and decision trees). Improved detection of deterioration and the ability to deal with class imbalance are two of the main advantages of using machine learning that would lead to more accurate alarms or warnings and thereby improve patient care and outcomes.

4 Recommendations

This review first described the current standard of care for the prediction or detection of patient deterioration throughout the perioperative course, as summarized in Fig. 1. Several risk and warning scores have been developed and clinically validated for their use in (peri-operative) clinical decision support. Although these scores perform well in some studies, their application in other, either more general or more specific, patient populations remains challenging. Moreover, the poor training for specific patient populations, such as the post-operative patient, and specific complications, such as respiratory depression or cardiac arrest, hampers the ability to act upon the provided alarms or warnings in clinical practice. Furthermore, the infrequent assessment of vitals and lab tests, and poor integration of patient monitoring systems with other data sources in the hospital leaves large gaps and limits temporal trajectory analysis. In addition to that, the independent treatment of the vitals with no attention to their underlying correlations and relations to other patient data, such as personal health data and lab data, leaves a lot of information hidden.

Recent analytical frameworks have emerged to improve postoperative patient deterioration detection and better support clinical decision-making. Promising approaches include: postoperative physiological trajectory modelling, further enabled by continuous, wearable monitoring; personalization of models, scores, and thresholds; combining pre-, intra-, and post-operative predictors; and machine learning and data mining techniques to find newly-identified relations in the vast amount of data generated through the patient’s care journey, helpful for workflow and clinical decision support. Figure 2 provides an illustrative summary of several analytical approaches, and how they relate, to improve risk and warning scoring systems for patient deterioration prediction and detection. Based on the promising results of the reviewed work, we conclude with the following recommendations for future work on risk and warning scores for post-operative patient deterioration detection:

Fig. 2
figure 2

Future of post-operative patient deterioration detection: An illustrative representation of promising approaches, and how they relate, to improve risk and warning scoring systems for patient deterioration prediction and detection

  • Identify important periods in the post-operative journey where deterioration occurs then focus predictive models on conditions that are more common during these periods. An example is the risk of myocardial infarction within 1 day after surgery and sepsis between 8 and 30 days after surgery.

  • Identify pathological trajectories from normal physiological changes by using temporal data modelling and personalization for each patient. These trajectories can be based on vitals, other indices (EWS, Rothman index), or a combination of them. Adding pre- and intra-operative data may improve performance in this case.

  • Analyze and model the correlation between features derived from vitals and other types of data and specific complications that can occur post-operatively. This can provide more insight into the underlying (patho)-physiology of the predicted complications.

  • Explore big data approaches that exploit large databases of patients who have similar trajectories and conditions. These databases will also help to discover the relationship between pre-operative conditions, patient demographics, surgical procedures, and post-operative deterioration events. These relationships will be crucial in developing patient-specific and actionable early detection systems.

  • Tackle challenges in real data, including that of irregular gaps, and use techniques to address class imbalance, to improve the implementation and performance of deterioration detection algorithms.

  • Introduce more continuous, yet unobtrusive, monitoring for post-operative patients. Features deriving from these devices can be integrated in novel machine learning and modelling techniques that look into the relationship between continuous vital sign data, surgical context, and the patient-specific post-surgical trajectory.

Further to the above recommendations, which focus on the more technical elements of implementing and deploying algorithms for post-operative deterioration detection, it is of extreme importance to consider how these algorithms are embedded in the clinical workflow. We recommend the following considerations for integrating them:

  • In many hospitals worldwide, clinicians and nurses are already suffering from alarm fatigue [79, 80], due to a multitude of alarms in the post-operative period. However, only a small percentage of these alarms is important or actionable. It is important that the above algorithms have improved sensitivity and specificity, and are embedded in the clinical workflow without increasing non-actionable alarms. Machine learning has already been used across various settings to facilitate the reduction of alarms [81,82,83]. However, the problem of alarming is far from solved in everyday clinical workflows and still poses a challenge that needs to be addressed.

  • Before deploying any of these algorithms in clinical practice, a good assessment of the clinical workflow needs to be done. In busy clinical environments, it is important to avoid adding too many new tasks and reminders, which are seen as a burden. Identifying who acts on which alarm and at what point of time is crucial for the success of a clinical change associated to these algorithms.

  • Before acting on a recommendation of an algorithm, clinical teams need to understand why the algorithm shows irregular scores. This raises the importance of highlighting the feature changes that led to these scores e.g. if an algorithm shows a raised risk of kidney injury, irregular values of urine output and labs need to be obvious to the clinician making a decision.

  • There has to be a flexibility in how alerts are received. Although some hospitals are keen on mobile devices, other hospitals avoid having them embedded in the workflow and would rather use existing medical devices, such as patient monitors and central stations.

  • Training and follow-up are extra important to embed clinical changes due to algorithms in the daily clinical workflow. Before adopting a new solution in the post-operative workflow, adequate training has to be given on what this algorithm does and how it is integrated. Roles for all stakeholders also need to be very clear. After the integration of the algorithm, clinical and nursing teams need to see how well they are doing and how patient care is affected over time. Adherence to clinical protocols is the main factor in maintaining improved clinical results over time.