Open AccessEditor’s ChoiceArticle

Deep Contrastive Survival Analysis with Dual-View Clustering

Chang Cui

^1,2

Yongqiang Tang

^1,*

and

Wensheng Zhang

^1,2,*

State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China

Authors to whom correspondence should be addressed.

Electronics 2024, 13(24), 4866; https://doi.org/10.3390/electronics13244866

Submission received: 19 November 2024 / Revised: 8 December 2024 / Accepted: 9 December 2024 / Published: 10 December 2024

(This article belongs to the Special Issue Machine Learning for Biomedical Applications)

Download

Browse Figures

Figure 1
The overall architecture of the DVC-Surv model. The Siamese autoencoder consists of two autoencoders without parameter sharing, mapping patient covariates into latent spaces of two views. Subsequently, the dual-view clustering module integrates the representations from dual views to cluster the samples. Lastly, the fused representation of the two views and covariates is fed into the survival backbone to obtain an estimation of the survival distribution. "> Figure 2
Schematic diagram of triple contrastive learning, including (a) inter-view cluster-guided contrastive learning, (b) intra-view instance-wise contrastive learning, and (c) intra-view cluster-wise contrastive learning. "> Figure 3
The visualization of dual-view clustering with tSNE. The t-SNE algorithm can map high-dimensional data to a low-dimensional space (such as two-dimensional space) while preserving the similarity between data points, thereby enabling the visualization of high-dimensional data distributions. Specifically, the clustering results in two views at the end of pre-training and training are shown. In each figure, different clusters are represented by different colors, with censored and uncensored samples indicated by ‘×’ and ‘·’, respectively. "> Figure 4
The feature importance of the model is determined using the SHAP algorithm. Specifically, the SHAP algorithm evaluates the contribution of each feature to the model’s predictions by calculating the marginal effect of each feature on each sample’s prediction. Higher SHAP values indicate that the feature plays a more significant role in the model’s prediction outcomes. Based on this, the average ranking of features’ SHAP values across all samples represents the importance ranking of the features. This can help us identify the features on which the model relies when making predictions, thereby better understanding the model’s decision-making process. ">

Versions Notes

Abstract

Survival analysis aims to analyze the relationship between covariates and events of interest, and is widely applied in multiple research fields, especially in clinical fields. Recently, some studies have attempted to discover potential sub-populations in survival data to assist in survival prediction with clustering. However, existing models that combine clustering with survival analysis face multiple challenges: incomplete representation caused by single-path encoders, the incomplete information of pseudo-samples, and misleading effects of boundary samples. To overcome these challenges, in this study, we propose a novel deep contrastive survival analysis model with dual-view clustering. Specifically, we design a Siamese autoencoder to construct latent spaces in two views and conduct dual-view clustering to more comprehensively capture patient representations. Moreover, we consider the dual views as mutual augmentations rather than introducing pseudo-samples and, based on this, triplet contrastive learning is proposed to fully utilize clustering information and dual-view representations to enhance survival prediction. Additionally, we employ a self-paced learning strategy in the dual-view clustering process to ensure the model handles samples from easy to hard in training, thereby avoiding the misleading effects of boundary samples. Our proposal achieves an average C-index and IBS of 0.6653 and 0.1786 on three widely used clinical datasets, both exceeding the existing best methods, which demonstrates its advanced discriminative and calibration performance.

Keywords:

survival analysis; neural network; clustering; contrastive learning; multi-view clustering

1. Introduction

Survival analysis is employed to explore the association between covariates and outcomes, and to predict the probability distribution of events of interest [1]. Survival analysis models are extensively employed across various fields, including finance [2] and industry [3,4], with particularly significant applications in the clinical field [5,6]. More specifically, survival analysis models are capable of assessing the relationship between patient covariates and prognostic outcomes, such as predicting the risk distribution of patient mortality. Traditional statistical survival analysis algorithms, exemplified by the Cox Proportional Hazards (CoxPH) model [7], are widely used in clinical settings, including cancer prognosis [8] and intensive care unit (ICU) [9] scenarios. CoxPH is a semi-parametric statistical model that predicts individual risks utilizing likelihood regression. However, the CoxPH model relies on the proportional hazards assumption, which assumes that risks among different individuals are proportionally distributed. Such an assumption is often overly strong and misaligned with real-world conditions. Additionally, traditional models utilize linear or likelihood regression, which lack the ability to manage complex nonlinear information. These limitations impede the further advancement of traditional statistical models.

Recently, driven by the rapid development of machine learning, such technologies have been widely applied in the clinical and biomedical fields [10,11,12]. Consequently, deep survival models have been extensively proposed and applied. DeepSurv [13] was developed to enhance the CoxPH model with deep learning, enabling the prediction of risk parameters via multi-layer neural networks. However, as a deep semi-parametric model, DeepSurv still encounters limitations due to its reliance on artificial assumptions. To overcome this challenge, deep non-parametric and deep fully parametric models have been proposed. DeepHit [14], as a representative deep non-parametric model, predicts the survival distribution over discrete time intervals in a completely data-driven manner with fully connected networks, eliminating the need for assumptions. In terms of deep fully parametric models, approaches such as DSM [15] are proposed to estimate survival using a mixture of multiple parametric distributions, thereby also circumventing the need for assumptions.

However, the above models only regard patients as isolated individuals and ignore their interconnections. Specifically, these models treat patients independently in survival prediction, yet in reality, associations between patients commonly exist. Recent studies such as SCA [16], DCM [17], and VaDeSC [18] identify potential sub-populations and seek to integrate clustering with survival analysis. By clustering survival data, these models identify potential sub-populations and try to enhance survival prediction with clustering information. Moreover, in some studies it has been proposed to apply contrastive learning to further capture patient associations. For instance, DSACC [19] utilizes clustering labels and takes uncensored samples as anchors, merging clustering and contrastive learning to augment the model’s predictive ability for censored patients.

Despite significant advancements, current approaches that integrate clustering with survival analysis still face three primary challenges: First, existing studies are limited to constructing a single view for patients. However, multiple types of associations commonly exist among patients, and clustering within a single view can neither adequately uncover potential sub-populations nor represent comprehensive patient information. Second, some models, represented by [20], attempt to utilize data augmentation to enhance contrastive learning. However, the additional samples generated by data augmentation lack precise outcome label and clustering information, potentially providing unreliable and misleading information to the model. Third, current survival models fail to consider the impact of boundary samples in the clustering process, treating all samples uniformly in the training stage. However, compared to easily distinguishable samples, boundary samples with low confidence may mislead the model during training, diminishing clustering performance and adversely affecting survival prediction.

To address the above challenges, we propose a novel deep contrastive survival analysis model with dual-view clustering, DVC-Surv. Specifically, for the single-view limitation, we propose Siamese autoencoder and dual-view clustering. Specifically, two independent autoencoders without parameter sharing are employed to learn the latent space distributions in two distinct views, and both views are integrated for clustering. Compared to single-view clustering, such a design enables the model to more comprehensively capture sample information. Additionally, to address the data augmentation limitation, we treat the dual views constructed by Siamese autoencoder as mutual augmentation rather than the generation of pseudo-samples. This strategy avoids the potentially misleading effects of pseudo-samples. Based on this framework, we further design triple contrastive learning to diversely utilize clustering information and dual-view distributions, thereby enhancing representations. Furthermore, for the boundary sample limitation, we employ self-paced learning in the clustering process. Specifically, the model progressively introduces training samples from easy to hard, thus minimizing the influence of low-confidence boundary samples to the model. In summary, the main contributions of this study are as follows:

We propose a novel deep contrastive survival analysis model with dual-view clustering. In this model, we design Siamese autoencoder to construct a dual-view latent space and utilize dual-view clustering to discover potential sub-populations in survival data, achieving comprehensive representations of patient interconnections, thus assisting in survival prediction.
We design triple contrastive learning, treating the two views as augmentations of each other, and integrating clustering labels with dual-view representations to construct positive and negative sample pairs. This design leverages contrastive learning from different perspectives to enhance the model’s representational ability.
We employ self-paced learning in dual-view clustering, allowing the model to learn samples from easy to hard, thus avoiding the misleading effect of boundary samples.
We conduct extensive experiments on three widely used real-world clinical datasets. The experimental results validate the superiority of our proposed model.

2. Related Work

In traditional survival analysis, semi-parametric models based on statistics are primarily used for predicting risks. CoxPH [7], a representative of semi-parametric models, is one of the most widely used survival models. It is based on the proportional hazards assumption, which assumes the risks of different patients are proportionally distributed, and employs likelihood regression to capture the relationship between covariates and risk parameters. However, the proportional hazards assumption, being an artificial assumption, is considered overly strong and not reflective of reality. Moreover, traditional statistical models lack the ability to capture the complex nonlinear relationships between covariates and outcomes. These limitations hinder further development and application of these models. In recent years, with the development of machine learning, machine learning and deep learning algorithms are extensively developed and applied [21,22]. Some studies apply machine learning algorithms to enhance the survival analysis ability. Random Survival Forests (RSF) [23] was proposed to utilize the random forest algorithm [24] to construct an ensemble of decision trees for survival prediction. Additionally, refs. [25,26] employed SVM [27] and the Boosting algorithm in survival analysis.

With the further development of deep learning, an increasing number of deep survival models have been proposed. Deep semi-parametric models, represented by DeepSurv [13], are extensions of traditional models, using deep neural networks instead of likelihood regression to predict patient risk parameters. However, similar to traditional semi-parametric methods, these models also rely on the proportional hazards assumption and are limited by it. Moreover, deep fully parametric models use parametric distributions to directly fit survival distributions to avoid assumptions. DSM [15] estimates the survival function with a mixture of multiple Weibull and log-normal distributions, using neural networks to predict distribution parameters. Unlike semi-parametric and parametric approaches, deep non-parametric models directly use neural networks to model survival distributions, thus avoiding assumptions. DeepHit [14] uses multi-layer fully connected networks to predict the survival distribution over discrete times, being completely data-driven. Similarly, DRSA [28] utilizes RNNs instead of fully connected layers as the encoder and SurvTrace [29] relies on Transformer to encode patient covariates. However, the above deep survival models treat patients individually, neglecting associations among patients.

More recently, some studies have focused on mining the latent associations in survival data. Specifically, several models have been proposed to find potential sub-populations in survival data and use the associative information to enhance the predictive capability. SCA [16] combines clustering with survival analysis, predicting outcomes with the truncated Dirichlet process. The DCM model [17] utilizes the expectation–maximization (EM) algorithm to cluster patients. VaDeSC [18] utilizes stochastic gradient variational inference to estimate both the clustering results and survival outcomes. Additionally, DSACC [19] was proposed to deepen the use of clustering information during the survival prediction process. Specifically, in DSACC, patients are mapped into a latent space and clustered, and contrastive learning is utilized to optimize the representation of censored patients based on uncensored patients based on cluster labels, thereby enhancing the model’s representational ability. However, these methods are based on single-path encoders and overlook a comprehensive representation of patients. Moreover, these models directly cluster all samples simultaneously, neglecting the misleading influence of low-confidence boundary samples.

In recent years, deep multi-view clustering models have been widely proposed. DCCA [30] achieves clustering by learning nonlinear transformations of two views with deep canonical correlation analysis. DMJC [31] utilizes two deep multi-view frameworks to simultaneously learn multiple embeddings, multi-view fusion, and cluster prediction. ACMU [32] introduces an adaptive weighting strategy that applies simple constraints to heterogeneous views to measure their different contributions to consensus prediction. DCMVC [33] enhances the representation of shared information across views through transformer and contrastive learning. DCMVSC [34] introduces contrastive learning techniques and block diagonalization constraints to achieve multi-view deep subspace clustering and the integration of representation learning and clustering processes. Compared to the above models that focus on enhancing multi-view clustering capabilities, our proposal tends to use clustering as additional associative information and is guided by it to improve survival prediction. Moreover, while the above models focus on existing multi-view data, in this study, we attempt to construct dual-view representations from single-view patient data to achieve more comprehensive patient representation.

3. Methods

3.1. Problem Definition

Given

D = {δ_{i}}_{i = 1}^{N}

as a survival dataset with N patients in total, each patient sample can be denoted by a set of three elements, i.e.,

δ_{i} = (x_{i}, t_{i}, e_{i})

. Specifically,

x_{i} \in R^{D}

represents the D-dimensional covariates of patient i, and

t_{i}

is the observed time. The element

e_{i}

is an occurrence indicator of death. If patient i is not observed to have died during the study period, this patient is censored. In this case,

e_{i}

is set to 0, and

t_{i}

represents the censoring time of patient i. Otherwise, when patient i is explicitly observed to have died,

e_{i} = 1

and

t_{i}

denotes the specific survival time of the patient.

Survival analysis aims to predict the probability of an event of interest occurring based on patient covariates. In this study, we follow the discrete time set in DeepHit. Specifically, time is segmented into several time windows, i.e.,

{T_{1}, T_{2}, \dots, T_{max}}

. The objective of survival analysis is to predict the estimation of probability distributions of death within all discrete time windows, i.e.,

{{\hat{p}}_{i, t}}_{t = T_{1}}^{T_{max}} = {{\hat{p}}_{i, T_{1}}, {\hat{p}}_{i, T_{2}}, \dots, {\hat{p}}_{i, T_{max}}}

. Moreover, the cumulative survival function estimation can be formulated by

{\hat{S}}_{i, t} = 1 - \sum_{t = T_{1}}^{T_{max}} {\hat{p}}_{i, t}

3.2. Overall Architecture

In this study, we propose a novel DVC-Surv model. In this subsection, we first introduce the overall architecture of our proposed model. The detailed framework of DVC-Surv is presented in Figure 1. The DVC-Surv model comprises five modules, encompassing three network structure modules and two optimization algorithm modules. Specifically, Siamese autoencoder, dual-view clustering, and a survival backbone form the overall network structure of the DVC-Surv model. Furthermore, in the optimization procedure, triple contrastive learning and self-paced learning are employed to augment the overall representational ability of the model. Detailed descriptions of each module and the optimization procedure are presented in the subsequent subsections.

3.3. Siamese Autoencoder

The Siamese autoencoder is responsible for parallel mapping of the original patient covariates into two latent space representations. Inspired by [35], we employ two independent autoencoders for patient encoding instead of using traditional single-path encoding. This design helps the model extend individual patient features into two views and facilitates the subsequent utilization of contrastive learning techniques. Specifically, the Siamese autoencoder consists of two independent autoencoders with the same structure but without parameter sharing.

{A u t o e n c o d e r}^{(1)}

is composed of an encoder

f_{θ_{e n c}^{(1)}}^{(1)} (\cdot)

and a decoder

g_{θ_{d e c}^{(1)}}^{(1)} (\cdot)

, while

{A u t o e n c o d e r}^{(2)}

is composed of

f_{θ_{e n c}^{(2)}}^{(2)} (\cdot)

and

g_{θ_{d e c}^{(2)}}^{(2)} (\cdot)

, where

θ_{e n c}^{(1)}, θ_{d e c}^{(1)}, θ_{e n c}^{(2)}

, and

θ_{d e c}^{(2)}

are network parameters. Each set of encoder and decoder possesses a symmetrical structure. The encoding process can be formulated as follows:

z^{(1)} = f_{θ_{e n c}^{(1)}}^{(1)} (x), z^{(2)} = f_{θ_{e n c}^{(2)}}^{(2)} (x),

(1)

where

z^{(1)} \in R^{d}

and

z^{(2)} \in R^{d}

are d-dimensional latent representations in two views. To improve the representation ability of the two encoders, a reconstruction process is introduced in Siamese autoencoder. Specifically, two decoders are employed to reconstruct the original representations from the two latent spaces. The decoding process is as follows:

{\hat{x}}^{(1)} = g_{θ_{d e c}^{(1)}}^{(1)} (z^{(1)}), {\hat{x}}^{(2)} = g_{θ_{d e c}^{(2)}}^{(2)} (z^{(1)}),

(2)

where

{\hat{x}}^{(1)} \in R^{D}

and

{\hat{x}}^{(2)} \in R^{D}

represent the reconstruction vectors of the two views, respectively. We apply reconstruction loss to supervise the encoding process of the Siamese autoencoder. The reconstruction loss function can be represented by the mean squared error (MSE) between the original features and the reconstructed features. The detailed reconstruction loss function is as follows:

L_{R E C} = \frac{1}{N} \sum_{i = 1}^{n} \sum_{v = 1}^{2} {∥{\hat{x}}_{i}^{(v)} - x_{i}∥}_{2}^{2} .

(3)

3.4. Dual-View Clustering

After the encoding process, the latent representations in two views are obtained. Thus, we conduct clustering in two such latent spaces. Specifically, the latent representations from the two views are concatenated into a global representation, i.e.,

Z = [Z^{(1)}; Z^{(2)}] \in R^{2 d \times N}

, where

Z^{(v)}

denotes the concatenation of the latent representations of all patients in v view, i.e.,

Z^{(v)} = [z_{1}^{(v)}, \dots, z_{N}^{(v)}] \in R^{d \times N}

[\cdot; \cdot]

and

[\cdot, \cdot]

represent vertical and horizontal concatenation, respectively. The K-means algorithm [36] is conducted in the global representation space to initialize the cluster centers and assignments. Cluster centers can be denoted by

M = [M^{(1)}; M^{(2)}] \in R^{2 d \times K}

with

M^{(v)} = [m_{1}^{(v)}, \dots, m_{K}^{(v)}] \in R^{d \times K}

, where K is the number of potential sub-populations in the survival data. Cluster assignments are represented by

S = {s_{i, k}}_{i = 1, k = 1}^{N, K}

, where

s_{i, k} = 1

only if patient i belongs to the k-th cluster. We train the dual-view clustering module by minimizing the mean square error (MSE) between the latent representations and the center of the corresponding cluster. The cluster loss function can be formulated as follows:

L_{D V C} = \frac{1}{N} \sum_{i = 1}^{N} \sum_{v = 1}^{2} {∥z_{i}^{(v)} - M^{(v)} s_{i}∥}_{2}^{2} .

(4)

3.5. Self-Paced Learning

In deep learning, different samples often present varying degrees of learning difficulty for models. For instance, in clustering, there are typically marginal samples located near the boundaries of clusters that fail to provide accurate guidance for model training. Therefore, in the proposed model, we employ self-paced learning (SPL) [37] to achieve progressive model training. SPL is a simple and effective sample learning strategy that ensures the model learns gradually from easy to hard tasks. Specifically, in each learning round, SPL prioritizes the inclusion of samples with high confidence and excludes unclear marginal samples, which helps the model to more stably acquire high-quality knowledge. Note that SPL is only conducted in the reconstruction and clustering processes of the proposed model, since the loss of different samples in survival prediction and triplet contrastive learning exhibit differences, making it difficult to define the confidence of sample learning. Therefore, the loss function of SPL can be defined as a weighted combination of reconstruction loss

L_{R E C}

and clustering loss

L_{D V C}

, which can be specifically formulated as follows:

L_{S P L} = \frac{1}{N} \sum_{i = 1}^{N} \sum_{v = 1}^{2} w_{i} (α_{R E C} {∥{\hat{x}}_{i}^{(v)} - x_{i}∥}_{2}^{2} + α_{D V C} {∥z_{i}^{(v)} - M^{(v)} s_{i}∥}_{2}^{2}),

(5)

where

α_{R E C}

and

α_{D V C}

are hyperparameters, and

w_{i} \in {0, 1}

is the SPL weight of patient i, which is determined by a threshold parameter

λ

as follows:

w_{i} = \{\begin{matrix} 1, L_{i} \leq λ \\ 0, otherwise \end{matrix} .

(6)

L_{i}

is the instance-level reconstruction and clustering loss value of patient i as follows:

L_{i} = α_{R E C} {∥{\hat{x}}_{i}^{(v)} - x_{i}∥}_{2}^{2} + α_{D V C} {∥z_{i}^{(v)} - M^{(v)} s_{i}∥}_{2}^{2} .

(7)

Following [38], we utilize a statistics-based adaptive method to update

λ

in each round of training. Thus,

λ

can be formulated as follows:

λ = μ (L_{i}) + \frac{E}{E_{max}} σ (L_{i}) .

(8)

μ (L_{i})

and

σ (L_{i})

denote the average and standard deviation of

L_{i}

, respectively. E is the number of current training steps, and

E_{m a x}

is the total training step number. As training progresses, SPL gradually selects more samples from easy to hard for the model to learn, thereby achieving progressive and stable training.

3.6. Triple Contrastive Learning

Recently, contrastive learning has been widely applied in deep learning; it enhances the model’s representational ability by constructing positive and negative sample pairs, drawing closer the representations of positive pairs and pushing apart those of negative pairs. Based on the dual-view representations and clustering results given by the Siamese autoencoder and dual-view clustering modules, to further enhance the representation in the latent spaces corresponding to the two views and the capability of the model to mine latent information about patients, a triple contrastive learning module is designed. Specifically, as shown in Figure 2, we design three contrastive learning strategies, i.e., intra-view cluster-guided contrastive learning, inter-view instance-wise contrastive learning, and inter-view cluster-wise contrastive learning. Detailed descriptions of each strategy follow.

3.6.1. Intra-View Cluster-Guided Contrastive Learning

Following DSACC [19], we design intra-view cluster-guided contrastive learning. Censored patients commonly exist in survival analysis, and their precise survival outcomes are undetermined. Therefore, based on the results of dual-view clustering, we utilize cluster assignments as pseudo-labels and uncensored patients with clearly observed outcomes as anchors to supervise the learning for censored patients. Specifically, as shown in Figure 2a, within each view, each censored patient forms a positive pair with an uncensored patient from the same cluster, and a negative pair with an uncensored patient from a different cluster. Such a design leverages the explicit supervisory information of uncensored patients to enhance the representation for censored patients who lack supervisory information. This design allows the distribution of censored samples in the latent space to more closely resemble that of similar uncensored samples, thereby achieving representation enhancement based on limited supervisory information. The InfoNCE loss is employed to implement intra-view cluster-guided contrastive learning. The specific loss function is as follows:

L_{I V C G} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{v = 1}^{2} (1 - e_{i}) \sum_{j = 1}^{N} 1_{e_{j} = 1, s_{i} = s_{j}} log \frac{exp (cos (z_{i}^{(v)}, z_{j}^{(v)}) / τ)}{\sum_{m}^{N} 1_{e_{j} = 1, s_{i} \neq s_{j}} exp (cos (z_{i}^{(v)}, z_{m}^{(v)}) / τ)},

(9)

where

cos (\cdot, \cdot)

indicates the cosine similarity of two vectors and

τ

is the temperature parameter, which are set as hyperparameters. The condition

s_{i} \neq s_{j}

restricts the denominator to be composed entirely of negative pairs. Such a design further enhances the proximity of censored samples to the anchors within the corresponding cluster, thereby ensuring the model’s representational capability for them.

3.6.2. Inter-View Instance-Wise Contrastive Learning

In encoding, we employ Siamese autoencoder instead of the traditional data augmentation of contrastive learning. This design avoids the employment of low-quality and inaccurately supervised pseudo-samples. As shown in Figure 2b, based on two latent spaces constructed by the Siamese autoencoder, we design inter-view instance-wise contrastive learning to enhance the representational consistency between two views. Specifically, the latent representations of the same sample in two views should be considered as a positive pair, while the latent representations of different samples are defined as a negative pair. Such a design can enhance the consistency and complementarity of the model. The specific loss function is as follows:

L_{I} V I W = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{v = 1}^{V} log \frac{exp (cos (z_{i}^{(v)}, z_{i}^{(v^{'})}) / τ)}{\sum_{j \neq i}^{N} exp (cos (z_{i}^{(v)}, z_{j}^{(v^{'})}) / τ)},

(10)

where

v^{'}

indicates the different view of view v.

3.6.3. Inter-View Cluster-Wise Contrastive Learning

In the dual-view clustering module, the K-means algorithm is utilized to calculate cluster centers and cluster assignments. Beyond the basic hard clustering assignments, we also design soft labels for clustering to indicate clustering confidence, as shown in Figure 2c. We adopt Student’s t-distribution to measure the clustering confidence by calculating the similarity between the latent representations of each sample and cluster center. The pseudo-label of clustering

Q = {q_{i, k}^{(v)}}_{i, k, v}^{N, K, 2}

can be calculated as follows:

q_{i, k}^{(v)} = \frac{{(1 + {∥z_{i}^{(v)} - m_{k}^{(v)}∥}_{2}^{2} / α)}^{- \frac{α + 1}{2}}}{\sum_{k^{'} \neq k}^{K} {(1 + {∥z_{i}^{(v)} - m_{k^{'}}^{(v)}∥}_{2}^{2} / α)}^{- \frac{α + 1}{2}}},

(11)

where

α

is the degrees of freedom of Student’s t-distribution, and

q_{i, k}^{(v)}

represents the confidence of sample i belonging to cluster k in view v.

Based on the calculated clustering confidence, we design inter-view cluster-wise contrastive learning to further enhance the consistency of clustering. Specifically, the objective is to ensure that different views of the same cluster are as similar as possible, while different clusters across all views are as distinct as possible. Based on this design, we concatenate the confidence of all samples in view v for a specific cluster k into a vector

q_{k}^{(v)}

as the cluster feature, then consider the features of the same cluster in two views as a positive pair, and the features of different clusters as negative pairs. The specific loss function can be formulated as follows:

L_{I V C W} = - \sum_{v \neq v^{'}}^{2} \sum_{k}^{K} log \frac{exp (cos (q_{k}^{(v)}, q_{k}^{(v^{'})}) / τ)}{\sum_{k^{'} \neq k}^{K} \sum_{v^{″}}^{2} exp (cos (q_{k}^{(v^{″})}, q_{k^{'}}^{(v^{'})}) / τ)} .

(12)

The total loss function of triple contrastive learning can be formulated as follows:

L_{T C L} = α_{I V C V} L_{I V C V} + α_{I V I W} L_{I V I W} + α_{I V C W} L_{I V C W},

(13)

where

α_{I V C V}, α_{I V I W}

, and

α_{I V C W}

are all hyperparameters.

3.7. Survival Backbone

Based on the results of the Siamese autoencoder and dual-view clustering, we utilize the fused representations of two latent spaces and patient covariates for the downstream survival analysis task. The fused representations

h

can be calculated as follows:

h = concatenate (\frac{1}{2} (z^{(1)} + z^{(2)}), x),

(14)

where

concatenate (\cdot, \cdot)

is the concatenation of two vectors.

The same as in DeepHit [14], a feed-forward network

g_{θ^{(s u r v)}}^{(s u r v)}

composed of multi-layer fully connected layers and a Softmax layer is employed to conduct survival prediction base on the above fused representations. The survival prediction module takes fused representations as input and outputs the estimation probability distribution of death as follows:

{{\hat{p}}_{t}}_{t = T_{1}}^{T_{max}} = g_{θ^{(s u r v)}}^{(s u r v)} (h),

(15)

and the estimation of the cumulative survival function can be calculated by

{\hat{S}}_{i, t} = 1 - \sum_{t = T_{1}}^{T_{max}} {\hat{p}}_{i, t}

The negative log-likelihood (NLL) and ranking loss function are utilized to supervise the training of the survival backbone. The NLL loss encourages the prediction to approximate the ground truth by maximizing the likelihood of patient survival, specifically maximizing the CIF estimates before the censoring of censored patients and the probability estimates of uncensored patients. The ranking loss enhances the model’s discriminative performance by maximizing the difference in survival estimates between comparable sample pairs. The survival loss function can be formulated as follows:

L_{N L L} = - \frac{1}{N} \sum_{i = 1}^{N} [e_{i} log ({\hat{p}}_{i, t_{i}}) + (1 - e_{i}) log ({\hat{S}}_{i, t_{i}})],

(16)

L_{R A N K} = \sum_{i \neq j}^{N} 1_{e_{i} = 1, t_{i} < t_{j}} exp (\frac{{\hat{S}}_{i, t_{i}} - {\hat{S}}_{j, t_{i}}}{σ}),

(17)

L_{S U R V} = α_{N L L} L_{N L L} + L_{R A N K},

(18)

where

σ

and

α_{N L L}

are hyperparameters.

3.8. Loss Functions and Optimization

The proposed model is trained in two stages: pre-training and training. Similar to DSACC [19], during the pre-training stage, the Siamese autoencoder and survival backbone are trained to obtain the initial latent representations. Subsequently, in the training stage, all modules are activated. During training, clusters and network parameters are alternately updated. The specific optimization steps are as follows:

(1) Pre-training: In the pre-training stage, we utilize the reconstruction and survival loss to train the model. The optimization problem in the pre-training stage is as follows:

min_{Θ} (α_{R E C} L_{R E C} + L_{S U R V}),

(19)

where

Θ = {θ_{e n c}^{(1)}, θ_{d e c}^{(1)}, θ_{e n c}^{(2)}, θ_{d e c}^{(2)}, θ^{(s u r v)}}

is the set of all network parameters.

(2) Initialize clusters: After the pre-training stage, the cluster centers

M

and assignments

S

are initialized by K-means. The cluster centers remain fixed until the end of training.

(3) Update network parameters: In training stage, with the cluster centers

M

and assignments

S

fixed, all loss functions are utilized to update the network parameters. The optimization problem is as follows:

min_{Θ} (L_{S P L} + L_{T C L} + L_{S U R V}) .

(20)

(4) Update cluster assignments: With all network parameters

Θ

fixed, cluster assignments

S

are updated by minimizing the clustering loss function. The optimization problem is as follows:

min_{S} L_{D V C} .

(21)

(5) Update SPL weights: With network parameters

Θ

and cluster assignments

S

fixed, the SPL weights are updated by Equations (6)–(8).

The overall optimization procedure of the proposed model is shown in Algorithm 1.

Algorithm 1 Optimization procedure of DVC-Surv

Input:: $D = {(x_{i}, t_{i}, e_{i})}_{i = 1}^{N}$ : survival dataset with N patients; K: the number of potential clusters; $A = {α_{R E C}, α_{D V C}, α_{I V C G}, α_{I V I W}, α_{I V C W}, α_{N L L}}$ : the weights of loss functions; $E_{m a x}$ : the number of total iterations; $τ$ : the temperature parameter; $α$ : the freedom degree of Student’s t-distribution; $σ$ : the parameter of ranking loss.
Output:: $Θ = {θ_{e n c}^{(1)}, θ_{d e c}^{(1)}, θ_{e n c}^{(2)}, θ_{d e c}^{(2)}, θ^{(s u r v)}}$ : network parameters.

1:: // Pretraining
2:: Initialize network parameters $Θ$ by Equation (19);
3:: // Initializing
4:: Initialize cluster centers $M$ and assignments $S$ by conducting K-means in latent space;
5:: // Training
6:: for E in ${1, 2, \dots, E_{m a x}}$ do
7:: Update network parameters $Θ$ by Equation (20);
8:: Update cluster assignments $S$ by Equation (21);
9:: Update SPL weights $w$ by Equation (6);
10:: Update SPL threshold $λ$ by Equation (8);
11:: end for
12:: return network parameters $Θ$ .

4. Experiments and Discussion

4.1. Datasets

To evaluate the performance of the proposed model, three widely used real-world clinical datasets are employed in the experiments:

METABRIC: [39] The Molecular Taxonomy of Breast Cancer International Consortium dataset includes nine columns of patient covariates and encompasses 1980 patients. Of these, 1143 patients are observed to have died during the study period, while the remaining 837 patients are censored.

GBSG: [40,41] GBSG is a compilation of breast cancer patient data from the Rotterdam tumor bank and the German Breast Cancer Study Group, including seven covariates. The GBSG dataset contains 2232 patients, with 1266 patients tracked to death before the end of the study, and the other 966 patients are censored.

SUPPORT: [42] The Study to Understand Prognoses Preferences Outcomes and Risks of Treatment is a study estimating 180-day survival for severely ill hospitalized adults. The SUPPORT dataset includes fourteen patient covariates and comprises 9105 patients, with 6201 patients observed to have died and the remaining 2904 patients censored.

4.2. Baseline Models

To validate the advance of the proposed model, the following models are employed as baseline models: Cox Proportional Hazards (CoxPH) Model, Random Survival Forests (RSFs), DeepSurv, DeepHit, Deep Survival Machines (DSM), and DSACC. Detailed descriptions of these models follow.

CoxPH: [7] CoxPH is a representative traditional semi-supervised survival analysis model that is based on the proportional hazards assumption and uses likelihood regression to predict individual patient survival risks.

RSF: [23] RSF is an ensemble survival analysis model based on decision trees; it is an extension of the random forest algorithm in survival analysis. RSF predicts patient risks by learning an ensemble of decision trees.

DeepSurv: [13] DeepSurv is an extension of the CoxPH model with deep learning. Similar to CoxPH, DeepSurv also relies on the proportional hazards assumption. DeepSurv utilizes deep neural networks instead of likelihood regression to predict patient risks.

DeepHit: [14] DeepHit is a deep non-parametric survival model that directly predicts the survival distribution over discrete time through fully connected networks, without relying on any assumptions and being entirely data-driven.

Deep Survival Machines: [15] A DSM is a deep fully parametric survival model that models the survival distribution with a mixture of multi-parametric distributions and predicts the parameters of these distributions with networks.

DSACC: [19] DSACC is a deep non-parametric survival model that discovers latent clusters in survival data through latent clustering and improves the model’s learning of censored patients with contrastive learning based on cluster labels.

4.3. Evaluation Metrics

Two commonly used survival analysis evaluation metrics, namely, the Concordance Index (C-index) [43,44] and the Brier score [45], are utilized in this study to assess the performance of the proposed and other baseline models.

The C-index evaluates the discriminative ability of a model by measuring the consistency between the model-predicted relative risks of sample pairs and the ground truth. The C-index can be defined as

C = Pr {{\hat{S}}_{i, t_{i}} < {\hat{S}}_{j, t_{i}} | e_{i} = 0, t_{i} < t_{j}} = \frac{\sum_{i \neq j}^{N} 1 (e_{i} = 0, t_{i} < t_{j}) \cdot 1 ({\hat{S}}_{i, t_{i}} < {\hat{S}}_{j, t_{i}})}{\sum_{i \neq j}^{N} 1 (e_{i} = 0, t_{i} < t_{j})},

(22)

where the upper part indicates the number of correctly predicted pairs and the lower part is the number of total comparable pairs. The range of C-index is from 0 to 1, and a higher C-index value indicates higher discriminative performance.

The Brier score is a widely used binary classification evaluation metric that measures calibration performance by calculating the mean squared error between probability estimates and the ground truth. Previous research [45] extended the Brier score to survival analysis and introduced the integrated Brier score (IBS), which can be written as

I B S = \frac{1}{N} \sum_{t = 1}^{T_{m a x}} \sum_{i = 1}^{N} [\frac{1 (t_{i} \leq t, e_{i} \neq 0) {({\hat{S}}_{i, t})}^{2}}{\hat{G} (t_{i})} + \frac{1 (t_{i} > t) {(1 - {\hat{S}}_{i, T_{max}})}^{2}}{\hat{G} (t_{i})}] .

(23)

The range of the IBS is from 0 to 1, and a lower IBS indicates a higher calibration performance.

4.4. Experimental Setup

We employ five-fold cross-validation to validate the performance of the proposed model and all other baseline models. Specifically, the data is evenly divided into five parts, with each part taking turns serving as the test set while the other four serve as the training set. In this study, we report the mean and standard deviation of the C-index and IBS of the five-fold cross-validation of all models.

The proposed model is trained with Adam and backpropagation. The batch size is set to 256, and the learning rate is 0.001. All fully connected layers in the proposed model take ReLU as the activation function. We utilize the grid search algorithm on a fixed validation set randomly selected from the data in each dataset to find the optimal combination of all other hyperparameters in the three datasets for our proposed model and all other baseline models. Specifically, for each model,

20 %

of the data in each dataset are randomly selected to form a validation set, and all hyperparameter tuning for that model on that dataset is conducted on this validation set. Once the optimal parameter combination is obtained, the remaining data are utilized to perform five-fold cross-validation based on such a parameter combination, thereby obtaining the performance metrics of the model. The specific hyperparameters, including their search ranges and the final values used in the experiments, are presented in Table 1. All experiments are conducted on Python

3.10 . 15

, Pytorch

1.12 . 1

, and Pycox

0.2 . 3

[46].

4.5. Results and Analysis

We compare the performance of our proposed model with other baseline models on three widely used survival datasets in terms of C-index and IBS. Table 2 and Table 3 present the C-index and IBS values for each model on the METABRIC, GBSG, and SUPPORT datasets, reflecting the discriminative and calibration performance, respectively. In each column of the tables, the best-performing model is marked in bold, and the second-best model is underlined. The experimental results indicate that the proposed model outperforms existing models in both discriminative and calibration performance.

On one hand, in terms of discriminative performance, Table 2 represents the C-index values of all models on the three datasets. According to the experimental results, the proposed DVC-Surv model achieves the best C-index performance on all three datasets. Specifically, on the METABRIC dataset, the DVC-Surv model achieves the highest C-index value of

0.6741

, surpassing the best of the other baseline models, DSACC. Similarly, in the GBSG dataset, the DVC-Surv model exceeds DSACC with a C-index of

0.6818

and ranking first. On the SUPPORT dataset, the C-index of the DVC-Surv model reaches

0.6415

, surpassing all other existing models. Overall, DVC-Surv achieves the highest C-index across the three datasets, with DSACC consistently ranking second. In terms of average C-index performance of all three datasets, DVC-Surv ranks first, with

0.6653

. In summary, our proposed model achieves the most advanced discriminative performance, indicating that the model can effectively predict the risk differences between different samples.

On the other hand, in terms of calibration performance, shown in Table 3, the proposed model achieves one best and two second-best results on the three datasets, respectively. Specifically, on the METABRIC dataset, the proposed model ranks first, with the lowest IBS value of

0.1603

. On the GBSG dataset, the DVC-Surv model ranks second, with only a

0.01

percentage difference from the first-ranked DSACC. On the SUPPORT dataset, DVC-Surv ranks second, with an IBS value of

0.1929

, just behind RSF. In terms of average IBS, DVC-Surv emerges as the top model with

0.1786

, indicating that it possesses the most advanced calibration performance and can accurately predict individual survival distributions.

Overall, the proposed model achieves optimal discriminative and calibration performance. The performance of the DVC-Surv model significantly surpasses the traditional CoxPH model. Among the deep learning competitors, DeepHit and DSACC exhibit relatively superior performance. This is because DeepHit, with its fully data-driven design, avoids manual assumptions, while DSACC incorporates latent clustering and contrastive learning. Compared to these two models, especially DSACC, our proposed model additionally employs Siamese autoencoders, extending patient representation to a dual-representation pathway. Furthermore, it introduces dual-view clustering, triplet contrastive learning, and self-paced learning, enhancing the model’s representational capability from multiple perspectives based on DSACC, thereby achieving the best performance.

4.6. Ablation Study

To further validate the contribution of each module, we conduct an ablation study. Specifically, a set of variant models of the DVC-Surv model are designed, from which a certain specific module or modules are removed. By comparing the performance differences between these variant models and the original DVC-Surv model, the specific contribution of each module can be verified. The specific designs of the variant models are as follows:

{Model}_{0}

: This model is the DSACC model. Considering that the proposed model is an improvement based on DSACC, this model is utilized as the baseline for the ablation study.

{Model}_{1}

: The Siamese autoencoder is removed from this variant model, which results in the degradation from two views to a single view. Additionally, due to the lack of dual-view representations, the IVIW and IVCW loss functions in the triple contrastive learning are also removed. Compared to DSACC, this variant model additionally employs SPL, and employs NLL rather than the RPS loss function.

{Model}_{2}

: The entire triple contrastive learning module, including IVCV, IVIW, and IVCW, is removed from this variant model. Other parts of this variant model remain consistent with the DVC-Surv model.

{Model}_{3}

: The IVIW loss function is removed from this variant model. Other parts of this variant model remain consistent with the DVC-Surv model.

{Model}_{4}

: The IVCW loss function is removed from this variant model. Other parts of this variant model remain consistent with the DVC-Surv model.

{Model}_{5}

: The SPL module is removed from this variant model. Other parts of this variant model remain consistent with the DVC-Surv model.

Table 4 and Table 5 display the results of the ablation study comparing the C-index and IBS performance of the DVC-Surv model with other variant models. Specifically, the results of

{Model}_{1}

confirm the contributions of SPL and the Siamese encoder. Its overall discriminative performance is slightly superior to that of DSACC, which serves as

{Model}_{0}

, due to the additional SPL module that enables

{Model}_{1}

to start learning from easier samples, avoiding the interference of ambiguous samples, thereby achieving higher discriminative performance. However, the calibration performance of

{Model}_{1}

is generally weaker than

{Model}_{0}

, which may be due to the RPS loss used by DSACC that can better optimize calibration performance, while the NLL loss can more evenly optimize both metrics. Compared to the DVC-Surv model, both the discriminative and calibration performance of

{Model}_{1}

are inferior, validating the necessity of Siamese autoencoder.

Furthermore, the results of

{Model}_{2}

{Model}_{4}

validate the effectiveness of triple contrastive learning. The performance of

{Model}_{2}

is among the lowest, whether compared to DSACC or DVC-Surv, due to its lack of triple contrastive learning. Both

{Model}_{3}

and

{Model}_{4}

show improved discriminative and calibration performance compared to DSACC. Considering that DSACC also utilizes contrastive learning, i.e., the IVCG loss in this study, while

{Model}_{3}

and

{Model}_{4}

employ additional IVCW and IVIW losses, respectively, compared to DSACC, this validates the effectiveness of these two loss functions. Moreover, compared to the DVC-Surv model,

{Model}_{3}

and

{Model}_{4}

lack the IVIW and IVCW losses, respectively, and the performance of both models is lower than the DVC-Surv model, which also illustrates the contributions of IVIW and IVCW.

Lastly, the experimental results of

{Model}_{5}

further demonstrate the contributions of the SPL strategy and triple contrastive learning module. Compared to

{Model}_{0}

{Model}_{4}

{Model}_{5}

exhibits superior discriminative and calibration performance, as it is the only variant model that possesses a complete triple contrastive learning module. However, compared to the DVC-Surv model, the performance of

{Model}_{5}

is lower, due to its lack of the SPL module. Overall, the experimental results of the ablation study thoroughly validate the contributions and effectiveness of all modules proposed in this study, including Siamese autoencoder, SPL, and triple contrastive learning.

4.7. Visualization Analysis

In this study, we integrate multi-view clustering, contrastive learning, and survival analysis to achieve precise patient survival prediction. To further demonstrate the clustering results within the proposed model, a clustering visualization analysis is conducted. Figure 3 presents the dual-view clustering results of the proposed model on three datasets, specifically including the clustering results of two views after the pre-training and training stages. In each figure, different clusters are marked with different colors, while censored and uncensored patients are denoted by ‘×’ and ‘·’, respectively. The tSNE algorithm [47] is utilized for visualization. In practice, we select different K values for different datasets through hyperparameter tuning. The number of latent clusters is set to three in MRTABRIC and GBSG, and to four in SUPPORT. Firstly, based on the visualization results, the two views of the model learned similar yet distinct latent space distributions, thus providing comprehensive complementary information. Secondly, after pre-training, the clustering distribution could be preliminarily initialized, while at the end of training, the clustering information was further fully mined. Lastly, the visualization analysis demonstrates that the proposed model effectively accomplished latent clustering discovery for the survival data.

4.8. Interpretability Analysis

To better explain the basis on which the proposed model makes survival predictions, thereby helping clinicians and patients understand and trust our proposed DVC-Surv model, an interpretability analysis is designed and implemented. Specifically, we employ the SHAP algorithm [48] to analyze the importance of features relied upon during model predictions. Figure 4 presents the feature importance ranking of the proposed model on three datasets. In the METABRIC and GBSG breast cancer datasets, tumor grading and cellularity are considered the most important features, which aligns with existing clinical research. Specifically, existing clinical research indicate that tumor grading and cellularity can reflect the severity of breast cancer patients [49]. Additionally, in the SUPPORT dataset, which targets severely ill hospitalized adults, WBC and PaO₂/FiO₂ are considered the most important features. Similarly, previous clinical studies confirm that in scenarios such as the ICU, WBC and PaO₂/FiO₂ can, respectively, reflect the level of inflammation and oxygenation in patients, representing the patient’s condition and thus being closely related to the outcome [50,51]. In summary, the proposed model can accurately identify clinically important factors and make reliable survival predictions.

4.9. Computational Resource Analysis

To gain a more intuitive understanding of the model’s efficiency, specifically the computational resources during training and inference, we conducted a computational resource analysis. Specifically, we selected three deep survival analysis models, including DVC-Surv, and conducted experiments on an NVIDIA RTX 3090 GPU to compare their computational resource consumption. To obtain more accurate results, we utilized the SUPPORT dataset, which has the highest number of patients and the largest feature dimensions among the three datasets. The results of the computational resource analysis are presented in Table 6.

On one hand, in terms of GPU memory usage, DeepHit, DSACC, and DVC-Surv require 969 MB, 1199 MB, and 1227 MB, respectively. DeepHit has the simplest network structure, hence it has the lowest GPU memory usage. DSACC, which introduces an autoencoder and a reconstruction module in addition to DeepHit, shows an increase in memory consumption. DVC-Surv, which extends the number of autoencoders from one to two on the basis of DSACC, has the highest memory usage. However, considering that the memory usage of all models is acceptable, the memory consumption of DVC-Surv does not significantly impact its applicability. On the other hand, in terms of training and inference time, DeepHit has the fastest speed. DSACC and DVC-Surv, due to their more complex network structures and loss functions, show an increase in training and inference time. Considering that the training time for every ten epochs for all three models is within 10 s, and the inference time is even less than 1 s, such differences do not reduce the practicality of the proposed DVC-Surv model. Overall, the DVC-Surv model exhibits the best performance, and its increase in computational resource consumption compared to existing models is not significant; thus, it has good practical value.

5. Conclusions

In this paper, we propose the DVC-Surv model to discover latent sub-populations in survival data and utilize the inter-patient associations to guide survival prediction. Firstly, we propose Siamese autoencoder, instead of the single-path encoder utilized in existing studies, to more comprehensively represent patients from two different views. Secondly, we extend clustering to dual views to fully explore potential sub-populations. Thirdly, based on the clustering results and dual-view representations, we design triplet contrastive learning to leverage such information and enhance the model’s survival prediction capability. Lastly, we employ self-paced learning in the dual-view clustering process. Self-paced learning helps the model to progressively learn from easy to hard, prioritizing the inclusion of high-confidence samples in training, thereby reducing the impact of boundary samples. Compared to existing studies, our proposed model achieves superior performance on three widely used clinical datasets, namely, METABRIC, GBSG, and SUPPORT. Specifically, on three datasets, DVC-Surv achieves C-indexes of 0.6736, 0.6818, and 0.6415, and IBS values of 0.1603, 0.1827, and 0.1929, respectively, all surpassing the existing state of the art and achieving optimal performance. Moreover, an ablation study further validates the effectiveness of the proposed modules. Additionally, we employ tSNE to visually display the results of the dual-view clustering. In the interpretability analysis, we utilize the SHAP algorithm to present feature importance, showing that the decision making of our proposed model is consistent with clinical knowledge. Future work will include extending the proposed model to multimodal survival analysis.

Author Contributions

Conceptualization, C.C. and Y.T.; methodology, C.C. and Y.T.; software, C.C.; validation, C.C.; formal analysis, C.C.; investigation, C.C.; resources, C.C.; data curation, C.C.; writing—original draft preparation, C.C.; writing—review and editing, Y.T.; visualization, C.C.; supervision, W.Z.; project administration, W.Z.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Technology of China under Grant 2022YFB2703300 and the National Natural Science Foundation of China (62106266, U22B2048).

Data Availability Statement

The data presented in this study are openly available in: METABRIC at https://doi.org/10.1038/nature10983, reference [39]; GBSG at https://aacrjournals.org/cancerres/article/60/3/636/507065/The-Urokinase-System-of-Plasminogen-Activation-and (accessed on 8 December 2024), https://doi.org/10.1200/JCO.1994.12.10.2086, references [40,41]; SUPPORT at https://doi.org/10.7326/0003-4819-122-3-199502010-00007, reference [42].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wiegrebe, S.; Kopper, P.; Sonabend, R.; Bischl, B.; Bender, A. Deep learning for survival analysis: A review. Artif. Intell. Rev. 2024, 57, 65. [Google Scholar] [CrossRef]
Arroyo, A.; Cartea, A.; Moreno-Pino, F.; Zohren, S. Deep attentive survival analysis in limit order books: Estimating fill probabilities with convolutional-transformers. Quant. Financ. 2024, 24, 35–57. [Google Scholar] [CrossRef]
Ahmed, J.; Green, R.C., II. Leveraging survival analysis in cost-aware deepnet for efficient hard drive failure prediction. Neural Comput. Appl. 2024, 1–16. [Google Scholar] [CrossRef]
Li, S.; Yan, Y.; Ji, Y.; Peng, W.; Wan, L.; Zhang, P. Survivability Mapping Strategy for Virtual Wireless Sensor Networks for Link Failures in the Internet of Things. Electronics 2023, 12, 2498. [Google Scholar] [CrossRef]
Wang, S.; Dong, D.; Li, L.; Li, H.; Bai, Y.; Hu, Y.; Huang, Y.; Yu, X.; Liu, S.; Qiu, X.; et al. A deep learning radiomics model to identify poor outcome in COVID-19 patients with underlying health conditions: A multicenter study. IEEE J. Biomed. Health Inform. 2021, 25, 2353–2362. [Google Scholar] [CrossRef]
Ai, D.; Cui, C.; Tang, Y.; Wang, Y.; Zhang, N.; Zhang, C.; Zhen, Y.; Li, G.; Huang, K.; Liu, G.; et al. Machine learning model for predicting physical activity related bleeding risk in Chinese boys with haemophilia A. Thromb. Res. 2023, 232, 43–53. [Google Scholar] [CrossRef]
Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. (Methodol.) 1972, 34, 187–202. [Google Scholar] [CrossRef]
Germer, S.; Rudolph, C.; Labohm, L.; Katalinic, A.; Rath, N.; Rausch, K.; Holleczek, B.; Handels, H.; AI-CARE Working Group. Survival analysis for lung cancer patients: A comparison of Cox regression and machine learning models. Int. J. Med. Inform. 2024, 191, 105607. [Google Scholar] [CrossRef]
Shintani, A.K.; Girard, T.D.; Eden, S.K.; Arbogast, P.G.; Moons, K.G.; Ely, E.W. Immortal time bias in critical care research: Application of time-varying Cox regression for observational cohort studies. Crit. Care Med. 2009, 37, 2939–2945. [Google Scholar] [CrossRef]
Alzubaidi, L.; Fadhel, M.A.; Al-Shamma, O.; Zhang, J.; Duan, Y. Deep learning models for classification of red blood cells in microscopy images to aid in sickle cell anemia diagnosis. Electronics 2020, 9, 427. [Google Scholar] [CrossRef]
Paul, A.; Tajin, M.A.S.; Das, A.; Mongan, W.M.; Dandekar, K.R. Energy-efficient respiratory anomaly detection in premature newborn infants. Electronics 2022, 11, 682. [Google Scholar] [CrossRef] [PubMed]
Acharya, S.; Mongan, W.M.; Rasheed, I.; Liu, Y.; Anday, E.; Dion, G.; Fontecchio, A.; Kurzweg, T.; Dandekar, K.R. Ensemble learning approach via kalman filtering for a passive wearable respiratory monitor. IEEE J. Biomed. Health Inform. 2018, 23, 1022–1031. [Google Scholar] [CrossRef] [PubMed]
Katzman, J.L.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 2018, 18, 24. [Google Scholar] [CrossRef]
Lee, C.; Zame, W.; Yoon, J.; Van Der Schaar, M. Deephit: A deep learning approach to survival analysis with competing risks. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Nagpal, C.; Li, X.; Dubrawski, A. Deep survival machines: Fully parametric survival regression and representation learning for censored data with competing risks. IEEE J. Biomed. Health Inform. 2021, 25, 3163–3175. [Google Scholar] [CrossRef]
Chapfuwa, P.; Li, C.; Mehta, N.; Carin, L.; Henao, R. Survival cluster analysis. In Proceedings of the ACM Conference on Health, Inference, and Learning, Toronto, ON, Canada, 2–4 April 2020; pp. 60–68. [Google Scholar]
Nagpal, C.; Yadlowsky, S.; Rostamzadeh, N.; Heller, K. Deep Cox mixtures for survival regression. In Proceedings of the Machine Learning for Healthcare Conference, Virtual, 6–7 August 2021; pp. 674–708. [Google Scholar]
Manduchi, L.; Marcinkevičs, R.; Massi, M.C.; Weikert, T.; Sauter, A.; Gotta, V.; Müller, T.; Vasella, F.; Neidert, M.C.; Pfister, M.; et al. A Deep Variational Approach to Clustering Survival Data. arXiv 2021, arXiv:2106.05763. [Google Scholar]
Cui, C.; Tang, Y.; Zhang, W. Deep Survival Analysis with Latent Clustering and Contrastive Learning. IEEE J. Biomed. Health Inform. 2024, 28, 3090–3101. [Google Scholar] [CrossRef]
Hong, C.; Yi, F.; Huang, Z. Deep-CSA: Deep Contrastive Learning for Dynamic Survival Analysis with Competing Risks. IEEE J. Biomed. Health Inform. 2022, 26, 4248–4257. [Google Scholar] [CrossRef] [PubMed]
Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.; Asari, V.K. A state-of-the-art survey on deep learning theory and architectures. Electronics 2019, 8, 292. [Google Scholar] [CrossRef]
Abdolrasol, M.G.; Hussain, S.S.; Ustun, T.S.; Sarker, M.R.; Hannan, M.A.; Mohamed, R.; Ali, J.A.; Mekhilef, S.; Milad, A. Artificial neural networks based optimization techniques: A review. Electronics 2021, 10, 2689. [Google Scholar] [CrossRef]
Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random survival forests. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Van Belle, V.; Pelckmans, K.; Van Huffel, S.; Suykens, J.A. Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artif. Intell. Med. 2011, 53, 107–118. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Jia, Z.; Mercola, D.; Xie, X. A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Comput. Math. Methods Med. 2013, 2013, 873595. [Google Scholar] [CrossRef] [PubMed]
Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
Ren, K.; Qin, J.; Zheng, L.; Yang, Z.; Zhang, W.; Qiu, L.; Yu, Y. Deep recurrent survival analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4798–4805. [Google Scholar]
Wang, Z.; Sun, J. Survtrace: Transformers for survival analysis with competing events. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Chicago, IL, USA, 7–10 August 2022; pp. 1–9. [Google Scholar]
Andrew, G.; Arora, R.; Bilmes, J.; Livescu, K. Deep canonical correlation analysis. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1247–1255. [Google Scholar]
Xie, Y.; Lin, B.; Qu, Y.; Li, C.; Zhang, W.; Ma, L.; Wen, Y.; Tao, D. Joint deep multi-view learning for image clustering. IEEE Trans. Knowl. Data Eng. 2020, 33, 3594–3606. [Google Scholar] [CrossRef]
Chen, R.; Tang, Y.; Zhang, W.; Feng, W. Adaptive-weighted deep multi-view clustering with uniform scale representation. Neural Netw. 2024, 171, 114–126. [Google Scholar] [CrossRef]
Yang, Z.; Zhu, C.; Li, Z. Deep contrastive multi-view clustering with doubly enhanced commonality. Multimed. Syst. 2024, 30, 196. [Google Scholar] [CrossRef]
Yu, X.; Jiang, Y.; Chao, G.; Chu, D. Deep Contrastive Multi-View Subspace Clustering With Representation and Cluster Interactive Learning. IEEE Trans. Knowl. Data Eng. 2024, 1–12. [Google Scholar] [CrossRef]
Yang, X.; Liu, Y.; Zhou, S.; Wang, S.; Tu, W.; Zheng, Q.; Liu, X.; Fang, L.; Zhu, E. Cluster-guided contrastive graph clustering network. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 10834–10842. [Google Scholar]
MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
Kumar, M.; Packer, B.; Koller, D. Self-paced learning for latent variable models. In Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, BC, Canada, 6–9 December 2010; Volume 23. [Google Scholar]
Guo, X.; Liu, X.; Zhu, E.; Zhu, X.; Li, M.; Xu, X.; Yin, J. Adaptive self-paced deep clustering with data augmentation. IEEE Trans. Knowl. Data Eng. 2019, 32, 1680–1693. [Google Scholar] [CrossRef]
Curtis, C.; Shah, S.P.; Chin, S.F.; Turashvili, G.; Rueda, O.M.; Dunning, M.J.; Speed, D.; Lynch, A.G.; Samarajiwa, S.; Yuan, Y.; et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012, 486, 346–352. [Google Scholar] [CrossRef]
Foekens, J.A.; Peters, H.A.; Look, M.P.; Portengen, H.; Schmitt, M.; Kramer, M.D.; Brünner, N.; Jänicke, F.; Gelder, M.E.M.v.; Henzen-Logmans, S.C.; et al. The urokinase system of plasminogen activation and prognosis in 2780 breast cancer patients. Cancer Res. 2000, 60, 636–643. [Google Scholar] [PubMed]
Schumacher, M.; Bastert, G.; Bojar, H.; Hübner, K.; Olschewski, M.; Sauerbrei, W.; Schmoor, C.; Beyerle, C.; Neumann, R.; Rauschecker, H. Randomized 2 × 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. J. Clin. Oncol. 1994, 12, 2086–2093. [Google Scholar] [CrossRef] [PubMed]
Knaus, W.A.; Harrell, F.E.; Lynn, J.; Goldman, L.; Phillips, R.S.; Connors, A.F.; Dawson, N.V.; Fulkerson, W.J.; Califf, R.M.; Desbiens, N.; et al. The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Ann. Intern. Med. 1995, 122, 191–203. [Google Scholar] [CrossRef] [PubMed]
Harrell, F.E.; Califf, R.M.; Pryor, D.B.; Lee, K.L.; Rosati, R.A. Evaluating the yield of medical tests. JAMA 1982, 247, 2543–2546. [Google Scholar] [CrossRef] [PubMed]
Antolini, L.; Boracchi, P.; Biganzoli, E. A time-dependent discrimination index for survival data. Stat. Med. 2005, 24, 3927–3944. [Google Scholar] [CrossRef]
Graf, E.; Schmoor, C.; Sauerbrei, W.; Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 1999, 18, 2529–2545. [Google Scholar] [CrossRef]
Kvamme, H.; Borgan, Ø.; Scheel, I. Time-to-event prediction with neural networks and Cox regression. J. Mach. Learn. Res. 2019, 20, 1–30. [Google Scholar]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Rajan, R.; Poniecka, A.; Smith, T.L.; Yang, Y.; Frye, D.; Pusztai, L.; Fiterman, D.J.; Gal-Gombos, E.; Whitman, G.; Rouzier, R.; et al. Change in tumor cellularity of breast carcinoma after neoadjuvant chemotherapy as a variable in the pathologic assessment of response. Cancer Interdiscip. Int. J. Am. Cancer Soc. 2004, 100, 1365–1373. [Google Scholar] [CrossRef]
Castelli, G.; Pognani, C.; Cita, M.; Stuani, A.; Sgarbi, L.; Paladini, R. Procalcitonin, C-reactive protein, white blood cells and SOFA score in ICU: Diagnosis and monitoring of sepsis. Minerva Anestesiol. 2006, 72, 69. [Google Scholar] [PubMed]
Brown, S.M.; Duggal, A.; Hou, P.C.; Tidswell, M.; Khan, A.; Exline, M.; Park, P.K.; Schoenfeld, D.A.; Liu, M.; Grissom, C.K.; et al. Nonlinear imputation of PaO₂/FIO₂ from SpO₂/FIO₂ among mechanically ventilated patients in the ICU: A prospective, observational study. Crit. Care Med. 2017, 45, 1317–1324. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The overall architecture of the DVC-Surv model. The Siamese autoencoder consists of two autoencoders without parameter sharing, mapping patient covariates into latent spaces of two views. Subsequently, the dual-view clustering module integrates the representations from dual views to cluster the samples. Lastly, the fused representation of the two views and covariates is fed into the survival backbone to obtain an estimation of the survival distribution.

Figure 2. Schematic diagram of triple contrastive learning, including (a) inter-view cluster-guided contrastive learning, (b) intra-view instance-wise contrastive learning, and (c) intra-view cluster-wise contrastive learning.

Figure 3. The visualization of dual-view clustering with tSNE. The t-SNE algorithm can map high-dimensional data to a low-dimensional space (such as two-dimensional space) while preserving the similarity between data points, thereby enabling the visualization of high-dimensional data distributions. Specifically, the clustering results in two views at the end of pre-training and training are shown. In each figure, different clusters are represented by different colors, with censored and uncensored samples indicated by ‘×’ and ‘·’, respectively.

Figure 4. The feature importance of the model is determined using the SHAP algorithm. Specifically, the SHAP algorithm evaluates the contribution of each feature to the model’s predictions by calculating the marginal effect of each feature on each sample’s prediction. Higher SHAP values indicate that the feature plays a more significant role in the model’s prediction outcomes. Based on this, the average ranking of features’ SHAP values across all samples represents the importance ranking of the features. This can help us identify the features on which the model relies when making predictions, thereby better understanding the model’s decision-making process.

Table 1. Search ranges and final values of hyperparameters.

Parameter	Search Range	Values
Parameter	Search Range	METABRIC	GBSG	SUPPORT
K	${1, 2, 3, 4, 5}$	3	3	4
$α_{N L L}$	$[1 \times 10^{- 5}, 1]$	$1 \times 10^{- 1}$	$1 \times 10^{- 1}$	$1 \times 10^{- 1}$
$α_{R E C}$	$[1 \times 10^{- 5}, 1]$	$1 \times 10^{- 2}$	$1 \times 10^{- 2}$	$5 \times 10^{- 2}$
$α_{D V C}$	$[1 \times 10^{- 5}, 1]$	$5 \times 10^{- 1}$	$1 \times 10^{- 2}$	$1 \times 10^{- 1}$
$α_{I V C G}$	$[1 \times 10^{- 5}, 1]$	$1 \times 10^{- 2}$	$1 \times 10^{- 2}$	$5 \times 10^{- 2}$
$α_{I V I W}$	$[1 \times 10^{- 5}, 1]$	$1 \times 10^{- 2}$	$5 \times 10^{- 2}$	$1 \times 10^{- 3}$
$α_{I V C W}$	$[1 \times 10^{- 5}, 1]$	$1 \times 10^{- 2}$	$1 \times 10^{- 2}$	$1 \times 10^{- 2}$

Table 2. C-Index performance comparison between our model and other rivals (mean ± std).

Model	METABRIC	GBSG	SUPPORT	Average
CoxPH [7]	$0.6349 \pm 0.0112$	$0.6620 \pm 0.0200$	$0.5689 \pm 0.0083$	$0.6219$
RSF [23]	$0.6490 \pm 0.0069$	$0.6575 \pm 0.0170$	$0.6271 \pm 0.0119$	$0.6445$
DeepSurv [13]	$0.6410 \pm 0.0112$	$0.6558 \pm 0.0208$	$0.5729 \pm 0.0188$	$0.6232$
DSM [15]	$0.6658 \pm 0.0131$	$0.6736 \pm 0.0166$	$0.6058 \pm 0.0062$	$0.6484$
DeepHit [14]	$0.6716 \pm 0.0084$	$0.6712 \pm 0.0196$	$0.6313 \pm 0.0072$	$0.6580$
DSACC [19]	0.6722 ± 0.0161	0.6793 ± 0.0152	0.6350 ± 0.0074	0.6621
DVC-Surv	$0.6736 \pm 0.0127$	$0.6818 \pm 0.0109$	$0.6415 \pm 0.0053$	$0.6653$

Table 3. IBS performance comparison between our model and other rivals (mean ± std).

Model	METABRIC	GBSG	SUPPORT	Average
CoxPH [7]	$0.1627 \pm 0.0045$	$0.1829 \pm 0.0055$	$0.2054 \pm 0.0026$	$0.1837$
RSF [23]	$0.1712 \pm 0.0051$	$0.1838 \pm 0.0047$	$0.1896 \pm 0.0038$	0.1815
DeepSurv [13]	$0.1628 \pm 0.0065$	$0.1842 \pm 0.0058$	$0.2081 \pm 0.0064$	$0.1850$
DSM [15]	$0.1669 \pm 0.0046$	$0.1853 \pm 0.0050$	$0.2028 \pm 0.0034$	$0.1850$
DeepHit [14]	$0.1698 \pm 0.0067$	$0.1958 \pm 0.0035$	$0.2036 \pm 0.0039$	$0.1897$
DSACC [19]	0.1616 ± 0.0063	$0.1826 \pm 0.0045$	$0.2028 \pm 0.0052$	$0.1823$
DVC-Surv	$0.1603 \pm 0.0090$	0.1827 ± 0.0013	0.1929 ± 0.0027	$0.1786$

Table 4. C-Index performance comparison between our model and other variant models (mean ± std).

Model	METABRIC	GBSG	SUPPORT	Average
${Model}_{0}$	$0.6722 \pm 0.0161$	$0.6793 \pm 0.0152$	$0.6350 \pm 0.0074$	$0.6621$
${Model}_{1}$	$0.6728 \pm 0.0152$	$0.6799 \pm 0.0056$	$0.6359 \pm 0.0052$	$0.6629$
${Model}_{2}$	$0.6699 \pm 0.0220$	$0.6781 \pm 0.0048$	$0.6328 \pm 0.0065$	$0.6603$
${Model}_{3}$	$0.6727 \pm 0.0231$	$0.6803 \pm 0.0037$	$0.6386 \pm 0.0056$	$0.6639$
${Model}_{4}$	$0.6721 \pm 0.0189$	$0.6816 \pm 0.0049$	$0.6389 \pm 0.0065$	$0.6642$
${Model}_{5}$	0.6730 ± 0.0154	0.6816 ± 0.0032	0.6402 ± 0.0047	0.6649
DVC-Surv	$0.6736 \pm 0.0127$	$0.6818 \pm 0.0109$	$0.6415 \pm 0.0053$	$0.6653$

Table 5. IBS performance comparison between our model and other variant models (mean ± std).

Model	METABRIC	GBSG	SUPPORT	Average
${Model}_{0}$	$0.1616 \pm 0.0063$	$0.1826 \pm 0.0045$	$0.2028 \pm 0.0052$	$0.1823$
${Model}_{1}$	$0.1629 \pm 0.0094$	$0.1837 \pm 0.0037$	$0.1954 \pm 0.0039$	$0.1807$
${Model}_{2}$	$0.1636 \pm 0.0095$	$0.1843 \pm 0.0021$	$0.1972 \pm 0.0019$	$0.1817$
${Model}_{3}$	$0.1627 \pm 0.0098$	$0.1840 \pm 0.0019$	$0.1945 \pm 0.0025$	$0.1804$
${Model}_{4}$	$0.1639 \pm 0.0087$	$0.1833 \pm 0.0040$	$0.1947 \pm 0.0011$	$0.1806$
${Model}_{5}$	0.1609 ± 0.0058	$0.1836 \pm 0.0015$	0.1944 ± 0.0027	0.1796
DVC-Surv	$0.1603 \pm 0.0090$	0.1827 ± 0.0013	$0.1929 \pm 0.0027$	$0.1786$

Table 6. Computational resource analysis on SUPPORT dataset.

Model	DeepHit	DSACC	DVC-Surv
C-index ↑	$0.6176$	$0.6722$	$0.6736$
IBS ↓	$0.1698$	$0.1616$	$0.1603$
GPU memory (MB) ↓	969	1199	1227
Train time (s/10 epoch) ↓	$6.7659$	$8.0378$	$8.3628$
Test time (s/10 epoch) ↓	$0.3579$	$0.4360$	$0.4520$
Epoch	500	500	500

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, C.; Tang, Y.; Zhang, W. Deep Contrastive Survival Analysis with Dual-View Clustering. Electronics 2024, 13, 4866. https://doi.org/10.3390/electronics13244866

AMA Style

Cui C, Tang Y, Zhang W. Deep Contrastive Survival Analysis with Dual-View Clustering. Electronics. 2024; 13(24):4866. https://doi.org/10.3390/electronics13244866

Chicago/Turabian Style

Cui, Chang, Yongqiang Tang, and Wensheng Zhang. 2024. "Deep Contrastive Survival Analysis with Dual-View Clustering" Electronics 13, no. 24: 4866. https://doi.org/10.3390/electronics13244866

APA Style

Cui, C., Tang, Y., & Zhang, W. (2024). Deep Contrastive Survival Analysis with Dual-View Clustering. Electronics, 13(24), 4866. https://doi.org/10.3390/electronics13244866

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Contrastive Survival Analysis with Dual-View Clustering

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Problem Definition

3.2. Overall Architecture

3.3. Siamese Autoencoder

3.4. Dual-View Clustering

3.5. Self-Paced Learning

3.6. Triple Contrastive Learning

3.6.1. Intra-View Cluster-Guided Contrastive Learning

3.6.2. Inter-View Instance-Wise Contrastive Learning

3.6.3. Inter-View Cluster-Wise Contrastive Learning

3.7. Survival Backbone

3.8. Loss Functions and Optimization

4. Experiments and Discussion

4.1. Datasets

4.2. Baseline Models

4.3. Evaluation Metrics

4.4. Experimental Setup

4.5. Results and Analysis

4.6. Ablation Study

4.7. Visualization Analysis

4.8. Interpretability Analysis

4.9. Computational Resource Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI