. 2021 Aug 26;122:108274. doi: 10.1016/j.patcog.2021.108274

GraphXCOVID: Explainable deep graph diffusion pseudo-Labelling for identifying COVID-19 on chest X-rays

Angelica I Aviles-Rivero ^a,^⁎, Philip Sellars ^b, Carola-Bibiane Schönlieb ^b, Nicolas Papadakis ^c

PMCID: PMC8387569 PMID: 34462610

Abstract

Can one learn to diagnose COVID-19 under extreme minimal supervision? Since the outbreak of the novel COVID-19 there has been a rush for developing automatic techniques for expert-level disease identification on Chest X-ray data. In particular, the use of deep supervised learning has become the go-to paradigm. However, the performance of such models is heavily dependent on the availability of a large and representative labelled dataset. The creation of which is a heavily expensive and time consuming task, and especially imposes a great challenge for a novel disease. Semi-supervised learning has shown the ability to match the incredible performance of supervised models whilst requiring a small fraction of the labelled examples. This makes the semi supervised paradigm an attractive option for identifying COVID-19. In this work, we introduce a graph based deep semi-supervised framework for classifying COVID-19 from chest X-rays. Our framework introduces an optimisation model for graph diffusion that reinforces the natural relation among the tiny labelled set and the vast unlabelled data. We then connect the diffusion prediction output as pseudo-labels that are used in an iterative scheme in a deep net. We demonstrate, through our experiments, that our model is able to outperform the current leading supervised model with a tiny fraction of the labelled examples. Finally, we provide attention maps to accommodate the radiologist’s mental model, better fitting their perceptual and cognitive abilities. These visualisation aims to assist the radiologist in judging whether the diagnostic is correct or not, and in consequence to accelerate the decision.

Keywords: COVID-19, Chest X-ray, Semi-Supervised learning, Deep learning, Explainability

1. Introduction

Since the outbreak of the novel coranavirus disease 2019 (COVID-19), which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), there have been more than 30 million confirmed infected cases and more than 1 million deaths has been reported worldwide (as on September 1st). This threat has encouraged joint efforts to obtain accurate early detection of COVID-19 to try and limit the spread of the pandemic.

Whilst real-time reverse transcription polymerase chain reaction (RT-PCR) COVID-19 test is the current gold standard for diagnosis, this type of test has demonstrated several limits and burdens. Firstly, it is prone to false negatives that heavily rely on the sample acquisition characteristics including insufficient quantities and location (nasal, throat or sputum) [1], [2]. Secondly, the limitation of several world regions in obtaining fast accessibility to the test. The use of imaging techniques, including computerised tomography (CT) and chest x-rays (CXRs), has been suggested as a parallel option to RT-PCR test. Clinical manifestation of COVID-19 is that of a respiratory infection, which is associated with viral pneumonia. Distinguishing viral pneumonia from bacterial pneumonia is a challenging task. This task becomes even harder when a large number of suspected patients need to be screened. This time consuming task strains already limited medical resources, thus reducing the efficiency of the diagnosis.

Computerised tomography (CT) has been a focus of attention in the literature for COVID-19 e.g. [3], [4]. However, the burden imposed in terms of infection control using CT scan suites and the inefficiencies relating to room decontamination and access restriction from several world regions make CT challenging to be used as a routinely basis – despite its high sensitivity [5]. Due to its wide availability and inexpensive screening, a great focus has been placed on Chest X-rays (CXRs) for both clinical and AI areas e.g. [6], [7], [8], [9]. These advantages make CXR a perfect alternative and complement to the RT-PCR. Despite the CXRs advantages, accurate interpretation still remains a challenge [10]. This is because the accuracy of the interpretation relies on the radiologist’s expertise level and there is still a substantial clinical error on the outcome [11]. Therefore, there is an urge for fast automated evaluations of CXRs, to quickly explore the vast amount of data which will save time in evaluating the diseases.

In this work and motivated by the aforementioned advantages of CXRs, we are interested in using CXRs for diagnosis, in which we consider to distinguish three major classes: healthy, COVID-19 and Pneumonia. As pointed out by the WHO chest imaging guidelines [12], imaging based diagnosis using CXRs plays an important role for improving decision making in several cases [12]. Firstly, when RT-PCR testing or results are not immediately available. Secondly, when initial RT-PCR testing is negative, but with high clinical suspicion of COVID-19. Chest imaging is also suggested for patients with suspected or confirmed COVID-19, not currently hospitalised and with mild symptoms, in addition to clinical and laboratory assessment to decide on hospital admission versus home discharge.

For the task of classifying COVID-19 using CXRs data, there has been a fast development of deep learning techniques e.g. [13], [14], [15], [16], [17], [18], [19], in which supervised learning is the go-to paradigm. However, the performance of these techniques strongly rely on a large and representative corpus of labelled data. In the medical domain and in particular with a new disease, this might be a strong assumption in the design of a solution, as it involves scarce annotations that might contain strong human bias. The current leading supervised model for COVID, COVID-Net [9], has reported promising results with a sensitivity of 91% for COVID-19. Hence there is still plenty of room for improvements, namely on how to use the vast amount of available unlabelled data to prevent labelling errors and uncertainties from affecting the classification output.

Motivated by the above limitations in current techniques, we address the following problem – Can one get a robust classifier with performance higher or comparable to the current leading supervised technique for CXR-based diagnosis of COVID-19 using far less labels? To answer to this question we propose a deep semi-supervised framework to go beyond human bias and the limited amount of labelled data. We remark that many deep SSL techniques e.g.[20], [21], [22], [23], have only been considered for natural images. No work has evaluated the performance outside this domain, by considering the fundamental differences between natural and medical images [24]. This paper extends our work in [25] with noticeable differences. Firstly, we construct the graph based on the initial embeddings coming from a deep net, making for an accurate first construction. Secondly, we use our optimisation diffusion model as means for generating pseudo-labels that can be updated iteratively in a trained deep net. Moreover, unlike the pseudo-label perspective of [26], our technique is a graph based approach and the pseudo-labels are computed by our diffusion model rather than the network. Our contributions are as follows:

•
We propose a deep semi-supervised framework, in which we highlight:
- •
  An optimisation model with strong class priors for multi-class graph diffusion, which is based on normalised and non-smooth $p = 1$ Dirichlet energy. Our method offers theoretical guarantees and efficient solving.
- •
  The connection of our diffusion model to the generation of meaningful pseudo-labels, which avoids the current SSL trend on the use of consistency regularisation.
We show that our framework reinforces the natural relation among the tiny labelled set and the vast unlabelled data.
•
We evaluate our technique with several numerical, statistical and visual results using an unified dataset that contains highly diverse samples from different sources. To the best of our knowledge, this is the first graph based deep semi-supervised technique proposed for identifying COVID-19. Moreover, we also report explainable results from our prediction scores to assist and accelerate the radiologist diagnosis.
•
We demonstrate that our technique reports higher sensitivity in COVID-19 and presents better global performance than the current leading deep supervised technique for such application whilst requiring far less labelled data.

2. Related work

The recent problem of classifying CXRs for COVID-19 has observed a fast growing in the literature. Existing techniques are reviewed in this section.

2.1. Chest X-ray classification for COVID-19

The task of classifying CXRs has been widely investigated in the community. The go-to paradigm to address this problem is deep learning e.g. [27], [28], [29], [30]. The fast development of these techniques has been motivated by the release of several benchmarking datasets including ChestXray14 [27] and CheXpert [29], in which a large number of annotated samples from different pathologies (classes) are contained in each dataset. With new diseases such as COVID-19, where the annotated data is limited, one needs to rethink the whole design of the techniques. In this work, we motivate the power of semi-supervised learning for COVID-19. In what follows, we review related COVID-19 and SSL techniques.

The bulk of literature addressing the task of classifying CXRs for COVID-19 is largely based on deep supervised learning e.g.[9], [13], [14], [15], [16], [18], [19], [31], [32], [33], and several techniques arise every single day. Most of approaches apply pre-trained off-the-shelf networks; in which diverse generic architectures, including ResNet [34], DenseNet [35] and VGG [36]. Existing works thus leverage on fine tuned networks and networks trained from scratch on CXRs data. However, there are fundamental differences between natural and medical image classification including features and data size. Hence, as shown in [24], transfer learning might offer little benefit to performance due to the over-parameterisation of standard models. Moreover, the samples available for COVID-19 are scarse in comparison to other type of pneumonia, and one needs to deal with highly imbalanced datasets.

The key assumption of supervised techniques is a well-representative labelled dataset, and in new diseases, such as COVID-19, this core assumption is a strong one. Moreover, the available annotations might be far from being a definite expression of ground truth [37]. Therefore, by using deep supervised learning techniques are prone to labelling error and uncertainty that adversely affect the classification output. Although transfer learning [38] and Generative Adversarial Networks [39] mitigate, at some level, the lack of a large and representative dataset, they weakly account for the mismatch between expert annotation and ground truth annotation, which is generated by human bias and uncertainty, and the performance can be limited due to the differences among datasets as demonstrated in [24].

Motivated by the above-mentioned drawbacks in deep supervised learning and with the goal of generating a high sensitivity technique that largely decreases the need for large annotated set, we introduce a deep semi-supervised technique for the application to COVID-19 identification, which to our knowledge it is the first in its type. In the next subsection we discuss recent techniques in deep SSL for other domains and how they differ to ours.

2.2. Semi-Supervised classification for medical images

Semi-supervised learning has been applied in the medical domain since its early developments, in which model based techniques have been the main focus of attention e.g. [40], [41], [42], [43], [44]. These approaches have demonstrated the potential of SSL- but the generalisation of the feature space along with the computational requirements have raised limitations. Recently, with the advent of deep learning, deep semi-supervised learning has been investigated.

In the past few years, there has been a silent revolution in deep SSL techniques, e.g. [21], [22], [23], [45], that have sought to combine the theoretical underpinning of SSL [46] with the generalisation and feature extraction of deep neural networks.

The largest trend for new deep SSL methods involves consistency regularisation [20], [21], [22], [23]. The core idea of this perspective is to, for both labelled and unlabelled examples, induce perturbations $δ$ and then add a regularisation term to the loss function such that the prediction of the model is invariant to the perturbation, i.e. $f (x + δ) = f (x)$ . The main variety between approaches stem from how to generate perturbations $δ$ . The definition of perturbations is indeed a complex problem, and no work has ever evaluated the performance of consistency based approaches outside the natural image domain.

To the best of our knowledge, there has been no deep SSL approach proposed for COVID-19 analysis. The potential performance of SSL has nevertheless been shown in our prior work [25] on CXRs analysis. We readily competed with SOTA supervised techniques for identification of several pathologies in CXRs, using a small fraction of the available labels. In this work, we extend this framework to build upon the concept of pseudo-labels, first introduced in [26]. However, compared to that initial work, there are now several key differences since a graph based approach is considered and the pseudo-labels are predicted from our diffusion model rather than the deep net.

3. GraphX-COVID framework

This section presents the three parts of our proposed technique: i) data representation and robust graph construction, ii) optimisation model for graph diffusion and iii) driving optimisation that connects our diffusion model with deep nets. The overview of our GraphXCOVID is illustrated in Fig. 1 .

Fig. 1 — Workflow of our proposed technique. We use a tiny labelled set and large unlabelled set. First our technique optimise over the labelled set using (2), this seek to extract meaningful features to construct a strong graph. We then use our proposed diffusion model (5), which results in generated pseudo-labels that are iteratively optimised using (7). The output is the confidence score per class and the attention map.

Problem Definition. Given a small amount of labelled data $D_{L} = {(x_{h}, y_{h})}_{h = 1}^{l}$ with provided labels $L = {1, ., L}$ and $y_{h} \in L$ , and a large amount of unlabelled data $X_{u} = {x_{k}}_{m = l + 1}^{n}$ . The whole set of data is thus $X = X_{L} \cup X_{U}$ , where $X_{L} = {x_{1}, \dots, x_{l}}$ . We seek to infer a function $f : X^{n} \mapsto Y^{n}$ such that $f$ gets a good estimate for ${x_{k}}_{m = l + 1}^{n}$ with minimum generalisation error.

In particular, in a deep semi-supervised setting, one seeks to minimise a functional of the form:

\min_{θ} \overset{labelled set}{\overset{︷}{\sum_{(x, y) \in D_{L}} L_{S} (x, y; θ)}} + γ \overset{unlabelled set}{\overset{︷}{\sum_{x \in X_{u}} L_{U} (x; θ)}},

(1)

where $L_{S}$ is the per example loss for the labelled set (e.g. standard cross-entropy) and $L_{U}$ denotes a loss defined on the unlabelled set (e.g. consistency loss). Moreover, $γ \in R^{+}$ is a weighting parameter to balance the two terms, and $θ$ is the network parameters to estimate.

The current go-to perspective in deep SSL is to apply consistency regularisation for $L_{U}$ in (1), which enforces invariant network predictions with respect to perturbations on the unlabelled data $X_{u}$ , e.g. [20], [21], [22], [23] However, the definition of such $δ$ -pertubations, e.g. flip-and-shift, rotate, posterise and sharpness, is not trivial. In this work, we avoid the explicit definition of such $δ$ by taking a proxy-based perspective. In particular, we rely on the concept of pseudo-labels [26] ${\hat{y}}_{i}$ for images of the unlabelled set $x_{i} \in X_{u}$ . In this work, the pseudo-labels are generated by optimising our graph-based model. Our framework is an iterative two-part technique, the first part concerns pseudo-label generation, including graph representation (see Subsection 3.1) and label diffusion (Subsection 3.2). The second step deals with the update of the generated pseudo-labels (Subsection 3.3).

3.1. Feature extraction & graph construction

The most common data representation is a Euclidean or grid-like structure. We rather define our data in a non-Euclidean domain with a graph. This framework offers different benefits including mathematical properties such as sparseness which allows for fast computation, and the ability to correct initially mislabelled data by smoothing the embeddings. We represent the dataset $X$ as a graph, where each node is an image, to produce pseudo-labels. Then, unlike pure model-based approaches or pure deep learning techniques, we introduce a hybrid model – that is, a combination between a model-based (energy model) and a deep learning framework.

A deep network $f_{θ}$ is considered for updating the pseudo-labels generated by our optimisation model. It is initialised from the tiny labelled set $(x, y) \sim D_{L}$ by minimising:

L_{S} (X_{L}, Y_{L}; θ) : = \sum_{h = 1}^{l} ℓ_{H} (f_{θ} (x_{h}), y_{h}),

(2)

where the loss function $ℓ_{H}$ is cross entropy, which is the most common choice for classification tasks. This optimisation process only involves the provided small labelled set to construct the initial graph (i.e. it is run once as initialisation).

More precisely, a given set of data (or features) can be represented as an undirected weighted graph by $G = (V, E, W)$ composed of $n$ nodes $V = {v_{1}, \dots, v_{n}}$ , which are connected by edges $E = {{v_{i}, v_{j}} : v_{i}, v_{j} \in V}$ with weights $w_{i j} = S (i, j) \geq 0$ that correspond to some similarity measure $S$ between the features of nodes $i \in V$ and $j \in V$ , and $w_{i j} = 0$ if $(i, j) \notin E$ . In our setting, a node $v_{i}$ represents an image of the set $x_{i} \in X$ . A fundamental question when using graph based approaches is how to effectively extract representative features to construct a robust graph. In order to avoid the sensitive task of hand-crafted feature selection, we rely on embeddings automatically produced by (2). Hence we reduce the potentially large generalisation error that appears when extracting hand-crafted features on a small training set $X_{L}$ .

We consider as feature extractor the function $φ_{θ} : X \to R^{P}$ , given by the bottleneck of the current network $f_{θ}$ , that maps the input to some feature space of dimension $P$ . To construct our graph, we compute, for each sample, the set of descriptors by $c_{i} = φ_{θ} (x_{i}) \in R^{P}$ for $x_{i} \in X$ , and connections are created through the k nearest neighbours (k-NN) approach in $R^{P}$ . An illustration of extracted features is given in Fig. 2 .

Fig. 2 — Visualisation of features extracted from Chest X-ray data.

Notice that the model $f_{θ}$ and the feature extractor $φ_{θ}$ are updated (see Section 3.3) through the exposition to pseudo-labels of unlabelled data. The graph thus evolves along our process. We now detail how pseudo-labels are obtained on the graph by following a transductive strategy.

3.2. Label diffusion as pseudo-Labelling

Pseudo-labels are estimated through a diffusion process on the whole graph containing both labeled and unlabelled examples. For each image, our method also assigns a score reflecting the uncertainty of the produced pseudo-label (see Section 3.3). To do this, we first generate the pseudo-labels through an optimisation model based on the normalised graph p-Dirichlet energy expressed as:

Δ_{p} (u) = \sum_{i, j} w_{i j} {∥ \frac{u_{i}}{d_{i}^{1 / p}} - \frac{u_{j}}{d_{j}^{1 / p}} ∥}^{p}, p \geq 1, d_{i} = \sum_{j} w_{i j} > 0,

(3)

where the degree of the node $i$ is denoted by $d_{i}$ , and the weights are computed from the descriptors described in previous section. The minimisation of this energy allows the diffusion of a labelling variable $u$ . Whilst techniques in this line have been reported in the medical domain e.g.[41], [42], [47] and in the pure machine learning community e.g. [48], [49], they only seek to use eigenfunctions of a normalised Dirichlet energy based on the graph Laplacian for $p = 2$ or only approximate $p \to 1$ . However, later machine learning works [50] has demonstrated that using the non smooth $p = 1$ Dirichlet energy (related to total variation) achieves better performance for label propagation.

With this motivation in mind, we introduce an optimisation model based on the normalised and non smooth $p = 1$ Dirichlet energy. In the case $p = 1$ , model (3) is thus $Δ_{1} (u) = | W D^{- 1} u |$ , where $D$ is the diagonal matrix containing the degrees $d_{i}$ and the $m \times n$ matrix $W$ encodes the $m$ edges in the graph. Each of these edges is represented on a different line $i$ of the matrix $W$ , with the value $w_{i j}$ (resp. $- w_{i j}$ ) on the column $i$ (resp. $j$ ).

We now detail our multi-class model that will be applied to $L = 3$ classes: “0: Healthy”, “1: Pneumonia” and “2: COVID-19”. For each class $k = 1 \dots L$ , we set a variable $u^{k}$ that contains the node values for class $k$ and denote $u = [u^{1}, \dots u^{L}]$ . For unlabelled nodes $i > l$ , we couple the $L$ variables with the constraint: $\sum_{k = 1}^{L} u_{i}^{k} = 0, \forall i > l$ . We make the standard assumption that there exists a non empty set of labelled nodes $I_{k} \subset {1 \dots l}$ for each class $k$ . For these nodes, we set $u_{i}^{k} \geq ϵ$ if $i \in I_{k}$ (positive response for the class), and $u_{i}^{k^{'}} \leq - ϵ$ if $i \in I_{k}$ and $k^{'} \neq k$ (negative output for the other classes).

Under such constraints, we seek to minimise the multi-class functional [51] that contains the sum of normalised ratios :

\min_{∥ u ∥ = 1} \sum_{k = 1}^{L} \frac{Δ_{1} u^{k}}{| u^{k} |} .

(4)

We consider an iterative scheme to optimise this problem:

\begin{matrix} u^{(t + 1)} & = & \underset{u}{a r g m i n} \frac{∥ u - u^{(t)} ∥^{2}}{2 Δ t} \\ + \sum_{k = 1}^{L} (Δ_{1} (u^{k}) - \frac{Δ_{1} (u^{k, (t)})}{| u^{k, (t)} |} 〈 s i g n (u^{k, (t)}), u^{k} 〉), \end{matrix}

(5)

where $t$ is a time index associated to the step $Δ t > 0$ . This process diffuses information from labelled nodes to unlabelled ones. To avoid trivial solutions [52], [53], we apply shifting $u^{k, (t + 1)} = u^{k, (t + 1)} - median (u^{k, (t + 1)})$ and normalisation $u^{(t + 1)} = u^{(t + 1)} / ∥ u^{(t + 1)} ∥$ steps at the end of each iteration.

From (5), one can see that the solution $u^{k, (t + 1)}$ satisfies:

\begin{matrix} \sum_{k = 1}^{L} Δ_{1} (u^{k, (t + 1)}) & \leq & \sum_{k = 1}^{L} \frac{Δ_{1} (u^{k, (t)})}{| u^{k, (t)} |} 〈 s i g n (u^{k, (t)}), u^{k, (t + 1)} 〉 \\ \leq & \sum_{k = 1}^{L} \frac{Δ_{1} (u^{k, (t)})}{| u^{k, (t)} |} | u^{k, (t + 1)} |, \end{matrix}

(6)

so that we get a reduction of the normalised ratio $Δ_{1} (u^{k, (t)}) / | u^{k, (t)} |$ along iterations $k$ . When $L = 1$ , the scheme $u^{k, (t)}$ converges to a local minima of (4) which corresponds to a bivalued function that naturally segments the graph [53]. In the general case $L > 1$ , the convergence to a local minima of (4) can be ensured by using a modification of the scheme (5) as proposed in [54]. However, such adaptation comes at the cost of an important additional computational cost. Even of there is no theoretical guarantee for that, we observe a monotonous decrease of (4) with the scheme (5). As a consequence, we suggest to use the flow (5) that presents an acceptable trade-off between theoretical and practical aspects.

Once $u^{k}$ has converged to some $u^{*} = [u^{*, 1}, \dots, u^{*, L}]$ , the label of each node is finally given by ${\hat{y}}_{i} = a r g m a x_{j} u_{i}^{*, j}$ . In practice, our model (5) is solved using an accelerated primal dual algorithm [55]. Our generated pseudo-labels are denoted ${\hat{Y}}_{U} = {{\hat{y}}_{k}}_{k = l + 1}^{n}$ and used to update the classification network $f_{θ}$ as explained in the next section.

We remark that there are several differences between our pseudo-label approach and that of [25], [26]. Firstly and unlike [25], our work generates and updates the embeddings from a deep net to construct, since the beginning, a stronger graph. Secondly, our optimisation model generates pseudo-labels, outside the deep net, and then they are iteratively updated from a deep net. Thirdly, our current model is designed to generate highly certain pseudo-labels since early stages by integrating an uncertainty measure, and also is equipped with a class balance term (see Section 3.3). With respect to [26], our model follows different principle when generating the pseudo-labels. Firstly, unlike [26], our technique is a graph based model. Secondly, [26] generates the pseudo-labels directly from the deep net (i.e. inside the network) by taking maximum predicted probability from the network. On the other hand, we generate the pseudo-labels from our new optimisation model (drawn from the normalised graph p-Dirichlet energy) which is decoupled from the network (i.e. outside the network).

3.3. Deep graph pseudo-Labelling update

Although our graph diffusion model (5) generates relevant pseudo-labels [25], one can further decrease the uncertainty over time. This is one of the major problem in pseudo-labelling as naive pseudo-labels are limited by the confirmation bias. This can be addressed if we consider the two major bottlenecks in real-world problems. A first prevalent issue in the medical domain is to face highly imbalanced class samples. As illustrated in Fig. 3 , this is particularly true in the COVIDx dataset. The second one is to deal with inferred pseudo-labels with different levels of uncertainty. As it has been shown in several works e.g. [56], [57], [58], [59], these problems can be mitigated by weighting the importance over the inferred pseudo-labels and the classes, and update the pseudo-label certainty in an iterative fashion.

Fig. 3 — Class distribution of COVIDx dataset. From top to bottom: samples/class distribution over the full COVIDx dataset and the official partition from COVIDx. We conducted experiments in both sets. This dataset imposes several challenges due to the highly imbalanced classes, in which COVID-19 samples are significantly less than the Pneumonia and Healthy classes.

The problem of classifying with a highly imbalanced dataset has been widely studied in the literature, e.g. [56], [60]. We apply a common strategy for imbalanced class population [56], [61] and add a weighting factor inversely proportional to the effective number of samples for class $k : ω_{k} \propto 1 / E_{n}, n \in Z_{> 0}$ where $E_{n}$ is the total number of samples. For the second problem, we associate an uncertainty weighing factor, $υ_{i}$ , to each $u_{{\hat{y}}_{i}}$ generated in the diffusion process. We use entropy as the measure for uncertainty [58], [62], [63], given by $υ_{i} = 1 - (H (u_{{\hat{y}}_{i}}) / \log (L))$ , where $H$ refers to the entropy and $u_{{\hat{y}}_{i}}$ is normalised beforehand with respect to the values in $u_{i}^{*}$ .

We finally define the main driving optimisation (i.e. estimation of network parameters) as:

\min_{θ} \sum_{i = 1}^{l} ω_{y_{i}} ℓ_{H} (f_{θ} (x_{i}), y_{i}) + \sum_{i = l + 1}^{n} υ_{i} ω_{{\hat{y}}_{i}} ℓ_{H} (f_{θ} (x_{i}), {\hat{y}}_{i}),

(7)

the loss in (7) is connected with model (1) but unlike the typical consistency loss, for $L_{U}$ , we are using the philosophy of pseudo-labels which are generated by our diffusion model.

Let us summarise the overall process. We first optimise (2) for a set of epochs, this serves for extracting the embeddings from the deep net to construct a graph. We then perform (5) to diffuse the small labelled set to the unlabelled data. The output of this process is the generation of pseudo-labels for the unlabelled set. These pseudo-labels are then used to optimise (7), which in turn updates the model parameters. The whole process (feature extraction, graph update, pseudo label diffusion, network update) is then iterated.

4. Experimental results

In this section, we detail the set of experiments conducted to validate our technique.

4.1. Dataset description

We evaluate our approach on the COVIDx Dataset which is a multi-center dataset introduced in [9]. The dataset is composed of a total of 15,254 CXR images. The official partition only considers 13,975 CXR images across 13,870 patients, with a training set composed of 13,675 images and a test of 300 (1579 for the test set on the full dataset). This dataset is, up to our knowledge, the largest and most diverse one for COVID-19; as the samples are from different locations and acquired under different conditions and vendors. COVIDx indeed merges five highly diverse datasets repositories: COVID-19 image data collection [8], Actualmed COVID-19 Chest X-ray Dataset Initiative [64], COVID-19 Chest X-ray Dataset Initiative [65], RSNA Pneumonia Challenge dataset [27], [66] and COVID-19 Radiography Database [67].

The COVIDx dataset contains threes classes: Healthy, Pneumonia and COVID-19. The class breakdown, for the full and official partition CXR images, is illustrated in Fig. 3. As it can be observed from these plots, it is a highly imbalanced dataset, in which COVID-19 samples are much smaller than for the other other two classes to further description in the dataset see [9].

Moreover, to further support the performance of our technique, we use an external dataset to evaluate the generalisation to out-of-distribution samples. To do this, we use the external dataset BIMCV-COVID19 [68], which is composed of images from 11 hospitals from the Valencian Region, Spain. We randomly selected a subset of 200 patient-level samples covering all hospitals, where 75% were COVID-19 confirmed cases as one is interested in show generalisation for the target disease.

4.2. Evaluation methodology & implementation details

We validate our proposed technique as follows. First we evaluate the performance of our approach compared to supervised techniques including the leading fully supervised paper in the field COVID-Net [9]. These comparisons include VGG-16 [36], ResNet-18 and ResNet-50 [34], InceptionV3 [69] and DenseNet-121 [35]. The selection of these architectures follows the same line of motivation as in [9]: they offer a clear advantage for dealing with the unique traits of COVID. Our experiments are then conducted using: i) the official partition, in which the test set is 300 samples split evenly across the classes; ii) the full COVIDx dataset, in which the main differences with respect i) is that the test set is composed of 1579 samples (see Fig. 3); and iii) an additional random partition. We ran all the experiments under the same conditions, and followed standard pre-processing protocol to normalise the images to have zero mean and unit variance. The images we resized to the resolution 480 $\times$ 480.

The evaluation is addressed from both qualitative and quantitative points of view. The former is based on visual outputs of our classification. The latter present the per class computation of sensitivity, positive predictive value and F1-scores. The overall performance is computed in terms of accuracy and error rate. Furthermore, for sake of completeness and guided by the field of estimation statistics, we report along with the error rate the confidence intervals (95%) of all techniques. Finally, we performed a data ablation study of our SSL method by using $10 %, 20 % and 30 %$ of labels.

We now give implementation details. For the COVID-Net [9] technique, we used the implementation and parameters provided by the authors. In particular, we considered the latest suggested model COVIDNet-CXR3-B. For the compared techniques, we used weight decay= 5e-4, momentum= 0.9 and learning rate 1e-2 (1e-3 for [26], and ResNet-18 for a fair comparison). For our technique, the k-NN neighborhood graph has been built with $k = 50$ . A ResNet-18 architecture has been used for the deep network $f_{θ}$ . In practice, we used a total number of epochs of 210 with and a weight decay of $2 \times 10^{- 4}$ and learning rate was set to 5e-2 decreasing with cosine annealing. Furthermore, we follow standard protocol in semi-supervised learning to report our results, we randomly select the labelled samples over five repeated times, that is- one has five different splits. We then report the mean error over the splits. All techniques were implemented in PyTorch and using Stochastic Gradient Descent (SGD) as optimiser.

4.3. Results & discussion

We begin by evaluating the different methods on the official COVIDx partition. As a baseline comparison, we consider six supervised techniques: VGG-16 [36], ResNet-18 and ResNet-50 [34], InceptionV3 [69], DenseNet-121 [35] and COVID-Net [9]. To the best of our knowledge, there exists no semi-supervised technique dedicated to COVID-19 identification that we can compare. Hence, for the sake of fairness, we also adapted one semi-supervised technique, Pseudo-Labelling [26], that has a philosophy close to ours but builds on different principles including a minimum entropy criterion. The supervised methods use the full training set, whereas the SSL techniques only consider 30% of the labelled set.

We provide a detailed quantitative analysis to understand the performance of the different techniques. The per class metrics across the official data partition are thus reported in Table 1 . Concerning positive predictive values, we observe that our GraphXCOVID approach performs the best for healthy and pneumonia classes and it readily competes with COVID-Net in the COVID-19 class. Due to design limitations, VGG-16 performed the worst, whereas ResNet-50 presents performances closer to GraphXCOVID thanks to its residual architecture. We also observe that excepting for the COVID-19 class, InceptionV3 and DenseNet-121 both sightly outperform ResNet-50. In order to show the robustness of our technique, we also considered the full COVIDx dataset, a scenario closer to a real medical setting. We compared our method with the supervised approach of COVID-Net, the second better method on the official partition. As reported in Table 2 , GraphXCOVID here performs better for all classes and all considered metrics, while only using 30% of the labelled set.

Table 1.

Numerical comparison of our technique vs fully supervised approaches. The results report per class metrics, including sensitivity, positive predictive value and F1-scores along with the overall accuracy. Our technique readily competes with all supervised techniques whilst using far less labelled data. $†$ denotes the score reported in [9]. The references cited in this table are [9], [26], [34], [35], [36], [69].

Open in a new tab

Table 2.

Performance comparison of COVID-Net and our technique using the full dataset.

Open in a new tab

The second evaluation is done in terms of sensitivity. As shown in tables 1 and 2, GraphXCOVID reports the highest values for pneumonia and COVID-19. For COVID-19, the true positive proportion is significantly higher for our method (0.94) and COVID-Net (0.91) than for other ones ( $\leq$ 0.88). This observation is confirmed with the sensitivity results on the highly imbalanced full dataset (see Table 2).

To give a view of the relative performance of all techniques, the F1-scores are respectively reported in Tables 1 and 2 for the official and full partitions. It still underlines that VGG-16, ResNet-50, DenseNet-121 and InceptionV3 are not sufficiently competitive, whereas COVID-Net and our GraphXCOVID technique are readily performing in a similar level. However, looking deeper into the performance of these two last techniques, we observe that within a highly imbalanced scenario (see Table 2), our approach outperforms COVID-Net.

We also compare our technique to a SSL technique, with similar philosophy than ours but different in design, Pseudo-Labelling [26]. This technique generates the pseudo-labels directly from the network whilst our approach considers labels coming from the diffusion model. From Table 1, one can observe that GraphXCOVID offers a substantial improvement over Pseudo-Labelling for all metrics. Overall, we achieve better accuracy with an improvement of 8%, and reduce the error rate ( $\pm 1.20$ CI) by more than half as displayed in Fig 4 . The improvement comes from two parts. The first major benefit is related to pseudo-labels generation. The work of [26] provides naive pseudo-labels with the network itself, while our technique generates more certain ones that are iteratively updated using both our diffusion models and the network, along with a uncertainty weight. Secondly, our technique also accounts for imbalanced class distribution.

Fig. 4 — Error rates on the full dataset. Performance comparison error-wise along with confidence intervals of the compared techniques and ours. Top plot displays results from a random partition, while bottom plot corresponds to the official partition. Our technique reported the lowest error rate (95%CI) while using only 30% of labelled samples.

To further support the previous results and give a global performance view, we compute the error rate and the confidence intervals for each model. The results are reported in Fig. 4 for the official and a random partition of this set. For both experiments, VGG-16 reported the worst performance followed by ResNet-18. Our model performed the best among all the compared models, reporting an error of $5.4 \pm 1.1$ at the 95% confidence level. As for the other criteria, COVID-Net ranked second, reporting an error of the model of $6.7 \pm 1.23$ at the 95% confidence level. One can also observe from the top of Fig. 4 that supervised techniques are highly variable at the change of partition. Such approaches are indeed heavily reliant on the training set being well-representative and balance. In comparison, the variation is negligible with our SSL approach.

In order to analyse the robustness of our model, we run an ablation study for different label counts. In Fig. 5 (top), we present the error rates and confidence intervals obtained by GraphXCOVID with 10%, 20% and 30% of the available labels. Why do not increase the percentage of labels? First, we want to use the lowest possible number of labels. Secondly, we seek to keep the advantage of transductive inference. Indeed, as pointed out in early works e.g. [70], the benefit of a transductive model is decreased when a large number of labels is considered. This effect was observed in our experiments. We finally illustrate in Fig. 5 (bottom), the behaviour of our model along iterations. Around 200 epochs were required to reach a stable error rate when considering 30% of labels.

A visual illustration of the result is next provided in Fig. 6 , where the probability scores of our technique are reported and compared with the human prediction (GT). One can see that the obtained classifier $f_{θ}$ easily differentiate between classes. However, the probability scores (from Fig. 6) are not enough to assist the radiologist in making the decision. To accommodate with this issue, we use a Gradient-weighted Class Activation Mapping [71] type solution to highlight abnormal and normal areas in the lungs, in which Pneumonia and COVID-19 are linked to abnormally regions. Samples outputs of the attention maps are displayed in Fig. 7 and compared with the human prediction (GT). The attention maps aims to accommodate with the mental model on how the radiologists work in a clinical scenario. Therefore, we project the attention in the lungs areas as the diseases affect this region by increasing the density. This tool is designed to help, in a friendly user-interface, the radiologist judging whether the diagnostic is correct or not, and in consequence to accelerate the decision. Additionally in Fig. 8 , we also display some misclassified samples. The intuition behind these cases is as follows. Firstly, the inherent complex appearance of the pathologies projected in the chest X-ray imposes a challenge in the predictive capability of the model. Secondly, the outcome is also effected by the difference in acquisition protocol and vendor machines, which introduce artifacts, blurry effects and noise in the chest x-ray images. This is translated to a tail distribution problem that affects the model’s capability to make predictions. However, we remark that the generalisation error is reduced with the proposed pseudo-labelling and uncertainty mechanism. In particular, we reported less missing cases of COVID than the compared techniques (see the per class result in Tables 1 and 2), and our model globally performs better than the compared techniques including when handling external data (see Fig. 12 along with the discussion).

Fig. 7 — Visualisation of the attention maps overlaid on the corresponding chest x-ray image. Our prediction output is also displayed (see bottom part of each output sample) along with the ground truth (GT) which denotes the human consensus prediction (see top part of each output sample). The attention maps highlight abnormal and normal regions to assist the radiologist in making decisions.

.

Fig. 8 — Visual examples of incorrectly classified samples. The results display the predicted probability score and the comparison with the human diagnostic (GT: ground truth).

Fig. 12 — Performance comparison of our technique and COVIDNet using the BIMCV-COVID19 [68] dataset as a external set. Left side displays the error rate for the COVID-19 class whilst the right side the overall performance of both techniques.

Ablation Study. Finally and to further support the design of our technique, we performed an ablation study regarding the influence of the weighting factors, $υ$ and $ω$ , in our model. We first report a performance comparison of our technique, in terms of error rate and using 30% of labelled data, when considering one, both and none of the two factors. The results are reported at the left side of Fig. 9 . We notice that removing both factors (red bar) increases more than twice the error. In contrast, we observe that a substantial decrease in the error rate is achieved when considering both factors (blue bar). In a closer look at the effect of each factor, we observed that while both factors (green and orange bars) indeed improve the performance, the factor with more influence is the uncertainty mechanism ( $υ$ , see orange bar). We suggest that measuring the uncertainty of the pseudo-labels prevents from obtaining certain incorrect pseudo-labels in early training stages, that are propagated in the next epochs. This effect is illustrated in the right side of Fig. 9. In this illustration, we display $υ$ for all the unlabelled samples, and plots reflect the comparison of the pseudo-label prediction with respect of the ground truth. The red and green areas denote the incorrect and correct pseudo-labels (wrt the ground truth). From this figure, we can observe that the certainty in the pseudo-labels is improved as the epochs evolve. This plot support the strength of $υ$ in our model.

Fig. 9 — (Left side) Ablation study on the influence of the weighting factors, $υ$ and $ω$ , in the performance of our model. The plot displays the error rate (95%CI) using 30% of the labels. (Right side) Effect of the uncertainty mechanism in the pseudo-labels. Correct pseudo-labels are displayed in green and incorrect ones in red.. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Additionally, we performed another ablation study to evaluate the performance of our technique with different amounts of labelled data data in the training set. This is illustrated in Fig. 10 . We observed that the results using 30% of the labelled set is a good trade-off between number of labels vs performance due to the transductivity behaviour. In line with earlier works in transduction e.g.[70], [72], a significant increase in performance is gained from an initial increase in the labelled percentage, from 5% to 30%. After $30 %$ , the introduction of additional labelled data provides a tiny amount of extra accuracy. This highlights the success of our approach in its core aim, that is using semi-supervised learning to alleviate the need for large labelled datasets for COVID X-ray classification.

Fig. 10 — Performance comparison of our technique under different percentage of labelled data. The performance of the model increases with more labeled data, but the increase in performance slows drastically past $30 %$ .

Moreover, we compared our technique (with and without our weighting factors) against two recent deep SSL approaches: MT [45] and GCN [22]. We considered the same label count of 30% and used the parameters suggested in these works. The results are reported in Fig. 11 . We observe that our technique (with the weighing factors) performs the best. MT competes readily with the basic version of our model, i.e. without uncertainty and balance parameters. Finally, GCN was clearly outperformed by all compared models. The intuition behind our performance gain is three-fold. Firstly, our proposed optimisation model itself enforces the inherent relation between the unlabelled and labelled data. Secondly, the generation of accurate pseudo-labels with uncertainty estimation increases generalisation. Thirdly, our approach better deals with imbalance datasets where the number of COVID images makes up a tiny fraction of the data.

Fig. 11 — Performance comparison of our technique (with and without including our weighting factors) against two deep SSL techniques GCN [45] and MT [22]. The results are reported, using 30% of labelled data, in terms of the error rate (95%) .

As a final set of experiments, we test the generalisation capability of our technique, and COVIDNet, to external datasets by including a performance comparison using the BIMCV-COVID19 [68] dataset (see Subsection 4.1. for details on the experiment). The results are displayed in Fig. 12 and show the error rate at the 95% confidence level. From the plots, we can observe that COVIDNet exhibits a substantial decrease in performance on the external dataset whilst our technique is more robust in this regard. Particularly, a strong degradation was observed in the COVID-19 class (see left side plot in Fig. 12). This is an expected behaviour of deep learning models, and is particularly noticed in the medical domain e.g. [73], as one assumes that the testing set should have similar distribution to the training set. Why is our model more robust to external data? Our technique has been carefully designed to mitigate, at some level, external distributions. This is achieved by discarding irrelevant samples with low confidence scores. More precisely, GraphXCOVID controls the predictive uncertainty on the generated pseudo-labels, which narrows the discrepancy between the external distribution samples. Contrary, COVIDNet is not equipped with any mechanism to accommodate with external distributions.

From the aforementioned findings, we emphasise a central message: the strong performance when using far less labelled data than the compared techniques is a core strength of our SSL technique. Moreover, we also underline the robustness of our technique when handling external data. We highlight the value of the vast available unlabelled data in medical domain, and in particular the potentials and benefits for diagnostic COVID-19 disease.

5. Conclusion

In this work, we propose a graph-based deep semi-supervised framework for classifying COVID-19 Chext X-ray images based on an optimisation model for label diffusion. Through the minimisation of a normalised and non-smooth $p = 1$ Dirichlet energy, the model generates meaningful pseudo-labels that are iteratively used to update a deep net. To our knowledge, this is the first graph based deep semi-supervised technique for COVID-19 analysis. From our results, we demonstrated that our technique reports higher sensitivity in COVID-19 and better global performances than the current leading deep supervised technique, while requiring a very reduced set of labels. We also provide attention maps as means to visualise the output of our technique. These visualisation aims to assist the radiologist in judging whether the diagnosis is correct. With this work, we investigate the use of deep semi-supervised learning for novel disease prediction, as for COVID-19. Such approaches alleviate the need for a large labelled dataset, which is costly and time consuming to produce especially for emerging pandemic diseases where both human and monetary resources are stretched thin. Future work includes the exploration of other strategies to measure the uncertainty, e.g. [63], and how this can be adapted to the pseudo-labelling setting. Moreover, the transfer of our findings in a more thorough experimental study to test its clinical potential on a larger patient cohort.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

AIAR gratefully acknowledges support from CMIH and CCIMI, University of Cambridge. PS is supported by EPSRC and NPL. CBS acknowledges support from the Leverhulme Trust project on ’Breaking the non-convexity barrier’, the Philip Leverhulme Prize, the Royal Society Wolfson Fellowship, the EPSRC grants EP/S026045/1 and EP/T003553/1, the EPSRC Centre EP/N014588/1, the Wellcome Innovator Award RG98755, European Union Horizon 2020 research and innovation programmes under the Marie Skodowska-Curie grant agreement 777826 NoMADS and 691,070 CHiPS, the CCIMI and the Alan Turing Institute. AIAR and CBS also thank the team of the project ’AI assisted diagnosis and prognostication in Covid-19’ for very helpful discussions. NP acknowledges H2020 RISE project NoMADS.

Biographies

Angelica I. Aviles-Rivero received the PhD degree (2017) from the Polytechnic University of Catalonia, Spain. She is currently a Research Associated in the Department of Pure Mathematics & Mathematical Statistics (DPMMS), University of Cambridge, UK. Her research lies at the intersection of computational mathematics, computer vision and machine learning for applications to large-scale real world problems. Her central research is to develop new data-driven algorithmic techniques that allow computers to gain high-level understanding from vast amounts of data, this, with the aim of aiding the decisions of users from multiple-disciplines. This line of research has allowed her to gain expertise with a wide range of various data types including medical imaging, computational photography, computer graphics and remote sensing to new a few. She is an IEEE, SIAM, CVF and ACM member.

Philip Sellars received the B.A. and M.Sc. degrees from the University of Cambridge, Cambridge,U.K., in 2017, where he is currently pursuing the Ph.D. in mathematics under the supervision of Prof. C.-B. Schönlieb. His research interests are focused on semi-supervised learning and its applications in computer vision.

Carola-Bibiane Schönlieb received the B.S. degree from the Institute for Mathematics, University of Salzburg, Salzburg, Austria, in 2004, and the Ph.D. degree from the University of Cambridge, Cam-bridge, U.K., in 2009.From 2004 to 2005, she held a teaching position in Salzburg. She held a post-doctoral position at the University of Göttingen, Göttingen, Germany,for one year. She became a Lecturer with the Department of Applied Mathematics and Theoret-ical Physics (DAMTP), University of Cambridge,in 2010, where she was promoted to a Reader in 2015 and a Professor in 2018. Since 2011, she has been a fellow of the Jesus College, University of Cambridge, where she is currently a Professor of applied mathematics with DAMTP, the Head of the Cambridge Image Analysis Group, the Director of the Cantab Capital Institute for Mathematics of Information, and the Director of the Engineering and Physical Sciences Research Council Center for Mathematical and Statistical Analysis of Multimodal Clinical Imaging. Her research interests include variational methods, partial differential equations,and machine learning for image analysis, image processing, and inverse imaging problems.

Nicolas Papadakis received the M. Eng degree in applied mathematics from the National Institute of Applied Sciences, Rouen, France, in 2004 and the M. Sc in Analysis from Rouen University in 2004. I obtained the Ph.D. degree in applied mathematics from the University of Rennes, France, in 2007, under the supervision of Etienne Mmin, with long stays at the Laboratorio de Fluiodinica de Buenos Aires, Argentina in 2006 and 2007. From 2008 to 2010, I had a postdoctoral position in Barcelona Media, Spain, under the supervision of Vicent Caselles. I joined the CNRS (section 07) as a full researcher in 2010 at the Laboratoire Jean Kuntzmann, Grenoble, France. Since 2013, I work in the Institut de Mathmatiques de Bordeaux. I realized a long stay at the Department of Applied Mathematics and Theoretical Physics of the University of Cambridge, UK, in 2018–2019. My main research interests include data assimilation, optimal transportation and machine learning for applications in image analysis and processing.

References

1.Xie X., Zhong Z., Zhao W., Zheng C., Wang F., Liu J. Chest ct for typical 2019-ncov pneumonia: relationship to negative rt-pcr testing. Radiology. 2020;296(2):200–343. doi: 10.1148/radiol.2020200343. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Wikramaratna P., Paton R.S., Ghafari M., Lourenco J. Estimating false-negative detection rate of sars-cov-2 by rt-pcr. MedRxiv (2020) 2020 doi: 10.2807/1560-7917.ES.2020.25.50.2000568. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Zhou S., Wang Y., Zhu T., Xia L. Ct features of coronavirus disease 2019 (covid-19) pneumonia in 62 patients in wuhan, china. American Journal of Roentgenology. 2020;214(6):1287–1294. doi: 10.2214/AJR.20.22975. [DOI] [PubMed] [Google Scholar]
4.Chung M., et al. Ct imaging features of 2019 novel coronavirus (2019-ncov) Radiology. 2020;295(1):202–207. doi: 10.1148/radiol.2020200230. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Fang Y., et al. Sensitivity of chest ct for covid-19: comparison to rt-pcr. Radiology. 2020;296(2):E115–E117. doi: 10.1148/radiol.2020200432. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Jacobi A., Chung M., Bernheim A., Eber C. Portable chest x-ray in coronavirus disease-19 (covid-19): a pictorial review. Clin Imaging. 2020;64:35–42. doi: 10.1016/j.clinimag.2020.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Wong H.Y.F., et al. Frequency and distribution of chest radiographic findings in covid-19 positive patients. Radiology. 2020;296(2):E72–E78. doi: 10.1148/radiol.2020201160. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Cohen J.P., Morrison P., Dao L. Covid-19 image data collection. arXiv preprint arXiv:2003.11597 (2020) 2020 [Google Scholar]
9.Wang L., Lin Z.Q., Wong A. Covid-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. Sci Rep. 2020;10(1):1–12. doi: 10.1038/s41598-020-76550-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Folio L.R. Springer Science & Business Media; 2012. Chest imaging: An algorithmic approach to learning. [Google Scholar]
11.Bruno M.A., Walker E.A., Abujudeh H.H. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction. Radiographics. 2015;35(6):1668–1676. doi: 10.1148/rg.2015150023. [DOI] [PubMed] [Google Scholar]
12.Organization W.H., et al. Technical Report. World Health Organization; 2020. Use of chest imaging in COVID-19: a rapid advice guide, 11 June 2020 [Online] [Google Scholar]; Available: https://apps.who.int/iris/handle/10665/332336
13.Narin A., Kaya C., Pamuk Z. Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. Pattern Analysis and Applications. 2021:1–14. doi: 10.1007/s10044-021-00984-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Apostolopoulos I.D., Aznaouridis S.I., Tzani M.A. Extracting possibly representative covid-19 biomarkers from x-ray images with deep learning approach and image data related to pulmonary diseases. J Med Biol Eng. 2020:1. doi: 10.1007/s40846-020-00529-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Farooq M., Hafeez A. Covid-resnet: a deep learning framework for screening of covid19 from radiographs. arXiv preprint arXiv:2003.14395 (2020) 2020 [Google Scholar]
16.Hemdan E.E.-D., Shouman M.A., Karar M.E. Covidx-net: a framework of deep learning classifiers to diagnose covid-19 in x-ray images. arXiv preprint arXiv:2003.11055 (2020) 2020 [Google Scholar]
17.Roberts M., Driggs D., Thorpe M., Gilbey J., Yeung M., Ursprung S., Aviles-Rivero A.I., Etmann C., McCague C., Beer L., et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for covid-19 using chest radiographs and ct scans. Nature Machine Intelligence. 2021;3(3):199–217. [Google Scholar]
18.Afshar P., Heidarian S., Naderkhani F., Oikonomou A., Plataniotis K.N., Mohammadi A. Covid-caps: a capsule network-based framework for identification of covid-19 cases from x-ray images. Pattern Recognit Lett. 2020;138:638–643. doi: 10.1016/j.patrec.2020.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Fan Y., Liu J., Yao R., Yuan X. Covid-19 detection from x-ray images using multi-kernel-size spatial-channel attention network. Pattern Recognit. 2021:108055. doi: 10.1016/j.patcog.2021.108055. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Laine S., Aila T. Temporal ensembling for semi-supervised learning. International Conference on Learning Representations (ICLR) 2017;4(5):6. [Google Scholar]
21.Berthelot D., Carlini N., Goodfellow I., Papernot N., Oliver A., Raffel C.A. Advances in Neural Information Processing Systems. Vol. 32. 2019. Mixmatch: A holistic approach to semisupervised learning; p. 14. [Google Scholar]
22.Tarvainen A., Valpola H. Advances in neural information processing systems (NIPS) 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results; pp. 1195–1204. [Google Scholar]
23.Verma V., Lamb A., Kannala J., Bengio Y., Lopez-Paz D. Interpolation consistency training for semi-supervised learning. International Joint Conference on Artificial Intelligence (IJCAI) 2019:3635–3641. doi: 10.1016/j.neunet.2021.10.008. [DOI] [PubMed] [Google Scholar]
24.Raghu M., Zhang C., Kleinberg J., Bengio S. Neural Information Processing Systems. 2019. Transfusion: Understanding transfer learning for medical imaging; pp. 3347–3357. [Google Scholar]
25.Aviles-Rivero A.I., Papadakis N., Li R., Sellars P., Fan Q., Tan R., Schönlieb C.-B. International Conference on Medical Image Computing and Computer-Assisted Intervention. 2019. Graphx-net - chest x-ray classification under extreme minimal supervision; pp. 504–512. [Google Scholar]
26.Lee D.-H. Workshop on Challenges in Representation Learning, ICML. Vol. 3. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks; p. 896. [Google Scholar]
27.Wang X., Peng Y., Lu L., Lu Z., Bagheri M., Summers R.M. IEEE Conference on Computer Vision and Pattern Recognition. 2017. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases; pp. 2097–2106. [Google Scholar]
28.Yao L., Prosky J., Poblenz E., Covington B., Lyman K. Weakly supervised medical diagnosis and localization from multiple resolutions. arXiv preprint arXiv:1803.07703 (2018) 2018 [Google Scholar]
29.Irvin J., Rajpurkar P., Ko M., Yu Y., Ciurea-Ilcus S., Chute C., Marklund H., Haghgoo B., Ball R., Shpanskaya K., et al. Proceedings of the AAAI conference on artificial intelligence. Vol. 33. 2019. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison; pp. 590–597. [Google Scholar]
30.Baltruschat I., Nickisch H., Grass M., Knopp T., Saalbach A. Comparison of deep learning approaches for multi-label chest x-ray classification. Sci Rep. 2019;9(1):1–10. doi: 10.1038/s41598-019-42294-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Zhang J., Xie Y., Li Y., Shen C., Xia Y. Covid-19 screening on chest x-ray images using deep learning based anomaly detection. arXiv preprint arXiv:2003.12338 (2020) 2020 [Google Scholar]
32.Wang Z., Xiao Y., Li Y., Zhang J., Lu F., Hou M., Liu X. Automatically discriminating and localizing covid-19 from community-acquired pneumonia on chest x-rays. Pattern Recognit. 2021;110:107613. doi: 10.1016/j.patcog.2020.107613. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Li J., Zhao G., Tao Y., Zhai P., Chen H., He H., Cai T. Multi-task contrastive learning for automatic ct and x-ray diagnosis of covid-19. Pattern Recognit. 2021;114:107848. doi: 10.1016/j.patcog.2021.107848. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.He K., Zhang X., Ren S., Sun J. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Deep residual learning for image recognition; pp. 770–778. [Google Scholar]
35.Huang G., Liu Z., Van Der Maaten L., Weinberger K.Q. IEEE conference on Computer Vision and Pattern Recognition (CVPR) 2017. Densely connected convolutional networks; pp. 4700–4708. [Google Scholar]
36.Simonyan K., Zisserman A. International Conference on Learning Representations. 2015. Very deep convolutional networks for large-scale image recognition; pp. 1–14. [Google Scholar]
37.Kohli M.D., Summers R.M., Geis J.R. Medical image data and datasets in the era of machine learningwhitepaper from the 2016 c-mimi meeting dataset session. J Digit Imaging. 2017;30(4):392–399. doi: 10.1007/s10278-017-9976-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Bar Y., Diamant I., Wolf L., Lieberman S., Konen E., Greenspan H. International Symposium on Biomedical Imaging. IEEE; 2015. Chest pathology detection using deep learning with non-medical training; pp. 294–297. [Google Scholar]
39.Moradi E., Pepe A., Gaser C., Huttunen H., Tohka J., Initiative A.D.N., et al. Machine learning framework for early mri-based alzheimer’s conversion prediction in mci subjects. Neuroimage. 2015;104:398–412. doi: 10.1016/j.neuroimage.2014.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Filipovych R., Davatzikos C., Initiative A.D.N., et al. Semi-supervised pattern classification of medical images: application to mild cognitive impairment (mci) Neuroimage. 2011;55(3):1109–1119. doi: 10.1016/j.neuroimage.2010.12.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Chen H., Li K., Zhu D., Jiang X., Yuan Y., Lv P., Zhang T., Guo L., Shen D., Liu T. Inferring group-wise consistent multimodal brain networks via multi-view spectral clustering. IEEE Trans Med Imaging. 2013;32(9):1576–1586. doi: 10.1109/TMI.2013.2259248. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Dodero L., Gozzi A., Liska A., Murino V., Sona D. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2014. Group-wise functional community detection through joint laplacian diagonalization; pp. 708–715. [DOI] [PubMed] [Google Scholar]
43.An L., Adeli E., Liu M., Zhang J., Shen D. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2016. Semi-supervised hierarchical multimodal feature and sample selection for alzheimers disease diagnosis; pp. 79–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Sun W., Tseng T.-L.B., Zhang J., Qian W. Computerized breast cancer analysis system using three stage semi-supervised learning method. Comput Methods Programs Biomed. 2016;135:77–88. doi: 10.1016/j.cmpb.2016.07.017. [DOI] [PubMed] [Google Scholar]
45.Kipf T.N., Welling M. Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations (ICLR) 2017 [Google Scholar]
46.Chapelle O., Scholkopf B., Zien A. Semi-supervised learning. IEEE Trans. Neural Networks. 2009;20(3) [Google Scholar]; 542–542
47.Wang Z., Zhu X., Adeli E., Zhu Y., Zu C., Nie F., Shen D., Wu G. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2016. Progressive graph-based transductive learning for multi-modal classification of brain disorder disease; pp. 291–299. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Belkin M., Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003;15(6):1373–1396. [Google Scholar]
49.Zhou D., Bousquet O., Lal T.N., Weston J., Schölkopf B. Advances in Neural Information Processing Systems (NIPS) 2004. Learning with local and global consistency; pp. 321–328. [Google Scholar]
50.Bühler T., Hein M. Spectral clustering based on the graph p-laplacian. International Conference on Machine Learning. 2009:81–88. [Google Scholar]
51.Bresson X., Laurent T., Uminsky D., Von Brecht J. Advances in Neural Information Processing Systems. 2013. Multiclass total variation clustering; pp. 1421–1429. [Google Scholar]
52.Hein M., Setzer S., Jost L., Rangapuram S.S. Vol. 26. 2013. The total variation on hypergraphs-learning on hypergraphs revisited; pp. 2427–2435. [Google Scholar]
53.Feld T., Aujol J.-F., Gilboa G., Papadakis N. Rayleigh quotient minimization for absolutely one-homogeneous functionals. Inverse Probl. 2019;35(6):064003. [Google Scholar]
54.Rangapuram S.S., Mudrakarta P.K., Hein M. Advances in Neural Information Processing Systems (NIPS) 2014. Tight continuous relaxation of the balanced k-cut problem; pp. 3131–3139. [Google Scholar]
55.Chambolle A., Pock T. A first-order primal-dual algorithm for convex problems with applications to imaging. J Math Imaging Vis. 2011;40(1):120–145. [Google Scholar]
56.He H., Ma Y. John Wiley & Sons; 2013. Imbalanced learning: Foundations, algorithms, and applications. [Google Scholar]
57.Shi W., Gong Y., Ding C., MaXiaoyu Tao Z., Zheng N. European Conference on Computer Vision (ECCV) 2018. Transductive semi-supervised deep learning using min-max features; pp. 299–315. [Google Scholar]
58.Iscen A., Tolias G., Avrithis Y., Chum O. IEEE Conference on Computer Vision and Pattern Recognition. 2019. Label propagation for deep semi-supervised learning; pp. 5070–5079. [Google Scholar]
59.Sellars P., Aviles-Rivero A., Schönlieb C.B. Two cycle learning: clustering based regularisation for deep semi-supervised classification. arXiv preprint arXiv:2001.05317 (2020) 2020 [Google Scholar]
60.Kukar M., Kononenko I., et al. Proceedings of the 13th European Conference on Artificial Intelligence. Vol. 98. 1998. Cost-sensitive learning with neural networks; pp. 445–449. [Google Scholar]
61.Fernandez A., Garcȡa S., Galar M., Prati R.C., Krawczyk B., Herrera F. Springer; 2018. Learning from imbalanced data sets. [Google Scholar]
62.Kendall A., Gal Y. What uncertainties do we need in bayesian deep learning for computer vision? Adv Neural Inf Process Syst. 2017 [Google Scholar]
63.Abdar M., Pourpanah F., Hussain S., Rezazadegan D., Liu L., Ghavamzadeh M., Fieguth P., Cao X., Khosravi A., Acharya U.R., et al. A review of uncertainty quantification in deep learning: techniques, applications and challenges. Information Fusion. 2021;76:243–297. [Google Scholar]
64.L. Wang et al., Actualmed covid-19 chest x-ray dataset initiative, [Online] Available: https://github.com/agchung/Actualmed-COVID-chestxray-dataset(2020a).
65.L. Wang et al., Fig. 1 covid-19 chest x-ray dataset initiative, [Online]: https://github.com/agchung/Figure1-COVID-chestxray-dataset(2020b).
66.RSNA, The radiological society of north america, [Online]: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data(2019).
67.M. Chowdhury, et al., Covid-19 radiography database, [Online] Available: https://www.kaggle.com/tawsifurrahman/covid19-radiography-database(2020).
68.De La Iglesia Vayá M., Saborit J.M., Montell J.A., Pertusa A., Bustos A., Cazorla M., Galant J., Barber X., Orozco-Beltrán D., García-García F., et al. Bimcv covid-19+: a large annotated dataset of rx and ct images from covid-19 patients. arXiv preprint arXiv:2006.01174 (2020) 2020 [Google Scholar]
69.Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z. IEEE conference on computer vision and pattern recognition. 2016. Rethinking the inception architecture for computer vision; pp. 2818–2826. [Google Scholar]
70.Joachims T. International Conference on Machine Learning. Vol. 99. 1999. Transductive inference for text classification using support vector machines; pp. 200–209. [Google Scholar]
71.Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Proceedings of the IEEE international conference on computer vision. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization; pp. 618–626. [Google Scholar]
72.V. Vapnik, V. Vapnik, Statistical learning theory 156–160, 1998,
73.Wang X., Liang G., Zhang Y., Blanton H., Bessinger Z., Jacobs N. Inconsistent performance of deep learning models on mammogram classification. Journal of the American College of Radiology. 2020;17(6):796–803. doi: 10.1016/j.jacr.2020.01.006. [DOI] [PubMed] [Google Scholar]

PERMALINK

GraphXCOVID: Explainable deep graph diffusion pseudo-Labelling for identifying COVID-19 on chest X-rays

Angelica I Aviles-Rivero

Philip Sellars

Carola-Bibiane Schönlieb

Nicolas Papadakis

Abstract

1. Introduction

2. Related work

2.1. Chest X-ray classification for COVID-19

2.2. Semi-Supervised classification for medical images

3. GraphX-COVID framework

Fig. 1.

3.1. Feature extraction & graph construction

Fig. 2.

3.2. Label diffusion as pseudo-Labelling

3.3. Deep graph pseudo-Labelling update

Fig. 3.

4. Experimental results

4.1. Dataset description

4.2. Evaluation methodology & implementation details

4.3. Results & discussion

Table 1.

Table 2.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

Fig. 12.

Fig. 9.

Fig. 10.

Fig. 11.

5. Conclusion

Declaration of Competing Interest

Acknowledgments

Biographies

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases