CN111144474B

CN111144474B - Multi-view, multi-scale and multi-task lung nodule classification method

Info

Publication number: CN111144474B
Application number: CN201911354214.8A
Authority: CN
Inventors: 黄青松; 张帅威; 刘利军; 冯旭鹏
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2022-06-14
Anticipated expiration: 2039-12-25
Also published as: CN111144474A

Abstract

The invention relates to a multi-view, multi-scale and multi-task pulmonary nodule classification method, and belongs to the technical field of medical image processing. The invention comprises the following steps: firstly, extracting 2D nodule slices of 9 views from a 3-D view; extracting 10mm, 20mm and 40mm patches on the 2D nodule slices; constructing 3 convolutional neural network models, training the patches extracted from each plane view according to the scale, and then fusing the full-link layers of the three models to perform feature fusion of each image of the lung nodule; and performing combined training on the semantic features of the lung nodules on the full-connection layer to obtain a classification result of the semantic features. The method classifies a plurality of semantic features of the pulmonary nodules, and finally accurately marks out the semantic features of the nodules by combining multiple classifications.

Description

Multi-view, multi-scale and multi-task lung nodule classification method

Technical Field

The invention relates to a multi-view, multi-scale and multi-task lung nodule classification method, and belongs to the technical field of medical image processing.

Background

Lung cancer is one of the leading causes of cancer death worldwide, with new cases estimated for lung cancer in 2019 accounting for 13% of all new cases of cancer (45 total cancers) in the united states, with mortality rates as high as 63%. The american National Lung Screening Test (NLST) indicates that lung cancer mortality can be reduced using low-dose CT screening, and therefore, detection and diagnosis of early stage lung cancer is very important for later stage treatment. With the advancement of screening technology, the detection rate of nodules is increasing, but the diagnosis of lung nodules by radiologists based on their level and experience is highly subjective.

Providing meaningful diagnostic features to enhance the objectivity of diagnostic decisions is a significant challenge. Research shows that semantic features of lung nodules can be obtained through image feature learning, and correlation exists among the semantic features. These semantic features include fineness, internal structure, calcification, sphericity, edges, lobulation, spiculation, texture, malignancy. However, in the study of the relationship between the degree of malignancy and other semantic features, each semantic feature is independent of the other semantic features, and the semantic features are not combined.

Disclosure of Invention

The invention provides a multi-view, multi-scale and multi-task lung nodule classification method, which is used for classifying a plurality of semantic features of lung nodules and solving the problem of unbalanced combined multi-task classification data.

The technical scheme of the invention is as follows: the method for classifying the pulmonary nodules based on multiple views, multiple scales and multiple tasks comprises the following specific steps:

step1, extracting 2D nodule slices of 9 views from the 3-D view;

step2, extracting 10mm, 20mm and 40mm patches on the 2D nodule slice;

step3, constructing 3 convolutional neural network models, training the patches extracted from each plane view according to the scale, and then fusing the full-link layers of the three models to perform feature fusion of each image of the lung nodule;

and Step4, performing combined training on the semantic features of the lung nodules at the full-connection layer to obtain a classification result of the semantic features.

Further, the Step1 includes the specific steps of:

step1.1, extracting a 40X40mm cube centered on a nodule for each nodule candidate, the size of the cube being selected to contain all nodule information and to include sufficient background information;

step1.2, in order to extract the pulmonary nodule image, nine slices are extracted on the plane corresponding to the 9 symmetric cubic planes; the three symmetrical cubic planes are planes parallel to the cube and are commonly called sagittal plane, coronal plane and axial plane respectively; the other six planes are planes of symmetry that cut diagonally two opposite faces of the cube, such planes containing two opposite edges and four vertices of the cube.

Further, in Step2, to be able to extract quantitative features from the original nodule block, rather than segmenting the nodule, we consider a nodule slice centered on the nodule with a slice interval thickness of 0.702 mm. The image of the lung nodule is at the center of each slice. Since the largest nodule diameter in our subset is 30mm, selecting a length of 40mm would completely contain all nodules in the slice, and we then used a data expansion method;

for a lung nodule we first extracted 9 slices and then 10mm, 20mm, 40mm slices centered on the nodule. Therefore, the effect of segmenting the lung nodules is achieved, the original edge information of the lung nodules is kept, and then a data balance method is adopted;

for extraction of nodule features on three scales, we adopt a VGG-NET network, and we try to train the Multi-SVT model by adding training samples, and supplement the learning process given limited training samples. We enhance the nodule through an image rotation operation. The rotation method is to rotate a certain angle by taking a vertical axis, a coronal axis and a horizontal axis of the 3-D as central axes respectively, the range of the rotation angle is 0-360 degrees, and then re-slicing is carried out respectively. A series of 40mm lung nodule slices with nodules distributed in the center were obtained. Finally, cutting out lung nodule slices of 10mm and 20mm by taking the nodule as the center on the basis of the 40mm slices;

further, the Step3 includes the specific steps of:

step3.1, constructing 3 convolutional neural network models and training by using patches extracted from each plane view according to scales;

step3.2, respectively inputting the nodule slices of each scale into a Model A, a Model B and a Model C, and then performing feature fusion of each image of the pulmonary nodule on a full-connection layer;

step3.3, the input slice is processed through four convolutional layers and one fully connected layer, and finally a softmax layer.

Further, the Step4 includes the specific steps of:

the image characteristics of the pulmonary nodules are fused in Step4.1 and Step3, and further 64-dimensional information is output by the convolutional neural network at the last full-connected layer through the convolutional layer of the convolutional neural network;

hiding the semantic features of Step4.2 and 9 types in the feature vector, and finally performing softmax classification on each semantic feature respectively; the loss value is obtained through calculation according to the softmax classification result and the lung nodule real label, 9 semantic features are considered simultaneously in combination with multitask, and back propagation is carried out according to the total loss value of the 9 semantic features.

Further, the semantic features comprise fineness, internal structure, calcification, sphericity, edge, lobular sign, burr sign, texture and malignancy parameters, and the internal structure comprises soft tissue, liquid, fat and air parameters.

The invention provides a Multi-SVT method for classifying the malignancy degree of lung nodules by combining nine semantic features, and the classification of other semantic features enhances the objectivity of lung nodule diagnosis decision. The Multi-SVT method takes a plurality of CT images centered on a nodule as model input, and outputs nine semantic features as models. According to the method, the 3D characteristics of the lung nodule are represented by nine views of the lung nodule, and for each view, a lung nodule image with three scales (10mm, 20mm and 40mm) is captured by taking the lung nodule as a center to represent nodule edge information and background information, instead of segmenting the lung nodule, so that the integrity of the lung nodule edge information is ensured. And then, respectively filling the nodule images of each scale into different convolutional neural networks, so as to ensure that model parameters of the lung nodule images of the same scale are shared, then, fusing the output features of the convolutional neural networks of three scales in a full-link layer, finally, performing combined multi-task classification on the nine semantic features of the lung nodules after the fused full-link layer, and training the model by minimizing a global loss function.

The steps of Step2, Step3 and Step4 are further described as follows:

1) the data expansion method comprises the following steps:

studies have shown that the more data the deep learning algorithm accesses is more efficient, while the overfitting of the model can be reduced by data enhancement. Our study can learn not only lung nodule 3D features through multiple views, but also expand the dataset through different combinations of views. In the first step, we reconstruct the whole lung in three dimensions using the DICOM file of the patient in the LIDC dataset, with the help of the software 3D Slicer. In the second step, we obtain the center coordinates L (X, Y, Z) of the nodule according to the position information of the lung nodule marked by the radiologist, then cut a cube with the size of 40X40mm by taking L as the center, and then slice the cube by weight, and the image of the nodule is at the center of each slice. Finally, the vertical shaft, the coronal shaft and the horizontal shaft are respectively used as central shafts to rotate for a certain angle, and the range of the rotation angle is [ 0-360 degrees ], and then the second step of operation is carried out.

The data set can be expanded through the three steps of operations, and the third diagram shows the combination of the original lung nodule slices and the combination of the lung nodule slices which rotate by 45 degrees respectively by taking a vertical axis, a coronal axis and a horizontal axis as central axes. In addition, combining slices at different rotation angles can further expand the data set, with the results shown in fig. 3, and the specific method shown in algorithm 1 in table 1.

TABLE 1 data expansion

2) The data balance method comprises the following steps:

when the training data set is unbalanced, i.e., the available data is not evenly distributed among different classes, the accuracy of the image classification technique is significantly reduced. In the LIDC dataset, the data distribution of semantic features is unbalanced, for example, 70% of nodules of a lobular feature are labeled as

lab le

0, 30% of nodules are labeled as lab le 1, and the same is true for other semantic features. Our model combines nine semantic features of lung nodules, and processing any one of them changes the distribution of the other semantic features. Therefore, a proportional ranking algorithm is provided, and nine semantic features are combined to relieve the condition of training set imbalance.

The nine semantic features have the same weight for the dataset as a whole. In the first step, the score of each nodule is calculated according to equation (1), and the nodules are then sorted from large to small according to this score. And secondly, selecting and ordering the last M nodules, copying the nodules of the last M nodules and adding the nodules to the whole data set. Finally, the above steps are repeated until the data sets are relatively balanced in terms of nine semantic features. Formula (1) is shown below:

in the formula, N is a total classification 46 of the 9 attributes of the lung nodule, d indicates whether the nodule belongs to the class, if yes, 1 is obtained, otherwise, 0 is obtained, m is the total number of nodules, and N is the number of nodules belonging to the class.

For the lung nodules newly added in the above process, the data is not oversampled using the composite few sample oversampling technique (SMOTE), but rather the data is oversampled by rotating a series of new lung nodule images by a slight angle with the vertical axis, the coronal axis, and the horizontal axis of the nodule as the central axis, and the algorithm is shown in table 2 below.

TABLE 2 data Balancing Algorithm

The invention has the beneficial effects that:

1. according to the multi-view, multi-scale and multi-task lung nodule classification method, the 2D image is adopted to represent the 3D image, and the multi-scale and multi-view images are fused to better represent pathological information of lung nodules;

2. the lung nodule semantic feature classification has the problem of data distribution imbalance, and a method is provided for solving the problem of data imbalance of combined multitask classification;

3. classifying a plurality of semantic features of the lung nodule according to the lung nodule features learned by the model, and finally accurately marking out the semantic features of the nodule through combining multi-classification.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of a multi-view, multi-scale, multi-tasking pulmonary nodule classification method of the present invention;

FIG. 3 is a diagram illustrating the result of the data expansion method of the present invention;

FIG. 4 is a result diagram of the nine semantic features of the lung nodule predicted in the present invention, wherein (a) is the case that the nine semantic features of the lung nodule predicted by the model are all correct, and (b) is the case that the semantic feature predicted by the model is partially incorrect;

FIG. 5 shows the selection of lung nodules and background information on the 30mm and 40mm scale involved in the present invention;

Detailed Description

Example 1: as shown in fig. 1-5, the multi-view, multi-scale, multi-task based lung nodule classification method specifically comprises the following steps:

step1, extracting 2D nodule slices of 9 views from the 3-D view;

further, the Step1 includes the specific steps of:

wherein a large public data set, lung image database alliance (LIDC-IDRI), is used to train and verify a proposed CAD system. LIDC-IDRI contained 1018 heterogeneous cases from 7 institutions. The layer thickness of the CT images varied from 0.6mm to 5.0mm with a median of 2.0 mm. For the images in each example, two-stage diagnostic annotation was performed by 4 experienced chest radiologists. In the first stage, each physician independently diagnoses and marks the location of the patient, wherein three categories are marked according to the size of the lung nodule diameter: 1) 3mm nodes, 2) <3mm nodes, 3) > <3mm non-nodes. In the subsequent second stage, each physician independently reviews the labels of the other three physicians and gives the final diagnosis result. The two-stage labeling can completely label all results as far as possible on the premise of avoiding forced consensus. Each case consists of a DICOM image of the associated CT scan and an associated extensible markup language (XML) file containing nodule annotations for up to 4 physicians. The labeling of each semantic feature of a lung nodule depends entirely on past experience of physicians, and different physicians may give different semantic features for the same lung nodule.

Step1.2, in order to extract a required pulmonary nodule image, a three-dimensional integral pulmonary nodule is firstly established by a three-dimensional reconstruction method, then a range of 40x40x40mm is cut by taking the center of the pulmonary nodule as a cube center, and nine slices are extracted on a plane corresponding to a symmetric cube plane. Three of these planes of symmetry are planes parallel to the cube, commonly referred to as sagittal, coronal, and axial planes. The other six planes are symmetry planes which cut diagonally out two opposite faces of the cube. Such a plane contains two opposite edges and four vertices of a cube. Thus, we obtained 9 slice views of each nodule.

The method comprises the steps of performing three-dimensional reconstruction on the whole lung of a patient through a software 3D slicer, finding out a nodule center coordinate L (X, Y, Z) according to position information of a lung nodule marked by an expert after reconstruction, and cutting a cube with the size of 40X40mm by taking the L as the center;

step2, extracting 10mm, 20mm and 40mm patches on the 2D nodule slice;

since the definition of nodule boundaries is ambiguous, variability between radiologists' subjective readers makes accurate nodule delineation a challenging task. So to be able to extract quantitative features from the original nodule block, rather than segmenting the nodule, we consider a nodule slice centered on the nodule. For a lung nodule we first extracted 9 slices and then 10mm, 20mm, 40mm slices centered on the nodule. Therefore, the effect of segmenting the lung nodules is achieved, and meanwhile, the original edge information of the lung nodules is reserved.

For extraction of nodule features on three scales, we adopt a VGG-NET network, and we try to train the Multi-SVT model by adding training samples, supplementing the learning process given limited training samples. We enhance the nodule by an image rotation operation. The rotation method is to rotate a certain angle by taking a vertical axis, a coronal axis and a horizontal axis of the 3-D as central axes respectively, the range of the rotation angle is 0-360 degrees, and then re-slicing is carried out respectively. A series of 40mm lung nodule slices with nodules distributed in the center were obtained. Finally, cutting out lung nodule slices of 10mm and 20mm by taking the nodule as the center on the basis of the 40mm slices;

further, the Step3 includes the specific steps of:

Further, the Step4 includes the specific steps of:

the image characteristics of the lung nodules are fused in Step4.1 and Step3, and further 64-dimensional information is output at the last full-connected layer by the convolutional neural network through the convolutional layer of the convolutional neural network;

hiding the semantic features of Step4.2 and 9 types in the feature vector, and finally performing softmax classification on each type of semantic features respectively; the loss value is obtained through calculation according to the softmax classification result and the lung nodule real label, 9 semantic features are considered simultaneously in combination with multitask, and back propagation is carried out according to the total loss value of the 9 semantic features.

To illustrate the effects of the present invention, the following experiments were performed:

experiment one: by preprocessing the pulmonary nodule annotation information, the grades of nine semantic features of the pulmonary nodule by four experts are unified, and errors caused by non-unified semantic feature grades are well solved. Since previous studies showed a correlation between nine semantic features of lung nodules, the nine semantic features of lung nodules were trained simultaneously in one model by a multitasking method. The Multi-SVT method not only can train a plurality of semantic features of lung nodules through one model, but also can show positive influence on malignancy classification. For validation, the Multi-SV method was compared (as shown in FIG. 2). The Multi-SV method uses the same model structure as the Multi-SVT method, but the output of the model is only a feature tag of Maligancy, and the experimental results are shown in Table 3.

Table 3 comparison of the two methods.

The Multi-SV method and Multi-SVT method in table 3 summarize the average AUC score, accuracy, sensitivity and specificity of the two models. The average AUC of the Multi-SVT model is 0.952, the average accuracy is 0.913, the average sensitivity is 0.843, and the average specificity is 0.929; while the average AUC of the Multi-SV model was 0.932, the average accuracy was 0.905, the average sensitivity was 0.816, and the average specificity was 0.926. The results show that the proposed Multi-SVT method achieves better performance in predicting malignancy than the Multi-SV method.

Our results were also compared to the currently best deep-learning model for lung nodule malignancy prediction. The best performance is achieved at present in terms of classification of malignancy, and the model is explained by classification of five other semantic features. The average AUC scores, accuracies, sensitivities, and specificities of the nine semantic features of the two models are summarized in table 4. On the semantic feature of the malignancy, the average AUC of the Multi-SVT model is 0.952, the average accuracy is 0.913, the average sensitivity is 0.843, and the average specificity is 0.929; the average AUC of the model of the method A is 0.856, the average accuracy is 0.842, the average sensitivity is 0.705, and the average specificity is 0.889. Index evaluation shows that the proposed Multi-SVT method achieves better performance in predicting malignancy than the best-performing method.

Table 4 representation of the method on other semantic features.

In addition, for other semantic feature prediction results, in addition to the malignancy considering other five semantic features (Subtlety, location, locality, Margin, Texture), our method considers all other eight semantic features. The same threshold is taken above five common semantic features, while we define the threshold of three features for the three semantic features of Internal Structure, Lobulation, and Spicification. As shown by the results in the table, we obtained average accuracies of 0.728,0.985,0.956,0.644,0.802,0.750,0.764,0.911 for eight semantic features (subtitle, Internal Structure, location, locality, segmentation, Texture); mean AUC scores of 0.803,0.908,0.935,0.595,0.787,0.721,0.781, 0.848; average sensitivities of 0.676,0.50,0.999,0.824,0.892,0.395,0.605, 0.955; the average specificity was 0.866,0.990,0.771,0.276,0.479,0.855,0.805, 0.595. Through comparison of the Multi-SVT method results and the HSCNN method in the table, the method not only realizes classification of all semantic features, but also has better classification results than the best method at present on the basis of five semantic features considered together.

The figure shows the working of the Multi-SVT model by visualizing View1-9 on a 40mm scale on lung nodules while presenting the model's predictions of nine semantic features and the results of expert labeling. Fig. 4(a) shows that the Multi-SVT model classifies the pulmonary nodules as benign, the contrast of the nodule region and the surrounding region is large, the interior is composed of soft tissue, calcified, spherical, clear-edged, burr-free, paginated, solid, and the prediction of these nine semantic features is the same as the true label.

Fig. 4(b) shows that the Multi-SVT model can also make reasonable interpretation by classifying the lung nodule as benign (the reference label is malignant) and classifying the 5 semantic features (subtlety, specificity, distribution, malignance) of the nodule incorrectly.

Experiment two: multiple view comparative experiments

In our Multi-SVT model, we decomposed each 3-D lung nodule into 9 fixed views and segmented the 9 views at three different scales, resulting in 27 2D images for one nodule. To prove that multi-view ensemble learning is effective, we tested 3 sets of different view combinations.

Table 5 discusses multiple views

The classification performance given in table 5 shows that the more views we use, the higher the classification accuracy and AUC we obtained. The results are not surprising, as more and more informative nodules can be used in our model with more and more perspectives.

Experiment three: comparative experiment of multiple scales

We used images of lung nodules at three dimensions of 10mm, 20mm, and 40 mm. To prove that multi-scale ensemble learning is effective, we tested 3 sets of different scale combinations. Since the 40mm scale can contain all lung nodules, we use this scale as the reference scale.

Table 6 discusses multiscale

The classification performance given in table 6 shows that the more scales we use, the higher the classification accuracy and AUC we obtained. The result is not surprising because we have taken a multi-scale original nodule patch, rather than a segmented region, providing evidence that the information obtained from the original nodule patch is valuable for lung nodule diagnosis.

Experiment four: comparative experiments with different fusion modes

We discuss the combination of different model input streams:

1) Committee-Fusion: the Committee-Fusion method is to merge the outputs of the fully connected layers of each stream and finally classify them. Each ConvNet flow is trained separately using the patches of a particular view. In this configuration, the convolutional layer parameters of different streams are not shared.

2) Late-Fusion: the Late-Fusion method concatenates the outputs of the first fully connected layer and connects the concatenated output directly to the classification layer. In this configuration, the convolutional layer parameters of different streams are shared.

3) Mixed-Fusion: Mixed-Fusion is a combination of the first two approaches. We divided the 27 patches into three separate sets, each set containing 9 different patches of the same scale, using the Late-Fusion method within each set, and using the commit-Fusion method between sets. In this configuration, the convolutional layer parameters for different streams within a set are shared, and the convolutional layer parameters for different streams between sets are not shared.

Table 7 discusses different fusion modes

The results are shown in Table 7. The table shows that we use a mechanism of dividing input stream sharing parameters by scale, and the obtained classification accuracy and AUC are higher. The results are not surprising, as scaling the model parameters more closely fits the model parameter training.

Experiment five: selection of dimensions

For the design of the Multi-SVT model, the Multi-scale selection is chosen to better characterize the lung nodule image and surrounding background information, and we next discuss the reason for choosing the 10, 20, 40mm combination. The distribution situation of the lung nodule diameter can be known from the list, wherein the section of 3-10mm accounts for 81%, the section of 10-20mm accounts for 14%, the section of 20-30mm accounts for 3%, the section of 30-40mm accounts for 0.3%, and the rest accounts for nodules with the diameter less than 3 mm. Therefore, choosing a maximum dimension of 40mm can absolutely encompass the entire nodule and surrounding tissue around the nodule, and not much noise. As shown in fig. 5, when the maximum dimension is selected to be 30mm, part of the lung nodule information and the lung nodule background information are lost.

Consider that selecting 30mm as the largest dimension covers most lung nodules, but some of the nodule information is lost. Choosing 50mm as the largest dimension would cover the entire lung nodule information, but would introduce more noise. Next we discuss experiments that choose three combinations of 10&20&30mm, 10&20&40mm and 10&20&50 mm. For the three combined experiments, we selected a fixed combination of 9 views, a fused mode of Mixed-Fusion, and a fixed model structure. The results are shown in the table.

Table 8 discusses the choice of dimensions.

Analysis of the results in table 8 shows that the best results are obtained when the combination of 10&20&40mm is chosen. As analyzed, selecting a combination of 10&20&30mm will lose background information for some nodules, whereas selecting 10&20&50mm will introduce more noise.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. The lung nodule classification method based on multiple views, multiple scales and multiple tasks is characterized by comprising the following steps of: the method for classifying the pulmonary nodules based on multiple views, multiple scales and multiple tasks comprises the following specific steps:

step1, extracting 2D nodule slices of 9 views from the 3-D view;

step2, extracting 10mm, 20mm and 40mm patches on the 2D nodule slice;

step3, constructing 3 convolutional neural network models, training patches extracted from each plane view according to the scale, and then fusing the full-connection layers of the three models to perform feature fusion of each image of the lung nodule;

step4, performing combined training on the semantic features of the pulmonary nodules at a full-connection layer to obtain a classification result of the semantic features;

the specific steps of Step4 are as follows:

hiding the semantic features of Step4.2 and 9 types in the feature vector, and finally performing softmax classification on each semantic feature respectively; the loss value is obtained through calculation according to the softmax classification result and the lung nodule real label, 9 semantic features are considered simultaneously in combination with multitask, and back propagation is carried out according to the total loss value of the 9 semantic features;

the semantic features comprise fineness, internal structure, calcification, sphericity, edge, lobular feature, burr feature, texture and malignancy parameters, and the internal structure comprises soft tissue, liquid, fat and air parameters.

2. The multi-view, multi-scale, multi-tasking lung nodule classification method of claim 1, wherein: the specific steps of Step1 are as follows:

3. The multi-view, multi-scale, multi-tasking lung nodule classification method of claim 1, wherein: the specific steps of Step3 are as follows:

step3.3, the input slice is processed through four convolution layers and one full join layer, and finally a softmax layer.