Detailed Description
The present invention is described in further detail below with reference to the attached drawings and examples.
As shown in fig. 1, the method of the present invention comprises the steps of:
(1) Inputting a mammary gland X-ray photograph to be diagnosed.
(2) And extracting the region of interest on the input mammary gland x-ray photograph to obtain the initial suspicious lump region position.
The analysis of the whole image not only has a large amount of redundant information, but also is easy to introduce errors. In order to improve the speed and accuracy of the processing, the processing object needs to be reduced from the whole image to a plurality of small areas, namely, regions of interest, and the positions of the regions of interest are the positions of the initial suspicious lump areas of the subsequent processing.
For region of interest extraction, a left and right breast image comparison method has been studied (see in particular F.F. Yin, M.L. Giger, doi Kunio, and et al, "Computerized detection of massages in digital mammograms: analysis of biological events," Medical Physics, 18.
Considering the characteristics of high brightness, approximately circular shape, and contrast due to gray level difference with surrounding tissues, the method of the present invention uses a template matching method to locate the region of interest, i.e., the initial suspicious tumor region. The concrete implementation is as follows:
(2.1) obtaining a template T by utilizing a two-dimensional hyperbolic secant (sech) function, and calculating the correlation between the input mammary gland x-ray radiography and the template T to obtain a correlation image;
a two-dimensional hyperbolic secant (sech) function is adopted to generate a template T with the size of (2L + 1) x (2L + 1)
The center of the template is used as an origin, x and y represent horizontal and vertical coordinates in the template, the value ranges are [ -L, L ], and L is a positive integer. α = input mammogram maximum gray level-1, β = ln (2 × α)/L × L.
Moving the template T on the input mammary gland x-ray photograph pixel by pixel, and calculating the correlation cor (T, S) between the template T and the subimage S covered by the template T by using the formula (2)
Wherein, mu st Is the average value, mu, of the gray-scale products of the corresponding pixels of the sub-image S and the template T t And mu s Is the average of the pixel gray levels, σ, within the template T and the sub-image S t And σ s Is the variance value of the pixel gray levels in the template T and the sub-image S. By applying the method, each pixel point in the input mammary gland X-ray radiography can be calculated to obtain a correlation value related to the template T, and the correlation value range is [ -1,1]. Then all the correlation degrees are calibrated to obtain a correlation degree graphLike this. Setting the correlation degree smaller than 0 as 0; otherwise, the input mammography maximum gray level is multiplied by the correlation value.
(2.2) carrying out binarization processing on the correlation degree image by using a selection threshold, extracting all connected regions from the binarization correlation degree image and determining the center of each connected region;
selecting a suitable threshold T low Performing binarization on the correlation image obtained in the step (2.1)That is, if the correlation value of a certain pixel point of the correlation image is smaller than T low Then the point is assigned a value of 0, otherwise it is assigned a value of 1. And extracting a plurality of connected regions from the correlation degree binary image. And (3) taking the point with the maximum correlation degree in the corresponding region in the correlation degree image obtained in the step (2.1) as the center of each connected region.
(2.3) redefining a template with a size different from that of the template in the step (2.1) by using the template defining method in the step (2.1), and performing multi-scale analysis on each connected region extracted in the step (2.2);
according to the template defining method described in step (2.1), 3 templates are redefined according to 200%,33% and 66% of the original template size 2L + 1. For each connected region extracted in the step (2.2), moving the center of each scale template to the position of the center of the connected region on the original input mammogram, respectively calculating the correlation degree of each scale template and the corresponding sub-image in the original input mammogram image, taking the maximum value of the calculated 3 correlation degrees and the correlation degree obtained in the step (2.1) as the final correlation degree of the connected region, and selecting a proper threshold value T high (T high Greater than T low ) The final degree of correlation is less than T high The connected regions of (c) are excluded as false positive regions (non-tumor regions considered by the computer).
(2.4) screening the residual connected regions processed in the step (2.3) by using area and shape rules;
and (4) further screening the connected regions which are not excluded in the multi-scale analysis method in the step (2.3) by adopting simple area and shape rules. In general, a connected region with a small area (usually 5-30 pixels) is usually corresponding to a mammary calcification focus, while a long-strip-shaped region is usually corresponding to normal glandular tissue, and the connected region satisfying the two conditions is excluded as a false positive region.
(2.5) extracting a region of interest;
after the region screening in step (2.4), a square (usually 125 × 125 pixels) is cut out from the original input breast image for the remaining connected region by taking its geometric center as the center, and the extracted square region is the region of interest or referred to as the initial suspicious mass region.
(3) And (3) segmenting the suspicious masses in the region of interest extracted in the step (2), and determining the boundaries of the suspicious masses.
After the region of interest is extracted, suspicious masses contained in the region of interest need to be segmented, and the boundaries of the suspicious masses need to be accurately determined.
Multi-layer topographic region growing has been studied for suspicious mass Segmentation (see in particular b.zheng, y.h. chang, d.gu. "Computerized Detection of massages in differential Mammograms using single-Image Segmentation and a multi-layer topographic visualization analysis," ad.radio., 2: 259-266 (2001)), segmentation based on multi-resolution analysis (see in particular Liu, C.babbs, and E.Delp, "Multiresolution Detection of partitioned versions in Digital algorithms," IEEE Transactions on Image Processing,10 (6): 874 884 (2001)), threshold Segmentation based on Fuzzy entropy (see in particular S.Amr.abdel-Daym, mahmoud r.El-Sakka, "Fuzzy entry based segmented Detection of partitioned versions in Digital algorithms," Images Processing of the 2005 Engineering and biological analysis number reference, 4017-4022 (Septer 2005, 2005), and so on.
The method adopts a segmentation method based on image gradient and dynamic programming method. The concrete implementation is as follows:
(3.1) determining candidate boundary points on the boundary of the suspicious lump by using the gradient correlation characteristics of the image of the region of interest;
as shown in the left diagram of fig. 3, for the step (2) of obtaining a region of interest and a given gray threshold Thres 1 The region of interest can be binarized. Obtaining a corresponding contour line 1 in the binarized region of interest by using a boundary tracking method; using a threshold value Thres 2 Repeating the above process to obtain the contour line 2; to meet i Repeating the above process to obtain a contour line I; say, adopt threshold Thres n Repeating the above process to obtain the contour line N. And selecting a plurality of thresholds to obtain a plurality of contour lines (contour line groups), wherein the density of the contour lines is related to the image gradient, the image gradient at the dense part of the contour lines is larger, and the image gradient at the sparse part of the contour lines is smaller.
And (4) taking the center of the region of interest as an end point, sequentially leading R rays outwards in an anticlockwise direction at equal angle intervals from a zero-degree angle, and solving the intersection point of each ray and the contour line group. If the Euclidean distance between two adjacent intersection points on the same ray is less than D min Then the two intersections are said to be connected. For the existence of more than S min And the connected point set is represented by taking the center of the connected point set as a candidate boundary point. There may or may not be multiple candidate boundary points on a ray. The right image in fig. 3 is the candidate boundary points marked in the original region of interest.
(3.2) obtaining a plurality of candidate boundary lines from the candidate boundary points on each ray obtained in the step (3.1), and selecting the optimal candidate boundary line by using a dynamic planning method, namely determining the boundary line of the suspicious lump;
ideally, if there is only one candidate boundary point on each ray in step (3.1), the candidate boundary points on each ray are connected in sequence to form a unique candidate boundary line, i.e. the suspicious lump boundary line. In practical situations, however, a plurality of candidate boundary points may be provided on one ray, one candidate boundary point is selected on each ray each time, the candidate boundary points selected on each ray are sequentially connected to form a candidate boundary line, so that a plurality of candidate boundary lines can be formed, according to the characteristics that the actual boundary line passes through the position where the image gray value changes greatly (namely the image gradient value is large) and has certain smoothness, the method of the invention determines the cost of each candidate boundary line by using a dynamic programming method and setting a cost function, and selects a candidate boundary line with the optimal cost from all candidate boundary lines as the final suspicious lump boundary line. The concrete implementation is as follows:
setting a candidate boundary line S: { n 1 ,n 2 ,…n R }, variable n i Representing the candidate boundary points selected on the ith ray. If the ith ray has no candidate boundary point, carrying out interpolation according to the distance between the candidate boundary point on the (i-1) th ray and the (i + 1) th ray and the center of the region of interest. The cost function C of the candidate boundary line S is the sum of local costs corresponding to all candidate boundary points on the candidate boundary line, that is:
local cost C (n) i ) From an internal cost C int (n i ) And an external cost C ext (n i ) Consists of the following components:
C(n i )=αC int (n i )+C ext (n i ) (4)
where alpha is a constant value for adjusting the smoothness of the boundary line.
Internal cost C int (n i ) Defined as candidate boundary points n i And n i-1 Normalized distance between:
Wherein dist (n) i ,n i-1 )、dist(O,n i )、dist(O,n i-1 ) Respectively represent candidate boundary points n i And n i-1 Center of interest O and n i Center of interest O and n i-1 The smaller the normalized distance, i.e., the smoother the candidate boundary line, the smaller the cost.
External cost C ext (n i ) Defined as candidate boundary points n i The more connected points represented by a candidate point, the larger the gradient of the candidate point, the more likely the candidate point is to be a real boundary point.
C ext (n i )=-(n i Number of communication points) (6)
And obtaining the cost of each candidate boundary line according to the cost function, and determining an optimal candidate boundary line in all the candidate boundary lines by adopting dynamic planning as a final suspicious lump boundary line.
(4) The serial related characteristic values of the segmented suspicious mass are calculated, and the selected characteristics can be generally divided into geometric characteristics, morphological characteristics, gray level characteristics, texture characteristics and the like.
The selected eigenvalues should follow several characteristics:
(1) identifiability: the characteristic values of different types of objects are obviously different;
(2) reliability: similar characteristic values are applied to the similar objects;
(3) independence: strong correlation should not exist among the characteristic values;
according to the above rules, 25 features of the suspicious tumor region are extracted, and the detailed description is shown in the table of fig. 4.
(5) And inputting the calculated characteristic values into a classifier, and automatically analyzing the initial suspicious tumor region by a computer to determine the final suspicious tumor region.
Suspicious mass classification is the last stage of automatic detection of breast masses. After extracting the characteristic values reflecting the characteristics of the tumor from the suspicious tumor with the boundary, the classifier is used to determine whether the suspicious tumor is positive (the true tumor area considered by the computer) or false positive. The selection and design of the classifier largely determines the accuracy of the tumor detection. Classification is an important component of pattern recognition theory, and can generally use methods such as linear classification, heuristic rules, statistical classification, fuzzy classification, artificial neural networks, and the like to classify features. The method of the invention adopts an improved k nearest neighbor method to classify the extracted features in the step (4).
The basic rule of the k-nearest neighbor method is: and (4) finding k samples closest to (or most similar to) the feature vector of the test sample from all the samples (except the test sample), voting the samples, and classifying the test sample into the category with the largest sample voting number. The k-nearest neighbor classifier adopted by the method firstly defines a feature vector similarity function and a decision function (DI for short).
(1) Definition of similarity function
Test specimen Y Q Is denoted as V (Y) Q ) Sample X divided by test sample) is denoted as V (X), the similarity function is defined as the inverse of the squared euclidian distance between two feature vectors, i.e.
(2) Definition of decision function
In pair with the test sample Y Q In the decision making process, in principle, the influence of the sample participation decision with the feature vector closer to the feature vector should be larger, and in an original k-nearest neighbor classifier, the influence of the vector distance difference is difficult to be reflected by a simple voting methodAnd (6) defining a row. The following takes decision test samples as the tumor class and the normal class as examples.
Calculating the test sample Y by the first k samples with the similarity arranged from large to small as shown in the formula (8) Q The decision value of (c). In and test sample Y Q Among the first k recent samples, mass was designated as Mass, normal as Norm, and N, sim (Y) were designated as Normal Q X) is the test specimen Y defined in (1) Q Similarity to sample X, rnk (X) represents the order of sample X in the similarity arrangement, X j Mass Represents the jth lump sample, and the value range of j is [1, M'],X l Norm Represents the l normal class sample, and the value range of l is [1, N ]]。
The decision value calculation method considers the classes of samples adjacent to the test sample like a simple voting method, also considers the sequence of the similarity of the samples and the test sample, and experiments prove that the decision function is superior to the calculation method of the original k neighbor decision function.
For the application of the classifier, training data is firstly used to train the classifier to obtain classifier parameters suitable for specific problems, and the classifier training process includes two steps of collecting classifier training data and obtaining classifier parameters, as shown in fig. 5:
(5.1) collecting classifier training data;
firstly, inputting a group of mammary X-ray radiographs with known diagnosis results, and applying the steps of the region-of-interest extraction, the suspicious tumor region segmentation and the feature extraction of the suspicious tumor region to obtain the segmentation results of the suspicious tumor and a series of related feature values, thereby completing the collection of classifier training data.
(5.2) obtaining classifier parameters;
training the designed classifier by using the calculated related characteristic value of the suspicious mass and the actual diagnosis result of the suspicious mass to obtain classification parameters, and writing the classification parameters into a classifier parameter file until the training process of the classifier is finished.
(6) And (3) positioning the final segmentation result of the suspicious tumor region on the X-ray mammary gland photograph to be diagnosed input in the step (1), and displaying the characteristic value of the tumor region calculated in the step (4) to a user according to the requirement.
As shown in fig. 2, the diagnosis assistance system of the present invention includes an input module 100, a region of interest extraction module 200, a suspicious mass segmentation module 300, a suspicious mass region feature extraction module 400, a classification diagnosis module 500, and an output module 600.
The input module 100 is used for receiving the x-ray radiograph of the breast to be diagnosed input by the user and transmitting the x-ray radiograph to the region of interest extraction module 200.
The region of interest extraction module 200 extracts the region of interest in the input mammography, obtains the initial suspicious mass region position information according to the steps described in the step (2), and transmits the position information to the suspicious mass segmentation module 300.
The suspicious tumor segmentation module 300 segments the suspicious tumor in the region of interest extracted by the region of interest extraction module 200 according to the process described in the step (3), so as to obtain boundary information of the suspicious tumor, and transmit the boundary information to the suspicious tumor region feature extraction module 400.
The suspicious mass region feature extraction module 400 calculates a series of region-related feature values, such as geometric features, morphological features, gray-scale features, texture features, etc., according to the received boundary information of the suspicious mass, and transmits the result to the classification diagnosis module 500.
The classification diagnosis module 500 inputs the calculated feature value of each suspicious mass region into a classifier, performs computer automatic classification and identification on the initial suspicious mass region, determines the final suspicious mass region, and transmits the result to the output module 600, the output module 600 positions the segmentation result of the final suspicious mass region automatically detected by the computer on the input x-ray mammogram of the breast to be diagnosed, and displays the calculated region-related feature value to the user as required.
Example (c):
the invention provides a breast cancer computer-aided diagnosis method based on galactophore X-ray radiography and a system thereof, which relate to a plurality of parameters, the parameters are comprehensively adjusted and set aiming at the data characteristics of specific processing so as to achieve the good performance of the whole system, and the parameters set aiming at the data set processing of the invention are listed:
step (2.1), obtaining an initial template size related parameter L =25 by using a two-dimensional hyperbolic secant (sech) function;
step (2.2) selecting threshold value T for binarization processing of correlation degree image low =0.5 × input mammary x-ray radiograph maximum gray level;
the threshold value T selected in the multi-scale analysis of the step (2.3) high =0.6 × input mammography maximum gray level;
and (3.1) sequentially leading R =64 rays outwards along the counterclockwise direction at equal angular intervals from a zero-degree angle by taking the center of the region of interest as an end point, and obtaining the intersection point of each ray and the contour line group. If the Euclidean distance between two adjacent intersection points on the same ray is less than D min And =3, the two intersection points are said to be connected. For the existence of more than S min =10 connected sets of points;
step (3.2) local cost C (n) i ) From an internal cost C int (n i ) And an external cost C ext (n i ) Consists of the following components:
C(n i )=αC int (n i )+C ext (n i )
where the constant parameter a =110 for adjusting the smoothness of the boundary line.
The method automatically analyzes and processes the suspicious breast lump area in the mammary X-ray radiograph through a mammary X-ray radiograph-based breast cancer computer-aided diagnosis system, provides the lump position and the lump shape, and provides a series of characteristic parameters related to the area according to the requirement, thereby prompting the radiologist to focus on the area needing important attention and the area-related important parameters, and improving the accuracy and the efficiency of the radiologist for breast cancer diagnosis to a certain extent. The implementation of the present invention is not limited to the scope disclosed in the above examples, and the technical solutions described above may be implemented in a manner different from the above examples.