1. Introduction
Breast cancer, according to the WHO, is the leading cause of cancer deaths in women worldwide. It affects approximately 1 in 8 women in their lifetime. It starts in the lining cells of the ducts or lobules of the breast, often without symptoms in the early stage. Over time, it can spread and cause symptoms such as breast lump, skin redness and nipple discharge. Metastasis, the spread to surrounding tissues, contributes to the severity of the disease [
1]. Breast cancer screening involves imaging from several angles with low radiation to avoid side effects. Early detection, crucial to reducing premature deaths, underscores the importance of regular medical consultation to identify breast changes and prevent disease.
Mastography has been shown to be useful in the early detection of breast tumours. However, it is not considered an effective diagnostic method because in dense breasts it can decrease the sensitivity of the image by up to 33%. Therefore, breast ultrasound is a complementary method to evaluate the mammary gland [
2]. In addition, breast ultrasound provides crucial information to distinguish between benign and malignant tumours, allowing clinicians and radiologists to analyse features and determine the nature of the tumour. The key parameters are morphology, which addresses shape and structure (regular borders for benign tumours and irregular appearance for malignant tumours), and echogenicity, which reflects the amount of echo in the image (hypoechoic for benign and more hypoechoic for malignant). The results indicate that breast ultrasound is highly effective, with a sensitivity of 69.62% and a specificity of 64.19%. In cases requiring microscopic examination, ultrasound is used in conjunction with fine needle aspiration biopsy (FNA).
Breast ultrasound, with a sensitivity of 73.53% and specificity of 80%, is preferred over other methods because of its efficacy and less pain. Although ultrasound plus fine needle aspiration biopsy (FNA) can occasionally be painful as it requires multiple applications to obtain tissue, this method is chosen for its efficacy [
3]. The real-time two-dimensional image provided by ultrasound allows the use of additional techniques such as Doppler to analyse blood flow and detect tumours with increased flow. Breast ultrasound datasets are classified into benign and malignant images, with other forms of classification such as BI-RADS studied [
4]. Ultrasound imaging, combined with computer vision tools, demonstrates satisfactory results in breast tumour classification, detection, and segmentation [
5].
The aim of the work is to detect breast tumour type using images processed in the code. This aims to assist radiologists in the pre-selection and detection of lesions, with the purpose of improving diagnostic accuracy and reducing workload and complications associated with over-diagnosis [
6]. In a study to detect breast tumour types, Machine Learning algorithms were used. The Support Vector Machine (SVM) classifier with recursive feature elimination and Random Forest (RF) achieved a high accuracy of 98.25%, promising accurate tumour stage detection [
7]. Another study employing various machine learning techniques, including deep neural networks (DNN), convolutional neural networks (CNN), artificial neural networks (ANN) and recursive feature elimination (RFE), found that the DNN-based approach achieved an outstanding accuracy of 97% in accurate breast tumour classification. This suggests the high effectiveness of neural networks in diagnosing this disease [
8]. Finally, they developed a computer-aided diagnosis (CAD) algorithm based on convolutional neural network (CNN). The accuracy obtained was 85.38%. This result demonstrates the effectiveness of the developed algorithm for breast tumour detection [
9].
On the other hand, the KAGGLE website has been used to obtain the database, where many databases can be found. For this project, the “Breast Ultrasound Image Dataset” database has been selected. This database has three folders named benign, malignant, and normal, with 891, 421, and 266 images, respectively [
10]. The feature of this database will allow us to determine the level of certainty of the developed algorithm by comparing the mask with the processed image. Finally, ultrasound mammography is a tool for early detection of breast tumour type by identifying changes in the breast before they are palpable, visible or cause any symptoms. The ultrasound mammography image is used as input and prepared for optimal visualisation of breast tumours. MATLAB version 2022 was used to develop the image processing, thanks to the tools it possesses, generating a certain advantage over the rest of the language.
2. Methodology
Figure 1 shows an overview of the steps involved in imaging benign or malignant breast tumours by breast ultrasound.
The diagram consists of seven steps. In the first, the data are loaded in “.png” format. In the second, pre-processing is carried out to improve the quality of the image. In the third stage, image segmentation is carried out. In the fourth, features are extracted to recognise the contours. In the fifth step, the image is classified into different tumour types. In the sixth, a specific tumour type is assigned. Finally, a possible treatment for the tumour identified in the previous step is considered.
2.1. Database and Image Upload
The breast ultrasound dataset has 780 images collected in 2018 and, so far, a total of 1578 images have been collected from women aged 25–75 years. The database has folders for benign, malignant, and normal categories, with 891 (143 mb), 421 (65.9 mb) and 266 (48 mb) images and weights, respectively. This gives a total of 256.9 mb. It should be noted that each ultrasonic image has its own mask, which is not used in this research. The normal category has been excluded, and 60 images of both benign and malignant have been taken to evaluate the algorithm. The average size of the images is 500 × 500 pixels and they are in PNG format. The use of machine learning in combination with these ultrasound images has shown excellent results in the classification, detection, and segmentation of breast tumours.
On the other hand, the work focuses on using ultrasound images, as illustrated in
Figure 2, for breast tumour detection. These images are generated by emitting high-frequency sound waves that penetrate the breast tissue and reflect off various structures, providing a real-time image. The images, in “.png” format, are part of the “Breast Ultrasound Image Dataset”.
2.2. Preprocessing
Improving the quality of ultrasound mammography images is essential to accurately detect breast tissue characteristics. Preprocessing involves converting the images to greyscale using a specific equation. This ensures that the images are properly formatted to calculate pixel intensity values and perform morphological operations to enhance the shape and structure of the object to be analysed in the image.
The NTSC formula is used to calculate and convert an image to greyscale.
In this formula:
is the intensity value of the greyscale image at the coordinates (x, y).
are the intensity values of the red, green, and blue channels of the original RGB image at the coordinates (x, y), respectively.
2.3. Segmentation
After the initial stages and identification of the input image, segmentation of the tissues of interest, in this case breast tumours, is performed. These processes are broken down in
Figure 3.
In ultrasound mammography, segmentation is essential to detect suspicious areas of breast tumours and to separate breast tissue from other tissues such as fat and muscle. Several image processing techniques are employed, such as thresholding to highlight regions of interest, followed by dilation and gap-filling to expand and fill the objects of interest. This helps to highlight the edges and segment the areas of interest, as shown in
Figure 4. This process is essential for early and accurate detection of abnormalities in breast tissues and thus for effective diagnosis.
The general equation for thresholding an image is shown below:
In this formula, B (x, y) represents the resulting pixel value at the (x, y) coordinates of the binarized image. I (x, y) is the value corresponding to the (x, y) coordinates of the original greyscale image, and T represents the threshold used to classify the pixels as black or white. Subsequently, morphological operations of dilation and gap filling were used to obtain
Figure 4.
In the next stage, using the Canny operator,
Figure 5 shows the edge detection with the distinct contours in the greyscale image. This operator is chosen because of its accuracy in edge detection and its ability to reduce noise. After highlighting the edges, dilation and filling are applied to improve continuity, thus facilitating analysis and extraction of tumour features.
In the third stage, active contour segmentation, known as “SNAKES”, was implemented to achieve accurate segmentation of objects in the ultrasound images. An initial mask was iteratively adjusted to obtain a clear representation of shapes and structures, allowing for effective comparison of equivalent points and distances between them. Active contours, such as “snakes”, are segmentation methods that seek to define the boundaries of objects in an image by means of an energy calculation that relates the shape of the contour to the image features. This energy is minimized to obtain the desired contour. Variants such as deformable and region-based contours are adapted to specific findings with mathematical formulations.
A snake is a time-varying parabola of coordinates
defined in the image plane
[
11]. The following energy function
expresses the shape of the contour, and it is this function that must be minimised to determine the final shape and position of the snake:
In these formulas:
is the total energy.
represents the energy function dependent on .
are the internal and external energy components, respectively.
is the variable function parameterized by .
are weighting functions.
are the first and second derivatives of whit respect to .
Characteristics described above allow for effective control of the physical behaviour and the local continuity of the contour. Final result is show in
Figure 6.
The fourth stage consists of the multiplication of three previously obtained binary images: the thresholder one, the one generated by the Canny operator and the one resulting from the active contours. The product of these images highlights the pixels that are common in all the original images. This process combines the edge information obtained by different techniques into a single image, highlighting the shared edges, as shown in
Figure 7. This improves the accuracy and consistency of edge detection in the final image.
In the fifth stage, the labelling of the image regions resulting from the multiplication of the previous stage is carried out. This involves identifying and assigning labels to the different areas of the image. Then, features such as the area and the bounding box of each region in the previously loaded images are obtained. In this case, 60 benign and 35 malignant images were used with the algorithm, which generates a table of values that facilitates the identification of specific objects, in this case tumours. This process provides detailed information about the regions and allows for a more specific analysis of the tumours.
2.4. Extraction of Characteristics
Feature extraction involves identifying and obtaining relevant numerical and visual properties from the segmented image. These features present essential aspects of the objects to be analysed and are used for further analysis or classification. In binarised and segmented images, the tumour usually appears white on a black background, but surrounding noise can affect accuracy. Therefore, features such as area and bounding box are used to identify the tumour more accurately. The regionprops command, which uses area and boundingbox, is essential in this application. The area is calculated for each labelled region in the multiplied edge image, thus identifying the largest object by its maximum area. The bounding box is obtained for each region and is used to define a rectangle surrounding the tumour. In summary, the area and bounding box are essential quantitative features that facilitate the analysis of the detected objects in a segmented image. Subsequently, the region of interest is cropped using specific values to obtain a more centred view of the tumour. The result of this cropping is presented in a new figure, as illustrated in
Figure 8, allowing a clearer and more isolated visualisation of the tumour.
3. Classification
This project uses classification to determine whether a tumour is benign or malignant using artificial neural networks. Sixty images were carefully selected, considering factors such as illumination, contrast, and size to ensure accurate classification. These images came from the breast ultrasound image dataset, with 33 benign and 27 malignant images. Of the total, 50 images (25 malignant and 25 benign) were used to train the neuronal network, and the remaining 10 were reserved to assess the accuracy of the tests. Five convolutional neural networks were used: AlexNet, LeNet5, LeNet5-Like, VGGNet16, and ZFNet, which have various applications such as facial emotion recognition [
12], object recognition [
13], and medical applications [
14], which is the subject of this study.
Firstly, five tests were performed for each of the neural networks. The highest network result will indicate the highest accuracy and the average of the five tests will indicate the mean value of the accuracy. These results allow us to compare the capabilities of each model in terms of accuracy during the training and testing phase.
The architecture of AlexNET was elaborated by [
15] and investigated by [
16]. The network can learn high-level features from images, using convolutional and clustering layers to extract visual features from input images. This makes it an efficient architecture for image classification and recognition. After training and testing this neural network 5 times, an average accuracy of 93.60% was obtained in training, while in testing an accuracy of 86.00% was obtained (see
Table 1). For training, the algorithm got a total of 22 out of the 25 images selected for training, failing only two images in the benign ones. In the malignant ones, it matched all 25 images, thus showing a 96% match between the actual and predicted labels. On the other hand, for testing or validation, eight benign and two malignant images were used. The code determined that these images met the segmentation requirement, which allowed the classification analysis to be performed. Seven of the eight benign images were found to be correct, and only one image was missing. The malignant images matched two of the eight benign images and none were missing. Thus, there was a 90.00% match between the actual and predicted labels.
The LeNET-5 architecture is used to extract image features using convolutions and clustering layers, which reduces dimensionality and highlights key features. These features are then processed through fully connected layers to perform classification. After training and evaluating the Le Net-5 neural network, an average accuracy of 100% in training and 72% in testing was obtained, as shown in
Table 1. During training, the algo-rhythm achieved a 100% match between the actual and predicted labels for the 25 benign and 25 malignant images used. However, in the test or validation, in which 8 benign and two malignant images were used, a pre-accuracy of 87.5% was obtained for the benign images (seven correct and one incorrect) and 50% for the malignant images (one correct and one incorrect). The result was an 80% match between the actual labels and those predicted in the tests.
The LeNET5 LIKE architecture is like LeNET5, but with improvements adapted to the needs of the problem at hand, such as the possibility of modifying the number of layers, the size of the filters and adding additional layers, among others. After training and testing the LeNET5-LIKE neural network, an average accuracy of 100% in training and 72% in testing was achieved, as shown in
Table 1. During training, the algorithm achieved a 100% match between the actual and predicted labels for both the 25 benign and 25 malignant images used. However, in the test or validation, where eight benign and two malignant images were used, an accuracy of 87.5% was obtained for the benign images (seven correct and one incorrect) and 50% for the malignant images (one correct and one incorrect). The result was an 80% match between the actual labels and the pre-test labels.
The VGGNET-16 neural network is a convolutional network architecture widely used in image classification, face recognition, computer vision, etc. It stands out for its ability to learn features and patterns in images, which facilitates the extraction of specific objects from the images. It stands out for its ability to learn features and patterns of objects in images, which facilitates the extraction of specific objects from images. In terms of results, VGGNET-16 obtained an average accuracy of 74.40% in training and 66.00% in testing, as shown in
Table 1. During training, the algorithm achieved a 100% match between the actual and predicted labels for both the 25 benign and 25 malignant images used. In the test or validation, where eight benign and two malignant images were used, all benign images were classified correctly, with no errors. For the malignant images, one was correctly classified, and one was misclassified. The result was a 90.00% agreement between the actual labels and those predicted in the tests.
The ZFNET neural network is inspired by the AlexNET architecture and focuses on image classification. It has five convolutional layers to capture simple features and three fully connected layers for more abstract features. It obtained an average accuracy of 96.40% in training and 80.00% in testing, as shown in
Table 1. In training, the algorithm got 24 of the 25 selected images right, making one error on a benign image, resulting in a 98% match between the actual and predicted labels. In the test, which included eight benign and two malignant images, the algorithm was correct on seven of the eight benign images and on the two malignant images, resulting in a 90% match between the actual and predicted labels.
The table below shows the results generated by each of the neural redescriptions, but this time in more detail.
After running and analysing several neural networks, it was determined that VGGNET16 obtained the best result with an accuracy of 90% in the test. On the other hand, AlexNET stood out by achieving the best average, with an average accuracy of 86%, which is considered an acceptable result. These findings demonstrate the ability of neural networks to make accurate predictions with unseen data in-between.