MDPI - Publisher of Open Access Journals

30 pages, 4558 KiB

Open AccessArticle

AI-Powered Lung Cancer Detection: Assessing VGG16 and CNN Architectures for CT Scan Image Classification

by Rapeepat Klangbunrueang, Pongsathon Pookduang, Wirapong Chansanam and Tassanee Lunrasri

Informatics 2025, 12(1), 18; https://doi.org/10.3390/informatics12010018 - 11 Feb 2025

Viewed by 581

Lung cancer is a leading cause of mortality worldwide, and early detection is crucial in improving treatment outcomes and reducing death rates. However, diagnosing medical images, such as Computed Tomography scans (CT scans), is complex and requires a high level of expertise. This [...] Read more.

Lung cancer is a leading cause of mortality worldwide, and early detection is crucial in improving treatment outcomes and reducing death rates. However, diagnosing medical images, such as Computed Tomography scans (CT scans), is complex and requires a high level of expertise. This study focuses on developing and evaluating the performance of Convolutional Neural Network (CNN) models, specifically the Visual Geometry Group 16 (VGG16) architecture, to classify lung cancer CT scan images into three categories: Normal, Benign, and Malignant. The dataset used consists of 1097 CT images from 110 patients, categorized according to these severity levels. The research methodology began with data collection and preparation, followed by training and testing the VGG16 model and comparing its performance with other CNN architectures, including Residual Network with 50 layers (ResNet50), Inception Version 3 (InceptionV3), and Mobile Neural Network Version 2 (MobileNetV2). The experimental results indicate that VGG16 achieved the highest classification performance, with a Test Accuracy of 98.18%, surpassing the other models. This accuracy highlights VGG16’s strong potential as a supportive diagnostic tool in medical imaging. However, a limitation of this study is the dataset size, which may reduce model accuracy when applied to new data. Future studies should consider increasing the dataset size, using Data Augmentation techniques, fine-tuning model parameters, and employing advanced models such as 3D CNN or Vision Transformers. Additionally, incorporating Gradient-weighted Class Activation Mapping (Grad-CAM) to interpret model decisions would enhance transparency and reliability. This study confirms the potential of CNNs, particularly VGG16, for classifying lung cancer CT images and provides a foundation for further development in medical applications. Full article

(This article belongs to the Section Medical and Clinical Informatics)

► Show Figures

Figure 1

41 pages, 1802 KiB

Open AccessReview

A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition

by Andisani Nemavhola, Colin Chibaya and Serestina Viriri

Information 2025, 16(2), 107; https://doi.org/10.3390/info16020107 - 5 Feb 2025

Viewed by 877

Abstract

This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent [...] Read more.

This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent collections such as MegaFace and Ms-Celeb-1M, offering a range of sizes, subject diversity, and image quality. Older databases, such as ORL and FERET, are smaller and cleaner, while newer datasets enable large-scale training with millions of images but pose challenges like inconsistent data quality and high computational costs. The study also examines CNN architectures, including FaceNet and Visual Geometry Group 16 (VGG16), which show strong performance on large datasets like Labeled Faces in the Wild (LFW) and VGGFace, achieving accuracy rates above 98%. In contrast, earlier models like Support Vector Machine (SVM) and Gabor Wavelets perform well on smaller datasets but lack scalability for larger, more complex datasets. The analysis highlights the growing importance of multi-task learning and ensemble methods, as seen in Multi-Task Cascaded Convolutional Networks (MTCNNs). Overall, the findings emphasize the need for advanced algorithms capable of handling large-scale, real-world challenges while optimizing accuracy and computational efficiency in face recognition systems. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)

► Show Figures

Figure 1

24 pages, 9651 KiB

Open AccessArticle

Fault Detection in Induction Machines Using Learning Models and Fourier Spectrum Image Analysis

by Kevin Barrera-Llanga, Jordi Burriel-Valencia, Angel Sapena-Bano and Javier Martinez-Roman

Sensors 2025, 25(2), 471; https://doi.org/10.3390/s25020471 - 15 Jan 2025

Viewed by 934

Abstract

Induction motors are essential components in industry due to their efficiency and cost-effectiveness. This study presents an innovative methodology for automatic fault detection by analyzing images generated from the Fourier spectra of current signals using deep learning techniques. A new preprocessing technique incorporating [...] Read more.

Induction motors are essential components in industry due to their efficiency and cost-effectiveness. This study presents an innovative methodology for automatic fault detection by analyzing images generated from the Fourier spectra of current signals using deep learning techniques. A new preprocessing technique incorporating a distinctive background to enhance spectral feature learning is proposed, enabling the detection of four types of faults: healthy motor coupled to a generator with a broken bar (HGB), broken rotor bar (BRB), race bearing fault (RBF), and bearing ball fault (BBF). The dataset was generated from three-phase signals of an induction motor controlled by a Direct Torque Controller under various operating conditions (20–1500 rpm with 0–100% load), resulting in 4251 images. The model, based on a Visual Geometry Group (VGG) architecture with 19 layers, achieved an overall accuracy of 98%, with specific accuracies of 99% for RAF, 100% for BRB, 100% for RBF, and 95% for BBF. A new model interpretability was assessed using explainability techniques, which allowed for the identification of specific learning patterns. This analysis introduces a new approach by demonstrating how different convolutional blocks capture particular features: the first convolutional block captures signal shape, while the second identifies background features. Additionally, distinct convolutional layers were associated with each fault type: layer 9 for RAF, layer 13 for BRB, layer 16 for RBF, and layer 14 for BBF. This methodology offers a scalable solution for predictive maintenance in induction motors, effectively combining signal processing, computer vision, and explainability techniques. Full article

(This article belongs to the Special Issue Feature Papers in Fault Diagnosis & Sensors 2024)

► Show Figures

Figure 1

Figure 1
Overview of the background-enhanced FFT signal processing for fault detection. In (a), the original current FFT signal is displayed. In (b), the distinctive background is created. In (c), the background is combined with the FFT signal, forming a unified representation. (d) shows the final background-enhanced FFT image. In (e), the image is resized to a <math display="inline"><semantics> <mrow> <mn>224</mn> <mo>×</mo> <mn>224</mn> </mrow> </semantics></math> pixels. Finally, (f) represents the input to a CNN model for automatic fault detection. Full article ">Figure 2
Fourier spectra of the current signals for each fault type at 1500 rpm and 100% load. (A) HGB, (B) BRB, (C) RBF, and (D) BBF. The x axis represents the frequency in Hz (up to 180 Hz for visualization), and the y axis represents the magnitude in decibels (dB). The model analyzes frequencies up to 100 Hz. Full article ">Figure 3
Transformation process of the FFT image. (A) Original FFT signal. (B) Smoothed signal after applying the Savitzky–Golay filter of degree 3. (C) Smoothed signal with a degraded background, including horizontal and vertical reference stripes. (D) Final resized image (224 × 224 pixels) prepared for input to the CNN model. Full article ">Figure 4
Training and validation loss, along with training accuracy, over 81 epochs. The loss curves indicate the convergence behavior of the model, while the accuracy curve indicates performance improvements, reaching a validation accuracy of 0.991 at the final epoch. Full article ">Figure 5
Images corresponding to the failure classes (A) HGB, (B) BRB, (C) RBF, and (D) BBF obtained under operating conditions of 1500 rpm and 100% load (row 1). Row 2 presents the saliency maps generated for each image, highlighting the relevant areas used by the model to perform the classification. Full article ">Figure 6
Visualization of the model’s internal interpretability using GradCAM in the VGG19 architecture, which consists of 16 convolutional layers distributed across 6 blocks (the sixth corresponds to classification). The activation maps generated for the classes (A) HGB, (B) BRB, (C) RBF, and (D) BBF highlight the regions relevant for prediction, where blue represents lower activation, red intermediate activation, and yellow maximum activation. Full article ">

25 pages, 8832 KiB

Open AccessArticle

3D-CNN with Multi-Scale Fusion for Tree Crown Segmentation and Species Classification

by Jiayao Wang, Zhen Zhen, Yuting Zhao, Ye Ma and Yinghui Zhao

Remote Sens. 2024, 16(23), 4544; https://doi.org/10.3390/rs16234544 - 4 Dec 2024

Viewed by 957

Abstract

Natural secondary forests play a crucial role in global ecological security, climate change mitigation, and biodiversity conservation. However, accurately delineating individual tree crowns and identifying tree species in dense natural secondary forests remains a challenge. This study combines deep learning with traditional image [...] Read more.

Natural secondary forests play a crucial role in global ecological security, climate change mitigation, and biodiversity conservation. However, accurately delineating individual tree crowns and identifying tree species in dense natural secondary forests remains a challenge. This study combines deep learning with traditional image segmentation methods to improve individual tree crown detection and species classification. The approach utilizes hyperspectral, unmanned aerial vehicle laser scanning data, and ground survey data from Maoershan Forest Farm in Heilongjiang Province, China. The study consists of two main processes: (1) combining semantic segmentation algorithms (U-Net and Deeplab V3 Plus) with watershed transform (WTS) for tree crown detection (U-WTS and D-WTS algorithms); (2) resampling the original images to different pixel densities (16 × 16, 32 × 32, and 64 × 64 pixels) and inputting them into five 3D-CNN models (ResNet10, ResNet18, ResNet34, ResNet50, VGG16). For tree species classification, the MSFB combined with the CNN models were used. The results show that the U-WTS algorithm achieved a recall of 0.809, precision of 0.885, and an F-score of 0.845. ResNet18 with a pixel density of 64 × 64 pixels achieved the highest overall accuracy (OA) of 0.916, an improvement of 0.049 over the original images. After incorporating MSFB, the OA improved by approximately 0.04 across all models, with only a 6% increase in model parameters. Notably, the floating-point operations (FLOPs) of ResNet18 + MSFB were only one-eighth of those of ResNet18 with 64 × 64 pixels, while achieving similar accuracy (OA: 0.912 vs. 0.916). This framework offers a scalable solution for large-scale tree species distribution mapping and forest resource inventories. Full article

► Show Figures

Figure 1

25 pages, 18179 KiB

Open AccessArticle

ES-L2-VGG16 Model for Artificial Intelligent Identification of Ice Avalanche Hidden Danger

by Daojing Guo, Minggao Tang, Qiang Xu, Guangjian Wu, Guang Li, Wei Yang, Zhihang Long, Huanle Zhao and Yu Ren

Remote Sens. 2024, 16(21), 4041; https://doi.org/10.3390/rs16214041 - 30 Oct 2024

Viewed by 987

Abstract

Ice avalanche (IA) has a strong concealment and sudden characteristics, which can cause severe disasters. The early identification of IA hidden danger is of great value for disaster prevention and mitigation. However, it is very difficult, and there is poor efficiency in identifying [...] Read more.

Ice avalanche (IA) has a strong concealment and sudden characteristics, which can cause severe disasters. The early identification of IA hidden danger is of great value for disaster prevention and mitigation. However, it is very difficult, and there is poor efficiency in identifying it by site investigation or manual remote sensing. So, an artificial intelligence method for the identification of IA hidden dangers using a deep learning model has been proposed, with the glacier area of the Yarlung Tsangpo River Gorge in Nyingchi selected for identification and validation. First, through engineering geological investigations, three key identification indices for IA hidden dangers are established, glacier source, slope angle, and cracks. Sentinel-2A satellite data, Google Earth, and ArcGIS are used to extract these indices and construct a feature dataset for the study and validation area. Next, key performance metrics, such as training accuracy, validation accuracy, test accuracy, and loss rates, are compared to assess the performance of the ResNet50 (Residual Neural Network 50) and VGG16 (Visual Geometry Group 16) models. The VGG16 model (96.09% training accuracy) is selected and optimized, using Early Stopping (ES) to prevent overfitting and L2 regularization techniques (L2) to add weight penalties, which constrained model complexity and enhanced simplicity and generalization, ultimately developing the ES-L2-VGG16 (Early Stopping—L2 Norm Regularization Techniques—Visual Geometry Group 16) model (98.61% training accuracy). Lastly, during the validation phase, the model is applied to the Yarlung Tsangpo River Gorge glacier area on the Tibetan Plateau (TP), identifying a total of 100 IA hidden danger areas, with average slopes ranging between 34° and 48°. The ES-L2-VGG16 model achieves an accuracy of 96% in identifying these hidden danger areas, ensuring the precise identification of IA dangers. This study offers a new intelligent technical method for identifying IA hidden danger, with clear advantages and promising application prospects. Full article

► Show Figures

Figure 1

25 pages, 6970 KiB

Open AccessArticle

Urban Land Use Classification Model Fusing Multimodal Deep Features

by Yougui Ren, Zhiwei Xie and Shuaizhi Zhai

ISPRS Int. J. Geo-Inf. 2024, 13(11), 378; https://doi.org/10.3390/ijgi13110378 - 30 Oct 2024

Viewed by 1275

Abstract

Urban land use classification plays a significant role in urban studies and provides key guidance for urban development. However, existing methods predominantly rely on either raster structure deep features through convolutional neural networks (CNNs) or topological structure deep features through graph neural networks [...] Read more.

Urban land use classification plays a significant role in urban studies and provides key guidance for urban development. However, existing methods predominantly rely on either raster structure deep features through convolutional neural networks (CNNs) or topological structure deep features through graph neural networks (GNNs), making it challenging to comprehensively capture the rich semantic information in remote sensing images. To address this limitation, we propose a novel urban land use classification model by integrating both raster and topological structure deep features to enhance the accuracy and robustness of the classification model. First, we divide the urban area into block units based on road network data and further subdivide these units using the fractal network evolution algorithm (FNEA). Next, the K-nearest neighbors (KNN) graph construction method with adaptive fusion coefficients is employed to generate both global and local graphs of the blocks and sub-units. The spectral features and subgraph features are then constructed, and a graph convolutional network (GCN) is utilized to extract the node relational features from both the global and local graphs, forming the topological structure deep features while aggregating local features into global ones. Subsequently, VGG-16 (Visual Geometry Group 16) is used to extract the image convolutional features of the block units, obtaining the raster structure deep features. Finally, the transformer is used to fuse both topological and raster structure deep features, and land use classification is completed using the softmax function. Experiments were conducted using high-resolution Google images and Open Street Map (OSM) data, with study areas on the third ring road of Shenyang and the fourth ring road of Chengdu. The results demonstrate that the proposed method improves the overall accuracy and Kappa coefficient by 9.32% and 0.17, respectively, compared to single deep learning models. Incorporating subgraph structure features further enhances the overall accuracy and Kappa by 1.13% and 0.1. The adaptive KNN graph construction method achieves accuracy comparable to that of the empirical threshold method. This study enables accurate large-scale urban land use classification with reduced manual intervention, improving urban planning efficiency. The experimental results verify the effectiveness of the proposed method, particularly in terms of classification accuracy and feature representation completeness. Full article

► Show Figures

Figure 1

16 pages, 8896 KiB

Open AccessArticle

Automatic Paddy Planthopper Detection and Counting Using Faster R-CNN

by Siti Khairunniza-Bejo, Mohd Firdaus Ibrahim, Marsyita Hanafi, Mahirah Jahari, Fathinul Syahir Ahmad Saad and Mohammad Aufa Mhd Bookeri

Agriculture 2024, 14(9), 1567; https://doi.org/10.3390/agriculture14091567 - 10 Sep 2024

Viewed by 999

Abstract

Counting planthoppers manually is laborious and yields inconsistent results, particularly when dealing with species with similar features, such as the brown planthopper (Nilaparvata lugens; BPH), whitebacked planthopper (Sogatella furcifera; WBPH), zigzag leafhopper (Maiestas dorsalis; ZIGZAG), and green [...] Read more.

Counting planthoppers manually is laborious and yields inconsistent results, particularly when dealing with species with similar features, such as the brown planthopper (Nilaparvata lugens; BPH), whitebacked planthopper (Sogatella furcifera; WBPH), zigzag leafhopper (Maiestas dorsalis; ZIGZAG), and green leafhopper (Nephotettix malayanus and Nephotettix virescens; GLH). Most of the available automated counting methods are limited to populations of a small density and often do not consider those with a high density, which require more complex solutions due to overlapping objects. Therefore, this research presents a comprehensive assessment of an object detection algorithm specifically developed to precisely detect and quantify planthoppers. It utilises annotated datasets obtained from sticky light traps, comprising 1654 images across four distinct classes of planthoppers and one class of benign insects. The datasets were subjected to data augmentation and utilised to train four convolutional object detection models based on transfer learning. The results indicated that Faster R-CNN VGG 16 outperformed other models, achieving a mean average precision (mAP) score of 97.69% and exhibiting exceptional accuracy in classifying all planthopper categories. The correctness of the model was verified by entomologists, who confirmed a classification and counting accuracy rate of 98.84%. Nevertheless, the model fails to recognise certain samples because of the high density of the population and the significant overlap among them. This research effectively resolved the issue of low- to medium-density samples by achieving very precise and rapid detection and counting. Full article

(This article belongs to the Special Issue Advanced Image Processing in Agricultural Applications)

► Show Figures

Figure 1

15 pages, 9305 KiB

Open AccessArticle

Symmetric Keys for Lightweight Encryption Algorithms Using a Pre–Trained VGG16 Model

by Ala’a Talib Khudhair, Abeer Tariq Maolood and Ekhlas Khalaf Gbashi

Telecom 2024, 5(3), 892-906; https://doi.org/10.3390/telecom5030044 - 3 Sep 2024

Cited by 1 | Viewed by 1558

Abstract

The main challenge within lightweight cryptographic symmetric key systems is striking a delicate balance between security and efficiency. Consequently, the key issue revolves around crafting symmetric key schemes that are both lightweight and robust enough to safeguard resource-constrained environments. This paper presents a [...] Read more.

The main challenge within lightweight cryptographic symmetric key systems is striking a delicate balance between security and efficiency. Consequently, the key issue revolves around crafting symmetric key schemes that are both lightweight and robust enough to safeguard resource-constrained environments. This paper presents a new method of making long symmetric keys for lightweight algorithms. A pre–trained convolutional neural network (CNN) model called visual geometry group 16 (VGG16) is used to take features from two images, turn them into binary strings, make the two strings equal by cutting them down to the length of the shorter string, and then use XOR to make a symmetric key from the binary strings from the two images. The key length depends on the number of features in the two images. Compared to other lightweight algorithms, we found that this method greatly decreases the time required to generate a symmetric key and improves defense against brute force attacks by creating exceptionally long keys. The method successfully passed all 15 tests when evaluated using the NIST SP 800-22 statistical test suite and all Basic Five Statistical Tests. To the best of our knowledge, this is the first research to explore the generation of a symmetric encryption key using a pre–trained VGG16 model. Full article

► Show Figures

Figure 1

10 pages, 1304 KiB

Open AccessArticle

Age and Sex Estimation in Children and Young Adults Using Panoramic Radiographs with Convolutional Neural Networks

by Tuğçe Nur Şahin and Türkay Kölüş

Appl. Sci. 2024, 14(16), 7014; https://doi.org/10.3390/app14167014 - 9 Aug 2024

Cited by 1 | Viewed by 1339

Abstract

Image processing with artificial intelligence has shown significant promise in various medical imaging applications. The present study aims to evaluate the performance of 16 different convolutional neural networks (CNNs) in predicting age and gender from panoramic radiographs in children and young adults. The [...] Read more.

Image processing with artificial intelligence has shown significant promise in various medical imaging applications. The present study aims to evaluate the performance of 16 different convolutional neural networks (CNNs) in predicting age and gender from panoramic radiographs in children and young adults. The networks tested included DarkNet-19, DarkNet-53, Inception-ResNet-v2, VGG-19, DenseNet-201, ResNet-50, GoogLeNet, VGG-16, SqueezeNet, ResNet-101, ResNet-18, ShuffleNet, MobileNet-v2, NasNet-Mobile, AlexNet, and Xception. These networks were trained on a dataset of 7336 radiographs from individuals aged between 5 and 21. Age and gender estimation accuracy and mean absolute age prediction errors were evaluated on 340 radiographs. Statistical analyses were conducted using Shapiro–Wilk, one-way ANOVA, and Tukey tests (p < 0.05). The gender prediction accuracy and the mean absolute age prediction error were, respectively, 87.94% and 0.582 for DarkNet-53, 86.18% and 0.427 for DarkNet-19, 84.71% and 0.703 for GoogLeNet, 81.76% and 0.756 for DenseNet-201, 81.76% and 1.115 for ResNet-18, 80.88% and 0.650 for VGG-19, 79.41% and 0.988 for SqueezeNet, 79.12% and 0.682 for Inception-Resnet-v2, 78.24% and 0.747 for ResNet-50, 77.35% and 1.047 for VGG-16, 76.47% and 1.109 for Xception, 75.88% and 0.977 for ResNet-101, 73.24% and 0.894 for ShuffleNet, 72.35% and 1.206 for AlexNet, 71.18% and 1.094 for NasNet-Mobile, and 62.94% and 1.327 for MobileNet-v2. No statistical difference in age prediction performance was found between DarkNet-19 and DarkNet-53, which demonstrated the most successful age estimation results. Despite these promising results, all tested CNNs performed below 90% accuracy and were not deemed suitable for clinical use. Future studies should continue with more-advanced networks and larger datasets. Full article

(This article belongs to the Special Issue Oral Diseases: Diagnosis and Therapy)

► Show Figures

Figure 1

23 pages, 7657 KiB

Open AccessArticle

A Multi-Feature Fusion Method for Urban Functional Regions Identification: A Case Study of Xi’an, China

by Zhuo Wang, Jianjun Bai and Ruitao Feng

ISPRS Int. J. Geo-Inf. 2024, 13(5), 156; https://doi.org/10.3390/ijgi13050156 - 7 May 2024

Cited by 2 | Viewed by 1915

Abstract

Research on the identification of urban functional regions is of great significance for the understanding of urban structure, spatial planning, resource allocation, and promoting sustainable urban development. However, achieving high-precision urban functional region recognition has always been a research challenge in this field. [...] Read more.

Research on the identification of urban functional regions is of great significance for the understanding of urban structure, spatial planning, resource allocation, and promoting sustainable urban development. However, achieving high-precision urban functional region recognition has always been a research challenge in this field. For this purpose, this paper proposes an urban functional region identification method called ASOE (activity–scene–object–economy), which integrates the features from multi-source data to perceive the spatial differentiation of urban human and geographic elements. First, we utilize VGG16 (Visual Geometry Group 16) to extract high-level semantic features from the remote sensing images with 1.2 m spatial resolution. Then, using scraped building footprints, we extract building object features such as area, perimeter, and structural ratios. Socioeconomic features and population activity features are extracted from Point of Interest (POI) and Weibo data, respectively. Finally, integrating the aforementioned features and using the Random Forest method for classification, the identification results of urban functional regions in the main urban area of Xi’an are obtained. After comparing with the actual land use map, our method achieves an identification accuracy of 91.74%, which is higher than other comparative methods, making it effectively identify four typical urban functional regions in the main urban area of Xi’an (e.g., residential regions, industrial regions, commercial regions, and public regions). The research indicates that the method of fusing multi-source data can fully leverage the advantages of big data, achieving high-precision identification of urban functional regions. Full article

(This article belongs to the Special Issue Application of Geographical Information System in Urban Design, Management or Evaluation)

► Show Figures

Figure 1

12 pages, 523 KiB

Open AccessArticle

Automated Ischemic Stroke Classification from MRI Scans: Using a Vision Transformer Approach

by Wafae Abbaoui, Sara Retal, Soumia Ziti and Brahim El Bhiri

J. Clin. Med. 2024, 13(8), 2323; https://doi.org/10.3390/jcm13082323 - 17 Apr 2024

Viewed by 1675

Abstract

Background: This study evaluates the performance of a vision transformer (ViT) model, ViT-b16, in classifying ischemic stroke cases from Moroccan MRI scans and compares it to the Visual Geometry Group 16 (VGG-16) model used in a prior study. Methods: A dataset [...] Read more.

Background: This study evaluates the performance of a vision transformer (ViT) model, ViT-b16, in classifying ischemic stroke cases from Moroccan MRI scans and compares it to the Visual Geometry Group 16 (VGG-16) model used in a prior study. Methods: A dataset of 342 MRI scans, categorized into ‘Normal’ and ’Stroke’ classes, underwent preprocessing using TensorFlow’s tf.data API. Results: The ViT-b16 model was trained and evaluated, yielding an impressive accuracy of 97.59%, surpassing the VGG-16 model’s 90% accuracy. Conclusions: This research highlights the ViT-b16 model’s superior classification capabilities for ischemic stroke diagnosis, contributing to the field of medical image analysis. By showcasing the efficacy of advanced deep learning architectures, particularly in the context of Moroccan MRI scans, this study underscores the potential for real-world clinical applications. Ultimately, our findings emphasize the importance of further exploration into AI-based diagnostic tools for improving healthcare outcomes. Full article

(This article belongs to the Special Issue Advancing Clinical Medicine through Artificial Intelligence (AI) and Digital Technology)

► Show Figures

Figure 1

15 pages, 3246 KiB

Open AccessArticle

Automatic Detection of Banana Maturity—Application of Image Recognition in Agricultural Production

by Liu Yang, Bo Cui, Junfeng Wu, Xuan Xiao, Yang Luo, Qianmai Peng and Yonglin Zhang

Processes 2024, 12(4), 799; https://doi.org/10.3390/pr12040799 - 16 Apr 2024

Cited by 1 | Viewed by 2889

Abstract

With the development of machine vision technology, deep learning and image recognition technology has become a research focus for agricultural product non-destructive inspection. During the ripening process, banana appearance and nutrients clearly change, causing damage and unjustified economic loss. A high-efficiency banana ripeness [...] Read more.

With the development of machine vision technology, deep learning and image recognition technology has become a research focus for agricultural product non-destructive inspection. During the ripening process, banana appearance and nutrients clearly change, causing damage and unjustified economic loss. A high-efficiency banana ripeness recognition model was proposed based on a convolutional neural network and transfer learning. Banana photos at different ripening stages were collected as a dataset, and data augmentation was applied. Then, weights and parameters of four models trained on the original ImageNet dataset were loaded and fine-tuned to fit our banana dataset. To investigate the learning rate’s effect on model performance, fixed and updating learning rate strategies are analyzed. In addition, four CNN models, ResNet 34, ResNet 101, VGG 16, and VGG 19, are trained based on transfer learning. Results show that a slower learning rate causes the model to converge slowly, and the training loss function oscillates drastically. With different learning rate updating strategies, MultiStepLR performs the best and achieves a better accuracy of 98.8%. Among the four models, ResNet 101 performs the best with the highest accuracy of 99.2%. This research provides a direct effective model and reference for intelligent fruit classification. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Food Processing and Food Industries)

► Show Figures

Figure 1

Figure 1
General process from banana harvest to sale. Full article ">Figure 2
Different ripeness bananas’ images. Full article ">Figure 3
Different data augmentation effects: (a) original image, (b) rotation, (c) darkening, (d) brightening, (e) pretzel, and (f) blurring. Full article ">Figure 4
Schematic diagram of CNN and transfer learning method. Full article ">Figure 5
Learning rate updating strategies. Full article ">Figure 6
Accuracy and loss with different fixed initial learning rates: (a) accuracy value, (b) training loss. Full article ">Figure 7
Accuracy and loss with different learning rate updating strategies: (a) accuracy value, (b) training loss. Full article ">Figure 8
Accuracy and loss with different models: (a) accuracy value; (b) training loss. Full article ">Figure 9
Test result of confusion matrix. Full article ">Figure 10
Precision, accuracy, recall, and F1 score on test set. Full article ">

20 pages, 6012 KiB

Open AccessArticle

A Novel Fault Diagnosis Strategy for Diaphragm Pumps Based on Signal Demodulation and PCA-ResNet

by Fanguang Meng, Zhiguo Shi and Yongxing Song

Sensors 2024, 24(5), 1578; https://doi.org/10.3390/s24051578 - 29 Feb 2024

Cited by 3 | Viewed by 1224

Abstract

The efficient and accurate identification of diaphragm pump faults is crucial for ensuring smooth system operation and reducing energy consumption. The structure of diaphragm pumps is complex and using traditional fault diagnosis strategies to extract typical fault characteristics is difficult, facing the risk [...] Read more.

The efficient and accurate identification of diaphragm pump faults is crucial for ensuring smooth system operation and reducing energy consumption. The structure of diaphragm pumps is complex and using traditional fault diagnosis strategies to extract typical fault characteristics is difficult, facing the risk of model overfitting and high diagnostic costs. In response to the shortcomings of traditional methods, this study innovatively combines signal demodulation methods with residual networks (ResNet) to propose an efficient fault diagnosis strategy for diaphragm pumps. By using a demodulation method based on principal component analysis (PCA), the vibration signal demodulation spectrum of the fault condition is obtained, the typical fault characteristics of the diaphragm pump are accurately extracted, and the sample features are enhanced, reducing the cost of fault diagnosis. Afterward, the PCA-ResNet model is applied to the fault diagnosis of diaphragm pumps. A reasonable model structure and advanced residual block design can effectively reduce the risk of model overfitting and improve the accuracy of fault diagnosis. Compared with the visual geometry group (VGG) 16, VGG19, ResNet50, and autoencoder models, the proposed model has improved accuracy by 35.89%, 80.27%, 2.72%, and 6.12%. Simultaneously, it has higher operational efficiency and lower loss rate, solving the problem of diagnostic lag in practical engineering. Finally, a model optimization strategy is proposed through model evaluation metrics and testing. The reasonable parameter range of the model is obtained, providing a reference and guarantee for further optimization of the model. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

16 pages, 3807 KiB

Open AccessArticle

VisFormers—Combining Vision and Transformers for Enhanced Complex Document Classification

by Subhayu Dutta, Subhrangshu Adhikary and Ashutosh Dhar Dwivedi

Mach. Learn. Knowl. Extr. 2024, 6(1), 448-463; https://doi.org/10.3390/make6010023 - 16 Feb 2024

Cited by 1 | Viewed by 3311

Abstract

Complex documents have text, figures, tables, and other elements. The classification of scanned copies of different categories of complex documents like memos, newspapers, letters, and more is essential for rapid digitization. However, this task is very challenging as most scanned complex documents look [...] Read more.

Complex documents have text, figures, tables, and other elements. The classification of scanned copies of different categories of complex documents like memos, newspapers, letters, and more is essential for rapid digitization. However, this task is very challenging as most scanned complex documents look similar. This is because all documents have similar colors of the page and letters, similar textures for all papers, and very few contrasting features. Several attempts have been made in the state of the art to classify complex documents; however, only a few of these works have addressed the classification of complex documents with similar features, and among these, the performances could be more satisfactory. To overcome this, this paper presents a method to use an optical character reader to extract the texts. It proposes a multi-headed model to combine vision-based transfer learning and natural-language-based Transformers within the same network for simultaneous training for different inputs and optimizers in specific parts of the network. A subset of the Ryers Vision Lab Complex Document Information Processing dataset containing 16 different document classes was used to evaluate the performances. The proposed multi-headed VisFormers network classified the documents with up to 94.2% accuracy, while a regular natural-language-processing-based Transformer network achieved 83%, and vision-based VGG19 transfer learning could achieve only up to 90% accuracy. The model deployment can help sort the scanned copies of various documents into different categories. Full article

(This article belongs to the Section Visualization)

► Show Figures

Figure 1

10 pages, 6249 KiB

Open AccessArticle

AI-Based Detection of Oral Squamous Cell Carcinoma with Raman Histology

by Andreas Weber, Kathrin Enderle-Ammour, Konrad Kurowski, Marc C. Metzger, Philipp Poxleitner, Martin Werner, René Rothweiler, Jürgen Beck, Jakob Straehle, Rainer Schmelzeisen, David Steybe and Peter Bronsert

Cancers 2024, 16(4), 689; https://doi.org/10.3390/cancers16040689 - 6 Feb 2024

Cited by 2 | Viewed by 2131

Abstract

Stimulated Raman Histology (SRH) employs the stimulated Raman scattering (SRS) of photons at biomolecules in tissue samples to generate histological images. Subsequent pathological analysis allows for an intraoperative evaluation without the need for sectioning and staining. The objective of this study was to [...] Read more.

Stimulated Raman Histology (SRH) employs the stimulated Raman scattering (SRS) of photons at biomolecules in tissue samples to generate histological images. Subsequent pathological analysis allows for an intraoperative evaluation without the need for sectioning and staining. The objective of this study was to investigate a deep learning-based classification of oral squamous cell carcinoma (OSCC) and the sub-classification of non-malignant tissue types, as well as to compare the performances of the classifier between SRS and SRH images. Raman shifts were measured at wavenumbers k₁ = 2845 cm⁻¹ and k₂ = 2930 cm⁻¹. SRS images were transformed into SRH images resembling traditional H&E-stained frozen sections. The annotation of 6 tissue types was performed on images obtained from 80 tissue samples from eight OSCC patients. A VGG19-based convolutional neural network was then trained on 64 SRS images (and corresponding SRH images) and tested on 16. A balanced accuracy of 0.90 (0.87 for SRH images) and F1-scores of 0.91 (0.91 for SRH) for stroma, 0.98 (0.96 for SRH) for adipose tissue, 0.90 (0.87 for SRH) for squamous epithelium, 0.92 (0.76 for SRH) for muscle, 0.87 (0.90 for SRH) for glandular tissue, and 0.88 (0.87 for SRH) for tumor were achieved. The results of this study demonstrate the suitability of deep learning for the intraoperative identification of tissue types directly on SRS and SRH images. Full article

(This article belongs to the Special Issue Recent Advances in Oncology Imaging)

► Show Figures

Figure 1

Figure 1
Annotations of tissue classes “Squamous epithelium”, “Stroma”, and “Tumor” on an SRH image (A) and transferred annotations on a corresponding SRS image (B) as well as tiles generated from the annotations with class labels “Squamous epithelium”, “Stroma”, and “Tumor” on a SRH image (C) and on the corresponding SRS image (D). Only tiles that intersect with an annotation by 99% were kept for the generation of the dataset. Full article ">Figure 2
Ground truth class labels for each tile (A) and predicted class labels for each tile (B) on a sample SRS image. Both true tiles with class label “Stroma” were classified correctly, whereas 6 tiles with class label “Tumor” were incorrectly classified as “Squamous epithelium” (5 tiles) and “Stroma” (1 tile). Ground truth class labels for each tile (C) and predicted class labels for each tile (D) on a sample SRH image. Both true tiles with class label “Stroma” were classified correctly, whereas 8 tiles with class label “Tumor” were incorrectly classified as “Squamous epithelium” (5 tiles) and “Stroma” (3 tiles). Full article ">Figure 3
Confusion matrices for the classification of the CNN on the SRS test dataset (left) and the corresponding SRH test dataset (right). The diverging colormap shows small values in dark blue with increasing brightness according to increasing values. Large values are shown in dark red with decreasing brightness according to increasing values. Full article ">

Search Results (73)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (73)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI