[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (73)

Search Parameters:
Keywords = VGG 16

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
30 pages, 4558 KiB  
Article
AI-Powered Lung Cancer Detection: Assessing VGG16 and CNN Architectures for CT Scan Image Classification
by Rapeepat Klangbunrueang, Pongsathon Pookduang, Wirapong Chansanam and Tassanee Lunrasri
Informatics 2025, 12(1), 18; https://doi.org/10.3390/informatics12010018 - 11 Feb 2025
Viewed by 581
Abstract
Lung cancer is a leading cause of mortality worldwide, and early detection is crucial in improving treatment outcomes and reducing death rates. However, diagnosing medical images, such as Computed Tomography scans (CT scans), is complex and requires a high level of expertise. This [...] Read more.
Lung cancer is a leading cause of mortality worldwide, and early detection is crucial in improving treatment outcomes and reducing death rates. However, diagnosing medical images, such as Computed Tomography scans (CT scans), is complex and requires a high level of expertise. This study focuses on developing and evaluating the performance of Convolutional Neural Network (CNN) models, specifically the Visual Geometry Group 16 (VGG16) architecture, to classify lung cancer CT scan images into three categories: Normal, Benign, and Malignant. The dataset used consists of 1097 CT images from 110 patients, categorized according to these severity levels. The research methodology began with data collection and preparation, followed by training and testing the VGG16 model and comparing its performance with other CNN architectures, including Residual Network with 50 layers (ResNet50), Inception Version 3 (InceptionV3), and Mobile Neural Network Version 2 (MobileNetV2). The experimental results indicate that VGG16 achieved the highest classification performance, with a Test Accuracy of 98.18%, surpassing the other models. This accuracy highlights VGG16’s strong potential as a supportive diagnostic tool in medical imaging. However, a limitation of this study is the dataset size, which may reduce model accuracy when applied to new data. Future studies should consider increasing the dataset size, using Data Augmentation techniques, fine-tuning model parameters, and employing advanced models such as 3D CNN or Vision Transformers. Additionally, incorporating Gradient-weighted Class Activation Mapping (Grad-CAM) to interpret model decisions would enhance transparency and reliability. This study confirms the potential of CNNs, particularly VGG16, for classifying lung cancer CT images and provides a foundation for further development in medical applications. Full article
(This article belongs to the Section Medical and Clinical Informatics)
Show Figures

Figure 1

Figure 1
<p>The three lung CT scan images.</p>
Full article ">Figure 2
<p>The comparison of model accuracies training and validation accuracy curves of four deep learning architectures. (Source: Authors analysis from data, 2024).</p>
Full article ">Figure 3
<p>The validation accuracy trends of four deep learning architectures. (Source: Authors analysis from data, 2024).</p>
Full article ">Figure 4
<p>Analysis of Performance Trajectories for Four Deep Learning Architectures. (Source: Authors analysis from data, 2024).</p>
Full article ">Figure 5
<p>Confusion Matrices. (Source: Authors analysis from data, 2024).</p>
Full article ">Figure 6
<p>Grad-CAM visualization using ResNet50 for lung CT analysis. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 7
<p>Grad-CAM visualization using ResNet50 for lung CT scan showing benign prediction. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 8
<p>Grad-CAM visualization using ResNet50 for lung CT scan with benign prediction. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 9
<p>Grad-CAM visualization using InceptionV3 for lung CT scan with normal prediction. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 10
<p>Grad-CAM visualization with InceptionV3 for lung CT image analysis reveals scattered nodular opacities, indicating possible abnormalities. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 11
<p>Grad-CAM visualization using InceptionV3 for lung CT scan with malignant prediction. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 12
<p>Grad-CAM visualization using MobileNetV2 for lung CT scan with benign prediction. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 13
<p>Grad-CAM visualization using MobileNetV2 for lung CT scan with malignant prediction. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 14
<p>Grad-CAM visualization of MobileNetV2 for lung malignancy prediction with a 0.1% confidence improvement. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 15
<p>Grad-CAM visualization (VGG16) highlighting nodular opacities in bilateral upper lungs with asymmetric right-side activation. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 16
<p>Grad-CAM Visualization of Interstitial Patterns in CT Using VGG16. (Source: author’s analysis from data, 2024).</p>
Full article ">Figure 17
<p>Grad-CAM Visualization of VGG16 for CT Image with Right-Sided Activation. (Source: author’s analysis from data, 2024).</p>
Full article ">
41 pages, 1802 KiB  
Review
A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition
by Andisani Nemavhola, Colin Chibaya and Serestina Viriri
Information 2025, 16(2), 107; https://doi.org/10.3390/info16020107 - 5 Feb 2025
Viewed by 877
Abstract
This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent [...] Read more.
This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent collections such as MegaFace and Ms-Celeb-1M, offering a range of sizes, subject diversity, and image quality. Older databases, such as ORL and FERET, are smaller and cleaner, while newer datasets enable large-scale training with millions of images but pose challenges like inconsistent data quality and high computational costs. The study also examines CNN architectures, including FaceNet and Visual Geometry Group 16 (VGG16), which show strong performance on large datasets like Labeled Faces in the Wild (LFW) and VGGFace, achieving accuracy rates above 98%. In contrast, earlier models like Support Vector Machine (SVM) and Gabor Wavelets perform well on smaller datasets but lack scalability for larger, more complex datasets. The analysis highlights the growing importance of multi-task learning and ensemble methods, as seen in Multi-Task Cascaded Convolutional Networks (MTCNNs). Overall, the findings emphasize the need for advanced algorithms capable of handling large-scale, real-world challenges while optimizing accuracy and computational efficiency in face recognition systems. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)
Show Figures

Figure 1

Figure 1
<p>Face recognition steps [<a href="#B26-information-16-00107" class="html-bibr">26</a>].</p>
Full article ">Figure 2
<p>PRISMA-ScR diagram showing all the steps taken to filter out articles [<a href="#B8-information-16-00107" class="html-bibr">8</a>].</p>
Full article ">Figure 3
<p>Databases used in face recognition systems.</p>
Full article ">Figure 4
<p>ORL Database samples.</p>
Full article ">Figure 5
<p>FERET database samples.</p>
Full article ">Figure 6
<p>AR database samples.</p>
Full article ">Figure 7
<p>XM2VTS database samples.</p>
Full article ">Figure 8
<p>FGRC database samples.</p>
Full article ">Figure 9
<p>LFW database samples.</p>
Full article ">Figure 10
<p>CMU Multi-PIE database samples.</p>
Full article ">Figure 11
<p>VGG architecture.</p>
Full article ">
24 pages, 9651 KiB  
Article
Fault Detection in Induction Machines Using Learning Models and Fourier Spectrum Image Analysis
by Kevin Barrera-Llanga, Jordi Burriel-Valencia, Angel Sapena-Bano and Javier Martinez-Roman
Sensors 2025, 25(2), 471; https://doi.org/10.3390/s25020471 - 15 Jan 2025
Viewed by 934
Abstract
Induction motors are essential components in industry due to their efficiency and cost-effectiveness. This study presents an innovative methodology for automatic fault detection by analyzing images generated from the Fourier spectra of current signals using deep learning techniques. A new preprocessing technique incorporating [...] Read more.
Induction motors are essential components in industry due to their efficiency and cost-effectiveness. This study presents an innovative methodology for automatic fault detection by analyzing images generated from the Fourier spectra of current signals using deep learning techniques. A new preprocessing technique incorporating a distinctive background to enhance spectral feature learning is proposed, enabling the detection of four types of faults: healthy motor coupled to a generator with a broken bar (HGB), broken rotor bar (BRB), race bearing fault (RBF), and bearing ball fault (BBF). The dataset was generated from three-phase signals of an induction motor controlled by a Direct Torque Controller under various operating conditions (20–1500 rpm with 0–100% load), resulting in 4251 images. The model, based on a Visual Geometry Group (VGG) architecture with 19 layers, achieved an overall accuracy of 98%, with specific accuracies of 99% for RAF, 100% for BRB, 100% for RBF, and 95% for BBF. A new model interpretability was assessed using explainability techniques, which allowed for the identification of specific learning patterns. This analysis introduces a new approach by demonstrating how different convolutional blocks capture particular features: the first convolutional block captures signal shape, while the second identifies background features. Additionally, distinct convolutional layers were associated with each fault type: layer 9 for RAF, layer 13 for BRB, layer 16 for RBF, and layer 14 for BBF. This methodology offers a scalable solution for predictive maintenance in induction motors, effectively combining signal processing, computer vision, and explainability techniques. Full article
(This article belongs to the Special Issue Feature Papers in Fault Diagnosis & Sensors 2024)
Show Figures

Figure 1

Figure 1
<p>Overview of the background-enhanced FFT signal processing for fault detection. In (<b>a</b>), the original current FFT signal is displayed. In (<b>b</b>), the distinctive background is created. In (<b>c</b>), the background is combined with the FFT signal, forming a unified representation. (<b>d</b>) shows the final background-enhanced FFT image. In (<b>e</b>), the image is resized to a <math display="inline"><semantics> <mrow> <mn>224</mn> <mo>×</mo> <mn>224</mn> </mrow> </semantics></math> pixels. Finally, (<b>f</b>) represents the input to a CNN model for automatic fault detection.</p>
Full article ">Figure 2
<p>Fourier spectra of the current signals for each fault type at 1500 rpm and 100% load. (<b>A</b>) HGB, (<b>B</b>) BRB, (<b>C</b>) RBF, and (<b>D</b>) BBF. The x axis represents the frequency in Hz (up to 180 Hz for visualization), and the y axis represents the magnitude in decibels (dB). The model analyzes frequencies up to 100 Hz.</p>
Full article ">Figure 3
<p>Transformation process of the FFT image. (<b>A</b>) Original FFT signal. (<b>B</b>) Smoothed signal after applying the Savitzky–Golay filter of degree 3. (<b>C</b>) Smoothed signal with a degraded background, including horizontal and vertical reference stripes. (<b>D</b>) Final resized image (224 × 224 pixels) prepared for input to the CNN model.</p>
Full article ">Figure 4
<p>Training and validation loss, along with training accuracy, over 81 epochs. The loss curves indicate the convergence behavior of the model, while the accuracy curve indicates performance improvements, reaching a validation accuracy of 0.991 at the final epoch.</p>
Full article ">Figure 5
<p>Images corresponding to the failure classes (<b>A</b>) HGB, (<b>B</b>) BRB, (<b>C</b>) RBF, and (<b>D</b>) BBF obtained under operating conditions of 1500 rpm and 100% load (row 1). Row 2 presents the saliency maps generated for each image, highlighting the relevant areas used by the model to perform the classification.</p>
Full article ">Figure 6
<p>Visualization of the model’s internal interpretability using GradCAM in the VGG19 architecture, which consists of 16 convolutional layers distributed across 6 blocks (the sixth corresponds to classification). The activation maps generated for the classes (<b>A</b>) HGB, (<b>B</b>) BRB, (<b>C</b>) RBF, and (<b>D</b>) BBF highlight the regions relevant for prediction, where blue represents lower activation, red intermediate activation, and yellow maximum activation.</p>
Full article ">
25 pages, 8832 KiB  
Article
3D-CNN with Multi-Scale Fusion for Tree Crown Segmentation and Species Classification
by Jiayao Wang, Zhen Zhen, Yuting Zhao, Ye Ma and Yinghui Zhao
Remote Sens. 2024, 16(23), 4544; https://doi.org/10.3390/rs16234544 - 4 Dec 2024
Viewed by 957
Abstract
Natural secondary forests play a crucial role in global ecological security, climate change mitigation, and biodiversity conservation. However, accurately delineating individual tree crowns and identifying tree species in dense natural secondary forests remains a challenge. This study combines deep learning with traditional image [...] Read more.
Natural secondary forests play a crucial role in global ecological security, climate change mitigation, and biodiversity conservation. However, accurately delineating individual tree crowns and identifying tree species in dense natural secondary forests remains a challenge. This study combines deep learning with traditional image segmentation methods to improve individual tree crown detection and species classification. The approach utilizes hyperspectral, unmanned aerial vehicle laser scanning data, and ground survey data from Maoershan Forest Farm in Heilongjiang Province, China. The study consists of two main processes: (1) combining semantic segmentation algorithms (U-Net and Deeplab V3 Plus) with watershed transform (WTS) for tree crown detection (U-WTS and D-WTS algorithms); (2) resampling the original images to different pixel densities (16 × 16, 32 × 32, and 64 × 64 pixels) and inputting them into five 3D-CNN models (ResNet10, ResNet18, ResNet34, ResNet50, VGG16). For tree species classification, the MSFB combined with the CNN models were used. The results show that the U-WTS algorithm achieved a recall of 0.809, precision of 0.885, and an F-score of 0.845. ResNet18 with a pixel density of 64 × 64 pixels achieved the highest overall accuracy (OA) of 0.916, an improvement of 0.049 over the original images. After incorporating MSFB, the OA improved by approximately 0.04 across all models, with only a 6% increase in model parameters. Notably, the floating-point operations (FLOPs) of ResNet18 + MSFB were only one-eighth of those of ResNet18 with 64 × 64 pixels, while achieving similar accuracy (OA: 0.912 vs. 0.916). This framework offers a scalable solution for large-scale tree species distribution mapping and forest resource inventories. Full article
Show Figures

Figure 1

Figure 1
<p>Overview map of the study area: (<b>a</b>) location of Heilongjiang Province in a map of the administrative areas of China; (<b>b</b>) aerial view of Maoershan Experimental Forest Farm, the numbered areas numbered 1, 2, 3, 4, and 5 are drone flight zones.; aerial views of unmanned aerial vehicle (UAV) flight area Nos. 4 (<b>c</b>) and 5 (<b>d</b>); and aerial photos of (<b>e</b>) mixed coniferous–broadleaf forest and (<b>f</b>) mixed broadleaf forest.</p>
Full article ">Figure 2
<p>Flowchart of the research process. Note: CHM, canopy height model; GLCM, gray level co-occurrence matrix; RFE, recursive feature elimination; WST, watershed transform.</p>
Full article ">Figure 3
<p>Flowchart of the U-net + watershed transform algorithm.</p>
Full article ">Figure 4
<p>Mean spectral reflectance curves of seven tree species groups: birch, elm, Korean pine, Manchurian ash, Manchurian walnut, other coniferous trees, and other broadleaf trees.</p>
Full article ">Figure 5
<p>Relationship between tree crown images of different pixel densities and feature map sizes obtained by the convolutional neural network model; the original images were resampled to different pixel densities: (<b>a</b>) 8 × 8 pixels, (<b>b</b>) 16 × 16 pixels, (<b>c</b>) 32 × 32 pixels, and (<b>d</b>) 64 × 64 pixels.</p>
Full article ">Figure 6
<p>Schematic of the ResNet18 model + multi-scale fusion branch module (MSFB) structure. Note: C, channel; D, depth; W, width; and H, height.</p>
Full article ">Figure 7
<p>Individual tree crown delineation results of three algorithms with a 15 × 15 m plot: (<b>a</b>) the reference tree crown; the results of the (<b>b</b>) U-WST, (<b>c</b>) D-WST, and (<b>d</b>) WST algorithms. Note: CHM, canopy height model.</p>
Full article ">Figure 8
<p>Recursive feature elimination and feature importance ranking based on random forest model.</p>
Full article ">Figure 9
<p>Confusion matrix of the ResNet18 model at four pixel densities: (<b>a</b>) 8 × 8 pixels, (<b>b</b>) 16 × 16 pixels, (<b>c</b>) 32 × 32 pixels, and (<b>d</b>) 64 × 64 pixels. Note: PA represents Producer’s Accuracy, UA represents User’s Accuracy, KP represents Korean pine, BR represents birch, MA represents Manchurian ash, MW represents Manchurian walnut, OC represents Other coniferous trees, OB represents Other broad-leaved trees, the intensity of the color represents the magnitude of the value.</p>
Full article ">Figure 10
<p>Maps of the distribution of tree species groups in part of unmanned aerial vehicle (UAV) flight region No. 4: (<b>a</b>) UAV flight region No. 4 with a background of hyperspectral images, (<b>b</b>) enlarged view of a local area, and (<b>c</b>) map of tree species prediction results.</p>
Full article ">Figure 11
<p>Spatial importance and feature importance based on the Shapley Additive exPlanations (SHAP) method: (<b>a</b>) tree crown segmentation; (<b>b</b>) tree species classification; (<b>c</b>) a stack of 40 channels of the original image; (<b>d</b>) a stack of 40 channels after using the SHAP method; (<b>e</b>) SHAP feature importance based on seven tree species groups.</p>
Full article ">Figure 12
<p>Performance of the U-WST algorithm in identifying tree canopies in mixed broadleaf forests. Note: CHM, canopy height model.</p>
Full article ">
25 pages, 18179 KiB  
Article
ES-L2-VGG16 Model for Artificial Intelligent Identification of Ice Avalanche Hidden Danger
by Daojing Guo, Minggao Tang, Qiang Xu, Guangjian Wu, Guang Li, Wei Yang, Zhihang Long, Huanle Zhao and Yu Ren
Remote Sens. 2024, 16(21), 4041; https://doi.org/10.3390/rs16214041 - 30 Oct 2024
Viewed by 987
Abstract
Ice avalanche (IA) has a strong concealment and sudden characteristics, which can cause severe disasters. The early identification of IA hidden danger is of great value for disaster prevention and mitigation. However, it is very difficult, and there is poor efficiency in identifying [...] Read more.
Ice avalanche (IA) has a strong concealment and sudden characteristics, which can cause severe disasters. The early identification of IA hidden danger is of great value for disaster prevention and mitigation. However, it is very difficult, and there is poor efficiency in identifying it by site investigation or manual remote sensing. So, an artificial intelligence method for the identification of IA hidden dangers using a deep learning model has been proposed, with the glacier area of the Yarlung Tsangpo River Gorge in Nyingchi selected for identification and validation. First, through engineering geological investigations, three key identification indices for IA hidden dangers are established, glacier source, slope angle, and cracks. Sentinel-2A satellite data, Google Earth, and ArcGIS are used to extract these indices and construct a feature dataset for the study and validation area. Next, key performance metrics, such as training accuracy, validation accuracy, test accuracy, and loss rates, are compared to assess the performance of the ResNet50 (Residual Neural Network 50) and VGG16 (Visual Geometry Group 16) models. The VGG16 model (96.09% training accuracy) is selected and optimized, using Early Stopping (ES) to prevent overfitting and L2 regularization techniques (L2) to add weight penalties, which constrained model complexity and enhanced simplicity and generalization, ultimately developing the ES-L2-VGG16 (Early Stopping—L2 Norm Regularization Techniques—Visual Geometry Group 16) model (98.61% training accuracy). Lastly, during the validation phase, the model is applied to the Yarlung Tsangpo River Gorge glacier area on the Tibetan Plateau (TP), identifying a total of 100 IA hidden danger areas, with average slopes ranging between 34° and 48°. The ES-L2-VGG16 model achieves an accuracy of 96% in identifying these hidden danger areas, ensuring the precise identification of IA dangers. This study offers a new intelligent technical method for identifying IA hidden danger, with clear advantages and promising application prospects. Full article
Show Figures

Figure 1

Figure 1
<p>Study and validation area. (the dataset was sourced from the National Earth System Science Data Center (<a href="http://www.geodata.cn" target="_blank">http://www.geodata.cn</a>, accessed on 1 May 2024), and the Second Glacier Inventory of China (V1.0) and processed using ArcGIS software 3.0.1).</p>
Full article ">Figure 2
<p>Terrain and geomorphology of verification area. (The DEM data (5 m resolution) was sourced from the Shuttle Radar Topography Mission (SRTM) (<a href="http://earthexplorer.usgs.gov/" target="_blank">http://earthexplorer.usgs.gov/</a>, accessed on 20 May 2024) and processed using ArcGIS software 3.0.1).</p>
Full article ">Figure 3
<p>Three-dimensional stereo model of the Sedongpu Valley.</p>
Full article ">Figure 4
<p>Technical processes and methods of IAs hidden danger intelligent identification.</p>
Full article ">Figure 5
<p>IA hidden danger feature fusion and training images. (<b>a</b>) The steepness of the slope increases as the color transitions from yellow to red, indicating a change from a gentle to a steep gradient; (<b>b</b>) The crack in the glacier IA area; (<b>c</b>) The training set image that integrates slope and cracks for comprehensive visualization.</p>
Full article ">Figure 6
<p>Development of IA in the Sedongpu Valley. (<b>a</b>–<b>c</b>) Shows satellite images of the Sedongpu Valley on different dates in 2016, 2017, and 2018, highlighting changes in glacier flow and landslide areas; (<b>a1</b>–<b>c1</b>) Corresponding schematic maps for the same dates, marking regions such as glacier flow, snow cover, river channels, and landslide debris.</p>
Full article ">Figure 7
<p>The slope zoning statistics of the Sedongpu Valley.</p>
Full article ">Figure 8
<p>Slope and crack evolution processes in different glacier source areas. (<b>a</b>) Slope and cracks in the Sedongpu glacier source area (4 December 2017); (<b>b</b>) Slope and cracks in the Chamoli glacier source area (5 February 2021); (<b>c</b>) Slope and cracks in the Marmolada glacier source area (Photo provided by Italy’s Alpine Rescue on 3 July 2022) (Google earth pro. Map. Retrieved from: <a href="https://www.google.com/earth/" target="_blank">https://www.google.com/earth/</a>, accessed on 8 July 2024).</p>
Full article ">Figure 9
<p>Schematic conceptual diagram of IA hidden danger.</p>
Full article ">Figure 10
<p>Model construction principal processes: (<b>a</b>) ResNet50 model network architecture; (<b>b</b>) VGG16 model network structure.</p>
Full article ">Figure 10 Cont.
<p>Model construction principal processes: (<b>a</b>) ResNet50 model network architecture; (<b>b</b>) VGG16 model network structure.</p>
Full article ">Figure 11
<p>ResNet50 model training results: (<b>a</b>) Model accuracy; (<b>b</b>) Model loss rate.</p>
Full article ">Figure 12
<p>VGG16 model training results: (<b>a</b>) Model accuracy; (<b>b</b>) Model loss rate.</p>
Full article ">Figure 13
<p>CM for model test set recognition results: (<b>a</b>) ResNet50; (<b>b</b>) VGG16.</p>
Full article ">Figure 14
<p>Comparison of accuracy between VGG16 and D-VGG16 models: (<b>a</b>) Training set accuracy; (<b>b</b>) Validation set accuracy.</p>
Full article ">Figure 15
<p>Comparison of loss rate between VGG16 and ES-VGG16 models: (<b>a</b>) Training set loss; (<b>b</b>) Validation set loss.</p>
Full article ">Figure 16
<p>Training performance of the L2-VGG16 model on validation and test sets: (<b>a</b>) Validation set accuracy of VGG16 and L2-VGG16 models; (<b>b</b>) L2-VGG16 model test set training results.</p>
Full article ">Figure 17
<p>Training results of the ES-L2-VGG16 model.</p>
Full article ">Figure 18
<p>IA hidden danger identification process overview: (<b>a</b>) A satellite image of a glacier, with glacier boundaries marked in green and identified cracks is highlighted by red ellipses; (<b>b</b>) Gradient changes in the glacier area are marked by blue rectangles indicating slopes; (<b>c</b>) The area where the green boundary, red ellipse, and blue rectangle intersect is an IA hidden danger area. (Google earth pro. Map. Retrieved from: <a href="https://www.google.com/earth/" target="_blank">https://www.google.com/earth/</a>, accessed on 15 July 2024).</p>
Full article ">Figure 19
<p>Extraction process: (<b>a</b>) Slope recognition process; (<b>b</b>) Crack extraction process.</p>
Full article ">Figure 20
<p>Distribution of IA danger levels in the Yarlung Tsangpo River Gorge.</p>
Full article ">Figure 21
<p>IA hidden dangers identification and verification results.</p>
Full article ">Figure 22
<p>Remote sensing interpretation of glacier areas: (<b>a</b>) Automatic identification results for regions (1) and (2); (<b>b</b>) Remote sensing satellite image of area (1); (<b>c</b>) Remote sensing satellite image of area (2) (Google earth pro. Map. Retrieved from: <a href="https://www.google.com/earth/" target="_blank">https://www.google.com/earth/</a>, accessed on 12 July 2024).</p>
Full article ">
25 pages, 6970 KiB  
Article
Urban Land Use Classification Model Fusing Multimodal Deep Features
by Yougui Ren, Zhiwei Xie and Shuaizhi Zhai
ISPRS Int. J. Geo-Inf. 2024, 13(11), 378; https://doi.org/10.3390/ijgi13110378 - 30 Oct 2024
Viewed by 1275
Abstract
Urban land use classification plays a significant role in urban studies and provides key guidance for urban development. However, existing methods predominantly rely on either raster structure deep features through convolutional neural networks (CNNs) or topological structure deep features through graph neural networks [...] Read more.
Urban land use classification plays a significant role in urban studies and provides key guidance for urban development. However, existing methods predominantly rely on either raster structure deep features through convolutional neural networks (CNNs) or topological structure deep features through graph neural networks (GNNs), making it challenging to comprehensively capture the rich semantic information in remote sensing images. To address this limitation, we propose a novel urban land use classification model by integrating both raster and topological structure deep features to enhance the accuracy and robustness of the classification model. First, we divide the urban area into block units based on road network data and further subdivide these units using the fractal network evolution algorithm (FNEA). Next, the K-nearest neighbors (KNN) graph construction method with adaptive fusion coefficients is employed to generate both global and local graphs of the blocks and sub-units. The spectral features and subgraph features are then constructed, and a graph convolutional network (GCN) is utilized to extract the node relational features from both the global and local graphs, forming the topological structure deep features while aggregating local features into global ones. Subsequently, VGG-16 (Visual Geometry Group 16) is used to extract the image convolutional features of the block units, obtaining the raster structure deep features. Finally, the transformer is used to fuse both topological and raster structure deep features, and land use classification is completed using the softmax function. Experiments were conducted using high-resolution Google images and Open Street Map (OSM) data, with study areas on the third ring road of Shenyang and the fourth ring road of Chengdu. The results demonstrate that the proposed method improves the overall accuracy and Kappa coefficient by 9.32% and 0.17, respectively, compared to single deep learning models. Incorporating subgraph structure features further enhances the overall accuracy and Kappa by 1.13% and 0.1. The adaptive KNN graph construction method achieves accuracy comparable to that of the empirical threshold method. This study enables accurate large-scale urban land use classification with reduced manual intervention, improving urban planning efficiency. The experimental results verify the effectiveness of the proposed method, particularly in terms of classification accuracy and feature representation completeness. Full article
Show Figures

Figure 1

Figure 1
<p>The methodological flow of the proposed approach.</p>
Full article ">Figure 2
<p>Schematic diagram of subgraph structure construction.</p>
Full article ">Figure 3
<p>VGG-16 structure.</p>
Full article ">Figure 4
<p>Data preprocessing.</p>
Full article ">Figure 5
<p>Image segmentation and subgraph construction results.</p>
Full article ">Figure 6
<p>Comparison of the classification results of different methods for the Shenyang third ring dataset.</p>
Full article ">Figure 7
<p>Comparison of the classification results of the different methods for the Chengdu fourth ring dataset.</p>
Full article ">Figure 8
<p>Localized details of the Shenyang third ring ablation experiment.</p>
Full article ">Figure 9
<p>Localized details of the Chengdu fourth ring ablation experiment.</p>
Full article ">Figure 10
<p>Results of different convolution layers of the GCN.</p>
Full article ">Figure 11
<p>Results of different batch sizes of VGG-16.</p>
Full article ">Figure 12
<p>Results of different encoder layers of the transformer.</p>
Full article ">
16 pages, 8896 KiB  
Article
Automatic Paddy Planthopper Detection and Counting Using Faster R-CNN
by Siti Khairunniza-Bejo, Mohd Firdaus Ibrahim, Marsyita Hanafi, Mahirah Jahari, Fathinul Syahir Ahmad Saad and Mohammad Aufa Mhd Bookeri
Agriculture 2024, 14(9), 1567; https://doi.org/10.3390/agriculture14091567 - 10 Sep 2024
Viewed by 999
Abstract
Counting planthoppers manually is laborious and yields inconsistent results, particularly when dealing with species with similar features, such as the brown planthopper (Nilaparvata lugens; BPH), whitebacked planthopper (Sogatella furcifera; WBPH), zigzag leafhopper (Maiestas dorsalis; ZIGZAG), and green [...] Read more.
Counting planthoppers manually is laborious and yields inconsistent results, particularly when dealing with species with similar features, such as the brown planthopper (Nilaparvata lugens; BPH), whitebacked planthopper (Sogatella furcifera; WBPH), zigzag leafhopper (Maiestas dorsalis; ZIGZAG), and green leafhopper (Nephotettix malayanus and Nephotettix virescens; GLH). Most of the available automated counting methods are limited to populations of a small density and often do not consider those with a high density, which require more complex solutions due to overlapping objects. Therefore, this research presents a comprehensive assessment of an object detection algorithm specifically developed to precisely detect and quantify planthoppers. It utilises annotated datasets obtained from sticky light traps, comprising 1654 images across four distinct classes of planthoppers and one class of benign insects. The datasets were subjected to data augmentation and utilised to train four convolutional object detection models based on transfer learning. The results indicated that Faster R-CNN VGG 16 outperformed other models, achieving a mean average precision (mAP) score of 97.69% and exhibiting exceptional accuracy in classifying all planthopper categories. The correctness of the model was verified by entomologists, who confirmed a classification and counting accuracy rate of 98.84%. Nevertheless, the model fails to recognise certain samples because of the high density of the population and the significant overlap among them. This research effectively resolved the issue of low- to medium-density samples by achieving very precise and rapid detection and counting. Full article
(This article belongs to the Special Issue Advanced Image Processing in Agricultural Applications)
Show Figures

Figure 1

Figure 1
<p>The process involved in this research.</p>
Full article ">Figure 2
<p>The transparent box used to house the light trap. Each side of the box has hundreds of small holes.</p>
Full article ">Figure 3
<p>Sample of a sticky light trap image placed on top of a white paper.</p>
Full article ">Figure 4
<p>Sample of four major classes of planthoppers: (<b>a</b>) BPH; (<b>b</b>) GLH; (<b>c</b>) WBPH; and (<b>d</b>) ZIGZAG.</p>
Full article ">Figure 5
<p>Sample of images in the BENIGN class, exhibiting similarities with major planthopper classes.</p>
Full article ">Figure 6
<p>Examples of annotated images using the LabelImg software.</p>
Full article ">Figure 7
<p>Graphical user interface of the developed verification web system.</p>
Full article ">Figure 8
<p>The user interface of the system used to capture the image and execute the counting process.</p>
Full article ">Figure 9
<p>mAP values for each epoch.</p>
Full article ">Figure 10
<p>Loss for each epoch.</p>
Full article ">Figure 11
<p>Results of detected planthoppers using Faster-RCNN with VGG16.</p>
Full article ">Figure 12
<p>Detection errors. Red dash line square indicates false positive cases while blue dash line square indicates false negative cases.</p>
Full article ">Figure 13
<p>Sample from image no. 134 of Light Trap 2 where the total undetected samples number 15 and there are 0 misclassified cases. Undetected classes are labelled by a blue dashed line box.</p>
Full article ">
15 pages, 9305 KiB  
Article
Symmetric Keys for Lightweight Encryption Algorithms Using a Pre–Trained VGG16 Model
by Ala’a Talib Khudhair, Abeer Tariq Maolood and Ekhlas Khalaf Gbashi
Telecom 2024, 5(3), 892-906; https://doi.org/10.3390/telecom5030044 - 3 Sep 2024
Cited by 1 | Viewed by 1558
Abstract
The main challenge within lightweight cryptographic symmetric key systems is striking a delicate balance between security and efficiency. Consequently, the key issue revolves around crafting symmetric key schemes that are both lightweight and robust enough to safeguard resource-constrained environments. This paper presents a [...] Read more.
The main challenge within lightweight cryptographic symmetric key systems is striking a delicate balance between security and efficiency. Consequently, the key issue revolves around crafting symmetric key schemes that are both lightweight and robust enough to safeguard resource-constrained environments. This paper presents a new method of making long symmetric keys for lightweight algorithms. A pre–trained convolutional neural network (CNN) model called visual geometry group 16 (VGG16) is used to take features from two images, turn them into binary strings, make the two strings equal by cutting them down to the length of the shorter string, and then use XOR to make a symmetric key from the binary strings from the two images. The key length depends on the number of features in the two images. Compared to other lightweight algorithms, we found that this method greatly decreases the time required to generate a symmetric key and improves defense against brute force attacks by creating exceptionally long keys. The method successfully passed all 15 tests when evaluated using the NIST SP 800-22 statistical test suite and all Basic Five Statistical Tests. To the best of our knowledge, this is the first research to explore the generation of a symmetric encryption key using a pre–trained VGG16 model. Full article
Show Figures

Figure 1

Figure 1
<p>Symmetric key encryption.</p>
Full article ">Figure 2
<p>AES encryption algorithm.</p>
Full article ">Figure 3
<p>DES encryption algorithm.</p>
Full article ">Figure 4
<p>Blowfish encryption algorithm.</p>
Full article ">Figure 5
<p>Conventional Machine Learning vs. Transfer Learning.</p>
Full article ">Figure 6
<p>The graphic above depicts the “top” portion of the model being removed. A 3D stack of feature maps convolves the remaining pre–trained output layers.</p>
Full article ">Figure 7
<p>The results of matrix multiplication are summed onto the feature map.</p>
Full article ">Figure 8
<p>Activation function (ReLU).</p>
Full article ">Figure 9
<p>Types of pooling.</p>
Full article ">Figure 10
<p>Flowchart of the proposed symmetric key generation using pre–trained VGG16 model.</p>
Full article ">
10 pages, 1304 KiB  
Article
Age and Sex Estimation in Children and Young Adults Using Panoramic Radiographs with Convolutional Neural Networks
by Tuğçe Nur Şahin and Türkay Kölüş
Appl. Sci. 2024, 14(16), 7014; https://doi.org/10.3390/app14167014 - 9 Aug 2024
Cited by 1 | Viewed by 1339
Abstract
Image processing with artificial intelligence has shown significant promise in various medical imaging applications. The present study aims to evaluate the performance of 16 different convolutional neural networks (CNNs) in predicting age and gender from panoramic radiographs in children and young adults. The [...] Read more.
Image processing with artificial intelligence has shown significant promise in various medical imaging applications. The present study aims to evaluate the performance of 16 different convolutional neural networks (CNNs) in predicting age and gender from panoramic radiographs in children and young adults. The networks tested included DarkNet-19, DarkNet-53, Inception-ResNet-v2, VGG-19, DenseNet-201, ResNet-50, GoogLeNet, VGG-16, SqueezeNet, ResNet-101, ResNet-18, ShuffleNet, MobileNet-v2, NasNet-Mobile, AlexNet, and Xception. These networks were trained on a dataset of 7336 radiographs from individuals aged between 5 and 21. Age and gender estimation accuracy and mean absolute age prediction errors were evaluated on 340 radiographs. Statistical analyses were conducted using Shapiro–Wilk, one-way ANOVA, and Tukey tests (p < 0.05). The gender prediction accuracy and the mean absolute age prediction error were, respectively, 87.94% and 0.582 for DarkNet-53, 86.18% and 0.427 for DarkNet-19, 84.71% and 0.703 for GoogLeNet, 81.76% and 0.756 for DenseNet-201, 81.76% and 1.115 for ResNet-18, 80.88% and 0.650 for VGG-19, 79.41% and 0.988 for SqueezeNet, 79.12% and 0.682 for Inception-Resnet-v2, 78.24% and 0.747 for ResNet-50, 77.35% and 1.047 for VGG-16, 76.47% and 1.109 for Xception, 75.88% and 0.977 for ResNet-101, 73.24% and 0.894 for ShuffleNet, 72.35% and 1.206 for AlexNet, 71.18% and 1.094 for NasNet-Mobile, and 62.94% and 1.327 for MobileNet-v2. No statistical difference in age prediction performance was found between DarkNet-19 and DarkNet-53, which demonstrated the most successful age estimation results. Despite these promising results, all tested CNNs performed below 90% accuracy and were not deemed suitable for clinical use. Future studies should continue with more-advanced networks and larger datasets. Full article
(This article belongs to the Special Issue Oral Diseases: Diagnosis and Therapy)
Show Figures

Figure 1

Figure 1
<p>Detailed prediction distribution of test radiographs by age and sex groups for the DarkNet-53 network. Correct predictions are shown in shades of blue, while incorrect predictions are displayed in shades of red. For example, all 10 radiographs of 5-year-old girls (05F) were correctly predicted by DarkNet-53. However, among the ten radiographs of 6-year-old females (06F), only one was correctly predicted, while one was predicted as a 5-year-old male (05M), five as 6-year-old males (06M), one as a 7-year-old female (07F), and two as 8-year-old females (08F).</p>
Full article ">Figure 2
<p>Distribution of predictions for the GoogLeNet network. Correct predictions are shown in shades of blue, while incorrect predictions are displayed in shades of red.</p>
Full article ">
23 pages, 7657 KiB  
Article
A Multi-Feature Fusion Method for Urban Functional Regions Identification: A Case Study of Xi’an, China
by Zhuo Wang, Jianjun Bai and Ruitao Feng
ISPRS Int. J. Geo-Inf. 2024, 13(5), 156; https://doi.org/10.3390/ijgi13050156 - 7 May 2024
Cited by 2 | Viewed by 1915
Abstract
Research on the identification of urban functional regions is of great significance for the understanding of urban structure, spatial planning, resource allocation, and promoting sustainable urban development. However, achieving high-precision urban functional region recognition has always been a research challenge in this field. [...] Read more.
Research on the identification of urban functional regions is of great significance for the understanding of urban structure, spatial planning, resource allocation, and promoting sustainable urban development. However, achieving high-precision urban functional region recognition has always been a research challenge in this field. For this purpose, this paper proposes an urban functional region identification method called ASOE (activity–scene–object–economy), which integrates the features from multi-source data to perceive the spatial differentiation of urban human and geographic elements. First, we utilize VGG16 (Visual Geometry Group 16) to extract high-level semantic features from the remote sensing images with 1.2 m spatial resolution. Then, using scraped building footprints, we extract building object features such as area, perimeter, and structural ratios. Socioeconomic features and population activity features are extracted from Point of Interest (POI) and Weibo data, respectively. Finally, integrating the aforementioned features and using the Random Forest method for classification, the identification results of urban functional regions in the main urban area of Xi’an are obtained. After comparing with the actual land use map, our method achieves an identification accuracy of 91.74%, which is higher than other comparative methods, making it effectively identify four typical urban functional regions in the main urban area of Xi’an (e.g., residential regions, industrial regions, commercial regions, and public regions). The research indicates that the method of fusing multi-source data can fully leverage the advantages of big data, achieving high-precision identification of urban functional regions. Full article
Show Figures

Figure 1

Figure 1
<p>Spatial location and main road network in the main urban area of Xi’an.</p>
Full article ">Figure 2
<p>Kernel density map generated by POIs in Xi’an.</p>
Full article ">Figure 3
<p>Workflow of ASOE-based methodology.</p>
Full article ">Figure 4
<p>Network architecture of VGG16.</p>
Full article ">Figure 5
<p>Verification effect of the optimal model (outside the brackets represents the predicted value, inside the brackets represents the true value, green letters represent correct predictions, and red letters represent incorrect predictions).</p>
Full article ">Figure 6
<p>Model architecture of BERT.</p>
Full article ">Figure 7
<p>Thematic maps of building footprints. (<b>a</b>) Area of buildings. (<b>b</b>) Perimeter of buildings. (<b>c</b>) Floor of buildings. (<b>d</b>) Ratio of buildings.</p>
Full article ">Figure 8
<p>The logical structure of the ASOE method.</p>
Full article ">Figure 9
<p>Map of Xi’an urban functional regions.</p>
Full article ">Figure 10
<p>Classification accuracy of different data source inputs.</p>
Full article ">Figure 11
<p>Comparison of classification accuracy for each category from different data sources.</p>
Full article ">Figure 12
<p>Training and testing results of different CNNs.</p>
Full article ">Figure 13
<p>Comparison with traditional methods and SOE.</p>
Full article ">
12 pages, 523 KiB  
Article
Automated Ischemic Stroke Classification from MRI Scans: Using a Vision Transformer Approach
by Wafae Abbaoui, Sara Retal, Soumia Ziti and Brahim El Bhiri
J. Clin. Med. 2024, 13(8), 2323; https://doi.org/10.3390/jcm13082323 - 17 Apr 2024
Viewed by 1675
Abstract
Background: This study evaluates the performance of a vision transformer (ViT) model, ViT-b16, in classifying ischemic stroke cases from Moroccan MRI scans and compares it to the Visual Geometry Group 16 (VGG-16) model used in a prior study. Methods: A dataset [...] Read more.
Background: This study evaluates the performance of a vision transformer (ViT) model, ViT-b16, in classifying ischemic stroke cases from Moroccan MRI scans and compares it to the Visual Geometry Group 16 (VGG-16) model used in a prior study. Methods: A dataset of 342 MRI scans, categorized into ‘Normal’ and ’Stroke’ classes, underwent preprocessing using TensorFlow’s tf.data API. Results: The ViT-b16 model was trained and evaluated, yielding an impressive accuracy of 97.59%, surpassing the VGG-16 model’s 90% accuracy. Conclusions: This research highlights the ViT-b16 model’s superior classification capabilities for ischemic stroke diagnosis, contributing to the field of medical image analysis. By showcasing the efficacy of advanced deep learning architectures, particularly in the context of Moroccan MRI scans, this study underscores the potential for real-world clinical applications. Ultimately, our findings emphasize the importance of further exploration into AI-based diagnostic tools for improving healthcare outcomes. Full article
Show Figures

Figure 1

Figure 1
<p>Sample MRI scans.</p>
Full article ">Figure 2
<p>Example of augmented image.</p>
Full article ">Figure 3
<p>ViT architecture.</p>
Full article ">Figure 4
<p>ViT-b16 architecture.</p>
Full article ">Figure 5
<p>Confusion matrix for the ViT-b16 model.</p>
Full article ">
15 pages, 3246 KiB  
Article
Automatic Detection of Banana Maturity—Application of Image Recognition in Agricultural Production
by Liu Yang, Bo Cui, Junfeng Wu, Xuan Xiao, Yang Luo, Qianmai Peng and Yonglin Zhang
Processes 2024, 12(4), 799; https://doi.org/10.3390/pr12040799 - 16 Apr 2024
Cited by 1 | Viewed by 2889
Abstract
With the development of machine vision technology, deep learning and image recognition technology has become a research focus for agricultural product non-destructive inspection. During the ripening process, banana appearance and nutrients clearly change, causing damage and unjustified economic loss. A high-efficiency banana ripeness [...] Read more.
With the development of machine vision technology, deep learning and image recognition technology has become a research focus for agricultural product non-destructive inspection. During the ripening process, banana appearance and nutrients clearly change, causing damage and unjustified economic loss. A high-efficiency banana ripeness recognition model was proposed based on a convolutional neural network and transfer learning. Banana photos at different ripening stages were collected as a dataset, and data augmentation was applied. Then, weights and parameters of four models trained on the original ImageNet dataset were loaded and fine-tuned to fit our banana dataset. To investigate the learning rate’s effect on model performance, fixed and updating learning rate strategies are analyzed. In addition, four CNN models, ResNet 34, ResNet 101, VGG 16, and VGG 19, are trained based on transfer learning. Results show that a slower learning rate causes the model to converge slowly, and the training loss function oscillates drastically. With different learning rate updating strategies, MultiStepLR performs the best and achieves a better accuracy of 98.8%. Among the four models, ResNet 101 performs the best with the highest accuracy of 99.2%. This research provides a direct effective model and reference for intelligent fruit classification. Full article
Show Figures

Figure 1

Figure 1
<p>General process from banana harvest to sale.</p>
Full article ">Figure 2
<p>Different ripeness bananas’ images.</p>
Full article ">Figure 3
<p>Different data augmentation effects: (<b>a</b>) original image, (<b>b</b>) rotation, (<b>c</b>) darkening, (<b>d</b>) brightening, (<b>e</b>) pretzel, and (<b>f</b>) blurring.</p>
Full article ">Figure 4
<p>Schematic diagram of CNN and transfer learning method.</p>
Full article ">Figure 5
<p>Learning rate updating strategies.</p>
Full article ">Figure 6
<p>Accuracy and loss with different fixed initial learning rates: (<b>a</b>) accuracy value, (<b>b</b>) training loss.</p>
Full article ">Figure 7
<p>Accuracy and loss with different learning rate updating strategies: (<b>a</b>) accuracy value, (<b>b</b>) training loss.</p>
Full article ">Figure 8
<p>Accuracy and loss with different models: (<b>a</b>) accuracy value; (<b>b</b>) training loss.</p>
Full article ">Figure 9
<p>Test result of confusion matrix.</p>
Full article ">Figure 10
<p>Precision, accuracy, recall, and F1 score on test set.</p>
Full article ">
20 pages, 6012 KiB  
Article
A Novel Fault Diagnosis Strategy for Diaphragm Pumps Based on Signal Demodulation and PCA-ResNet
by Fanguang Meng, Zhiguo Shi and Yongxing Song
Sensors 2024, 24(5), 1578; https://doi.org/10.3390/s24051578 - 29 Feb 2024
Cited by 3 | Viewed by 1224
Abstract
The efficient and accurate identification of diaphragm pump faults is crucial for ensuring smooth system operation and reducing energy consumption. The structure of diaphragm pumps is complex and using traditional fault diagnosis strategies to extract typical fault characteristics is difficult, facing the risk [...] Read more.
The efficient and accurate identification of diaphragm pump faults is crucial for ensuring smooth system operation and reducing energy consumption. The structure of diaphragm pumps is complex and using traditional fault diagnosis strategies to extract typical fault characteristics is difficult, facing the risk of model overfitting and high diagnostic costs. In response to the shortcomings of traditional methods, this study innovatively combines signal demodulation methods with residual networks (ResNet) to propose an efficient fault diagnosis strategy for diaphragm pumps. By using a demodulation method based on principal component analysis (PCA), the vibration signal demodulation spectrum of the fault condition is obtained, the typical fault characteristics of the diaphragm pump are accurately extracted, and the sample features are enhanced, reducing the cost of fault diagnosis. Afterward, the PCA-ResNet model is applied to the fault diagnosis of diaphragm pumps. A reasonable model structure and advanced residual block design can effectively reduce the risk of model overfitting and improve the accuracy of fault diagnosis. Compared with the visual geometry group (VGG) 16, VGG19, ResNet50, and autoencoder models, the proposed model has improved accuracy by 35.89%, 80.27%, 2.72%, and 6.12%. Simultaneously, it has higher operational efficiency and lower loss rate, solving the problem of diagnostic lag in practical engineering. Finally, a model optimization strategy is proposed through model evaluation metrics and testing. The reasonable parameter range of the model is obtained, providing a reference and guarantee for further optimization of the model. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

Figure 1
<p>Fault diagnosis strategy of diaphragm pump based on signal demodulation and PCA-ResNet.</p>
Full article ">Figure 2
<p>Principle of DPCA.</p>
Full article ">Figure 3
<p>Principle of PCA-ResNet.</p>
Full article ">Figure 4
<p>Principle of residual learning and skip connection.</p>
Full article ">Figure 5
<p>Diaphragm pump structure.</p>
Full article ">Figure 6
<p>Diaphragm pump experimental platform.</p>
Full article ">Figure 7
<p>Data enhancement flow path.</p>
Full article ">Figure 8
<p>PCA results of two sample sets. (<b>a</b>) Spectrum sample set. (<b>b</b>) DPCA sample set.</p>
Full article ">Figure 9
<p>Results of the confusion matrix. (<b>a</b>) Spectrum sample set. (<b>b</b>) DPCA sample set.</p>
Full article ">Figure 10
<p>Confusing matrix results of the four models. (<b>a</b>) VGG16 (<b>b</b>) VGG19 (<b>c</b>) ResNet50 (<b>d</b>) PCA-ResNet.</p>
Full article ">Figure 11
<p>CR of 20 test results for two models. (<b>a</b>) Line chart of two models. (<b>b</b>) Box plot of two models.</p>
Full article ">Figure 12
<p>Recall and F1 score of 20 test results for two models. (<b>a</b>) Recall of two models. (<b>b</b>) F1 score of two models.</p>
Full article ">Figure 13
<p>Loss rate and running time of 20 test results for two models. (<b>a</b>) Loss rate of two models. (<b>b</b>) Running time of two models.</p>
Full article ">Figure 14
<p>PCA-ResNet model test results of different batch sizes.</p>
Full article ">Figure 15
<p>PCA-ResNet model test results of different momentums.</p>
Full article ">Figure 16
<p>PCA-ResNet model test results of different learning rates.</p>
Full article ">
16 pages, 3807 KiB  
Article
VisFormers—Combining Vision and Transformers for Enhanced Complex Document Classification
by Subhayu Dutta, Subhrangshu Adhikary and Ashutosh Dhar Dwivedi
Mach. Learn. Knowl. Extr. 2024, 6(1), 448-463; https://doi.org/10.3390/make6010023 - 16 Feb 2024
Cited by 1 | Viewed by 3311
Abstract
Complex documents have text, figures, tables, and other elements. The classification of scanned copies of different categories of complex documents like memos, newspapers, letters, and more is essential for rapid digitization. However, this task is very challenging as most scanned complex documents look [...] Read more.
Complex documents have text, figures, tables, and other elements. The classification of scanned copies of different categories of complex documents like memos, newspapers, letters, and more is essential for rapid digitization. However, this task is very challenging as most scanned complex documents look similar. This is because all documents have similar colors of the page and letters, similar textures for all papers, and very few contrasting features. Several attempts have been made in the state of the art to classify complex documents; however, only a few of these works have addressed the classification of complex documents with similar features, and among these, the performances could be more satisfactory. To overcome this, this paper presents a method to use an optical character reader to extract the texts. It proposes a multi-headed model to combine vision-based transfer learning and natural-language-based Transformers within the same network for simultaneous training for different inputs and optimizers in specific parts of the network. A subset of the Ryers Vision Lab Complex Document Information Processing dataset containing 16 different document classes was used to evaluate the performances. The proposed multi-headed VisFormers network classified the documents with up to 94.2% accuracy, while a regular natural-language-processing-based Transformer network achieved 83%, and vision-based VGG19 transfer learning could achieve only up to 90% accuracy. The model deployment can help sort the scanned copies of various documents into different categories. Full article
(This article belongs to the Section Visualization)
Show Figures

Figure 1

Figure 1
<p>The framework workflow includes two key steps: document image classification using transfer learning, followed by OCR and Transformer-based text classification, ultimately integrating vision and OCR for robust classification.</p>
Full article ">Figure 2
<p>The VisFormers model combines a Transformer and a pre-trained VGG-19 network for document classification with specific architecture details.</p>
Full article ">Figure 3
<p>The Grad-CAM heatmap visualization highlights critical regions in document image classification. The reddish color indicates a higher probability density of having decision-making features.</p>
Full article ">
10 pages, 6249 KiB  
Article
AI-Based Detection of Oral Squamous Cell Carcinoma with Raman Histology
by Andreas Weber, Kathrin Enderle-Ammour, Konrad Kurowski, Marc C. Metzger, Philipp Poxleitner, Martin Werner, René Rothweiler, Jürgen Beck, Jakob Straehle, Rainer Schmelzeisen, David Steybe and Peter Bronsert
Cancers 2024, 16(4), 689; https://doi.org/10.3390/cancers16040689 - 6 Feb 2024
Cited by 2 | Viewed by 2131
Abstract
Stimulated Raman Histology (SRH) employs the stimulated Raman scattering (SRS) of photons at biomolecules in tissue samples to generate histological images. Subsequent pathological analysis allows for an intraoperative evaluation without the need for sectioning and staining. The objective of this study was to [...] Read more.
Stimulated Raman Histology (SRH) employs the stimulated Raman scattering (SRS) of photons at biomolecules in tissue samples to generate histological images. Subsequent pathological analysis allows for an intraoperative evaluation without the need for sectioning and staining. The objective of this study was to investigate a deep learning-based classification of oral squamous cell carcinoma (OSCC) and the sub-classification of non-malignant tissue types, as well as to compare the performances of the classifier between SRS and SRH images. Raman shifts were measured at wavenumbers k1 = 2845 cm−1 and k2 = 2930 cm−1. SRS images were transformed into SRH images resembling traditional H&E-stained frozen sections. The annotation of 6 tissue types was performed on images obtained from 80 tissue samples from eight OSCC patients. A VGG19-based convolutional neural network was then trained on 64 SRS images (and corresponding SRH images) and tested on 16. A balanced accuracy of 0.90 (0.87 for SRH images) and F1-scores of 0.91 (0.91 for SRH) for stroma, 0.98 (0.96 for SRH) for adipose tissue, 0.90 (0.87 for SRH) for squamous epithelium, 0.92 (0.76 for SRH) for muscle, 0.87 (0.90 for SRH) for glandular tissue, and 0.88 (0.87 for SRH) for tumor were achieved. The results of this study demonstrate the suitability of deep learning for the intraoperative identification of tissue types directly on SRS and SRH images. Full article
(This article belongs to the Special Issue Recent Advances in Oncology Imaging)
Show Figures

Figure 1

Figure 1
<p>Annotations of tissue classes “Squamous epithelium”, “Stroma”, and “Tumor” on an SRH image (<b>A</b>) and transferred annotations on a corresponding SRS image (<b>B</b>) as well as tiles generated from the annotations with class labels “Squamous epithelium”, “Stroma”, and “Tumor” on a SRH image (<b>C</b>) and on the corresponding SRS image (<b>D</b>). Only tiles that intersect with an annotation by 99% were kept for the generation of the dataset.</p>
Full article ">Figure 2
<p>Ground truth class labels for each tile (<b>A</b>) and predicted class labels for each tile (<b>B</b>) on a sample SRS image. Both true tiles with class label “Stroma” were classified correctly, whereas 6 tiles with class label “Tumor” were incorrectly classified as “Squamous epithelium” (5 tiles) and “Stroma” (1 tile). Ground truth class labels for each tile (<b>C</b>) and predicted class labels for each tile (<b>D</b>) on a sample SRH image. Both true tiles with class label “Stroma” were classified correctly, whereas 8 tiles with class label “Tumor” were incorrectly classified as “Squamous epithelium” (5 tiles) and “Stroma” (3 tiles).</p>
Full article ">Figure 3
<p>Confusion matrices for the classification of the CNN on the SRS test dataset (<b>left</b>) and the corresponding SRH test dataset (<b>right</b>). The diverging colormap shows small values in dark blue with increasing brightness according to increasing values. Large values are shown in dark red with decreasing brightness according to increasing values.</p>
Full article ">
Back to TopTop