[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (4,823)

Search Parameters:
Keywords = semantic modeling

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 2997 KiB  
Article
BEV Semantic Map Reconstruction for Self-Driving Cars with the Multi-Head Attention Mechanism
by Yi-Cheng Liao, Jichiang Tsai and Hsuan-Ying Chien
Electronics 2025, 14(1), 32; https://doi.org/10.3390/electronics14010032 - 25 Dec 2024
Abstract
Environmental perception is crucial for safe autonomous driving, enabling accurate analysis of the vehicle’s surroundings. While 3D LiDAR is traditionally used for 3D environment reconstruction, its high cost and complexity present challenges. In contrast, camera-based cross-view frameworks can offer a cost-effective alternative. Hence, [...] Read more.
Environmental perception is crucial for safe autonomous driving, enabling accurate analysis of the vehicle’s surroundings. While 3D LiDAR is traditionally used for 3D environment reconstruction, its high cost and complexity present challenges. In contrast, camera-based cross-view frameworks can offer a cost-effective alternative. Hence, this manuscript proposes a new cross-view model to extract mapping features from camera images and then transfer them to a Bird’s-Eye View (BEV) map. Particularly, a multi-head attention mechanism in the decoder architecture generates the final semantic map. Each camera learns embedding information corresponding to its position and angle within the BEV map. Cross-view attention fuses information from different perspectives to predict top-down map features enriched with spatial information. The multi-head attention mechanism then globally performs dependency matches, enhancing long-range information and capturing latent relationships between features. Transposed convolution replaces traditional upsampling methods, avoiding high similarities of local features and facilitating semantic segmentation inference of the BEV map. Finally, we conduct numerous simulation experiments to verify the performance of our cross-view model. Full article
(This article belongs to the Special Issue Advancement on Smart Vehicles and Smart Travel)
25 pages, 11809 KiB  
Article
DSC-SeNet: Unilateral Network with Feature Enhancement and Aggregation for Real-Time Segmentation of Carbon Trace in the Oil-Immersed Transformer
by Liqing Liu, Hongxin Ji, Junji Feng, Xinghua Liu, Chi Zhang and Chun He
Sensors 2025, 25(1), 43; https://doi.org/10.3390/s25010043 - 25 Dec 2024
Abstract
Large oil-immersed transformers have metal-enclosed shells, making it difficult to visually inspect the internal insulation condition. Visual inspection of internal defects is carried out using a self-developed micro-robot in this work. Carbon trace is the main visual characteristic of internal insulation defects. The [...] Read more.
Large oil-immersed transformers have metal-enclosed shells, making it difficult to visually inspect the internal insulation condition. Visual inspection of internal defects is carried out using a self-developed micro-robot in this work. Carbon trace is the main visual characteristic of internal insulation defects. The characteristics of carbon traces, such as multiple sizes, diverse morphologies, and irregular edges, pose severe challenges for segmentation accuracy and inference speed. In this paper, a feasible real-time network (deformable-spatial-Canny segmentation network, DSC-SeNet) was designed for carbon trace segmentation. To improve inference speed, a lightweight unilateral feature extraction framework is constructed based on a shallow feature sharing mechanism, which is designed to provide feature input for both semantic path and spatial path. Meanwhile, the segmentation model is improved in two aspects for better segmentation accuracy. For one aspect, to better perceive diverse morphology and edge features of carbon trace, three measures, including deformable convolution (DFC), Canny edge operator, and spatial feature refinement module (SFRM), were adopted for feature perception, enhancement, and aggregation, respectively. For the other aspect, to improve the fusion of semantic features and spatial features, coordinate attention feature aggregation (CAFA) is designed to reduce feature aggregation loss. Experimental results showed that the proposed DSC-SeNet outperformed state-of-the-art models with a good balance between segmentation accuracy and inference speed. For a 512 × 512 input, it achieved 84.7% mIoU, which is 6.4 percentage points higher than that of the baseline short-term dense convolution network (STDC), with a speed of 94.3 FPS on an NVIDIA GTX 2050Ti. This study provides technical support for real-time segmentation of carbon traces and transformer insulation assessment. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Physical drawing of the transformer internal inspection micro-robot.</p>
Full article ">Figure 2
<p>Discharge carbon trace identification based on improved YOLOv8 [<a href="#B17-sensors-25-00043" class="html-bibr">17</a>].</p>
Full article ">Figure 3
<p>Segmentation of discharge carbon trace based on improved HCP-UNet [<a href="#B18-sensors-25-00043" class="html-bibr">18</a>].</p>
Full article ">Figure 4
<p>Network structure of the DSC-SeNet.</p>
Full article ">Figure 5
<p>The STDC module and the STDDC incorporated with DFC.</p>
Full article ">Figure 6
<p>Receptive field differences between common and deformable convolutions.</p>
Full article ">Figure 7
<p>Network structure of the DFC.</p>
Full article ">Figure 8
<p>Network architecture of the SFRM.</p>
Full article ">Figure 9
<p>Network architecture of the CAFA.</p>
Full article ">Figure 10
<p>Examples of carbon trace samples. (<b>a</b>) dendritic carbon trace; (<b>b</b>) clustered carbon trace; (<b>c</b>) small-scale trace; (<b>d</b>) large-scale trace; (<b>e</b>) trace with low light; (<b>f</b>) trace with high light.</p>
Full article ">Figure 11
<p>Comparison of carbon traces with and without improved MSRCR. (<b>a</b>) the original image#1; (<b>b</b>) the enhanced image corresponding to #1; (<b>c</b>) the original image#2; (<b>d</b>) the enhanced image corresponding to #2.</p>
Full article ">Figure 12
<p>Labeling process of carbon trace with Labelme software.</p>
Full article ">Figure 13
<p>Visual comparison of Canny edge detection with different threshold values. (<b>a</b>) original; (<b>b</b>) (10, 30); (<b>c</b>) (25, 75); (<b>d</b>) (40,120); (<b>e</b>) (55, 165); (<b>f</b>) (70, 210).</p>
Full article ">Figure 14
<p>Grad-CAM comparison of segmentation results with and without the SFRM. (<b>a</b>) Input; (<b>b</b>) without SFRM; (<b>c</b>) with SFRM; (<b>d</b>) Groundtruth; (<b>e</b>) Prediction.</p>
Full article ">Figure 15
<p>Grad-CAM comparison of segmentation results with and without the CAFA. (<b>a</b>) Input; (<b>b</b>) without CAFA; (<b>c</b>) with CAFA; (<b>d</b>) Groundtruth; (<b>e</b>) Prediction.</p>
Full article ">Figure 16
<p>Performance comparison of segmentation model with different modules. (<b>a</b>) Loss curve; (<b>b</b>) mIoU curve.</p>
Full article ">Figure 17
<p>Comparison of DSC-SeNet and state-of-the-art models.</p>
Full article ">
19 pages, 8716 KiB  
Article
Eye Tracking and Semantic Evaluation for Ceramic Teapot Product Modeling
by Wei Liu, Ziyan Hu, Yinan Fei, Jiaqi Chen and Changlong Yu
Appl. Sci. 2025, 15(1), 46; https://doi.org/10.3390/app15010046 - 25 Dec 2024
Abstract
In addition to their practical and aesthetic qualities, ceramic teapots are highly decorative and stylish. Based on the theory of perceptual engineering, this study employs eye-tracking technology and semantic-difference methods to investigate user preferences for ceramic teapot shapes. Using eye-movement experiments, the study [...] Read more.
In addition to their practical and aesthetic qualities, ceramic teapots are highly decorative and stylish. Based on the theory of perceptual engineering, this study employs eye-tracking technology and semantic-difference methods to investigate user preferences for ceramic teapot shapes. Using eye-movement experiments, the study first determines users’ visual attention to different morphological regions. Using the orthogonal alignment method, nine styling samples were developed by combining expert classification of classic and traditional teapot styling elements. By combining a semantic perception questionnaire with a satisfaction questionnaire, the study evaluated users’ visual attention to these samples and their satisfaction with them. It was found that shapes characterized by classic, rounded, and proportional coordination were more in line with consumers’ aesthetic preferences despite the differences in semantic evaluations among consumers of different genders and ages, which led to relatively consistent consumer satisfaction. The purpose of this study is not only to provide a scientific basis for styling ceramic teapots but also to assist designers in grasping the laws of consumer preference in order to create better products. Full article
Show Figures

Figure 1

Figure 1
<p>Keyword co-occurrence mapping of ceramic teapot subject literature (Source: author).</p>
Full article ">Figure 2
<p>Experimental flow chart (Source: author).</p>
Full article ">Figure 3
<p>AOI division of ceramic teapot (Source: author).</p>
Full article ">Figure 4
<p>Flowchart of Experiment 1 (Source: author).</p>
Full article ">Figure 5
<p>Preliminary samples of ceramic teapot form design element combination (Source: author).</p>
Full article ">Figure 6
<p>Flowchart of Experiment 2 (Source: author).</p>
Full article ">Figure 7
<p>Hot spot map of experiments in regions of interest (Source: author).</p>
Full article ">Figure 8
<p>Satisfaction of each sample (Source: author).</p>
Full article ">Figure 9
<p>Hot spot map for 9 samples (Source: author).</p>
Full article ">
19 pages, 770 KiB  
Article
An Adaptive Multimodal Fusion Network Based on Multilinear Gradients for Visual Question Answering
by Chengfang Zhao, Mingwei Tang, Yanxi Zheng and Chaocong Ran
Electronics 2025, 14(1), 9; https://doi.org/10.3390/electronics14010009 - 24 Dec 2024
Abstract
As an interdisciplinary field of natural language processing and computer vision, Visual Question Answering (VQA) has emerged as a prominent research focus in artificial intelligence. The core of the VQA task is to combine natural language understanding and image analysis to infer answers [...] Read more.
As an interdisciplinary field of natural language processing and computer vision, Visual Question Answering (VQA) has emerged as a prominent research focus in artificial intelligence. The core of the VQA task is to combine natural language understanding and image analysis to infer answers by extracting meaningful features from textual and visual inputs. However, most current models struggle to fully capture the deep semantic relationships between images and text owing to their limited capacity to comprehend feature interactions, which constrains their performance. To address these challenges, this paper proposes an innovative Trilinear Multigranularity and Multimodal Adaptive Fusion algorithm (TriMMF) that is designed to improve the efficiency of multimodal feature extraction and fusion in VQA tasks. Specifically, the TriMMF consists of three key modules: (1) an Answer Generation Module, which generates candidate answers by extracting fused features and leveraging question features to focus on critical regions within the image; (2) a Fine-grained and Coarse-grained Interaction Module, which achieves multimodal interaction between question and image features at different granularities and incorporates implicit answer information to capture complex multimodal correlations; and (3) an Adaptive Weight Fusion Module, which selectively integrates coarse-grained and fine-grained interaction features based on task requirements, thereby enhancing the model’s robustness and generalization capability. Experimental results demonstrate that the proposed TriMMF significantly outperforms existing methods on the VQA v1.0 and VQA v2.0 datasets, achieving state-of-the-art performance in question–answer accuracy. These findings indicate that the TriMMF effectively captures the deep semantic associations between images and text. The proposed approach provides new insights into multimodal interaction and fusion research, combining domain adaptation techniques to address a broader range of cross-domain visual question answering tasks. Full article
Show Figures

Figure 1

Figure 1
<p>TriMMF model framework diagram.</p>
Full article ">Figure 2
<p>Experimental results for the first N candidate answers.</p>
Full article ">Figure 3
<p>Correctness of candidate answers and final answers of TriMMF.</p>
Full article ">Figure 4
<p>Example prediction of the proposed model on the VQA v2.0 dataset.</p>
Full article ">
17 pages, 3527 KiB  
Article
An Ontology-Based Approach for Understanding Appendicectomy Processes and Associated Resources
by Nadeesha Pathiraja Rathnayaka Hitige, Ting Song, Steven J. Craig, Kimberley J. Davis, Xubing Hao, Licong Cui and Ping Yu
Healthcare 2025, 13(1), 10; https://doi.org/10.3390/healthcare13010010 - 24 Dec 2024
Abstract
Background: Traditional methods for analysing surgical processes often fall short in capturing the intricate interconnectedness between clinical procedures, their execution sequences, and associated resources such as hospital infrastructure, staff, and protocols. Aim: This study addresses this gap by developing an ontology for appendicectomy, [...] Read more.
Background: Traditional methods for analysing surgical processes often fall short in capturing the intricate interconnectedness between clinical procedures, their execution sequences, and associated resources such as hospital infrastructure, staff, and protocols. Aim: This study addresses this gap by developing an ontology for appendicectomy, a computational model that comprehensively represents appendicectomy processes and their resource dependencies to support informed decision making and optimise appendicectomy healthcare delivery. Methods: The ontology was developed using the NeON methodology, drawing knowledge from existing ontologies, scholarly literature, and de-identified patient data from local hospitals. Results: The resulting ontology comprises 108 classes, including 11 top-level classes and 96 subclasses organised across five hierarchical levels. The 11 top-level classes include “clinical procedure”, “appendicectomy-related organisational protocols”, “disease”, “start time”, “end time”, “duration”, “appendicectomy outcomes”, “hospital infrastructure”, “hospital staff”, “patient”, and “patient demographics”. Additionally, the ontology includes 77 object and data properties to define relationships and attributes. The ontology offers a semantic, computable framework for encoding appendicectomy-specific clinical procedures and their associated resources. Conclusion: By systematically representing this knowledge, this study establishes a foundation for enhancing clinical decision making, improving data integration, and ultimately advancing patient care. Future research can leverage this ontology to optimise healthcare workflows and outcomes in appendicectomy management. Full article
Show Figures

Figure 1

Figure 1
<p>Proposed ontology development method (based on the NeON methodology) [<a href="#B38-healthcare-13-00010" class="html-bibr">38</a>,<a href="#B42-healthcare-13-00010" class="html-bibr">42</a>,<a href="#B43-healthcare-13-00010" class="html-bibr">43</a>].</p>
Full article ">Figure 2
<p>A simplified ontology graph of appendicectomy process and resource ontology with major classes and their relationships.</p>
Full article ">Figure 3
<p>Simplified disease class with major subclasses and their relationships.</p>
Full article ">Figure 4
<p>Simplified clinical procedure class with major subclasses and their relationships.</p>
Full article ">Figure 5
<p>Hospital infrastructure class.</p>
Full article ">Figure 6
<p>Hospital staff class.</p>
Full article ">
21 pages, 1132 KiB  
Article
Lightweight Multi-Scale Feature Fusion Network for Salient Object Detection in Optical Remote Sensing Images
by Jun Li and Kaigen Huang
Electronics 2025, 14(1), 8; https://doi.org/10.3390/electronics14010008 - 24 Dec 2024
Abstract
Salient object detection in optical remote sensing images (ORSI-SOD) encounters notable challenges, mainly because of the small scale of salient objects and the similarity between these objects and their backgrounds in images captured by satellite and aerial sensors. Conventional approaches frequently struggle to [...] Read more.
Salient object detection in optical remote sensing images (ORSI-SOD) encounters notable challenges, mainly because of the small scale of salient objects and the similarity between these objects and their backgrounds in images captured by satellite and aerial sensors. Conventional approaches frequently struggle to efficiently leverage multi-scale and multi-stage features. Moreover, these methods usually rely on sophisticated and resource-heavy architectures, which can limit their practicality and efficiency in real-world applications. To overcome these limitations, this paper proposes a novel lightweight network called the Multi-scale Feature Fusion Network (MFFNet). Specifically, a Multi-stage Information Fusion (MIF) module is created to improve the detection of salient objects by effectively integrating features from multiple stages and scales. Additionally, we design a Semantic Guidance Fusion (SGF) module to specifically alleviate the problem of semantic dilution often observed in U-Net architecture. Comprehensive evaluations on two benchmark datasets show that the MFFNet attains outstanding performance in four out of eight evaluation metrics while only having 12.14M parameters and 2.75G FLOPs. These results highlight significant advancements over 31 state-of-the-art models, underscoring the efficiency of MFFNet in salient object-detection tasks. Full article
Show Figures

Figure 1

Figure 1
<p>Comparison of performance and efficiency.</p>
Full article ">Figure 2
<p>The overall framework of MFFNet is founded on U-Net architecture. The UniFormer-L encoder captures four-level features, which are subsequently forwarded to the MIF modules. In the decoder, four MIF modules integrate multi-stage and multi-scale information, while the SGF module leverages top-level semantic features to guide the synthesis of lower-level information.</p>
Full article ">Figure 3
<p>Illustration of the structure of MIF.</p>
Full article ">Figure 4
<p>Illustration of the structure of SGF.</p>
Full article ">Figure 5
<p>PR curves and F-measure curves on the EORSSD and ORSSD datasets.</p>
Full article ">Figure 6
<p>Visual comparisons with 13 ORSI-SOD models.</p>
Full article ">
24 pages, 9347 KiB  
Article
RDAU-Net: A U-Shaped Semantic Segmentation Network for Buildings near Rivers and Lakes Based on a Fusion Approach
by Yipeng Wang, Dongmei Wang, Teng Xu, Yifan Shi, Wenguang Liang, Yihong Wang, George P. Petropoulos and Yansong Bao
Remote Sens. 2025, 17(1), 2; https://doi.org/10.3390/rs17010002 - 24 Dec 2024
Abstract
The encroachment of buildings into the waters of rivers and lakes can lead to increased safety hazards, but current semantic segmentation algorithms have difficulty accurately segmenting buildings in such environments. The specular reflection of the water and boats with similar features to the [...] Read more.
The encroachment of buildings into the waters of rivers and lakes can lead to increased safety hazards, but current semantic segmentation algorithms have difficulty accurately segmenting buildings in such environments. The specular reflection of the water and boats with similar features to the buildings in the environment can greatly affect the performance of the algorithm. Effectively eliminating their influence on the model and further improving the segmentation accuracy of buildings near water will be of great help to the management of river and lake waters. To address the above issues, the present study proposes the design of a U-shaped segmentation network of buildings called RDAU-Net that works through extraction and fuses a convolutional neural network and a transformer to segment buildings. First, we designed a residual dynamic short-cut down-sampling (RDSC) module to minimize the interference of complex building shapes and building scale differences on the segmentation results; second, we reduced the semantic and resolution gaps between multi-scale features using a multi-channel cross fusion transformer module (MCCT); finally, a double-feature channel-wise fusion attention (DCF) was designed to improve the model’s ability to depict building edge details and to reduce the influence of similar features on the model. Additionally, an HRI Building dataset was constructed, comprising water-edge buildings situated in a riverine and lacustrine regulatory context. This dataset encompasses a plethora of water-edge building sample scenarios, offering a comprehensive representation of the subject matter. The experimental results indicated that the statistical metrics achieved by RDAU-Net using the HRI and WHU Building datasets are better than those of others, and that it can effectively solve the building segmentation problems in the management of river and lake waters. Full article
Show Figures

Figure 1

Figure 1
<p>The architecture of RDAU-Net.</p>
Full article ">Figure 2
<p>The running flow of dynamic convolution.</p>
Full article ">Figure 3
<p>The structure of RDSC module.</p>
Full article ">Figure 4
<p>The structure of the MCCT module.</p>
Full article ">Figure 5
<p>Multi-channel attention mechanism.</p>
Full article ">Figure 6
<p>The structure of the DCF module.</p>
Full article ">Figure 7
<p>Example of a typical sample of the HRI Building dataset.</p>
Full article ">Figure 8
<p>Results of ablation experiments on the HRI Building dataset. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) Baseline. (<b>d</b>) Baseline + RDSC. (<b>e</b>) Baseline + RDSC + MCCT. (<b>f</b>) Baseline + RDSC + MCCT + DCF.</p>
Full article ">Figure 9
<p>Results of ablation experiments on the WHU Building dataset. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) Baseline. (<b>d</b>) Baseline + RDSC. (<b>e</b>) Baseline + RDSC + MCCT. (<b>f</b>) Baseline + RDSC + MCCT + DCF.</p>
Full article ">Figure 10
<p>Visualization of the results of comparative experiments on the HRI Building dataset. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) FCN. (<b>d</b>) U-Net. (<b>e</b>) U-Net++. (<b>f</b>) Swin-UNet. (<b>g</b>) ACC-UNet. (<b>h</b>) CSC-UNet. (<b>i</b>) UCTransNet. (<b>j</b>) DTA-UNet. (<b>k</b>) RDAU-Net.</p>
Full article ">Figure 11
<p>Visualization of the results of comparative experiments on the WHU Building dataset. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) FCN. (<b>d</b>) U-Net. (<b>e</b>) U-Net++. (<b>f</b>) Swin-UNet. (<b>g</b>) ACC-UNet. (<b>h</b>) CSC-UNet. (<b>i</b>) UCTransNet. (<b>j</b>) DTA-UNet. (<b>k</b>) RDAU-Net.</p>
Full article ">
20 pages, 10713 KiB  
Article
Detecting Ocean Eddies with a Lightweight and Efficient Convolutional Network
by Haochen Sun, Hongping Li, Ming Xu, Tianyu Xia and Hao Yu
Remote Sens. 2024, 16(24), 4808; https://doi.org/10.3390/rs16244808 - 23 Dec 2024
Abstract
As a ubiquitous mesoscale phenomenon, ocean eddies significantly impact ocean energy and mass exchange. Detecting these eddies accurately and efficiently has become a research focus in ocean remote sensing. Many traditional detection methods, rooted in physical principles, often encounter challenges in practical applications [...] Read more.
As a ubiquitous mesoscale phenomenon, ocean eddies significantly impact ocean energy and mass exchange. Detecting these eddies accurately and efficiently has become a research focus in ocean remote sensing. Many traditional detection methods, rooted in physical principles, often encounter challenges in practical applications due to their complex parameter settings, while effective, deep learning models can be limited by the high computational demands of their extensive parameters. Therefore, this paper proposes a new approach to eddy detection based on the altimeter data, the Ghost Attention Deeplab Network (GAD-Net), which is a lightweight and efficient semantic segmentation model designed to address these issues. The encoder of GAD-Net consists of a lightweight ECA+GhostNet and an Atrous Spatial Pyramid Pooling (ASPP) module. And the decoder integrates an Efficient Attention Network (EAN) module and an Efficient Ghost Feature Integration (EGFI) module. Experimental results show that GAD-Net outperforms other models in evaluation indices, with a lighter model size and lower computational complexity. It also outperforms other segmentation models in actual detection results in different sea areas. Furthermore, GAD-Net achieves detection results comparable to the Py-Eddy-Tracker (PET) method with a smaller eddy radius and a faster detection speed. The model and the constructed eddy dataset are publicly available. Full article
(This article belongs to the Special Issue Artificial Intelligence for Ocean Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>GAD-Net’s overall structure.</p>
Full article ">Figure 2
<p>Ghost module’s structure.</p>
Full article ">Figure 3
<p>Ghost bottleneck with different step sizes.</p>
Full article ">Figure 4
<p>ECA+Net’s structure.</p>
Full article ">Figure 5
<p>EAN module’s structure.</p>
Full article ">Figure 6
<p>EGFI module’s structure.</p>
Full article ">Figure 7
<p>The dataset area is shown in a black box (10°N−30°N, 120°E−150°E).</p>
Full article ">Figure 8
<p>Confusion matrix calculations for the models. (<b>a</b>) Base model. (<b>b</b>) Base model integrating ECA+GhostNet. (<b>c</b>) Base model integrating ECA+GhostNet and EAN module. (<b>d</b>) GAD-Net.</p>
Full article ">Figure 9
<p>Results of ablation experiments with the spatial information loss. Red regions are anticyclonic eddies and blue regions are cyclonic eddies. White boxes are undetected eddies and yellow boxes are more detected eddies. (<b>a</b>) Input. (<b>b</b>) Ground truth. (<b>c</b>) GAD-Net with the input size of <math display="inline"><semantics> <mrow> <mn>640</mn> <mo>×</mo> <mn>640</mn> </mrow> </semantics></math>. (<b>d</b>) GAD-Net with the input size of <math display="inline"><semantics> <mrow> <mn>480</mn> <mo>×</mo> <mn>480</mn> </mrow> </semantics></math>. (<b>e</b>) GAD-Net without the ASPP module. (<b>f</b>) GAD-Net without the EGFI module.</p>
Full article ">Figure 10
<p>Eddy detection results of the comparison experiment. Red regions are anticyclonic eddies and blue regions are cyclonic eddies. White boxes are undetected eddies and yellow boxes are more detected eddies. (<b>a</b>) Input. (<b>b</b>) Ground truth. (<b>c</b>) UNet with ResNet. (<b>d</b>) UNet with GhostNet. (<b>e</b>) PSPNet with ResNet. (<b>f</b>) PSPNet with GhostNet. (<b>g</b>) Deeplabv3+ with ResNet. (<b>h</b>) Deeplabv3+ with GhostNet. (<b>i</b>) LR-ASPP. (<b>j</b>) HRNetv2. (<b>k</b>) Segformer. (<b>l</b>) GAD-Net.</p>
Full article ">Figure 11
<p>Eddy detection results within the dataset. Red regions are anticyclonic eddies and blue regions are cyclonic eddies. White boxes are undetected eddies and yellow boxes are more detected eddies. (<b>a</b>) Input. (<b>b</b>) Ground truth. (<b>c</b>) UNet with ResNet. (<b>d</b>) Deeplabv3+ with ResNet. (<b>e</b>) Segfomer. (<b>f</b>) GAD-Net.</p>
Full article ">Figure 12
<p>Eddy detection results outside the dataset. Red regions are anticyclonic eddies and blue regions are cyclonic eddies. Yellow boxes are more detected eddies. (<b>a</b>) Input. (<b>b</b>) UNet with ResNet. (<b>c</b>) Deeplabv3+ with ResNet. (<b>d</b>) Segfomer. (<b>e</b>) GAD-Net.</p>
Full article ">Figure 13
<p>Validation experiment results. Red regions are anticyclonic eddies and blue regions are cyclonic eddies. Yellow boxes are more detected eddies, green boxes are incorrectly detected eddies, and the arrows indicate the regional geostrophic flow. (<b>a</b>) Input. (<b>b</b>) PET. (<b>c</b>) GAD-Net. (<b>d</b>) Regional geostrophic flow.</p>
Full article ">Figure 14
<p>Distribution of eddy radius detected by GAD-Net and PET.</p>
Full article ">
28 pages, 38236 KiB  
Article
Disassembly of Distribution Transformers Based on Multimodal Data Recognition and Collaborative Processing
by Li Wang, Feng Chen, Yujia Hu, Zhiyao Zheng and Kexin Zhang
Algorithms 2024, 17(12), 595; https://doi.org/10.3390/a17120595 - 23 Dec 2024
Abstract
As power system equipment gradually ages, the automated disassembly of transformers has become a critical area of research to enhance both efficiency and safety. This paper presents a transformer disassembly system designed for power systems, leveraging multimodal perception and collaborative processing. By integrating [...] Read more.
As power system equipment gradually ages, the automated disassembly of transformers has become a critical area of research to enhance both efficiency and safety. This paper presents a transformer disassembly system designed for power systems, leveraging multimodal perception and collaborative processing. By integrating 2D images and 3D point cloud data captured by RGB-D cameras, the system enables the precise recognition and efficient disassembly of transformer covers and internal components through multimodal data fusion, deep learning models, and control technologies. The system employs an enhanced YOLOv8 model for positioning and identifying screw-fastened covers while also utilizing the STDC network for segmentation and cutting path planning of welded covers. In addition, the system captures 3D point cloud data of the transformer’s interior using multi-view RGB-D cameras and performs multimodal semantic segmentation and object detection via the ODIN model, facilitating the high-precision identification and cutting of complex components such as windings, studs, and silicon steel sheets. Experimental results show that the system achieves a recognition accuracy of 99% for both cover and internal component disassembly, with a disassembly success rate of 98%, demonstrating its high adaptability and safety in complex industrial environments. Full article
Show Figures

Figure 1

Figure 1
<p>General scheme for transformer disassembly system.</p>
Full article ">Figure 2
<p>Schematic diagram of the transformer cover disassembly process.</p>
Full article ">Figure 3
<p>Schematic diagram of the disassembly process of the internal components of the transformer.</p>
Full article ">Figure 4
<p>Overall flowchart of transformer copper wire winding stripping method.</p>
Full article ">Figure 5
<p>Two types of transformer cover. (<b>a</b>) Screw-fastened cover. (<b>b</b>) Welded cover.</p>
Full article ">Figure 6
<p>Process of target detection.</p>
Full article ">Figure 7
<p>Diagram of the target detection effect.</p>
Full article ">Figure 8
<p>Result of the transformer cover segmentation.</p>
Full article ">Figure 9
<p>The process of the ODIN modeling transformer.</p>
Full article ">Figure 10
<p>The process of segmentation and identification of internal components of the transformer.</p>
Full article ">Figure 11
<p>Constant-tension intelligent unwinding system structure.</p>
Full article ">Figure 12
<p>Intelligent wire removal system.</p>
Full article ">Figure 13
<p>Architecture of the RCF network.</p>
Full article ">Figure 14
<p>Transformer disassembly area in the new established disassembly factory.</p>
Full article ">Figure 15
<p>Transformer interior before disassembly.</p>
Full article ">Figure 16
<p>Transformer interior after disassembly.</p>
Full article ">Figure 17
<p>Component recognition and disassembly process. (<b>a</b>) Transformer cover and internal component recognition. (<b>b</b>) Transformer disassembly process and results.</p>
Full article ">
41 pages, 43778 KiB  
Review
UAV (Unmanned Aerial Vehicle): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking
by Md. Mahfuzur Rahman, Sunzida Siddique, Marufa Kamal, Rakib Hossain Rifat and Kishor Datta Gupta
Algorithms 2024, 17(12), 594; https://doi.org/10.3390/a17120594 - 23 Dec 2024
Abstract
Unmanned Aerial Vehicles (UAVs) have transformed the process of data collection and analysis in a variety of research disciplines, delivering unparalleled adaptability and efficacy. This paper presents a thorough examination of UAV datasets, emphasizing their wide range of applications and progress. UAV datasets [...] Read more.
Unmanned Aerial Vehicles (UAVs) have transformed the process of data collection and analysis in a variety of research disciplines, delivering unparalleled adaptability and efficacy. This paper presents a thorough examination of UAV datasets, emphasizing their wide range of applications and progress. UAV datasets consist of various types of data, such as satellite imagery, images captured by drones, and videos. These datasets can be categorized as either unimodal or multimodal, offering a wide range of detailed and comprehensive information. These datasets play a crucial role in disaster damage assessment, aerial surveillance, object recognition, and tracking. They facilitate the development of sophisticated models for tasks like semantic segmentation, pose estimation, vehicle re-identification, and gesture recognition. By leveraging UAV datasets, researchers can significantly enhance the capabilities of computer vision models, thereby advancing technology and improving our understanding of complex, dynamic environments from an aerial perspective. This review aims to encapsulate the multifaceted utility of UAV datasets, emphasizing their pivotal role in driving innovation and practical applications in multiple domains. Full article
(This article belongs to the Special Issue Machine Learning for Pattern Recognition (2nd Edition))
Show Figures

Figure 1

Figure 1
<p>Workflow of this study.</p>
Full article ">Figure 2
<p>Diverse applications of UAV datasets in computer vision research.</p>
Full article ">Figure A1
<p>Aerial Image Dataset for Applications in Emergency Response (AIDER): A selection of pictures from the augmented database.</p>
Full article ">Figure A2
<p>Illustrations of the flapping-wing UAV used for data collection and the representative data of BioDrone. Different flight attitudes for various scenes under three lighting conditions are included in the data acquisition process, ensuring that BioDrone can fully reflect the robust visual challenges of the flapping-wing UAVs.</p>
Full article ">Figure A3
<p>Overview of the ERA dataset. Overall, they have collected 2864 labeled video snippets for 24 event classes and 1 normal class: post-earthquake, flood, fire, landslide, mudslide, traffic collision, traffic congestion, harvesting, ploughing, constructing, police chase, conflict, baseball, basketball, boating, cycling, running, soccer, swimming, car racing, party, concert, parade/protest, religious activity, and non-event. For each class, we show the first (left) and last (right) frames of a video. Best viewed zoomed in color.</p>
Full article ">Figure A4
<p>Samples of the various FOR-instance data collections’ instance and semantic annotations.</p>
Full article ">Figure A5
<p>The first frames of representative scenes in newly constructed UAVDark135. Here, target ground-truths are marked out by green boxes and sequence names are located at the top left corner of the images. Dark special challenges like objects’ unreliable color feature and objects’ merging into the dark can be seen clearly.</p>
Full article ">Figure A6
<p>Examples of action videos in UAV-Human dataset. The first and second rows show two video sequences of significant camera motions and view variations, caused by continuously varying flight attitudes, speeds, and heights. The last three rows display action samples of the dataset, showing the diversities, e.g., distinct views, various capture sites, weathers, scales, and motion blur.</p>
Full article ">Figure A7
<p>Example images and labels from UAVid dataset. First row shows the images captured by UAV. Second row shows the corresponding ground truth labels. Third row shows the prediction results of MS-Dilation net+PRT+FSO model. The last row shows the labels.</p>
Full article ">Figure A8
<p>Initial frames of specific sequences from the DarkTrack2021 archive. Objects being tracked are indicated by green boxes, and sequence names are shown in the top left corner of the photos.</p>
Full article ">Figure A9
<p>Illustration of a UAV-based vehicle Re-Identification (ReID) dataset displaying annotated distinguishing elements, such as skylights, bumpers, spare tires, and baggage racks, across diverse vehicle categories (e.g., white truck, black sedan, blue SUV). The green bounding boxes and arrows delineate particular vehicle components essential for identification, while variations in perspective and resolution underscore the difficulties of ReID from UAV imagery. The tabular annotations beneath each vehicle indicate the presence or absence of key elements for ReID.</p>
Full article ">Figure A10
<p>Exemplary photos extracted from the dataset. The dataset is obtained from a comprehensive real video surveillance system including 174 cameras strategically placed around an urban area spanning over 200 square kilometers.</p>
Full article ">Figure A11
<p>Graphical representation of complex scenes from the RescueNet dataset. The first and third rows display the original photos, while the lower rows provide the associated annotations for both semantic segmentation and image classification functions. Displayed on the right are the 10 classes, each represented by their segmentation color.</p>
Full article ">Figure A12
<p>This diagram illustrates the many modalities present in the UAV-Assistant dataset, which consists of a randomly chosen collection of images. The uppermost row displays color photos, the second row displays depth, the third row displays the normal map, and the last row displays flight silhouettes of the drone.</p>
Full article ">Figure A13
<p>The AU−AIR dataset includes extracted frames that are annotated with object information, time stamp, current location, altitude, velocity of the UAV, and rotation data observed from the IMU sensor. This figure presents an exemplar of it.</p>
Full article ">Figure A14
<p>This diagram displays thirteen explicitly chosen gestures, each accompanied by a single picked image. Directions of hand movement are shown by the arrows. The amber color marks serve as approximate indicators of the initial and final locations of the palm for ONE iteration. Neither the Hover nor Land gestures are dynamic gestures.</p>
Full article ">Figure A15
<p>Exemplary commands and visual representations derived from the KITE dataset.</p>
Full article ">
23 pages, 2811 KiB  
Review
Crisis Response in Tourism: Semantic Networks and Topic Modeling in the Hotel and Aviation Industries
by Ruohan Tang, Shaofeng Zhao, Won Seok Lee, Sunwoo Park and Yunfei Zhang
Sustainability 2024, 16(24), 11275; https://doi.org/10.3390/su162411275 - 23 Dec 2024
Abstract
The COVID-19 pandemic caused unprecedented global disruptions, with the hotel and aviation industries—two critical pillars of tourism—among the hardest hit. This study analyzed 451 hotel-related and 336 aviation-related records from the Web of Science database, applying semantic network analysis to uncover eight clusters [...] Read more.
The COVID-19 pandemic caused unprecedented global disruptions, with the hotel and aviation industries—two critical pillars of tourism—among the hardest hit. This study analyzed 451 hotel-related and 336 aviation-related records from the Web of Science database, applying semantic network analysis to uncover eight clusters of crisis management knowledge: basic functions, crisis response, operational strategies, epidemic prevention and control, crisis perception, innovative services, scope of influence, and internal and external environments. Latent Dirichlet Allocation (LDA) topic modeling identified distinct thematic strategies for each sector. In hotels, these included Digital Innovation Transformation, Monitoring Management Procedures, Emotional Awareness Incentives, and Resilience Mechanism Establishment. In aviation, strategies focused on Green Economic Transformation, Co-creation Value Realization, Passenger Incentive Mechanisms, and Balancing Health Risks. By visualizing co-occurrence relationships and mapping thematic intersections and divergences, this study provides actionable insights into the recovery strategies of these industries. The findings offer robust support for developing targeted management approaches and decision-making frameworks to ensure the sustainable growth of the tourism sector. Full article
(This article belongs to the Section Tourism, Culture, and Heritage)
Show Figures

Figure 1

Figure 1
<p>Research framework and analytical flowchart (source: own source).</p>
Full article ">Figure 2
<p>Keyword cloud in hotel and aviation (source: own source). (<b>a</b>) Hotel industry; (<b>b</b>) Aviation industry. In the figures, red represents the hotel industry, while blue represents the aviation industry. The font size varies according to word frequency.</p>
Full article ">Figure 3
<p>Visualization of semantic network clusters in the hotel and aviation (source: own source). (<b>a</b>) Hotel industry; (<b>b</b>) Aviation industry. Clusters are labeled as follows: Cluster 1 = Basic Functions; Cluster 2 = Crisis Response; Cluster 3 = Operational Strategies; Cluster 4 = Epidemic Prevention and Control; Cluster 5 = Crisis Perception; Cluster 6 = Innovative Services; Cluster 7 =Scope of Influence; Cluster 8 = Internal and External Environment.</p>
Full article ">Figure 3 Cont.
<p>Visualization of semantic network clusters in the hotel and aviation (source: own source). (<b>a</b>) Hotel industry; (<b>b</b>) Aviation industry. Clusters are labeled as follows: Cluster 1 = Basic Functions; Cluster 2 = Crisis Response; Cluster 3 = Operational Strategies; Cluster 4 = Epidemic Prevention and Control; Cluster 5 = Crisis Perception; Cluster 6 = Innovative Services; Cluster 7 =Scope of Influence; Cluster 8 = Internal and External Environment.</p>
Full article ">Figure 4
<p>LDA topic modeling coordinates for the hotel and aviation (source: own source). (<b>a</b>) Hotel industry; (<b>b</b>) Aviation industry. Here, the <span class="html-italic">x</span>-axis symbolizes resilience and co-creation, while the <span class="html-italic">y</span>-axis denotes management and recovery.</p>
Full article ">Figure 5
<p>Scatter distribution of the topic prevalence of LDA in the hotel and aviation (source: own source).</p>
Full article ">
18 pages, 930 KiB  
Case Report
Ontological Representation of the Structure and Vocabulary of Modern Greek on the Protégé Platform
by Nikoletta Samaridi, Evangelos Papakitsos and Nikitas Karanikolas
Computation 2024, 12(12), 249; https://doi.org/10.3390/computation12120249 - 23 Dec 2024
Abstract
One of the issues in Natural Language Processing (NLP) and Artificial Intelligence (AI) is language representation and modeling, aiming to manage its structure and find solutions to linguistic issues. With the pursuit of the most efficient capture of knowledge about the Modern Greek [...] Read more.
One of the issues in Natural Language Processing (NLP) and Artificial Intelligence (AI) is language representation and modeling, aiming to manage its structure and find solutions to linguistic issues. With the pursuit of the most efficient capture of knowledge about the Modern Greek language and, given the scientifically certified usability of the ontological structuring of data in the field of the semantic web and cognitive computing, a new ontology of the Modern Greek language at the level of structure and vocabulary is presented in this paper, using the Protégé platform. With the specific logical and structured form of knowledge representation to express, this research processes and exploits in an easy and useful way the distributed semantics of linguistic information. Full article
Show Figures

Figure 1

Figure 1
<p>The four (4) basic concepts (Μορφολογία_Morphology, Σύνταξη_Syntax, Σημασιολογία_Semantics, and Φωνητική_Phonetics), on which the ontology of Modern Greek is structured on the Protégé platform.</p>
Full article ">Figure 2
<p>The data properties (GreekLanguageDataProperty) of the new Greek Language Ontology Dictionary on the Protégé platform.</p>
Full article ">
16 pages, 2833 KiB  
Article
MGKGR: Multimodal Semantic Fusion for Geographic Knowledge Graph Representation
by Jianqiang Zhang, Renyao Chen, Shengwen Li, Tailong Li and Hong Yao
Algorithms 2024, 17(12), 593; https://doi.org/10.3390/a17120593 - 23 Dec 2024
Abstract
Geographic knowledge graph representation learning embeds entities and relationships in geographic knowledge graphs into a low-dimensional continuous vector space, which serves as a basic method that bridges geographic knowledge graphs and geographic applications. Previous geographic knowledge graph representation methods primarily learn the vectors [...] Read more.
Geographic knowledge graph representation learning embeds entities and relationships in geographic knowledge graphs into a low-dimensional continuous vector space, which serves as a basic method that bridges geographic knowledge graphs and geographic applications. Previous geographic knowledge graph representation methods primarily learn the vectors of entities and their relationships from their spatial attributes and relationships, which ignores various semantics of entities, resulting in poor embeddings on geographic knowledge graphs. This study proposes a two-stage multimodal geographic knowledge graph representation (MGKGR) model that integrates multiple kinds of semantics to improve the embedding learning of geographic knowledge graph representation. Specifically, in the first stage, a spatial feature fusion method for modality enhancement is proposed to combine the structural features of geographic knowledge graphs with two modal semantic features. In the second stage, a multi-level modality feature fusion method is proposed to integrate heterogeneous features from different modalities. By fusing the semantics of text and images, the performance of geographic knowledge graph representation is improved, providing accurate representations for downstream geographic intelligence tasks. Extensive experiments on two datasets show that the proposed MGKGR model outperforms the baselines. Moreover, the results demonstrate that integrating textual and image data into geographic knowledge graphs can effectively enhance the model’s performance. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

Figure 1
<p>Multimodal data in the geographic knowledge graph provides semantic information for geographic attribute prediction.</p>
Full article ">Figure 2
<p>The framework of the proposed MGKGR. (<b>A</b>) Multimodal GeoKG Encoding module processes the multimodal data of multimodal GeoKG for effective encoding. (<b>B</b>) Two-Stage Multimodal Feature Fusion module integrates features from multiple modalities to generate the multimodal features of multimodal GeoKG.</p>
Full article ">Figure 3
<p>Model performance on attribute relations, adjacency relations, and mixed relations.</p>
Full article ">
19 pages, 10695 KiB  
Article
A Scene Knowledge Integrating Network for Transmission Line Multi-Fitting Detection
by Xinhang Chen, Xinsheng Xu, Jing Xu, Wenjie Zheng and Qianming Wang
Sensors 2024, 24(24), 8207; https://doi.org/10.3390/s24248207 - 23 Dec 2024
Abstract
Aiming at the severe occlusion problem and the tiny-scale object problem in the multi-fitting detection task, the Scene Knowledge Integrating Network (SKIN), including the scene filter module (SFM) and scene structure information module (SSIM) is proposed. Firstly, the particularity of the scene in [...] Read more.
Aiming at the severe occlusion problem and the tiny-scale object problem in the multi-fitting detection task, the Scene Knowledge Integrating Network (SKIN), including the scene filter module (SFM) and scene structure information module (SSIM) is proposed. Firstly, the particularity of the scene in the multi-fitting detection task is analyzed. Hence, the aggregation of the fittings is defined as the scene according to the professional knowledge of the power field and the habit of the operators in identifying the fittings. So, the scene knowledge will include global context information, fitting fine-grained visual information and scene structure information. Then, a scene filter module is designed to learn the global context information and fitting fine-grained visual information, and a scene structure module is designed to learn the scene structure information. Finally, the scene semantic features are used as the carrier to integrate three categories of information into the relative scene features, which can assist in the recognition of the occluded fittings and the tiny-scale fittings after feature mining and feature integration. The experiments show that the proposed network can effectively improve the performance of the multi-fitting detection task compared with the Faster R-CNN and other state-of-the-art models. In particular, the detection performances of the occluded and tiny-scale fittings are significantly improved. Full article
Show Figures

Figure 1

Figure 1
<p>Fitting object detection problems (Objects are indicated by green boxes).</p>
Full article ">Figure 2
<p>Comparison chart of scene meanings.</p>
Full article ">Figure 3
<p>Scene definition diagram.</p>
Full article ">Figure 4
<p>SKIN model structure.</p>
Full article ">Figure 5
<p>Scene filtering module structure diagram.</p>
Full article ">Figure 6
<p>Scene-fitting co-existence matrix.</p>
Full article ">Figure 7
<p>The network structure of SSIM.</p>
Full article ">Figure 8
<p>Qualitative result comparison on fitting dataset.</p>
Full article ">Figure 9
<p>More test results.</p>
Full article ">
19 pages, 6995 KiB  
Article
A Classification Model for Fine-Grained Silkworm Cocoon Images Based on Bilinear Pooling and Adaptive Feature Fusion
by Mochen Liu, Xin Hou, Mingrui Shang, Eunice Oluwabunmi Owoola, Guizheng Zhang, Wei Wei, Zhanhua Song and Yinfa Yan
Agriculture 2024, 14(12), 2363; https://doi.org/10.3390/agriculture14122363 - 22 Dec 2024
Viewed by 291
Abstract
The quality of silkworm cocoons affects the quality and cost of silk processing. It is necessary to sort silkworm cocoons prior to silk production. Cocoon images consist of fine-grained images with large intra-class differences and small inter-class differences. The subtle intra-class features pose [...] Read more.
The quality of silkworm cocoons affects the quality and cost of silk processing. It is necessary to sort silkworm cocoons prior to silk production. Cocoon images consist of fine-grained images with large intra-class differences and small inter-class differences. The subtle intra-class features pose a serious challenge in accurately locating the effective areas and classifying silkworm cocoons. To improve the perception of intra-class features and the classification accuracy, this paper proposes a bilinear pooling classification model (B-Res41-ASE) based on adaptive multi-scale feature fusion and enhancement. B-Res41-ASE consists of three parts: a feature extraction module, a feature fusion module, and a feature enhancement module. Firstly, the backbone network, ResNet41, is constructed based on the bilinear pooling algorithm to extract complete cocoon features. Secondly, the adaptive spatial feature fusion module (ASFF) is introduced to fuse different semantic information to solve the problem of fine-grained information loss in the process of feature extraction. Finally, the squeeze and excitation module (SE) is used to suppress redundant information, enhance the weight of distinguishable regions, and reduce classification bias. Compared with the widely used classification network, the proposed model achieves the highest classification performance in the test set, with accuracy of 97.0% and an F1-score of 97.5%. The accuracy of B-Res41-ASE is 3.1% and 2.6% higher than that of the classification networks AlexNet and GoogLeNet, respectively, while the F1-score is 2.5% and 2.2% higher, respectively. Additionally, the accuracy of B-Res41-ASE is 1.9% and 7.7% higher than that of the Bilinear CNN and HBP, respectively, while the F1-score is 1.6% and 5.7% higher. The experimental results show that the proposed classification model without complex labelling outperforms other cocoon classification algorithms in terms of classification accuracy and robustness, providing a theoretical basis for the intelligent sorting of silkworm cocoons. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Images of silkworm cocoons captured by top and bottom camera. (<b>a</b>) Cocoon image captured by camera (top). (<b>b</b>) Cocoon image captured by camera (bottom).</p>
Full article ">Figure 2
<p>Images of reelable cocoons and different types of waste cocoons.</p>
Full article ">Figure 3
<p>Cocoon image classification model architecture.</p>
Full article ">Figure 4
<p>Bilinear pooling-based image classification model for silkworm cocoon images.</p>
Full article ">Figure 5
<p>Silkworm cocoon image classification model based on bilinear pooling with feature fusion.</p>
Full article ">Figure 6
<p>Bilinear pooling classification model for silkworm cocoon images based on feature fusion and enhancement.</p>
Full article ">Figure 7
<p>The training accuracy curves of different fusion algorithms. (<b>a</b>) The training accuracy curves of different fusion algorithms. (<b>b</b>) The training loss curves of different fusion algorithms.</p>
Full article ">Figure 8
<p>Confusion matrix for different fusion algorithms.</p>
Full article ">Figure 8 Cont.
<p>Confusion matrix for different fusion algorithms.</p>
Full article ">Figure 9
<p>The fine-grained classification <span class="html-italic">precision</span> of the silkworm cocoon for different fusion algorithms. A. Cocoon polluted by oil. B. Stained cocoon. C. Cocoon pressed by cocooning frame. D. Crushed cocoon. E. Double cocoon. F. Reelable cocoon. G. Yellow spotted cocoon. H. Decayed cocoon. I. Malformed cocoon.</p>
Full article ">Figure 10
<p>The training curves of different feature fusion and enhancement methods. (<b>a</b>) The training accuracy curves. (<b>b</b>) The training loss curves.</p>
Full article ">Figure 11
<p>The confusion matrix for different feature enhancements.</p>
Full article ">Figure 12
<p>The fine-grained classification precision of the silkworm cocoon for different feature enhancements. A. Cocoon polluted by oil. B. Stained cocoon. C. Cocoon pressed by cocooning frame. D. Crushed cocoon. E. Double cocoon. F. Reelable cocoon. G. Yellow spotted cocoon. H. Decayed cocoon. I. Malformed cocoon.</p>
Full article ">Figure 13
<p>Comparison of different models with Grad-CAM visualization.</p>
Full article ">Figure 14
<p>Adaptive spatial feature map before and after fusion visualization.</p>
Full article ">Figure 15
<p>Accuracy and loss value change curve for each model. (<b>a</b>) The training accuracy curves of different algorithms. (<b>b</b>) The training loss curves of different algorithms.</p>
Full article ">Figure 16
<p>Experimental images of different varieties of silkworm cocoons.</p>
Full article ">
Back to TopTop