[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (535)

Search Parameters:
Keywords = pyramid feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 2730 KiB  
Article
Redefining Contextual and Boundary Synergy: A Boundary-Guided Fusion Network for Medical Image Segmentation
by Yu Chen, Yun Wu, Jiahua Wu, Xinxin Zhang, Dahan Wang and Shunzhi Zhu
Electronics 2024, 13(24), 4986; https://doi.org/10.3390/electronics13244986 - 18 Dec 2024
Viewed by 244
Abstract
Medical image segmentation plays a crucial role in medical image processing, focusing on the automated extraction of regions of interest (such as organs, lesions, etc.) from medical images. This process supports various clinical applications, including diagnosis, surgical planning, and treatment. In this paper, [...] Read more.
Medical image segmentation plays a crucial role in medical image processing, focusing on the automated extraction of regions of interest (such as organs, lesions, etc.) from medical images. This process supports various clinical applications, including diagnosis, surgical planning, and treatment. In this paper, we introduce a Boundary-guided Context Fusion U-Net (BCF-UNet), a novel approach designed to tackle a critical shortcoming in current methods: the inability to effectively integrate boundary information with semantic context. The BCF-UNet introduces a Adaptive Multi-Frequency Encoder (AMFE), which uses multi-frequency analysis inspired by the Wavelet Transform (WT) to capture both local and global features efficiently. The Adaptive Multi-Frequency Encoder (AMFE) decomposes images into different frequency components and adapts more effectively to boundary texture information through a learnable activation function. Additionally, we introduce a new multi-scale feature fusion module, the Atten-kernel Adaptive Fusion Module (AKAFM), designed to integrate deep semantic information with shallow texture details, significantly bridging the gap between features at different scales. Furthermore, each layer of the encoder sub-network integrates a Boundary-aware Pyramid Module (BAPM), which utilizes a simple and effective method and combines it with a priori knowledge to extract multi-scale edge features to improve the accuracy of boundary segmentation. In BCF-UNet, semantic context is used to guide edge information extraction, enabling the model to more effectively comprehend and identify relationships among various organizational structures. Comprehensive experimental evaluations on two datasets demonstrate that the proposed BCF-UNet achieves superior performance compared to existing state-of-the-art methods. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Two group examples of the skin lesion and colorectal polyp. The first row showcases images from colorectal polyp endoscopies, while the second row displays dermoscopic images of skin lesions. In each image, a solid yellow line delineates the boundary of the region of interest.</p>
Full article ">Figure 2
<p>Illustration of the proposed BCF-UNet.</p>
Full article ">Figure 3
<p>Adaptive Multi-Frequency Encoder.</p>
Full article ">Figure 4
<p>Atten-kernel Adaptive Fusion Module (AKAFM).</p>
Full article ">Figure 5
<p>Panoramic Channel Enhancement Module (PCEM) and Deformable Spatial Attention Module (DSAM).</p>
Full article ">Figure 6
<p>Structure of proposed Boundary-aware Pyramid Module (BAPM).</p>
Full article ">Figure 7
<p>ISIC2018 Skin Lesion Dataset.</p>
Full article ">Figure 8
<p>Kavasir -SEG polyp Dataset.</p>
Full article ">
16 pages, 3143 KiB  
Article
DGA Domain Detection Based on Transformer and Rapid Selective Kernel Network
by Jisheng Tang, Yiling Guan, Shenghui Zhao, Huibin Wang and Yinong Chen
Electronics 2024, 13(24), 4982; https://doi.org/10.3390/electronics13244982 - 18 Dec 2024
Viewed by 257
Abstract
Botnets pose a significant challenge in network security by leveraging Domain Generation Algorithms (DGA) to evade traditional security measures. Extracting DGA domain samples is inherently complex, and the current DGA detection models often struggle to capture domain features effectively when facing limited training [...] Read more.
Botnets pose a significant challenge in network security by leveraging Domain Generation Algorithms (DGA) to evade traditional security measures. Extracting DGA domain samples is inherently complex, and the current DGA detection models often struggle to capture domain features effectively when facing limited training data. This limitation results in suboptimal detection performance and an imbalance between model accuracy and complexity. To address these challenges, this paper introduces a novel multi-scale feature fusion model that integrates the Transformer architecture with the Rapid Selective Kernel Network (R-SKNet). The proposed model employs the Transformer’s encoder to couple the single-domain character elements with the multiple types of relationships within the global domain block. This paper proposes integrating R-SKNet into DGA detection and developing an efficient channel attention (ECA) module. By enhancing the branch information guidance in the SKNet architecture, the approach achieves adaptive receptive field selection, multi-scale feature capture, and lightweight yet efficient multi-scale convolution. Moreover, the improved Feature Pyramid Network (FPN) architecture, termed EFAM, is utilized to adjust channel weights for outputs at different stages of the backbone network, leading to achieving multi-scale feature fusion. Experimental results demonstrate that, in tasks with limited training samples, the proposed method achieves lower computational complexity and higher detection accuracy compared to mainstream detection models. Full article
Show Figures

Figure 1

Figure 1
<p>Overall framework.</p>
Full article ">Figure 2
<p>Sample domain length.</p>
Full article ">Figure 3
<p>Transformer encoder module.</p>
Full article ">Figure 4
<p>R-SK convolution structure.</p>
Full article ">Figure 5
<p><b>ECA module</b>.</p>
Full article ">Figure 6
<p>Band matrix.</p>
Full article ">Figure 7
<p>EFAM structure.</p>
Full article ">Figure 8
<p>Binary Classification Results and Model Parameters Comparisons.</p>
Full article ">
21 pages, 6489 KiB  
Article
Peach Leaf Shrinkage Disease Recognition Algorithm Based on Attention Spatial Pyramid Pooling Enhanced with Local Attention Network
by Caihong Zhang, Pingchuan Zhang, Yanjun Hu, Zeze Ma, Xiaona Ding, Ying Yang and Shan Li
Electronics 2024, 13(24), 4973; https://doi.org/10.3390/electronics13244973 - 17 Dec 2024
Viewed by 296
Abstract
Aiming at many challenges in the recognition task of peach leaf shrink disease, such as the diversity of object size of diseased leaf disease, complex background interference, and inflexible adjustment of model training learning rate, we propose a peach leaf shrink disease recognition [...] Read more.
Aiming at many challenges in the recognition task of peach leaf shrink disease, such as the diversity of object size of diseased leaf disease, complex background interference, and inflexible adjustment of model training learning rate, we propose a peach leaf shrink disease recognition algorithm based on an attention generalized efficient layer aggregation network. Firstly, the rectified linear unit activation function is used to effectively improve the stability and performance of the model in low-precision computing environments and solve the problem of partial gradient disappearance. Secondly, the integrated squeeze-and-excitation network attention mechanism can adaptively focus on the key areas of pests and diseases in the image, which significantly enhances the recognition ability of the model to the characteristics of pests and diseases. Finally, combined with fast pyramid pooling enhanced with Local Attention Networks, the deep fusion of cross-layer features is realized to improve the ability of the model to identify complex features and optimize the operation efficiency. The experimental results on the peach leaf shrink disease recognition dataset show that the proposed algorithm achieves a significant improvement in performance compared with the original YOLOv8 algorithm. Specifically, mF1, mPrecision, mRecall, and mAP increased by 0.1075, 0.0723, 0.1224, and 0.1184, respectively, which provided strong technical support for intelligent and automatic monitoring of peach pests and diseases. Full article
Show Figures

Figure 1

Figure 1
<p>The overall architecture of the YOLOv8-SEPyro model.</p>
Full article ">Figure 2
<p>ReLU6 activation function diagram.</p>
Full article ">Figure 3
<p>SENet attention mechanism structure diagram.</p>
Full article ">Figure 4
<p>Spatial pyramid pooling structure diagram.</p>
Full article ">Figure 5
<p>Enhanced with Local Attention Network structure diagram.</p>
Full article ">Figure 6
<p>Fast pyramid pooling enhanced with Local Attention Networks.</p>
Full article ">Figure 7
<p>Datasets of peach leaf shrink disease.</p>
Full article ">Figure 7 Cont.
<p>Datasets of peach leaf shrink disease.</p>
Full article ">Figure 8
<p>Precision index results.</p>
Full article ">Figure 8 Cont.
<p>Precision index results.</p>
Full article ">Figure 9
<p>Recall index results.</p>
Full article ">Figure 10
<p>F1 score index results.</p>
Full article ">Figure 11
<p>AP index results.</p>
Full article ">Figure 12
<p>Point-line diagram of comparative experiments.</p>
Full article ">Figure 13
<p>Point-line diagram of ablation experiments.</p>
Full article ">Figure 14
<p>Detection effect diagram.</p>
Full article ">
24 pages, 7279 KiB  
Article
An Accurate Book Spine Detection Network Based on Improved Oriented R-CNN
by Haibo Ma, Chaobo Wang, Ang Li, Aide Xu and Dong Han
Sensors 2024, 24(24), 7996; https://doi.org/10.3390/s24247996 - 14 Dec 2024
Viewed by 360
Abstract
Book localization is crucial for the development of intelligent book inventory systems, where the high-precision detection of book spines is a critical requirement. However, the varying tilt angles and diverse aspect ratios of books on library shelves often reduce the effectiveness of conventional [...] Read more.
Book localization is crucial for the development of intelligent book inventory systems, where the high-precision detection of book spines is a critical requirement. However, the varying tilt angles and diverse aspect ratios of books on library shelves often reduce the effectiveness of conventional object detection algorithms. To address these challenges, this study proposes an enhanced oriented R-CNN algorithm for book spine detection. First, we replace the standard 3 × 3 convolutions in ResNet50’s residual blocks with deformable convolutions to enhance the network’s capacity for modeling the geometric deformations of book spines. Additionally, the PAFPN (Path Aggregation Feature Pyramid Network) was integrated into the neck structure to enhance multi-scale feature fusion. To further optimize the anchor box design, we introduce an adaptive initial cluster center selection method for K-median clustering. This allows for a more accurate computation of anchor box aspect ratios that are better aligned with the book spine dataset, enhancing the model’s training performance. We conducted comparison experiments between the proposed model and other state-of-the-art models on the book spine dataset, and the results demonstrate that the proposed approach reaches an mAP of 90.22%, which outperforms the baseline algorithm by 4.47 percentage points. Our method significantly improves detection accuracy, making it highly effective for identifying book spines in real-world library environments. Full article
Show Figures

Figure 1

Figure 1
<p>The architecture of the enhanced oriented R-CNN model.</p>
Full article ">Figure 2
<p>Schematic of deformable convolutional layer.</p>
Full article ">Figure 3
<p>Schematic of deformable residual block structure.</p>
Full article ">Figure 4
<p>Schematic of FPN structure.</p>
Full article ">Figure 5
<p>Schematic of PAFPN structure.</p>
Full article ">Figure 6
<p>Diagram of midpoint offset representation.</p>
Full article ">Figure 7
<p>Scatter density plot of the book spine detection dataset.</p>
Full article ">Figure 8
<p>Samples of book spine dataset: (<b>a</b>) near-vertical image; (<b>b</b>) tilted books; (<b>c</b>) low-angle shot image.</p>
Full article ">Figure 9
<p>Dataset distribution: (<b>a</b>) angle distribution histogram; (<b>b</b>) aspect ratio distribution histogram.</p>
Full article ">Figure 10
<p>Samples of on-shelf-books-recognition dataset.</p>
Full article ">Figure 11
<p>Illustration of data augmentation.</p>
Full article ">Figure 12
<p>Comparison of different clustering methods.</p>
Full article ">Figure 13
<p>K-median clustering result of the book spine dataset.</p>
Full article ">Figure 14
<p>The mAP curves of model training under different aspect ratios.</p>
Full article ">Figure 15
<p>Comparison of detection results on book spine dataset between (<b>a</b>) method 2 (baseline), (<b>b</b>) method 3 (baseline + DCN), (<b>c</b>) method 4 (baseline + PAFPN), and (<b>d</b>) method 5 (baseline + DCN + PAFPN). In each detection result, the green boxes represent the book spine, the orange boxes denote the book label, and the red dashed ovals and boxes highlight the detection issues.</p>
Full article ">Figure 16
<p>Comparison of model inference performance.</p>
Full article ">Figure 17
<p>Comparison of detection results on book spine dataset between (<b>a</b>) oriented R-CNN and (<b>b</b>) improved oriented R-CNN. In each detection result, the green boxes represent the book spine, the orange boxes denote the book label, and the red dashed ovals and boxes highlight the detection issues.</p>
Full article ">Figure 18
<p>Visualization results on on-shelf-books-recognition dataset using improved oriented R-CNN.</p>
Full article ">
23 pages, 3884 KiB  
Article
Cascaded Feature Fusion Grasping Network for Real-Time Robotic Systems
by Hao Li and Lixin Zheng
Sensors 2024, 24(24), 7958; https://doi.org/10.3390/s24247958 - 13 Dec 2024
Viewed by 434
Abstract
Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping [...] Read more.
Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping pose estimation. The network employs innovative structural designs, including depth-wise separable convolutions to reduce parameters and enhance computational efficiency; convolutional block attention modules to augment the model’s ability to focus on key features; multi-scale dilated convolution to expand the receptive field and capture multi-scale information; and bidirectional feature pyramid modules to achieve effective fusion and information flow of features at different levels. In tests on the Cornell dataset, our network achieved grasping pose prediction at a speed of 66.7 frames per second, with accuracy rates of 98.6% and 96.9% for image-wise and object-wise splits, respectively. The experimental results show that our method achieves high-speed processing while maintaining high accuracy. In real-world robotic grasping experiments, our method also proved to be effective, achieving an average grasping success rate of 95.6% on a robot equipped with parallel grippers. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>Grasping configuration representation.</p>
Full article ">Figure 2
<p>Illustration of the complete grasp representation and angle encoding pipeline. Left: input RGB-D images with grasp parameters annotated—grasp center (<span class="html-italic">u</span>, <span class="html-italic">v</span>), grasp angle (<math display="inline"><semantics> <mi>θ</mi> </semantics></math>), and grasp width (<span class="html-italic">w</span>). Middle: three parameterized grasp maps derived from the input—the grasp quality map <span class="html-italic">Q</span> (values from 0 to 1.0, indicating grasp success probability), grasp angle map <math display="inline"><semantics> <mo>Φ</mo> </semantics></math> (angle range <math display="inline"><semantics> <mrow> <mo>[</mo> <mo>−</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>π</mi> <mn>2</mn> </mfrac> </mstyle> <mo>,</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>π</mi> <mn>2</mn> </mfrac> </mstyle> <mo>]</mo> </mrow> </semantics></math>), and grasp width map W (in pixels). Right: angle encoding using trigonometric transformations—<math display="inline"><semantics> <mrow> <msub> <mo>Φ</mo> <mrow> <mi>c</mi> <mi>o</mi> <mi>s</mi> </mrow> </msub> <mo>=</mo> <mo form="prefix">cos</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>Φ</mo> <mo>)</mo> </mrow> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mo>Φ</mo> <mrow> <mi>s</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>=</mo> <mo form="prefix">sin</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>Φ</mo> <mo>)</mo> </mrow> </mrow> </semantics></math> to handle angle periodicity. The color scales indicate the range of values for each map—grasp quality (0–1.0), angles (<math display="inline"><semantics> <mrow> <mo>−</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>π</mi> <mn>2</mn> </mfrac> </mstyle> </mrow> </semantics></math> to <math display="inline"><semantics> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>π</mi> <mn>2</mn> </mfrac> </mstyle> </semantics></math>), and width (0–100 pixels).</p>
Full article ">Figure 3
<p>Network architecture of the Cascaded Feature Fusion Grasp Network (CFFGN).</p>
Full article ">Figure 4
<p>Grasp parameter calculation process. The network takes RGB-D data as input and outputs four values: <span class="html-italic">Q</span>, <math display="inline"><semantics> <mrow> <mo form="prefix">cos</mo> <mo>(</mo> <mn>2</mn> <mo>Φ</mo> <mo>)</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo form="prefix">sin</mo> <mo>(</mo> <mn>2</mn> <mo>Φ</mo> <mo>)</mo> </mrow> </semantics></math>, and <span class="html-italic">W</span>.</p>
Full article ">Figure 5
<p>Left: standard convolution with BN and Relu layers.Right: depth-wise separable convolution structure.</p>
Full article ">Figure 6
<p>Schematic diagram of the CBAM module. This module comprises two components: the channel attention module and the spatial attention module. The input features undergo sequential processing.</p>
Full article ">Figure 7
<p>Channel attention module in the CBAM. The input feature F with dimensions H × W × C undergoes global max pooling (MaxPool) and average pooling (AvgPool) operations, resulting in two feature descriptors of size 1 × 1 × C. These descriptors are then processed by a shared multi-layer perceptron (Shared MLP). The outputs are combined to generate the final channel attention map <math display="inline"><semantics> <msub> <mi>M</mi> <mi>c</mi> </msub> </semantics></math> with dimensions 1 × 1 × C.</p>
Full article ">Figure 8
<p>Spatial attention module in the CBAM. The channel-refined feature F′ with dimensions H′ × W′ × C undergoes max pooling and average pooling operations, resulting in features of size H′ × W′ × 1. These are then processed to generate the spatial attention map <math display="inline"><semantics> <msub> <mi>M</mi> <mi>s</mi> </msub> </semantics></math> with dimensions H′ × W′ × 1, which captures important spatial information in the input feature map.</p>
Full article ">Figure 9
<p>Structure of the Multi-scale Dilated Convolution Module (MCDM).</p>
Full article ">Figure 10
<p>BiFPN structure diagram. P3–P7: represent feature maps of different scales, from the shallow layer (P3) to the deep layer (P7). Red arrows: top-down path, fusing high-level semantic information to low-level features. Blue arrows: bottom-up path, propagating fine-grained information from low-level to high-level features. Purple arrows: same-level connections, integrating features from the same scale. Black arrows: flow paths of the initial features. The colored circles in the diagram represent feature maps at different scales. From P3 to P7, they indicate feature maps progressing from the shallow layer (P3) to the deep layer (P7).</p>
Full article ">Figure 11
<p>Architecture of the Baseline network. It consists of a 9 × 9 convolutional layer, followed by 5 × 5 and 2 × 2 max pooling layers, and then progressive upsampling layers.</p>
Full article ">Figure 12
<p>Experimental platform setup for robotic grasping. The platform integrates an EPSON C4-A901S six-axis robot arm equipped with an electric parallel gripper as the end-effector. A RealSense D415 depth camera is mounted overhead in an eye-to-hand configuration. The gripping area (marked with red dashed box) represents the workspace where objects are placed for grasping experiments. All key components are labeled for clarity.</p>
Full article ">Figure 13
<p>Sequential demonstration of a successful umbrella grasping experiment. Left: the robotic arm approaches the target umbrella based on the predicted optimal grasping pose. Center: the gripper aligns with the detected grasping point on the umbrella body and adjusts to the appropriate width. Right: the gripper successfully executes the grasp and lifts the umbrella, demonstrating the algorithm’s capability to identify and execute grasps on the main body structure rather than conventional grasping points like handles.</p>
Full article ">
21 pages, 3197 KiB  
Article
Infrared Aircraft Detection Algorithm Based on High-Resolution Feature-Enhanced Semantic Segmentation Network
by Gang Liu, Jiangtao Xi, Chao Ma and Huixiang Chen
Sensors 2024, 24(24), 7933; https://doi.org/10.3390/s24247933 - 11 Dec 2024
Viewed by 451
Abstract
In order to achieve infrared aircraft detection under interference conditions, this paper proposes an infrared aircraft detection algorithm based on high-resolution feature-enhanced semantic segmentation network. Firstly, the designed location attention mechanism is utilized to enhance the current-level feature map by obtaining correlation weights [...] Read more.
In order to achieve infrared aircraft detection under interference conditions, this paper proposes an infrared aircraft detection algorithm based on high-resolution feature-enhanced semantic segmentation network. Firstly, the designed location attention mechanism is utilized to enhance the current-level feature map by obtaining correlation weights between pixels at different positions. Then, it is fused with the high-level feature map rich in semantic features to construct a location attention feature fusion network, thereby enhancing the representation capability of target features. Secondly, based on the idea of using dilated convolutions to expand the receptive field of feature maps, a hybrid atrous spatial pyramid pooling module is designed. By utilizing a serial structure of dilated convolutions with small dilation rates, this module addresses the issue of feature information loss when expanding the receptive field through dilated spatial pyramid pooling. It captures the contextual information of the target, further enhancing the target features. Finally, a dice loss function is introduced to calculate the overlap between the predicted results and the ground truth labels, facilitating deep excavation of foreground information for comprehensive learning of samples. This paper constructs an infrared aircraft detection algorithm based on a high-resolution feature-enhanced semantic segmentation network which combines the location attention feature fusion network, the hybrid atrous spatial pyramid pooling module, the dice loss function, and a network that maintains the resolution of feature maps. Experiments conducted on a self-built infrared dataset show that the proposed algorithm achieves a mean intersection over union (mIoU) of 92.74%, a mean pixel accuracy (mPA) of 96.34%, and a mean recall (MR) of 96.19%, all of which outperform classic segmentation algorithms such as DeepLabv3+, Segformer, HRNetv2, and DDRNet. This demonstrates that the proposed algorithm can achieve effective detection of infrared aircraft in the presence of interference. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>HFSSNet architecture diagram.</p>
Full article ">Figure 2
<p>The structure of LAM.</p>
Full article ">Figure 3
<p>Location attention feature fusion network.</p>
Full article ">Figure 4
<p>Atrous spatial pyramid pooling.</p>
Full article ">Figure 5
<p>Dilated convolution kernel.</p>
Full article ">Figure 6
<p>Calculation diagram of HASPP serial structure.</p>
Full article ">Figure 7
<p>Hybrid atrous spatial pyramid pooling.</p>
Full article ">Figure 8
<p>Infrared aircraft image with simulated interference under sky background.</p>
Full article ">Figure 9
<p>Infrared aircraft image with simulated interference under ground background.</p>
Full article ">Figure 10
<p>Visual effect of location attention mechanism.</p>
Full article ">Figure 11
<p>The comparison between HRNetv2 and HRNetv2+LAFFN for segmentation results.</p>
Full article ">Figure 12
<p>The comparison between HRNetv2 and HRNetv2+HASPP for segmentation results.</p>
Full article ">Figure 13
<p>The comparison between HRNetv2 and HRNetv2+dice loss for segmentation results.</p>
Full article ">Figure 14
<p>The segmentation results of different algorithms.</p>
Full article ">
26 pages, 6713 KiB  
Article
Improved Field Obstacle Detection Algorithm Based on YOLOv8
by Xinying Zhou, Wenming Chen and Xinhua Wei
Agriculture 2024, 14(12), 2263; https://doi.org/10.3390/agriculture14122263 - 11 Dec 2024
Viewed by 523
Abstract
To satisfy the obstacle avoidance requirements of unmanned agricultural machinery during autonomous operation and address the challenge of rapid obstacle detection in complex field environments, an improved field obstacle detection model based on YOLOv8 was proposed. This model enabled the fast detection and [...] Read more.
To satisfy the obstacle avoidance requirements of unmanned agricultural machinery during autonomous operation and address the challenge of rapid obstacle detection in complex field environments, an improved field obstacle detection model based on YOLOv8 was proposed. This model enabled the fast detection and recognition of obstacles such as people, tractors, and electric power pylons in the field. This detection model was built upon the YOLOv8 architecture with three main improvements. First, to adapt to different tasks and complex environments in the field, improve the sensitivity of the detector to various target sizes and positions, and enhance detection accuracy, the CBAM (Convolutional Block Attention Module) was integrated into the backbone layer of the benchmark model. Secondly, a BiFPN (Bi-directional Feature Pyramid Network) architecture took the place of the original PANet to enhance the fusion of features across multiple scales, thereby increasing the model’s capacity to distinguish between the background and obstacles. Third, WIoU v3 (Wise Intersection over Union v3) optimized the target boundary loss function, assigning greater focus to medium-quality anchor boxes and enhancing the detector’s overall performance. A dataset comprising 5963 images of people, electric power pylons, telegraph poles, tractors, and harvesters in a farmland environment was constructed. The training set comprised 4771 images, while the validation and test sets each consisted of 596 images. The results from the experiments indicated that the enhanced model attained precision, recall, and average precision scores of 85.5%, 75.1%, and 82.5%, respectively, on the custom dataset. This reflected increases of 1.3, 1.2, and 1.9 percentage points when compared to the baseline YOLOv8 model. Furthermore, the model reached 52 detection frames per second, thereby significantly enhancing the detection performance for common obstacles in the field. The model enhanced by the previously mentioned techniques guarantees a high level of detection accuracy while meeting the criteria for real-time obstacle identification in unmanned agricultural equipment during fieldwork. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>YOLOv8 network structure.</p>
Full article ">Figure 2
<p>CBBI-YOLO network structure. We added a lime-green CBAM module between the C2f module and the SPPF module, replaced the original green Concat module with a green BiFPN module, and replaced the original CIoU loss function with the WIoU v3 loss function from Bbox Loss, while leaving the other modules un-changed.</p>
Full article ">Figure 3
<p>CBAM attention, echanism. The blue module represents the original feature map; the orange module represents the channel attention module; the purple module represents the spatial attention module. The original input feature map is directly producted with the channel attention feature map, and the processed feature map is directly producted with the spatial attention feature map to obtain the pink module, which is the final feature map.</p>
Full article ">Figure 4
<p>Channel attention module. The green module represents the input feature map, after average pooling and maximum pooling operations to obtain two feature maps (light pink and light purple modules), which are fed into the multilayer perceptron MLP to obtain two feature maps (pink and green fused module, purple and green fused module), and then after summation and activation function (purple Relu module) operations to obtain the channel attention feature map (module of purple and pink fusion).</p>
Full article ">Figure 5
<p>Spatial attention module. The green module represents the input feature map, after average pooling and maximum pooling operations to obtain two feature maps (pink and purple modules), these two feature maps after splicing and convolution operations (blue Conv module) obtained feature maps (white module) to activation function to obtain the spatial attention feature maps (white and grey fusion of the module).</p>
Full article ">Figure 6
<p>PANet structure: (<b>a</b>) FPN structure; (<b>b</b>) bottom-up structure. (<b>a</b>,<b>b</b>) together form the PANet structure. The red dashed line indicates that the information is passed from the bottom fea-ture map to the high-level feature map, which undergoes a large number of convolution operations; the green dashed line indicates that the bottom information is fused into the current layer and the previ-ous layer until the highest information is reached (from C2 to P2 and then to N2 until N5), this greatly reduces the number of convolution calculations.</p>
Full article ">Figure 7
<p>BiFPN structure. P3, P4, and P5 are the outputs of the backbone network, which undergoes two downsampling operations to obtain P6 and P7 after performing a convolution operation to adjust the channels to obtain the input <math display="inline"><semantics> <mrow> <msubsup> <mrow> <mi>P</mi> </mrow> <mrow> <mi>n</mi> </mrow> <mrow> <mi>i</mi> <mi>n</mi> </mrow> </msubsup> </mrow> </semantics></math> . The middle part is <math display="inline"><semantics> <mrow> <msubsup> <mrow> <mi>P</mi> </mrow> <mrow> <mi>n</mi> </mrow> <mrow> <mi>t</mi> <mi>d</mi> </mrow> </msubsup> </mrow> </semantics></math> in the following equation; <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>w</mi> </mrow> <mrow> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math> is the weighting factor. The right part is <math display="inline"><semantics> <mrow> <msubsup> <mrow> <mi>P</mi> </mrow> <mrow> <mi>n</mi> </mrow> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> </mrow> </msubsup> </mrow> </semantics></math>.</p>
Full article ">Figure 8
<p>Five examples of obstacles.</p>
Full article ">Figure 9
<p>The distribution of dataset labels. The left image represents the distribution of the center points of the target bounding box, and the horizontal (x) and vertical (y) axes represent the width and height normalized coordinates of the image, respectively. The right image represents the distribution of the width and height of the target bounding box, the horizontal axis represents the relative width of the target box, the vertical axis represents the relative height, and the dark-colored region is the location with higher fre-quency.</p>
Full article ">Figure 10
<p>Comparison of precision, recall, and mAP between YOLOv8 and CBBI-YOLO. (<span class="html-italic">X</span>-axis epochs denote the number of training epochs, which is a dimensionless quantity. <span class="html-italic">Y</span>-axis shows a percentage that usually ranges from 0 to 1. Due to the Early Stopping strategy, training stopped at around 150 epochs).</p>
Full article ">Figure 11
<p>Some test results from the field.</p>
Full article ">Figure 12
<p>Test results: (<b>a</b>) YOLO v8 model missed detection; (<b>b</b>) CBBI-YOLO model correctness detection.</p>
Full article ">Figure 13
<p>Test results: (<b>a</b>) YOLO v8 model confidence level; (<b>b</b>) CBBI-YOLO model confidence level.</p>
Full article ">Figure 14
<p>(<b>a</b>) Precision–confidence curve, (<b>b</b>) precision–recall curve, and (<b>c</b>) recall–confidence curve during CBBI-YOLO model training.</p>
Full article ">Figure 15
<p>Overview of precision, recall, and average precision during CBBI-YOLO model training. (<span class="html-italic">X</span>-axis epochs denotes the number of training epochs, which is a dimensionless quantity. <span class="html-italic">Y</span>-axis shows a percentage that usually ranges from 0 to 1. Due to the Early Stopping strategy, training stopped at around 150 epochs).</p>
Full article ">
17 pages, 1791 KiB  
Article
Apple Defect Detection in Complex Environments
by Wei Shan and Yurong Yue
Electronics 2024, 13(23), 4844; https://doi.org/10.3390/electronics13234844 - 9 Dec 2024
Viewed by 416
Abstract
Aiming at the problem of high false detection and missed detection rate of apple surface defects in complex environments, a new apple surface defect detection network: space-to-depth convolution-Multi-scale Empty Attention-Context Guided Feature Pyramid Network-You Only Look Once version 8 nano (SMC-YOLOv8n) is designed. [...] Read more.
Aiming at the problem of high false detection and missed detection rate of apple surface defects in complex environments, a new apple surface defect detection network: space-to-depth convolution-Multi-scale Empty Attention-Context Guided Feature Pyramid Network-You Only Look Once version 8 nano (SMC-YOLOv8n) is designed. Firstly, space-to-depth convolution (SPD-Conv) is introduced before each Faster Implementation of CSP Bottleneck with 2 convolutions (C2f) in the backbone network as a preprocessing step to improve the quality of input data. Secondly, the Bottleneck in C2f is removed in the neck, and Multi-scale Empty Attention (MSDA) is introduced to enhance the feature extraction ability. Finally, the Context Guided Feature Pyramid Network (CGFPN) is used to replace the Concat method of the neck for feature fusion, thereby improving the expression ability of the features. Compared with the YOLOv8n baseline network, mean Average Precision (mAP) 50 increased by 2.7% and 1.1%, respectively, and mAP50-95 increased by 4.1% and 2.7%, respectively, on the visible light apple surface defect data set and public data set in the self-made complex environments.The experimental results show that SMC-YOLOv8n shows higher efficiency in apple defect detection, which lays a solid foundation for intelligent picking and grading of apples. Full article
Show Figures

Figure 1

Figure 1
<p>ADDCE research work classification.</p>
Full article ">Figure 2
<p>The overall architecture of YOLOv8.</p>
Full article ">Figure 3
<p>SPD-Conv structure diagram.</p>
Full article ">Figure 4
<p>C2f structure diagram.</p>
Full article ">Figure 5
<p>MSDA structure diagram.</p>
Full article ">Figure 6
<p>C2f-MSDA structure diagram.</p>
Full article ">Figure 7
<p>SE Attention module.</p>
Full article ">Figure 8
<p>SE Attention structure diagram.</p>
Full article ">Figure 9
<p>Context guide feature pyramid network architecture diagram.</p>
Full article ">Figure 10
<p>SMC-YOLOV8n.</p>
Full article ">Figure 11
<p>Examples of some data sets.</p>
Full article ">Figure 12
<p>Training curve and test curve.</p>
Full article ">Figure 13
<p>Test set confusion matrix.</p>
Full article ">Figure 14
<p>Part of the apple detection map.</p>
Full article ">
17 pages, 9263 KiB  
Article
HHS-RT-DETR: A Method for the Detection of Citrus Greening Disease
by Yi Huangfu, Zhonghao Huang, Xiaogang Yang, Yunjian Zhang, Wenfeng Li, Jie Shi and Linlin Yang
Agronomy 2024, 14(12), 2900; https://doi.org/10.3390/agronomy14122900 - 4 Dec 2024
Viewed by 473
Abstract
Background: Given the severe economic burden that citrus greening disease imposes on fruit farmers and related industries, rapid and accurate disease detection is particularly crucial. This not only effectively curbs the spread of the disease, but also significantly reduces reliance on manual detection [...] Read more.
Background: Given the severe economic burden that citrus greening disease imposes on fruit farmers and related industries, rapid and accurate disease detection is particularly crucial. This not only effectively curbs the spread of the disease, but also significantly reduces reliance on manual detection within extensive citrus planting areas. Objective: In response to this challenge, and to address the issues posed by resource-constrained platforms and complex backgrounds, this paper designs and proposes a novel method for the recognition and localization of citrus greening disease, named the HHS-RT-DETR model. The goal of this model is to achieve precise detection and localization of the disease while maintaining efficiency. Methods: Based on the RT-DETR-r18 model, the following improvements are made: the HS-FPN (high-level screening-feature pyramid network) is used to improve the feature fusion and feature selection part of the RT-DETR model, and the filtered feature information is merged with the high-level features by filtering out the low-level features, so as to enhance the feature selection ability and multi-level feature fusion ability of the model. In the feature fusion and feature selection sections, the HWD (hybrid wavelet-directional filter banks) downsampling operator is introduced to prevent the loss of effective information in the channel and reduce the computational complexity of the model. Through using the ShapeIoU loss function to enable the model to focus on the shape and scale of the bounding box itself, the prediction of the bounding box of the model will be more accurate. Conclusions and Results: This study has successfully developed an improved HHS-RT-DETR model which exhibits efficiency and accuracy on resource-constrained platforms and offers significant advantages for the automatic detection of citrus greening disease. Experimental results show that the improved model, when compared to the RT-DETR-r18 baseline model, has achieved significant improvements in several key performance metrics: the precision increased by 7.9%, the frame rate increased by 4 frames per second (f/s), the recall rose by 9.9%, and the average accuracy also increased by 7.5%, while the number of model parameters reduced by 0.137×107. Moreover, the improved model has demonstrated outstanding robustness in detecting occluded leaves within complex backgrounds. This provides strong technical support for the early detection and timely control of citrus greening disease. Additionally, the improved model has showcased advanced detection capabilities on the PASCAL VOC dataset. Discussions: Future research plans include expanding the dataset to encompass a broader range of citrus species and different stages of citrus greening disease. In addition, the plans involve incorporating leaf images under various lighting conditions and different weather scenarios to enhance the model’s generalization capabilities, ensuring the accurate localization and identification of citrus greening disease in diverse complex environments. Lastly, the integration of the improved model into an unmanned aerial vehicle (UAV) system is envisioned to enable the real-time, regional-level precise localization of citrus greening disease. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Images of selected greening datasets (images of rock sugar oranges, Wokan oranges, and grapefruits in natural and simple backgrounds).</p>
Full article ">Figure 2
<p>Partial results of dataset expansion.</p>
Full article ">Figure 3
<p>HHS-RT-DETR model structure (the structural diagram of the improved model).</p>
Full article ">Figure 4
<p>Structure of feature selection module (feature selection network structure in HS-FPN network).</p>
Full article ">Figure 5
<p>Structure of SPFF feature fusion module (feature fusion network structure in HS-FPN network).</p>
Full article ">Figure 6
<p>ChannelAttention_HSFPN structure.</p>
Full article ">Figure 7
<p>Feature selection module (feature selection network module in ChannelAttention-HSFPN network).</p>
Full article ">Figure 8
<p>Feature fusion module (feature fusion network module in ChannelAttention-HSFPN network).</p>
Full article ">Figure 9
<p>HWD module structure.</p>
Full article ">Figure 10
<p>Comparison of detection effect between the RT-DETR-r18 model and HS-RT-DETR model. (<b>a</b>) on the left presents the detection results of the original RT-DETR-r18 model, while figure (<b>b</b>) on the right displays the outcomes of the enhanced HHS-RT-DETR model. A comparison between the two figures reveals that the area indicated by the yellow arrow was not detected by the original model, but it has been successfully identified in the improved model.</p>
Full article ">Figure 11
<p>HWD module compared with other modules to reduce the loss of context information (comparison of HWD downsampling method with max pooling, average pooling, and strided convolution methods).</p>
Full article ">Figure 12
<p>Comparison curves of different loss functions.</p>
Full article ">Figure 13
<p>Comparison of the thermal map effect between the original model and the improved model ((<b>a</b>) is the original image, (<b>b</b>) is the heatmap of object detection from the HHS-RT-DETR model, and (<b>c</b>) is the heatmap of object detection from the RT-DETR-r18 benchmark model).</p>
Full article ">Figure 14
<p>Comparison curves of different models.</p>
Full article ">
19 pages, 4163 KiB  
Article
Edge-Guided Feature Pyramid Networks: An Edge-Guided Model for Enhanced Small Target Detection
by Zimeng Liang and Hua Shen
Sensors 2024, 24(23), 7767; https://doi.org/10.3390/s24237767 - 4 Dec 2024
Viewed by 392
Abstract
Infrared small target detection technology has been widely applied in the defense sector, including applications such as precision targeting, alert systems, and naval monitoring. However, due to the small size of their targets and the extended imaging distance, accurately detecting drone targets in [...] Read more.
Infrared small target detection technology has been widely applied in the defense sector, including applications such as precision targeting, alert systems, and naval monitoring. However, due to the small size of their targets and the extended imaging distance, accurately detecting drone targets in complex infrared environments remains a considerable challenge. Detecting drone targets accurately in complex infrared environments poses a substantial challenge. This paper introduces a novel model that integrates edge characteristics with multi-scale feature fusion, named Edge-Guided Feature Pyramid Networks (EG-FPNs). This model aims to capture deep image features while simultaneously emphasizing edge characteristics. The goal is to resolve the problem of missing target information that occurs when Feature Pyramid Networks (FPNs) perform continuous down-sampling to obtain deeper semantic features. Firstly, an improved residual block structure is proposed, integrating multi-scale convolutional feature extraction and inter-channel attention mechanisms, with significant features being emphasized through channel recalibration. Then, a layered feature fusion module is introduced to strengthen the shallow details in images while fusing multi-scale image features, thereby strengthening the shallow edge features. Finally, an edge self-fusion module is proposed to enhance the model’s depiction of image features by extracting edge information and integrating it with multi-scale features. We conducted comparative experiments on multiple datasets using the proposed algorithm and existing advanced methods. The results show improvements in the IoU, nIoU, and F1 metrics, while also showcasing the lightweight nature of EG-FPNs, confirming that they are more suitable for drone detection in resource-constrained infrared scenarios. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

Figure 1
<p>Structure of the Edge-Guided Feature Pyramid Network (EG-FPN).</p>
Full article ">Figure 2
<p>Structure of the Residual Block.</p>
Full article ">Figure 3
<p>Structure of Adaptive Pyramid Feature Block (APF Block).</p>
Full article ">Figure 4
<p>Structure of Hierarchical Multi-Feature Fusion Block (HMF Block).</p>
Full article ">Figure 5
<p>Structure of Edge-Aware Residual Block (EAR Block).</p>
Full article ">Figure 6
<p>ROC curves of different methods on different datasets. (<b>a</b>) NAUU-SIRST; (<b>b</b>) NAUU-SIRST-V2 (<b>c</b>) MSISTD; (<b>d</b>) MDvsFA_cGAN.</p>
Full article ">Figure 6 Cont.
<p>ROC curves of different methods on different datasets. (<b>a</b>) NAUU-SIRST; (<b>b</b>) NAUU-SIRST-V2 (<b>c</b>) MSISTD; (<b>d</b>) MDvsFA_cGAN.</p>
Full article ">Figure 7
<p>Visual comparison of various methods on the NAUU-SIRST and MSISTD datasets.</p>
Full article ">Figure 8
<p>Three-dimensional visual comparison of various methods on the NAUU-SIRST and MSISTD datasets.</p>
Full article ">
17 pages, 3796 KiB  
Article
FastQAFPN-YOLOv8s-Based Method for Rapid and Lightweight Detection of Walnut Unseparated Material
by Junqiu Li, Jiayi Wang, Dexiao Kong, Qinghui Zhang and Zhenping Qiang
J. Imaging 2024, 10(12), 309; https://doi.org/10.3390/jimaging10120309 - 2 Dec 2024
Viewed by 560
Abstract
Walnuts possess significant nutritional and economic value. Fast and accurate sorting of shells and kernels will enhance the efficiency of automated production. Therefore, we propose a FastQAFPN-YOLOv8s object detection network to achieve rapid and precise detection of unsorted materials. The method uses lightweight [...] Read more.
Walnuts possess significant nutritional and economic value. Fast and accurate sorting of shells and kernels will enhance the efficiency of automated production. Therefore, we propose a FastQAFPN-YOLOv8s object detection network to achieve rapid and precise detection of unsorted materials. The method uses lightweight Pconv (Partial Convolution) operators to build the FasterNextBlock structure, which serves as the backbone feature extractor for the Fasternet feature extraction network. The ECIoU loss function, combining EIoU (Efficient-IoU) and CIoU (Complete-IoU), speeds up the adjustment of the prediction frame and the network regression. In the Neck section of the network, the QAFPN feature fusion extraction network is proposed to replace the PAN-FPN (Path Aggregation Network—Feature Pyramid Network) in YOLOv8s with a Rep-PAN structure based on the QARepNext reparameterization framework for feature fusion extraction to strike a balance between network performance and inference speed. To validate the method, we built a three-axis mobile sorting device and created a dataset of 3000 images of walnuts after shell removal for experiments. The results show that the improved network contains 6071008 parameters, a training time of 2.49 h, a model size of 12.3 MB, an mAP (Mean Average Precision) of 94.5%, and a frame rate of 52.1 FPS. Compared with the original model, the number of parameters decreased by 45.5%, with training time reduced by 32.7%, the model size shrunk by 45.3%, and frame rate improved by 40.8%. However, some accuracy is sacrificed due to the lightweight design, resulting in a 1.2% decrease in mAP. The network reduces the model size by 59.7 MB and 23.9 MB compared to YOLOv7 and YOLOv6, respectively, and improves the frame rate by 15.67 fps and 22.55 fps, respectively. The average confidence and mAP show minimal changes compared to YOLOv7 and improved by 4.2% and 2.4% compared to YOLOv6, respectively. The FastQAFPN-YOLOv8s detection method effectively reduces model size while maintaining recognition accuracy. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Sample images of the dataset.</p>
Full article ">Figure 2
<p>Experimental platform preparation. (<b>a</b>) Experimental platform (<b>b</b>) Field of view of the camera.</p>
Full article ">Figure 3
<p>YOLOv8s improvement process diagram.</p>
Full article ">Figure 4
<p>YOLOv8s specific improvement layer.</p>
Full article ">Figure 5
<p>FasterNext structure construction.</p>
Full article ">Figure 6
<p>QAFPN structure construction.</p>
Full article ">Figure 7
<p>Comparison of different loss functions.</p>
Full article ">Figure 8
<p>Graph of recognition results of different models.</p>
Full article ">
21 pages, 4627 KiB  
Article
CFF-Net: Cross-Hierarchy Feature Fusion Network Based on Composite Dual-Channel Encoder for Surface Defect Segmentation
by Ke’er Qian, Xiaokang Ding, Xiaoliang Jiang, Yingyu Ji and Ling Dong
Electronics 2024, 13(23), 4714; https://doi.org/10.3390/electronics13234714 - 28 Nov 2024
Viewed by 399
Abstract
In industries spanning manufacturing to software development, defect segmentation is essential for maintaining high standards of product quality and reliability. However, traditional segmentation methods often struggle to accurately identify defects due to challenges like noise interference, occlusion, and feature overlap. To solve these [...] Read more.
In industries spanning manufacturing to software development, defect segmentation is essential for maintaining high standards of product quality and reliability. However, traditional segmentation methods often struggle to accurately identify defects due to challenges like noise interference, occlusion, and feature overlap. To solve these problems, we propose a cross-hierarchy feature fusion network based on a composite dual-channel encoder for surface defect segmentation, called CFF-Net. Specifically, in the encoder of CFF-Net, we design a composite dual-channel module (CDCM), which combines standard convolution with dilated convolution and adopts a dual-path parallel structure to enhance the model’s capability in feature extraction. Then, a dilated residual pyramid module (DRPM) is integrated at the junction of the encoder and decoder, which utilizes the expansion convolution of different expansion rates to effectively capture multi-scale context information. In the final output phase, we introduce a cross-hierarchy feature fusion strategy (CFFS) that combines outputs from different layers or stages, thereby improving the robustness and generalization of the network. Finally, we conducted comparative experiments to evaluate CFF-Net against several mainstream segmentation networks across three distinct datasets: a publicly available Crack500 dataset, a self-built Bearing dataset, and another publicly available SD-saliency-900 dataset. The results demonstrated that CFF-Net consistently outperformed competing methods in segmentation tasks. Specifically, in the Crack500 dataset, CFF-Net achieved notable performance metrics, including an Mcc of 73.36%, Dice coefficient of 74.34%, and Jaccard index of 59.53%. For the Bearing dataset, it recorded an Mcc of 76.97%, Dice coefficient of 77.04%, and Jaccard index of 63.28%. Similarly, in the SD-saliency-900 dataset, CFF-Net achieved an Mcc of 84.08%, Dice coefficient of 85.82%, and Jaccard index of 75.67%. These results underscore CFF-Net’s effectiveness and reliability in handling diverse segmentation challenges across different datasets. Full article
Show Figures

Figure 1

Figure 1
<p>Visual illustrations of defects on the Crack500 dataset and corresponding annotations.</p>
Full article ">Figure 2
<p>Visual illustrations of defects on the Bearing dataset and corresponding annotations.</p>
Full article ">Figure 3
<p>Visual illustrations of defects on the SD-Saliency-900 dataset and corresponding annotations.</p>
Full article ">Figure 4
<p>Overall architecture of CFF-Net.</p>
Full article ">Figure 5
<p>Structure of original selective kernel module.</p>
Full article ">Figure 6
<p>Structure of composite dual-channel module.</p>
Full article ">Figure 7
<p>Comparison experiment of dilatational convolution. The first row: consecutive convolutions with different expansions (rate = 1, 2, 4). The second row: consecutive convolution with the same expansion (rate = 2).</p>
Full article ">Figure 8
<p>Structure of dilated residual pyramid module.</p>
Full article ">Figure 9
<p>The loss and accuracy curves for the validation phases of all methods. The first column illustrates the results obtained on the Crack500 dataset, the second column displays the corresponding results for the Bearing dataset, and the last column shows the results for the SD-saliency-900 dataset.</p>
Full article ">Figure 10
<p>Visual comparison of different methods on the Crack500 dataset.</p>
Full article ">Figure 11
<p>Visual comparison of different methods on the Bearing dataset.</p>
Full article ">Figure 12
<p>Visual comparison of different methods on the SD-saliency-900 dataset.</p>
Full article ">Figure 13
<p>Ablation experiments on the Crack500 dataset.</p>
Full article ">Figure 14
<p>Bar chart of ablation experiment.</p>
Full article ">Figure 15
<p>Examples of poor segmentation on the Crack500 dataset.</p>
Full article ">Figure 16
<p>Examples of poor segmentation on the Bearing dataset.</p>
Full article ">Figure 17
<p>Examples of poor segmentation on the SD-saliency-900 dataset.</p>
Full article ">
13 pages, 3314 KiB  
Article
Research on Defect Detection for Overhead Transmission Lines Based on the ABG-YOLOv8n Model
by Yang Yu, Hongfang Lv, Wei Chen and Yi Wang
Energies 2024, 17(23), 5974; https://doi.org/10.3390/en17235974 - 27 Nov 2024
Viewed by 429
Abstract
In the field of smart grid monitoring, real-time defect detection for overhead transmission lines is crucial for ensuring the safety and stability of power systems. This paper proposes a defect detection model for overhead transmission lines based on an improved YOLOv8n model, named [...] Read more.
In the field of smart grid monitoring, real-time defect detection for overhead transmission lines is crucial for ensuring the safety and stability of power systems. This paper proposes a defect detection model for overhead transmission lines based on an improved YOLOv8n model, named ABG-YOLOv8n. The model incorporates four key improvements: Lightweight convolutional neural networks and spatial–channel reconstructed convolutional modules are integrated into the backbone network and feature fusion network, respectively. A bidirectional feature pyramid network is employed to achieve multi-scale feature fusion, and the ASFF mechanism is used to enhance the sensitivity of YOLOv8n’s detection head. Finally, comprehensive comparative experiments were conducted with multiple models to validate the effectiveness of the proposed method based on the obtained prediction curves and various performance metrics. The validation results indicate that the proposed ABG-YOLOv8n model achieves a 4.5% improvement in mean average precision compared to the original YOLOv8n model, with corresponding increases of 3.6% in accuracy and 2.0% in recall. Additionally, the ABG-YOLOv8n model demonstrates superior detection performance compared to other enhanced YOLO models. Full article
(This article belongs to the Section F: Electrical Engineering)
Show Figures

Figure 1

Figure 1
<p>YOLOv8n model network structure diagram.</p>
Full article ">Figure 2
<p>ABG-YOLOv8n model network structure diagram.</p>
Full article ">Figure 3
<p>BiFPN network structure diagram.</p>
Full article ">Figure 4
<p>SCConv structure diagram.</p>
Full article ">Figure 5
<p>ASFF working principle diagram.</p>
Full article ">Figure 6
<p>Data labeling and slicing format.</p>
Full article ">Figure 7
<p>(<b>a</b>) Distribution of label quantities in the dataset. (<b>b</b>) Distribution of label sizes.</p>
Full article ">Figure 8
<p>Confusion matrix.</p>
Full article ">Figure 9
<p>F1-Confidence curves of the YOLOv8n model (<b>a</b>) and the ABG-YOLOv8n model (<b>b</b>).</p>
Full article ">Figure 10
<p>Comparison of the effects of the YOLOv8n model (<b>a</b>) and the ABG-YOLOv8n model (<b>b</b>).</p>
Full article ">
24 pages, 108807 KiB  
Article
SMEA-YOLOv8n: A Sheep Facial Expression Recognition Method Based on an Improved YOLOv8n Model
by Wenbo Yu, Xiang Yang, Yongqi Liu, Chuanzhong Xuan, Ruoya Xie and Chuanjiu Wang
Animals 2024, 14(23), 3415; https://doi.org/10.3390/ani14233415 - 26 Nov 2024
Viewed by 376
Abstract
Sheep facial expressions are valuable indicators of their pain levels, playing a critical role in monitoring their health and welfare. In response to challenges such as missed detections, false positives, and low recognition accuracy in sheep facial expression recognition, this paper introduces an [...] Read more.
Sheep facial expressions are valuable indicators of their pain levels, playing a critical role in monitoring their health and welfare. In response to challenges such as missed detections, false positives, and low recognition accuracy in sheep facial expression recognition, this paper introduces an enhanced algorithm based on YOLOv8n, referred to as SimAM-MobileViTAttention-EfficiCIoU-AA2_SPPF-YOLOv8n (SMEA-YOLOv8n). Firstly, the proposed method integrates the parameter-free Similarity-Aware Attention Mechanism (SimAM) and MobileViTAttention modules into the CSP Bottleneck with 2 Convolutions(C2f) module of the neck network, aiming to enhance the model’s feature representation and fusion capabilities in complex environments while mitigating the interference of irrelevant background features. Additionally, the EfficiCIoU loss function replaces the original Complete IoU(CIoU) loss function, thereby improving bounding box localization accuracy and accelerating model convergence. Furthermore, the Spatial Pyramid Pooling-Fast (SPPF) module in the backbone network is refined with the addition of two global average pooling layers, strengthening the extraction of sheep facial expression features and bolstering the model’s core feature fusion capacity. Experimental results reveal that the proposed method achieves a [email protected] of 92.5%, a Recall of 91%, a Precision of 86%, and an F1-score of 88.0%, reflecting improvements of 4.5%, 9.1%, 2.8%, and 6.0%, respectively, compared to the baseline model. Notably, the [email protected] for normal and abnormal sheep facial expressions increased by 3.7% and 5.3%, respectively, demonstrating the method’s effectiveness in enhancing recognition accuracy under complex environmental conditions. Full article
(This article belongs to the Section Small Ruminants)
Show Figures

Figure 1

Figure 1
<p>Images of Eyes, Ears, and Nose compliant with SPFES.</p>
Full article ">Figure 2
<p>A selection of sheep facial expression images (<b>a</b>) for normal (<b>b</b>) for abnormal.</p>
Full article ">Figure 3
<p>Augmentation techniques for abnormal sheep facial expression image data.</p>
Full article ">Figure 4
<p>Annotation of sheep facial expression dataset.</p>
Full article ">Figure 5
<p>Architecture of the SMEA-YOLOv8n model for sheep facial expression recognition.</p>
Full article ">Figure 6
<p>Diagram of the YOLOv8n model architecture.</p>
Full article ">Figure 7
<p>Architecture of the SimAM Attention module.</p>
Full article ">Figure 8
<p>Architecture of the MobileViTAttention module.</p>
Full article ">Figure 9
<p>Architecture of the improved SPPF module: (<b>a</b>) SPPF structural diagram. (<b>b</b>) AA2_SPPF structural diagram.</p>
Full article ">Figure 10
<p>Comparative mAP@0.5 evaluation of enhanced sheep facial expression models.</p>
Full article ">Figure 11
<p>Comparison of P-R curves before and after model enhancement. (<b>a</b>) P-R curve in YOLOv8n. (<b>b</b>) P-R curve in SMEA-YOLOv8n.</p>
Full article ">Figure 12
<p>Comparison of F1-score curves before and after model enhancement. (<b>a</b>) F1-score curve in YOLOv8n. (<b>b</b>) F1-score curve in SMEA-YOLOv8n.</p>
Full article ">Figure 13
<p>Visual comparison of sheep facial expression recognition pre- and post-YOLOv8n model enhancements.</p>
Full article ">Figure 14
<p>Comparative analysis of mAP@0.5, Precision, and Recall across various YOLOv8n model enhancements.</p>
Full article ">
22 pages, 96008 KiB  
Article
HSD2Former: Hybrid-Scale Dual-Domain Transformer with Crisscrossed Interaction for Hyperspectral Image Classification
by Binxin Luo, Meihui Li, Yuxing Wei, Haorui Zuo, Jianlin Zhang and Dongxu Liu
Remote Sens. 2024, 16(23), 4411; https://doi.org/10.3390/rs16234411 - 25 Nov 2024
Viewed by 426
Abstract
An unescapable trend of hyperspectral image (HSI) has been toward classification with high accuracy and splendid performance. In recent years, Transformers have made remarkable progress in the HSI classification task. However, Transformer-based methods still encounter two main challenges. First, they concentrate on extracting [...] Read more.
An unescapable trend of hyperspectral image (HSI) has been toward classification with high accuracy and splendid performance. In recent years, Transformers have made remarkable progress in the HSI classification task. However, Transformer-based methods still encounter two main challenges. First, they concentrate on extracting spectral information and are incapable of using spatial information to a great extent. Second, they lack the utilization of multiscale features and do not sufficiently combine the advantages of the Transformer’s global feature extraction and multiscale feature extraction. To tackle these challenges, this article proposes a new solution named the hybrid-scale dual-domain Transformer with crisscrossed interaction (HSD2Former) for HSI classification. HSD2Former consists of three functional modules: dual-dimension multiscale convolutional embedding (D2MSCE), mixed domainFormer (MDFormer), and pyramid scale fusion block (PSFB). D2MSCE supersedes conventional patch embedding to generate spectral and spatial tokens at different scales, effectively enriching the diversity of spectral-spatial features. MDFormer is designed to facilitate self-enhancement and information interaction between the spectral domain and spatial domain, alleviating the heterogeneity of the spatial domain and spectral domain. PSFB introduces a straightforward fusion manner to achieve advanced semantic information for classification. Extensive experiments conducted on four datasets demonstrate the robustness and significance of HSD2Former. The classification evaluation indicators of OA, AA, and Kappa on four datasets almost exceed 98%, reaching state-of-the-art performance. Full article
Show Figures

Figure 1

Figure 1
<p>Overall architecture of hybrid-scale dual-domain Transformer with crisscrossed interaction (HSD<sup>2</sup>Former).</p>
Full article ">Figure 2
<p>Structure of dual-dimension multiscale convolutional embedding (D<sup>2</sup>MSCE). (<b>a</b>) spectral multiscale convolutional embedding (SeMSCE), (<b>b</b>) spatial multiscale convolutional embedding (SaMSCE).</p>
Full article ">Figure 3
<p>Structure of mixed domainFormer (MDFormer).</p>
Full article ">Figure 4
<p>Structure of pyramid scale fusion block (PSFB).</p>
Full article ">Figure 5
<p>Visual comparison results on the S-A dataset.</p>
Full article ">Figure 6
<p>Visual comparison results on the UP dataset.</p>
Full article ">Figure 7
<p>Visual comparison results on the HU dataset.</p>
Full article ">Figure 8
<p>Visual comparison results on the IP dataset.</p>
Full article ">Figure 9
<p>t-SNE on the S-A dataset.</p>
Full article ">Figure 10
<p>t-SNE on the UP dataset.</p>
Full article ">Figure 11
<p>t-SNE on the HU dataset.</p>
Full article ">Figure 12
<p>t-SNE on the IP dataset.</p>
Full article ">Figure 13
<p>Sensitivity of principal component number on four datasets: (<b>a</b>) on the S-A dataset, (<b>b</b>) on the UP dataset, (<b>c</b>) on the HU dataset, (<b>d</b>) on the IP dataset.</p>
Full article ">Figure 14
<p>Sensitivity of space size on four datasets: (<b>a</b>) on the S-A dataset, (<b>b</b>) on the UP dataset, (<b>c</b>) on the HU dataset, (<b>d</b>) on the IP dataset.</p>
Full article ">Figure 15
<p>Sensitivity of convolutional kernel number on four datasets: (<b>a</b>) on the S-A dataset, (<b>b</b>) on the UP dataset, (<b>c</b>) on the HU dataset, (<b>d</b>) on the IP dataset.</p>
Full article ">Figure 16
<p>Sensitivity of multihead number on four datasets: (<b>a</b>) on the S-A dataset, (<b>b</b>) on the UP dataset, (<b>c</b>) on the HU dataset, (<b>d</b>) on the IP dataset.</p>
Full article ">Figure 17
<p>Sensitivity of pooling stride on four datasets: (<b>a</b>) on the S-A dataset, (<b>b</b>) on the UP dataset, (<b>c</b>) on the HU dataset, (<b>d</b>) on the IP dataset.</p>
Full article ">Figure 18
<p>Sensitivity of training sample ratio on four datasets: (<b>a</b>) on the S-A dataset, (<b>b</b>) on the UP dataset, (<b>c</b>) on the HU dataset, (<b>d</b>) on the IP dataset.</p>
Full article ">
Back to TopTop