[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,690)

Search Parameters:
Keywords = YOLOv3

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 14493 KiB  
Article
WED-YOLO: A Detection Model for Safflower Under Complex Unstructured Environment
by Zhenguo Zhang, Yunze Wang, Peng Xu, Ruimeng Shi, Zhenyu Xing and Junye Li
Agriculture 2025, 15(2), 205; https://doi.org/10.3390/agriculture15020205 (registering DOI) - 18 Jan 2025
Viewed by 77
Abstract
Accurate safflower recognition is a critical research challenge in the field of automated safflower harvesting. The growing environment of safflowers, including factors such as variable weather conditions in unstructured environments, shooting distances, and diverse morphological characteristics, presents significant difficulties for detection. To address [...] Read more.
Accurate safflower recognition is a critical research challenge in the field of automated safflower harvesting. The growing environment of safflowers, including factors such as variable weather conditions in unstructured environments, shooting distances, and diverse morphological characteristics, presents significant difficulties for detection. To address these challenges and enable precise safflower target recognition in complex environments, this study proposes an improved safflower detection model, WED-YOLO, based on YOLOv8n. Firstly, the original bounding box loss function is replaced with the dynamic non-monotonic focusing mechanism Wise Intersection over Union (WIoU), which enhances the model’s bounding box fitting ability and accelerates network convergence. Then, the upsampling module in the network’s neck is substituted with the more efficient and versatile dynamic upsampling module, DySample, to improve the precision of feature map upsampling. Meanwhile, the EMA attention mechanism is integrated into the C2f module of the backbone network to strengthen the model’s feature extraction capabilities. Finally, a small-target detection layer is incorporated into the detection head, enabling the model to focus on small safflower targets. The model is trained and validated using a custom-built safflower dataset. The experimental results demonstrate that the improved model achieves Precision (P), Recall (R), mean Average Precision (mAP), and F1 score values of 93.15%, 86.71%, 95.03%, and 89.64%, respectively. These results represent improvements of 2.9%, 6.69%, 4.5%, and 6.22% over the baseline model. Compared with Faster R-CNN, YOLOv5, YOLOv7, and YOLOv10, the WED-YOLO achieved the highest mAP value. It outperforms the module mentioned by 13.06%, 4.85%, 4.86%, and 4.82%, respectively. The enhanced model exhibits superior precision and lower miss detection rates in safflower recognition tasks, providing a robust algorithmic foundation for the intelligent harvesting of safflowers. Full article
25 pages, 3461 KiB  
Article
Side-Scan Sonar Small Objects Detection Based on Improved YOLOv11
by Chang Zou, Siquan Yu, Yankai Yu, Haitao Gu and Xinlin Xu
J. Mar. Sci. Eng. 2025, 13(1), 162; https://doi.org/10.3390/jmse13010162 (registering DOI) - 18 Jan 2025
Viewed by 95
Abstract
Underwater object detection using side-scan sonar (SSS) remains a significant challenge in marine exploration, especially for small objects. Conventional methods for small object detection face various obstacles, such as difficulties in feature extraction and the considerable impact of noise on detection accuracy. To [...] Read more.
Underwater object detection using side-scan sonar (SSS) remains a significant challenge in marine exploration, especially for small objects. Conventional methods for small object detection face various obstacles, such as difficulties in feature extraction and the considerable impact of noise on detection accuracy. To address these issues, this study proposes an improved YOLOv11 network named YOLOv11-SDC. Specifically,a new Sparse Feature (SF) module is proposed, replacing the Spatial Pyramid Pooling Fast (SPPF) module from the original YOLOv11 architecture to enhance object feature selection. Furthermore, the proposed YOLOv11-SDC integrates a Dilated Reparam Block (DRB) with a C3k2 module to broaden the model’s receptive field. A Content-Guided Attention Fusion (CGAF) module is also incorporated prior to the detection module to assign appropriate weights to various feature maps, thereby emphasizing the relevant object information. Experimental results clearly demonstrate the superiority of YOLOv11-SDC over several iterations of YOLO versions in detection performance. The proposed method was validated through extensive real-world experiments, yielding a precision of 0.934, recall of 0.698, [email protected] of 0.825, and [email protected]:0.95 of 0.598. In conclusion, the improved YOLOv11-SDC offers a promising solution for detecting small objects in SSS images, showing substantial potential for marine applications. Full article
(This article belongs to the Special Issue Artificial Intelligence Applications in Underwater Sonar Images)
21 pages, 4678 KiB  
Article
TBF-YOLOv8n: A Lightweight Tea Bud Detection Model Based on YOLOv8n Improvements
by Wenhui Fang and Weizhen Chen
Sensors 2025, 25(2), 547; https://doi.org/10.3390/s25020547 (registering DOI) - 18 Jan 2025
Viewed by 111
Abstract
Tea bud localization detection not only ensures tea quality, improves picking efficiency, and advances intelligent harvesting, but also fosters tea industry upgrades and enhances economic benefits. To solve the problem of the high computational complexity of deep learning detection models, we developed the [...] Read more.
Tea bud localization detection not only ensures tea quality, improves picking efficiency, and advances intelligent harvesting, but also fosters tea industry upgrades and enhances economic benefits. To solve the problem of the high computational complexity of deep learning detection models, we developed the Tea Bud DSCF-YOLOv8n (TBF-YOLOv8n)lightweight detection model. Improvement of the Cross Stage Partial Bottleneck Module with Two Convolutions(C2f) module via efficient Distributed Shift Convolution (DSConv) yields the C2f module with DSConv(DSCf)module, which reduces the model’s size. Additionally, the coordinate attention (CA) mechanism is incorporated to mitigate interference from irrelevant factors, thereby improving mean accuracy. Furthermore, the SIOU_Loss (SCYLLA-IOU_Loss) function and the Dynamic Sample(DySample)up-sampling operator are implemented to accelerate convergence and enhance both average precision and detection accuracy. The experimental results show that compared to the YOLOv8n model, the TBF-YOLOv8n model has a 3.7% increase in accuracy, a 1.1% increase in average accuracy, a 44.4% reduction in gigabit floating point operations (GFLOPs), and a 13.4% reduction in the total number of parameters included in the model. In comparison experiments with a variety of lightweight detection models, the TBF-YOLOv8n still performs well in terms of detection accuracy while remaining more lightweight. In conclusion, the TBF-YOLOv8n model achieves a commendable balance between efficiency and precision, offering valuable insights for advancing intelligent tea bud harvesting technologies. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>YOLOv8 framework diagram.</p>
Full article ">Figure 2
<p>Structure of the TBF-YOLOv8n model. The red dashed box shows the improvement.</p>
Full article ">Figure 3
<p>Structure of DSConv. KDS is the kernel distribution shift; CDS is the channel distribution shift; ◎ is the Hadamard operation; the CDS is <math display="inline"><semantics> <mo>ϕ</mo> </semantics></math>; Original represents the original tensor; k is the width and height of the kernel; BLK stands for block size hyperparameter.</p>
Full article ">Figure 4
<p>CA structure diagram. “X Avg Pool” and “Y Avg Pool” signify one-dimensional horizontal global average pooling and one-place vertical global average pooling.</p>
Full article ">Figure 5
<p>DySample design; <math display="inline"><semantics> <mo>χ</mo> </semantics></math>, O, g, and <math display="inline"><semantics> <mrow> <mo>δ</mo> </mrow> </semantics></math> correspond to the input features, upsampled features, offsets, original network, sampling set, and Sigmoid function, respectively. (<b>A</b>) The sampling set S is generated by the sampling point generator, and this process is represented by Equation (3); (<b>B</b>) denotes the sampling point generator, and the sampling set S consists of the offsets and the original network g Equation (5). There are two different ways of generating the sample set S, which involve two types of offsets: one is static range factor Equation and the other is dynamic range factor Equation (7).</p>
Full article ">Figure 6
<p>SIOU loss function.</p>
Full article ">Figure 7
<p>Data preprocessing. (<b>a</b>) original image; (<b>b</b>) random rotation; (<b>c</b>) brightness decrease; (<b>d</b>) brightness increase; (<b>e</b>) horizontal flip; (<b>f</b>) vertical flip.</p>
Full article ">Figure 8
<p>The training process of TBF-YOLOv8n and YOLOv8n.</p>
Full article ">Figure 9
<p>Results of the five model tests. (<b>a</b>) YOLOv7-tiny; (<b>b</b>) YOLOv8n; (<b>c</b>) YOLOv9-tiny; (<b>d</b>) YOLOv10n; (<b>e</b>) YOLO11; (<b>f</b>) YOLOv5_tea; (<b>g</b>)YOLOv8_tea; (<b>h</b>)TBF-YOLOv8n. Red boxes represent correct detections, blue boxes represent incorrect detections, and yellow boxes represent not detected.</p>
Full article ">
16 pages, 3773 KiB  
Article
MDA-DETR:Enhancing Offending Animal Detection with Multi-Channel Attention and Multi-Scale Feature Aggregation
by Haiyan Zhang, Huiqi Li, Guodong Sun and Feng Yang
Animals 2025, 15(2), 259; https://doi.org/10.3390/ani15020259 - 17 Jan 2025
Viewed by 174
Abstract
Conflicts between humans and animals in agricultural and settlement areas have recently increased, resulting in significant resource loss and risks to human and animal lives. This growing issue presents a global challenge. This paper addresses the detection and identification of offending animals, particularly [...] Read more.
Conflicts between humans and animals in agricultural and settlement areas have recently increased, resulting in significant resource loss and risks to human and animal lives. This growing issue presents a global challenge. This paper addresses the detection and identification of offending animals, particularly in obscured or blurry nighttime images. This article introduces Multi-Channel Coordinated Attention and Multi-Dimension Feature Aggregation (MDA-DETR). It integrates multi-scale features for enhanced detection accuracy, employing a Multi-Channel Coordinated Attention (MCCA) mechanism to incorporate location, semantic, and long-range dependency information and a Multi-Dimension Feature Aggregation Module (DFAM) for cross-scale feature aggregation. Additionally, the VariFocal Loss function is utilized to assign pixel weights, enhancing detail focus and maintaining accuracy. In the dataset section, this article uses a dataset from the Northeast China Tiger and Leopard National Park, which includes images of six common offending animal species. In the comprehensive experiments on the dataset, the mAP50 index of MDA-DETR was 1.3%, 0.6%, 0.3%, 3%, 1.1%, and 0.5% higher than RT-DETR-r18, yolov8n, yolov9-C, DETR, Deformable-detr, and DCA-yolov8, respectively, indicating that MDA-DETR is superior to other advanced methods. Full article
(This article belongs to the Special Issue Animal–Computer Interaction: Advances and Opportunities)
21 pages, 9000 KiB  
Article
An Investigation of Infrared Small Target Detection by Using the SPT–YOLO Technique
by Yongjun Qi, Shaohua Yang, Zhengzheng Jia, Yuanmeng Song, Jie Zhu, Xin Liu and Hongxing Zheng
Technologies 2025, 13(1), 40; https://doi.org/10.3390/technologies13010040 - 17 Jan 2025
Viewed by 363
Abstract
To detect and recognize small-size and submerged complex background targets in infrared images, we combine a dynamic receptive field fusion strategy and a multi-scale feature fusion mechanism to improve the detection performance of small targets significantly. The space-to-depth convolution module is introduced as [...] Read more.
To detect and recognize small-size and submerged complex background targets in infrared images, we combine a dynamic receptive field fusion strategy and a multi-scale feature fusion mechanism to improve the detection performance of small targets significantly. The space-to-depth convolution module is introduced as a downsampling layer in the backbone first and achieves the same sampling effect. More detailed information is retained at the same time. Thus, the model’s detection capability for small targets has been enhanced. Then, the pyramid level 2 feature map with minimum receptive field and maximum resolution is added to the neck, which reduces the loss of positional information during feature sampling. Furthermore, x-small detection heads are added, the understanding of the overall characteristics and structure of the target is enhanced much more, and the representation and localization of small targets have been improved. Finally, the cross-entropy loss function in the original network model is replaced by an adaptive threshold focal loss function, forcing the model to allocate more attention to target features. The above methods are based on a public tool, the eighth version of You Only Look Once (YOLO) improved, it is named SPT–YOLO (SPDConv + P2 + Adaptive Threshold + YOLOV8s) in this paper. Some experiments on datasets such as infrared small object detection (IR-SOD) and infrared small target detection 1K(IRSTD-1K), etc. have been executed to verify the proposed algorithm; and the mean average precision of 94.0% and 69% under the condition of threshold at 0.5 and over a range from 0.5 to 0.95 is obtained, respectively. The results show that the proposed method achieves the best performance of infrared small target detection compared to existing methods. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

Figure 1
<p>A lightweight deep-learning model consisting of the backbone, neck, and head networks, and the YOLOv8 network architecture depicted in detail.</p>
Full article ">Figure 2
<p>The architecture of the SPT–YOLO detection model, highlights the improvements made to the original YOLOv8s. The red boxes in the backbone network represent the integration of SPDConv (labeled as SPD). In the neck network, the red-highlighted section corresponds to the addition of the P2 feature layer. Similarly, in the detection head, the red box denotes the inclusion of an x-small detection head.</p>
Full article ">Figure 3
<p>Conventional maximum pooling and strided convolution schematic; (<b>a</b>) several equal-sized rectangular regions called pooling windows; (<b>b</b>) SPDConv Module.</p>
Full article ">Figure 4
<p>Feature fusion process of the network after adding P2. The process from F3 to F2 is detailed in the purple box on the left, which includes the upsampling and fusion modules. F3 is the first upsampling to match the size of P2, after which it is fused with the P2 feature map.</p>
Full article ">Figure 5
<p>Imbalance between target and background. The left panel shows the original image with the target of interest highlighted within the red dashed border. The right panel shows the background of the image after target removal, with the green dashed border marking where the target was previously located.</p>
Full article ">Figure 6
<p>Visualization of model evaluation metrics during training (precision, recall, and mAP@0.5).</p>
Full article ">Figure 7
<p>Comparison of PR of different methods on IR-SOD dataset.</p>
Full article ">Figure 8
<p>Comparison of multiple target detection in a scene, where the blue bounding boxes represent cars, and the indigo boxes represent ships. In the ground truth annotations, green boxes denote cars and blue boxes denote ships, while the yellow bounding boxes highlight the missed detections.</p>
Full article ">Figure 9
<p>Presentation of a comparative analysis of various models in complex scenes.</p>
Full article ">Figure 9 Cont.
<p>Presentation of a comparative analysis of various models in complex scenes.</p>
Full article ">Figure 10
<p>Illustration of a comparative analysis of target detection in scenes with few targets, where the purple bounding boxes indicate missed detections.</p>
Full article ">Figure 10 Cont.
<p>Illustration of a comparative analysis of target detection in scenes with few targets, where the purple bounding boxes indicate missed detections.</p>
Full article ">Figure 11
<p>Comparison of feature maps for the output by the third convolutional module of the network.</p>
Full article ">Figure 12
<p>Comparison of detection performance with 3 and 4 detection heads.</p>
Full article ">
24 pages, 17068 KiB  
Article
Automated Fillet Weld Inspection Based on Deep Learning from 2D Images
by Ignacio Diaz-Cano, Arturo Morgado-Estevez, José María Rodríguez Corral, Pablo Medina-Coello, Blas Salvador-Dominguez and Miguel Alvarez-Alcon
Appl. Sci. 2025, 15(2), 899; https://doi.org/10.3390/app15020899 (registering DOI) - 17 Jan 2025
Viewed by 325
Abstract
This work presents an automated welding inspection system based on a neural network trained through a series of 2D images of welding seams obtained in the same study. The object detection method follows a geometric deep learning model based on convolutional neural networks. [...] Read more.
This work presents an automated welding inspection system based on a neural network trained through a series of 2D images of welding seams obtained in the same study. The object detection method follows a geometric deep learning model based on convolutional neural networks. Following an extensive review of available solutions, algorithms, and networks based on this convolutional strategy, it was determined that the You Only Look Once algorithm in its version 8 (YOLOv8) would be the most suitable for object detection due to its performance and features. Consequently, several models have been trained to enable the system to predict specific characteristics of weld beads. Firstly, the welding strategy used to manufacture the weld bead was predicted, distinguishing between two of them (Flux-Cored Arc Welding (FCAW)/Gas Metal Arc Welding (GMAW)), two of the predominant welding processes used in many industries, including shipbuilding, automotive, and aeronautics. In a subsequent experiment, the distinction between a well-manufactured weld bead and a defective one was predicted. In a final experiment, it was possible to predict whether a weld seam was well-manufactured or not, distinguishing between three possible welding defects. The study demonstrated high performance in three experiments, achieving top results in both binary classification (in the first two experiments) and multiclass classification (in the third experiment). The average prediction success rate exceeded 97% in all three experiments. Full article
(This article belongs to the Special Issue Graph and Geometric Deep Learning)
Show Figures

Figure 1

Figure 1
<p>Scheme of the FCAW welding process.</p>
Full article ">Figure 2
<p>Scheme of the GMAW welding process.</p>
Full article ">Figure 3
<p>On the left side, Fanuc 200i-D 7L robotic arm equipped with a welding torch. In the background of this image, you can see the gas bottles (Argon/Carbon Dioxide) in their conveniently mixed proportions. On the right side is the Lilcoln R450 CE Multi-Process Welding Machine, placed under the table of the robotic arm and connected to it.</p>
Full article ">Figure 4
<p>Steel plate where numbered seams were welded and then treated according to the experiment to be carried out.</p>
Full article ">Figure 5
<p>Equipment utilized for the capture of images in various positions and luminosities included a high-precision camera affixed to the end effector of the robotic arm and a luminaire positioned in different locations, contingent upon the intended image, with the objective of attaining a series of images exhibiting the most diverse range of luminosities feasible, thereby facilitating a more comprehensive training experience.</p>
Full article ">Figure 6
<p>Diagram illustrates the methodological framework employed in this study. The process initiates with the fabrication of the welds necessary for the experimental studies, followed by the acquisition of images of these welds. Subsequently, a series of image transformations are performed to train three models, one for each experiment, capable of detecting the manufactured weld seams.</p>
Full article ">Figure 7
<p>Industrial camera brand Ensenso model N35 (IDS-IMAGING, Germany), used to take images of weld seams.</p>
Full article ">Figure 8
<p>Mild steel plate with several welding beads, labeled with the online tool Roboflow, so that the system can detect a type of weld manufactured correctly compared to another weld manufactured with some defect.</p>
Full article ">Figure 9
<p>Set of images of the FCAW-GMAW dataset, where the predicted label and the percentage of that prediction can be observed. An irregular character of the image content can be observed, where the welding bead occupies practically all the space of the image.</p>
Full article ">Figure 10
<p>Training curves and performance metrics for the YOLOv8s object detection model trying to detect FCAW and GMAW weld seams. In all of them, we have the training epochs on the x-axis, while the y-axis represents the loss values, both without units. The curves show the learning of the model, observing a significant decrease in the loss while at the same time improving the precision, recall, and mAP50 scores, which leads us to think that the training has been effective.</p>
Full article ">Figure 11
<p>Plate of fillet weld beads where different beads can be seen, some labeled as GOOD and others as BAD, according to what the algorithm has learned once trained.</p>
Full article ">Figure 12
<p>Training curves and performance metrics for the YOLOv8s object detection model trying to detect weld seams manufactured without defects (labeled as GOOD) and weld seams with some manufacturing defects (labeled as BAD). The x-axis shows the training epochs, while the y-axis shows the loss values, both without units. The curves show the learning of the model, observing a significant decrease in loss, while the precision, recall, and mAP50 scores improve, which leads us to think that the training has been effective.</p>
Full article ">Figure 13
<p>Plate of fillet weld beads analyzed with the model obtained in experiment 3. It shows three of the four types of weld beads (objects) for which the model of this experiment has been trained. In addition, the image shows other elements that the model is able to discard.</p>
Full article ">Figure 14
<p>Training curves and performance metrics for the YOLOv8s object detection model, where we try to detect correctly made weld seams, without any defects (labeled as GOOD), and weld seams with some manufacturing defect, labeling and classifying several of these most common defects (labeled as UNDER for Undercuts, LOP for Lack Of Penetration, and OP for Other problems). The x-axis shows the training epochs, while the y-axis shows the loss values, both without units. The curves show the learning of the model; it is observed that the loss is significant, although somewhat milder than in the two previous experiments. Same case as in the precision, recall, and mAP50 scores that, although lower than before, we can deduce that the training has been effective.</p>
Full article ">
23 pages, 7015 KiB  
Article
Research on Lightweight Scenic Area Detection Algorithm Based on Small Targets
by Yu Zhang and Liya Wang
Electronics 2025, 14(2), 356; https://doi.org/10.3390/electronics14020356 - 17 Jan 2025
Viewed by 339
Abstract
Given the difficulty of effectively detecting small target objects using traditional detection technology in current scenic waste disposal settings, this paper proposes an improved detection algorithm based on YOLOv8n deployed on mobile carts. Firstly, the C2f-MS (Middle Spilt) module is proposed to replace [...] Read more.
Given the difficulty of effectively detecting small target objects using traditional detection technology in current scenic waste disposal settings, this paper proposes an improved detection algorithm based on YOLOv8n deployed on mobile carts. Firstly, the C2f-MS (Middle Spilt) module is proposed to replace the convolution module of the backbone network. Retaining the original feature details of different scales enhances the ability to detect small targets while reducing the number of model parameters. Secondly, the neck network is redesigned, introducing the CEPN (Convergence–Expansion Pyramid Network) to enhance the semantic feature information during transmission. This improves the capture of detailed information about small targets, enabling effective detection. Finally, a QS-Dot-IoU hybrid loss function is proposed. This loss function enhances sensitivity to target shape, simultaneously focuses on classification and localization, improves the detection performance of small targets, and reduces the occurrence of false detections. Experimental results demonstrate that the proposed algorithm outperforms other detection algorithms regarding small targets’ detection performance while maintaining a more compact size. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Improved YOLOv8n network structure.</p>
Full article ">Figure 2
<p>C2f structural diagram.</p>
Full article ">Figure 3
<p>MS module structural diagram.</p>
Full article ">Figure 4
<p>C2f-MS module structure diagram.</p>
Full article ">Figure 5
<p>CEPN convergence diffusion pyramid network module.</p>
Full article ">Figure 6
<p>MDCR module structure diagram.</p>
Full article ">Figure 7
<p>Schematic diagram of image enhancement method. (<b>a</b>) Original image; (<b>b</b>) horizontal rotation; (<b>c</b>) random brightness; (<b>d</b>) random noise.</p>
Full article ">Figure 8
<p>mAP variation curve of comparative experiments under TACO dataset.</p>
Full article ">Figure 9
<p>mAP variation curve of comparative experiments under VIA Img dataset.</p>
Full article ">Figure 10
<p>mAP variation curve of comparative experiments under self-made dataset.</p>
Full article ">Figure 11
<p>Comparison of experiment results of the heat map. (<b>a</b>) Original image; (<b>b</b>) attention heatmap not added; (<b>c</b>) attention heatmap added.</p>
Full article ">Figure 12
<p>Comparative result graphs of quantitative experiments under three different datasets. (<b>a</b>) Original image; (<b>b</b>) YOLOv8n detection effect; (<b>c</b>) MS-YOLO detection effect.</p>
Full article ">Figure 13
<p>Experiment with mobile car grasping in real scenic scenes. (<b>a</b>) Initial state; (<b>b</b>) grab the object; (<b>c</b>) scraping process; (<b>d</b>) crawling is complete.</p>
Full article ">
23 pages, 10414 KiB  
Article
Instance Segmentation and 3D Pose Estimation of Tea Bud Leaves for Autonomous Harvesting Robots
by Haoxin Li, Tianci Chen, Yingmei Chen, Chongyang Han, Jinhong Lv, Zhiheng Zhou and Weibin Wu
Agriculture 2025, 15(2), 198; https://doi.org/10.3390/agriculture15020198 - 17 Jan 2025
Viewed by 237
Abstract
In unstructured tea garden environments, accurate recognition and pose estimation of tea bud leaves are critical for autonomous harvesting robots. Due to variations in imaging distance, tea bud leaves exhibit diverse scale and pose characteristics in camera views, which significantly complicates the recognition [...] Read more.
In unstructured tea garden environments, accurate recognition and pose estimation of tea bud leaves are critical for autonomous harvesting robots. Due to variations in imaging distance, tea bud leaves exhibit diverse scale and pose characteristics in camera views, which significantly complicates the recognition and pose estimation process. This study proposes a method using an RGB-D camera for precise recognition and pose estimation of tea bud leaves. The approach first constructs an for tea bud leaves, followed by a dynamic weight estimation strategy to achieve adaptive pose estimation. Quantitative experiments demonstrate that the instance segmentation model achieves an mAP@50 of 92.0% for box detection and 91.9% for mask detection, improving by 3.2% and 3.4%, respectively, compared to the YOLOv8s-seg instance segmentation model. The pose estimation results indicate a maximum angular error of 7.76°, a mean angular error of 3.41°, a median angular error of 3.69°, and a median absolute deviation of 1.42°. The corresponding distance errors are 8.60 mm, 2.83 mm, 2.57 mm, and 0.81 mm, further confirming the accuracy and robustness of the proposed method. These results indicate that the proposed method can be applied in unstructured tea garden environments for non-destructive and precise harvesting with autonomous tea bud-leave harvesting robots. Full article
(This article belongs to the Section Agricultural Technology)
Show Figures

Figure 1

Figure 1
<p>Robot adjusting picking pose based on estimated pose of tea bud leaves. In the figure, A is the apex of the leaf, B is the apex of the tea bud, and C is the lowest point of the stem. D is the centroid of the growth plane formed by A, B, and C. The line connecting D and C defines the pose of the tea bud leaves.</p>
Full article ">Figure 2
<p>Data collection schematic.</p>
Full article ">Figure 3
<p>Tea-bud-leaves instance segmentation model.</p>
Full article ">Figure 4
<p>GELAN and E-GELAN modules. (<b>a</b>) GELAN module; (<b>b</b>) E-GELAN module.</p>
Full article ">Figure 5
<p>DCNv2.</p>
Full article ">Figure 6
<p>Dynamic Head.</p>
Full article ">Figure 7
<p>ORBSLAM3 algorithm overview.</p>
Full article ">Figure 8
<p>Comparison of local point cloud obtained from single-position sampling and multiple-position sampling for tea bud leaves.</p>
Full article ">Figure 9
<p>Tea-bud-leaves instance segmentation results. (<b>A</b>) Original Image, (<b>B</b>) proposed instance segmentation model, (<b>C</b>) YOLOv8s-seg model.</p>
Full article ">Figure 10
<p>Angle errors in tea-bud-leaves pose estimation.</p>
Full article ">Figure 11
<p>Distance errors in tea-bud-leaves pose estimation.</p>
Full article ">Figure 12
<p>Pose-estimation results for tea-bud-leaves point clouds of varying quality. (<b>a</b>) Partial point-cloud loss of the tea bud; (<b>b</b>) partial point-cloud loss of both tea bud and leaf; (<b>c</b>) partial point-cloud loss of the stem; (<b>d</b>) point-cloud loss at the intersection of the stem and tea bud.</p>
Full article ">Figure 13
<p>Tea-bud-leaves pose-estimation results.</p>
Full article ">
22 pages, 15460 KiB  
Article
The Application of an Intelligent Agaricus bisporus-Harvesting Device Based on FES-YOLOv5s
by Hao Ma, Yulong Ding, Hongwei Cui, Jiangtao Ji, Xin Jin, Tianhang Ding and Jiaoling Wang
Sensors 2025, 25(2), 519; https://doi.org/10.3390/s25020519 - 17 Jan 2025
Viewed by 242
Abstract
To address several challenges, including low efficiency, significant damage, and high costs, associated with the manual harvesting of Agaricus bisporus, in this study, a machine vision-based intelligent harvesting device was designed according to its agronomic characteristics and morphological features. This device mainly [...] Read more.
To address several challenges, including low efficiency, significant damage, and high costs, associated with the manual harvesting of Agaricus bisporus, in this study, a machine vision-based intelligent harvesting device was designed according to its agronomic characteristics and morphological features. This device mainly comprised a frame, camera, truss-type robotic arm, flexible manipulator, and control system. The FES-YOLOv5s deep learning target detection model was used to accurately identify and locate Agaricus bisporus. The harvesting control system, using a Jetson Orin Nano as the main controller, adopted an S-curve acceleration and deceleration motor control algorithm. This algorithm controlled the robotic arm and the flexible manipulator to harvest Agaricus bisporus based on the identification and positioning results. To confirm the impact of vibration on the harvesting process, a stepper motor drive test was conducted using both trapezoidal and S-curve acceleration and deceleration motor control algorithms. The test results showed that the S-curve acceleration and deceleration motor control algorithm exhibited excellent performance in vibration reduction and repeat positioning accuracy. The recognition efficiency and harvesting effectiveness of the intelligent harvesting device were tested using recognition accuracy, harvesting success rate, and damage rate as evaluation metrics. The results showed that the Agaricus bisporus recognition algorithm achieved an average recognition accuracy of 96.72%, with an average missed detection rate of 2.13% and a false detection rate of 1.72%. The harvesting success rate of the intelligent harvesting device was 94.95%, with an average damage rate of 2.67% and an average harvesting yield rate of 87.38%. These results meet the requirements for the intelligent harvesting of Agaricus bisporus and provide insight into the development of intelligent harvesting robots in the industrial production of Agaricus bisporus. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Structural diagram of the <span class="html-italic">Agaricus bisporus</span>-harvesting platform: 1. harvesting robot; 2. U-shaped guide rail; 3. mushroom rack; 4. mushroom bed; 5. climbing device.</p>
Full article ">Figure 2
<p>Structural diagram of the intelligent <span class="html-italic">Agaricus bisporu</span>-harvesting device: 1. frame; 2. frame stepper motor; 3. track wheel; 4. flexible manipulator; 5. camera; 6. gantry-style robotic arm; 7. control system.</p>
Full article ">Figure 3
<p>Workflow diagram of the <span class="html-italic">Agaricus bisporus</span>-harvesting device.</p>
Full article ">Figure 4
<p>Truss-type mechanical arm structure diagram. 1. Y-axis stepper motor; 2. synchronous pulley; 3. Y-axis sliding module; 4. timing belt; 5. beam; 6. X-axis sliding module; 7. synchronous pulley; 8. X-axis stepper motor; 9. lead screw sliding platform.</p>
Full article ">Figure 5
<p>Manipulator structure and pneumatic driving diagram: (<b>a</b>) flexible manipulator structure diagram; (<b>b</b>) pneumatic drive circuit. 1. Servo motor; 2. connector; 3. spring flange; 4. telescopic rod; 5. air inlet; 6. pneumatic flexible fingers.</p>
Full article ">Figure 6
<p>Hardware system of the <span class="html-italic">Agaricus bisporus</span>-harvesting device.</p>
Full article ">Figure 7
<p>Upper computer page diagram. The (<b>left</b>) side of the diagram shows the unit’s function buttons, operating hours, and total amount of picking. On the (<b>right</b>) side is the real-time detection screen for <span class="html-italic">Agaricus bisporus</span>.</p>
Full article ">Figure 8
<p>The improved network structure of FES-YOLOv5s. The left side of the figure shows the network structure of FES-YOLOv5s, and the right side shows the network structure of some modules. Adapted from Ma et al. [<a href="#B21-sensors-25-00519" class="html-bibr">21</a>].</p>
Full article ">Figure 9
<p>Coordinate relationship diagram: O is the world coordinate system (red coordinate system); Om is the flexible manipulator coordinate system (yellow coordinate system); Opo is the camera coordinate system (green coordinate system); xoy is the image coordinate system; and tow is the pixel coordinate system.</p>
Full article ">Figure 10
<p>S-type acceleration and deceleration schematic: (<b>a</b>) S-type acceleration and deceleration curves; (<b>b</b>) acceleration process. T1 and T2 are the acceleration time periods; T3 is the constant speed time period; T4 and T5 are the deceleration time periods; a is the point of maximum acceleration.</p>
Full article ">Figure 11
<p><span class="html-italic">Agaricus bisporus</span> mushroom growing room.</p>
Full article ">Figure 12
<p>Full system test: (<b>a</b>) picking device picking area; (<b>b</b>) reciprocating line-by-line detection method.</p>
Full article ">
17 pages, 2041 KiB  
Article
LEAF-Net: A Unified Framework for Leaf Extraction and Analysis in Multi-Crop Phenotyping Using YOLOv11
by Ameer Tamoor Khan and Signe Marie Jensen
Agriculture 2025, 15(2), 196; https://doi.org/10.3390/agriculture15020196 - 17 Jan 2025
Viewed by 194
Abstract
Accurate leaf segmentation and counting are critical for advancing crop phenotyping and improving breeding programs in agriculture. This study evaluates YOLOv11-based models for automated leaf detection and segmentation across spring barley, spring wheat, winter wheat, winter rye, and winter triticale. The key focus [...] Read more.
Accurate leaf segmentation and counting are critical for advancing crop phenotyping and improving breeding programs in agriculture. This study evaluates YOLOv11-based models for automated leaf detection and segmentation across spring barley, spring wheat, winter wheat, winter rye, and winter triticale. The key focus is assessing whether a unified model trained on a combined multi-crop dataset can outperform crop-specific models. Results show that the unified model achieves superior performance in bounding box tasks, with mAP@50 exceeding 0.85 for spring crops and 0.7 for winter crops. Segmentation tasks, however, reveal mixed results, with individual models occasionally excelling in recall for winter crops. These findings highlight the benefits of dataset diversity in improving generalization, while emphasizing the need for larger annotated datasets to address variability in real-world conditions. While the combined dataset improves generalization, the unique characteristics of individual crops may still benefit from specialized training. Full article
Show Figures

Figure 1

Figure 1
<p>Sample images from the spring and winter crops dataset, displaying annotated leaves highlighted in green, which were otherwise challenging to distinguish in raw images. Spring barley and spring wheat images were captured using an agricultural robot (overall height approximately 2.15 m), while winter wheat, winter rye, and winter triticale images were collected using a drone flying at an altitude of 8 m.</p>
Full article ">Figure 2
<p>Spring and winter crop dataset distribution illustrating the count of images in the training and testing datasets for each crop type.</p>
Full article ">Figure 3
<p>Spring and winter crop leaf distribution showing the count of manually annotated leaves across the training and testing datasets for each crop type.</p>
Full article ">Figure 4
<p>PCA of spring and winter crops visualized using features from a pretrained ResNet50 model. The clustering reflected height-based differences, with robot images capturing larger leaf sizes due to proximity to the ground and controlled conditions, while drone images showed smaller leaf sizes influenced by natural, uncontrolled environmental factors.</p>
Full article ">Figure 5
<p>A schematic diagram of YOLOv11 illustrating its three core components: Backbone, Neck, and Head. The Backbone handles multi-scale feature extraction using advanced blocks such as the C3k2 and Spatial Pyramid Pooling Fast (SPPF), designed for efficient feature representation. The Neck aggregates and refines these features with additional mechanisms like Cross-Stage Partial with Spatial Attention (C2PSA). Finally, the Head predicts object bounding boxes, masks, and classifications using enhanced multi-scale processing.</p>
Full article ">Figure 6
<p>(<b>a</b>–<b>d</b>) Training and validation losses for spring barley, spring wheat, winter wheat, winter rye, and winter triticale across combined and individual datasets, showing box and segmentation losses.</p>
Full article ">Figure 7
<p>(<b>a</b>–<b>d</b>) Precision and recall for spring barley, spring wheat, winter wheat, winter rye, and winter triticale across combined and individual datasets, showing box and segmentation precision and recall.</p>
Full article ">Figure 8
<p>(<b>a</b>–<b>d</b>) Mean Average Precision (mAP) for spring barley, spring wheat, winter wheat, winter rye, and winter triticale across combined and individual datasets, showing box and segmentation @mAP50 and @mAP50:95.</p>
Full article ">Figure 9
<p>(<b>a</b>–<b>e</b>) Prediction results for spring barley, spring wheat (robot images), and winter wheat, rye, and triticale (8 m drone images), with detected crops highlighted in color-coded annotations.</p>
Full article ">
18 pages, 5456 KiB  
Article
Smart Agricultural Pest Detection Using I-YOLOv10-SC: An Improved Object Detection Framework
by Wenxia Yuan, Lingfang Lan, Jiayi Xu, Tingting Sun, Xinghua Wang, Qiaomei Wang, Jingnan Hu and Baijuan Wang
Agronomy 2025, 15(1), 221; https://doi.org/10.3390/agronomy15010221 - 17 Jan 2025
Viewed by 221
Abstract
Aiming at the problems of insufficient detection accuracy and high false detection rates of traditional pest detection models in the face of small targets and incomplete targets, this study proposes an improved target detection network, I-YOLOv10-SC. The network leverages Space-to-Depth Convolution to enhance [...] Read more.
Aiming at the problems of insufficient detection accuracy and high false detection rates of traditional pest detection models in the face of small targets and incomplete targets, this study proposes an improved target detection network, I-YOLOv10-SC. The network leverages Space-to-Depth Convolution to enhance its capability in detecting small insect targets. The Convolutional Block Attention Module is employed to improve feature representation and attention focus. Additionally, Shape Weights and Scale Adjustment Factors are introduced to optimize the loss function. The experimental results show that compared with the original YOLOv10, the model generated by the improved algorithm improves the accuracy by 5.88 percentage points, the recall rate by 6.67 percentage points, the balance score by 6.27 percentage points, the mAP value by 4.26 percentage points, the bounding box loss by 18.75%, the classification loss by 27.27%, and the feature point loss by 8%. The model oscillation has also been significantly improved. The enhanced I-YOLOv10-SC network effectively addresses the challenges of detecting small and incomplete insect targets in tea plantations, offering high precision and recall rates, thus providing a solid technical foundation for intelligent pest monitoring and precise prevention in smart tea gardens. Full article
Show Figures

Figure 1

Figure 1
<p>Label distribution.</p>
Full article ">Figure 2
<p>Data augmentation.</p>
Full article ">Figure 3
<p>Improved YOLOv10 network structure.</p>
Full article ">Figure 4
<p>Space-to-Depth Convolution. (<span class="html-fig-inline" id="agronomy-15-00221-i001"><img alt="Agronomy 15 00221 i001" src="/agronomy/agronomy-15-00221/article_deploy/html/images/agronomy-15-00221-i001.png"/></span>: It means output after convolution with a stride of 1).</p>
Full article ">Figure 5
<p>Convolutional Block Attention Module.</p>
Full article ">Figure 6
<p>Loss function variation curve.</p>
Full article ">Figure 7
<p>Performance metrics curve. (Note: Light blue is <span class="html-italic">T. aurantii</span>, green is <span class="html-italic">X. fornicatus</span>, red is <span class="html-italic">A. apicalis</span>, yellow is <span class="html-italic">E. pirisuga</span>, and dark blue is all categories.).</p>
Full article ">Figure 8
<p>GradCAM heatmaps from the ablation study.</p>
Full article ">Figure 9
<p>External validation comparison.</p>
Full article ">
17 pages, 7356 KiB  
Article
Increasing Neural-Based Pedestrian Detectors’ Robustness to Adversarial Patch Attacks Using Anomaly Localization
by Olga Ilina, Maxim Tereshonok and Vadim Ziyadinov
J. Imaging 2025, 11(1), 26; https://doi.org/10.3390/jimaging11010026 - 17 Jan 2025
Viewed by 270
Abstract
Object detection in images is a fundamental component of many safety-critical systems, such as autonomous driving, video surveillance systems, and robotics. Adversarial patch attacks, being easily implemented in the real world, provide effective counteraction to object detection by state-of-the-art neural-based detectors. It poses [...] Read more.
Object detection in images is a fundamental component of many safety-critical systems, such as autonomous driving, video surveillance systems, and robotics. Adversarial patch attacks, being easily implemented in the real world, provide effective counteraction to object detection by state-of-the-art neural-based detectors. It poses a serious danger in various fields of activity. Existing defense methods against patch attacks are insufficiently effective, which underlines the need to develop new reliable solutions. In this manuscript, we propose a method which helps to increase the robustness of neural network systems to the input adversarial images. The proposed method consists of a Deep Convolutional Neural Network to reconstruct a benign image from the adversarial one; a Calculating Maximum Error block to highlight the mismatches between input and reconstructed images; a Localizing Anomalous Fragments block to extract the anomalous regions using the Isolation Forest algorithm from histograms of images’ fragments; and a Clustering and Processing block to group and evaluate the extracted anomalous regions. The proposed method, based on anomaly localization, demonstrates high resistance to adversarial patch attacks while maintaining the high quality of object detection. The experimental results show that the proposed method is effective in defending against adversarial patch attacks. Using the YOLOv3 algorithm with the proposed defensive method for pedestrian detection in the INRIAPerson dataset under the adversarial attacks, the mAP50 metric reaches 80.97% compared to 46.79% without a defensive method. The results of the research demonstrate that the proposed method is promising for improvement of object detection systems security. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

Figure 1
<p>Example of an adversarial patch attack, generated by minimizing the object detector’s objectness score. The image is taken from [<a href="#B13-jimaging-11-00026" class="html-bibr">13</a>].</p>
Full article ">Figure 2
<p>Simplified scheme of the proposed method.</p>
Full article ">Figure 3
<p>Simplified diagram of the proposed Deep Convolutional Neural Network (DCNN) architecture. The encoder is based on the convolutional part of ResNet50, excluding the final fully connected classification layer, and consists of sequential blocks that reduce the spatial dimensions of the input image. The decoder consists of sequential <span class="html-italic">deconvU</span> blocks, which increase the spatial dimensions to reconstruct the original image. The DCNN is trained using the mean squared error (MSE) loss function to minimize the difference between the original and reconstructed images. The input image <span class="html-italic">X</span> and the reconstructed image <span class="html-italic">Y</span> are processed through the blocks of the scheme presented in <a href="#jimaging-11-00026-f002" class="html-fig">Figure 2</a>.</p>
Full article ">Figure 4
<p>Example of a benign image (<b>a</b>) and its corresponding error map (<b>c</b>); example of an adversarial image (<b>b</b>) and its corresponding error map (<b>d</b>).</p>
Full article ">Figure 5
<p>Simplified scheme of the <span class="html-italic">Localizing Anomalous Fragments</span> block.</p>
Full article ">Figure 6
<p>Examples of anomalous fragment maps <math display="inline"><semantics> <msub> <mo>Δ</mo> <mrow> <mi>I</mi> <mi>F</mi> </mrow> </msub> </semantics></math> for clean images (the second row) and for adversarial images (the fourth row).</p>
Full article ">Figure 7
<p>Simplified scheme of the proposed clustering and processing block.</p>
Full article ">Figure 8
<p>Examples of anomaly maps for a benign image (<b>a</b>) and for an adversarial image (<b>b</b>).</p>
Full article ">Figure 9
<p>Visualization of YOLOv3 object detection on several examples from the INRIA-Person dataset, subjected to an adversarial patch attack [<a href="#B13-jimaging-11-00026" class="html-bibr">13</a>]. The green bounding boxes represent detected “person” objects, with the objectness score displayed in the top-left corner of each box. The left column shows detection results without defensive pre-processing, while the right column illustrates the impact of the proposed defense method.</p>
Full article ">
16 pages, 4947 KiB  
Article
FE-YOLO: An Efficient Deep Learning Model Based on Feature-Enhanced YOLOv7 for Microalgae Identification and Detection
by Gege Ding, Yuhang Shi, Zhenquan Liu, Yanjuan Wang, Zhixuan Yao, Dan Zhou, Xuexiu Zhu and Yiqin Li
Biomimetics 2025, 10(1), 62; https://doi.org/10.3390/biomimetics10010062 - 16 Jan 2025
Viewed by 665
Abstract
The identification and detection of microalgae are essential for the development and utilization of microalgae resources. Traditional methods for microalgae identification and detection have many limitations. Herein, a Feature-Enhanced YOLOv7 (FE-YOLO) model for microalgae cell identification and detection is proposed. Firstly, the feature [...] Read more.
The identification and detection of microalgae are essential for the development and utilization of microalgae resources. Traditional methods for microalgae identification and detection have many limitations. Herein, a Feature-Enhanced YOLOv7 (FE-YOLO) model for microalgae cell identification and detection is proposed. Firstly, the feature extraction capability was enhanced by integrating the CAGS (Coordinate Attention Group Shuffle Convolution) attention module into the Neck section. Secondly, the SIoU (SCYLLA-IoU) algorithm was employed to replace the CIoU (Complete IoU) loss function in the original model, addressing the issues of unstable convergence. Finally, we captured and constructed a microalgae dataset containing 6300 images of seven species of microalgae, addressing the issue of a lack of microalgae cell datasets. Compared to the YOLOv7 model, the proposed method shows greatly improved average Precision, Recall, mAP@50, and mAP@95; our proposed algorithm achieved increases of 9.6%, 1.9%, 9.7%, and 6.9%, respectively. In addition, the average detection time of a single image was 0.0455 s, marking a 9.2% improvement. Full article
(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)
Show Figures

Figure 1

Figure 1
<p>Microalgae image acquisition device.</p>
Full article ">Figure 2
<p>FE-YOLO model structure.</p>
Full article ">Figure 3
<p>CAGS network structure.</p>
Full article ">Figure 4
<p>The scheme for calculation of angle cost and distance cost contribution into the loss function. B represents the predicted bounding box, <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="normal">B</mi> </mrow> <mrow> <mi mathvariant="normal">g</mi> <mi mathvariant="normal">t</mi> </mrow> </msup> </mrow> </semantics></math> denotes the ground truth bounding box, and <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="normal">B</mi> </mrow> <mrow> <mo>∗</mo> </mrow> </msup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="normal">B</mi> </mrow> <mrow> <mo>∗</mo> <mi mathvariant="normal">G</mi> <mi mathvariant="normal">T</mi> </mrow> </msup> </mrow> </semantics></math> are the minimum enclosing rectangles of the predicted and ground truth bounding boxes, respectively. <math display="inline"><semantics> <mrow> <msup> <mrow> <msub> <mrow> <mi mathvariant="normal">C</mi> </mrow> <mrow> <mi mathvariant="normal">W</mi> </mrow> </msub> </mrow> <mo>⁢</mo> </msup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msup> <mrow> <msub> <mrow> <mi mathvariant="normal">C</mi> </mrow> <mrow> <mi mathvariant="normal">h</mi> </mrow> </msub> </mrow> <mo>⁢</mo> </msup> </mrow> </semantics></math> represent the width and height of the minimum enclosing rectangle, and <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="sans-serif">α</mi> </mrow> <mo>⁢</mo> </msup> </mrow> </semantics></math> indicates the angular difference between the bounding boxes.</p>
Full article ">Figure 5
<p>The progress of training performance of different models. (<b>a</b>) Recall, (<b>b</b>) Precision, (<b>c</b>) mAP@0.5, (<b>d</b>) mAP@0.95.</p>
Full article ">Figure 6
<p>The comparison of P-R curves during training. (<b>a</b>) YOLOv7, (<b>b</b>) FE-YOLO.</p>
Full article ">Figure 7
<p>The comparison of ROC curves during training. (<b>a</b>) YOLOv7, (<b>b</b>) FE-YOLO. Dashed line: randomly guessing the baseline.</p>
Full article ">Figure 8
<p>Visualization of attention maps for the YOLOv7, ablation1, and FE-YOLO on the microalgae dataset.</p>
Full article ">Figure 9
<p>mAP@0.95 curves for different methods on the microalgae dataset.</p>
Full article ">Figure 10
<p>The detection results of different methods.</p>
Full article ">
14 pages, 5200 KiB  
Article
Evaluating YOLOv4 and YOLOv5 for Enhanced Object Detection in UAV-Based Surveillance
by Mugtaba Abdalrazig Mohamed Alhassan and Ersen Yılmaz
Processes 2025, 13(1), 254; https://doi.org/10.3390/pr13010254 - 16 Jan 2025
Viewed by 330
Abstract
Traditional surveillance systems often rely on fixed cameras with limited coverage and human monitoring, which can lead to potential errors and delays. Unmanned Aerial Vehicles (UAVs) equipped with object detection algorithms, such as You Only Look Once (YOLO), offer a robust solution for [...] Read more.
Traditional surveillance systems often rely on fixed cameras with limited coverage and human monitoring, which can lead to potential errors and delays. Unmanned Aerial Vehicles (UAVs) equipped with object detection algorithms, such as You Only Look Once (YOLO), offer a robust solution for dynamic surveillance, enabling real-time monitoring over large and inaccessible areas. In this study, we present a comparative analysis of YOLOv4 and YOLOv5 for UAV-based surveillance applications, focusing on two critical metrics: detection speed (Frames Per Second, FPS) and accuracy (Average Precision, AP). Using aerial imagery captured by a UAV, along with 20,288 images from the Microsoft Common Objects in Context (MS COCO) dataset, we evaluate each model’s suitability for deployment in high-demand environments. The results indicate that YOLOv5 outperforms YOLOv4 with a 1.63-fold increase in FPS and a 1.09-fold improvement in AP, suggesting that YOLOv5 is a more efficient option for UAV-based detection. However, to align with recent advancements, this study also highlights potential areas for integrating newer YOLO models and transformer-based architectures in future research to further enhance detection performance and model robustness. This work aims to provide a solid foundation for UAV-based object detection, while acknowledging the need for continuous development to accommodate newer models and evolving detection challenges. Full article
(This article belongs to the Section Advanced Digital and Other Processes)
Show Figures

Figure 1

Figure 1
<p>The YOLOv4 architecture in general consists of the following: backbone as CSPDarknet53, neck as SPP, PAN, and head as YOLOv3 [<a href="#B5-processes-13-00254" class="html-bibr">5</a>].</p>
Full article ">Figure 2
<p>Illustrates the YOLOv5 architecture, delineating its three primary components: backbone, neck, and head. This representation is derived from the TensorBoard visualization of the model and the official documentation available in the YOLOv5 repository [<a href="#B24-processes-13-00254" class="html-bibr">24</a>].</p>
Full article ">Figure 3
<p>The Intersection over Union.</p>
Full article ">Figure 4
<p>Box plot illustrating the distribution of (<b>a</b>) Average Precision (AP) and (<b>b</b>) Average Recall (AR) for YOLOv4 across the person, bus, and car object categories.</p>
Full article ">Figure 5
<p>Box plot illustrating the distribution of (<b>a</b>) Average Precision (AP) and (<b>b</b>) Average Recall (AR) for YOLOv5 across the person, bus, and car object categories.</p>
Full article ">Figure 6
<p>YOLO object detection (residual blocks, bounding box regression and Intersection Over Union).</p>
Full article ">
23 pages, 3671 KiB  
Article
Improved YOLOv10 for Visually Impaired: Balancing Model Accuracy and Efficiency in the Case of Public Transportation
by Rio Arifando, Shinji Eto, Tibyani Tibyani and Chikamune Wada
Informatics 2025, 12(1), 7; https://doi.org/10.3390/informatics12010007 - 16 Jan 2025
Viewed by 216
Abstract
Advancements in automation and artificial intelligence have significantly impacted accessibility for individuals with visual impairments, particularly in the realm of bus public transportation. Effective bus detection and bus point-of-view (POV) classification are crucial for enhancing the independence of visually impaired individuals. This study [...] Read more.
Advancements in automation and artificial intelligence have significantly impacted accessibility for individuals with visual impairments, particularly in the realm of bus public transportation. Effective bus detection and bus point-of-view (POV) classification are crucial for enhancing the independence of visually impaired individuals. This study introduces the Improved-YOLOv10, a novel model designed to tackle challenges in bus identification and pov classification by integrating Coordinate Attention (CA) and Adaptive Kernel Convolution (AKConv) into the YOLOv10 framework. The Improved YOLOv10 advances the YOLOv10 architecture through the incorporation of CA, which enhances long-range dependency modeling and spatial awareness, and AKConv, which dynamically adjusts convolutional kernels for superior feature extraction. These enhancements aim to improve both detection accuracy and efficiency, essential for real-time applications in assistive technologies. Evaluation results demonstrate that the Improved-YOLOv10 offers significant improvements in detection performance, including better Accuracy, Precision and Recall compared to YOLOv10. The model also exhibits reduced computational complexity and storage requirements, highlighting its efficiency. While the classification results show some trade-offs, with slightly decreased overall F1 score, the complexity of Giga Floating Point Operations (GFLOPs), Parameters, and Weight/MB in the Improved-YOLOv10 remains advantageous for classification tasks. The model’s architectural improvements contribute to its robustness and efficiency, making it a suitable choice for real-time applications and assistive technologies. Full article
Show Figures

Figure 1

Figure 1
<p>Diversity of Urban Environments in Bus Detection Dataset.</p>
Full article ">Figure 2
<p>Visualization of the dataset: (<b>a</b>) The number of annotations for each class. (<b>b</b>) A visual representation of the location and size of the bounding boxes. (<b>c</b>) The statistical distribution of bounding box positions. (<b>d</b>) The statistical distribution of bounding box sizes. (<b>e</b>) Detailed label distribution analysis.</p>
Full article ">Figure 3
<p>Different Patterns of Road Area with Different Illuminations Between Daytime and Nighttime.</p>
Full article ">Figure 4
<p>Examples of (<b>A</b>) Good and (<b>B</b>) Bad Viewpoints.</p>
Full article ">Figure 5
<p>The YOLOv10 model structure.</p>
Full article ">Figure 6
<p>The structure of Coordinate Attention.</p>
Full article ">Figure 7
<p>AKConv network structure.</p>
Full article ">Figure 8
<p>Initial sampling shape.</p>
Full article ">Figure 9
<p>The 5 × 5 different initial sample shapes.</p>
Full article ">Figure 10
<p>The Improved YOLOv10 model structure.</p>
Full article ">Figure 11
<p>Comparison of YOLOv10 and YOLOv10-PSCA-AKConv Performance Metrics.</p>
Full article ">Figure 12
<p>Training and Evaluation Results of YOLOv10 (Baseline Detection Model).</p>
Full article ">Figure 13
<p>Training and Evaluation Results of YOLOv10-PSCA-AKConv (Improved Detection Model).</p>
Full article ">Figure 14
<p>Loss Comparison Between YOLOv10 and YOLOv10-PSCA-AKConv.</p>
Full article ">Figure 15
<p>Training and evaluation results of (<b>a</b>) YOLOv10 (Baseline Detection Model) and (<b>b</b>) YOLOv10-PSCA-AKConv (Improved Detection Model).</p>
Full article ">Figure 16
<p>Confusion Matrix Results for YOLOv10 and YOLOv10-PSCA-AKConv. (<b>a</b>) YOLOv10 and (<b>b</b>) YOLOv10-PSCA-AKConv.</p>
Full article ">
Back to TopTop