1. Introduction
In marine engineering, metal corrosion is a severe issue, particularly in environments with high salinity, high humidity, and high oxygen levels, where structures such as ships, offshore platforms, port facilities, and subsea pipelines are often affected by electrochemical corrosion [
1,
2]. Due to the harsh conditions of the marine environment, traditional corrosion detection methods, such as visual inspection, electrochemical measurements, and ultrasonic testing, face challenges such as low detection accuracy, high maintenance costs, and difficulties in data analysis [
3]. Therefore, the development of efficient, automated corrosion detection technologies has become an important solution to this problem. In recent years, artificial intelligence (AI)-based image recognition technologies have achieved significant success in intelligent corrosion detection, where AI technologies have been widely applied to the identification and analysis of surface corrosion on structures [
4]. This technology can quickly and accurately detect corrosion phenomena, providing strong support for the maintenance and management of marine engineering projects.
Among various AI technologies, deep learning-based object detection algorithms, particularly the YOLO (You Only Look Once) series of object detection algorithms, have demonstrated significant application potential in the field of object detection due to their fast detection speed and high accuracy [
5]. It was noted that YOLOv5 has garnered attention for its exceptional speed and accuracy, achieving positive results in the corrosion detection of marine engineering structures [
6]. However, marine metal corrosion is characterized by complexity and irregularity, especially in early metal structures where frequently occurring small-scale corrosion features pose challenges for detection [
7]. Although YOLOv5 has made progress in detecting small objects, there is still room for improvement in their performance when dealing with complex backgrounds and marine corrosion features. This necessitates further optimization of the YOLOv5 model’s detection performance to better meet the complex demands of marine corrosion detection.
The traditional YOLOv5 model uses Intersection over Union (IoU) as its loss function, but this loss function often leads to slow convergence and low accuracy when handling small object detection tasks. As a result, improving the loss function has become an important direction for enhancing the performance of the YOLOv5 model [
8]. In recent years, researchers have proposed various improved loss functions, such as Generalized Intersection over Union (GIoU), Distance Intersection over Union (DIoU), and Complete Intersection over Union (CIoU). These new loss functions not only accelerate the model’s convergence speed but also significantly improve the accuracy of small object detection [
9]. However, despite the good results achieved by these methods in general object detection tasks, the complex backgrounds and irregular features of marine metal corrosion still pose significant challenges to existing algorithms. Therefore, further optimizing the loss function to enhance the performance of YOLOv5 in detecting small and numerous corrosion features remains of great practical value and has substantial room for improvement.
To enhance the accuracy and robustness of the YOLOv5 model in detecting small target corrosion features, this paper focuses on enhancing the model’s loss function, specifically examining several typical IoU-based loss functions. Through a comparative analysis of how these IoU-based loss functions affect the performance of the YOLOv5 model in small target corrosion detection tasks, the goal is to propose a new YOLOv5 model that incorporates an improved IoU loss function, thereby optimizing its effectiveness in marine metal corrosion detection. This approach aims to provide a more efficient method for identifying corrosion damage, which will aid in the maintenance and management of marine engineering structures.
To clearly articulate the purpose and fundamental approach of this research, the subsequent sections of the article first present a detailed literature review of the YOLOv5 object detection algorithm, focusing on the evolution of the YOLOv5 algorithm, methods for optimizing the loss function, and its specific applications in marine metal corrosion detection. Additionally, the unique advantages and challenges of YOLOv5 in small target detection are analyzed, along with a discussion on the impact of different loss functions on the performance of the YOLOv5 model. These analyses will provide a solid theoretical foundation for the subsequent optimization of YOLOv5’s effectiveness in corrosion detection tasks.
2. Literature Review
The YOLO algorithm employs an end-to-end training approach, directly detecting objects in raw images, eliminating the need for complex preprocessing and postprocessing steps, simplifying system design, and improving detection efficiency [
10]. In 2017, Redmon et al. proposed YOLOv2 (YOLO9000), which combined anchor boxes with a deeper feature extraction network, further enhancing detection accuracy [
11]. In 2018, YOLOv3 was introduced, using multi-scale feature maps and an improved network structure, which strengthened its ability to detect objects in complex backgrounds and at different scales [
12]. In 2020, Alexey Bochkovskiy et al. proposed YOLOv4, which used CSPDarknet53 as the backbone network, incorporated Mosaic data augmentation, the DropBlock regularization method, and the CIoU loss function, and introduced the Spatial Attention Module (SAM), greatly enhancing the model’s feature extraction efficiency and accuracy [
13]. In June 2020, the Ultralytics team, led by Glenn Jocher, released the YOLOv5 model, which was the first version to be implemented using PyTorch (1.6.0), making the model easier to deploy and integrate. The model offers different scales (YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x) to suit varying computational capabilities and real-time demands. It improved accuracy while maintaining high speed, especially in small object detection [
14]. Since then, the YOLO series has continuously innovated, with multiple versions released from YOLOv5 to YOLOv10, progressively improving model performance. YOLOv5, due to its outstanding performance, remains one of the key models for researchers to study and apply. Many scholars have optimized the traditional YOLOv5 model by designing lightweight versions through optimized training strategies and model architecture, incorporating more efficient feature extraction and multi-scale feature fusion techniques [
15]. Notably, substantial improvements have been made in small object detection tasks, including modifications to the backbone [
16,
17,
18], the introduction of attention mechanisms [
19,
20], improvements to the neck network [
21,
22,
23,
24], and advancements in loss functions [
25].
In the field of computer vision, the choice and optimization of the loss function are often overlooked when it comes to object detection performance. Therefore, designing an appropriate loss function is crucial for the detection of object features [
26,
27]. In object detection tasks, the key is to accurately predict the boundary boxes of the target objects. To assess and optimize the prediction of boundary boxes, various loss functions have been proposed to minimize the difference between the predicted and ground truth boxes. Among them, Intersection over Union (IoU) is widely adopted due to its intuitiveness and effectiveness. However, traditional IoU-based loss functions suffer from slow convergence and inaccurate predictions [
28]. As a result, many researchers have made improvements to the IoU loss function to propose more efficient alternatives. The IoU loss function measures the accuracy of predictions by calculating the ratio of the intersection to the union of the predicted and ground truth boxes. While IoU loss theoretically reflects the quality of the predicted boxes well, in practice, it often requires more iterations to converge, and when there is no overlap between the predicted and ground truth boxes, the gradient vanishing issue can lead to inefficient learning [
29]. To address the gradient vanishing problem in cases of no overlap between the predicted and ground truth boxes, researchers proposed the Generalized IoU (GIoU) loss function. GIoU adds a penalty term to IoU to ensure that effective gradient information is provided even in non-overlapping cases. GIoU loss improves convergence speed in non-overlapping scenarios by minimizing the difference in the closed area between the predicted and ground truth boxes [
30]. Although GIoU loss improves convergence speed to some extent, it still depends on IoU changes, which leads to slower convergence in certain situations (e.g., when the predicted and ground truth boxes are vertically or horizontally aligned). To further accelerate convergence, researchers introduced the Distance IoU (DIoU) loss function. DIoU loss directly optimizes the distance between the center points of the boxes, significantly improving training convergence speed [
31]. To further improve the accuracy of boundary box regression, researchers proposed the Complete IoU (CIoU) loss function. CIoU loss not only considers the overlapping area and center point distance but also introduces aspect ratio consistency as an optimization objective, further enhancing both the accuracy and convergence speed of object detection [
32]. Notably, the improved IoU loss functions (such as GIoU) were first introduced in YOLOv4 to improve the localization ability of the boundary boxes. YOLOv5, on the other hand, uses CIoU (Complete Intersection over Union) as the primary loss function for boundary box regression.
In addition, some researchers have proposed various improved loss functions, such as adaptive loss functions, weighted loss functions, and multi-task learning loss functions, to enhance the detection performance and generalization ability of the traditional YOLOv5 model for small object features. The adaptive loss function adjusts its weights or form dynamically based on the characteristics of the data or the performance of the model, improving the model’s robustness and generalization ability, especially when dealing with imbalanced data, outliers, or other complex situations, demonstrating better detection performance. Typical adaptive loss functions include Focal Loss [
33], Adaptive Loss [
34], Class Balanced Loss [
35], Weighted Loss [
36], and Label Smoothing [
37]. The introduction of weighted loss functions has improved the shortcomings of the traditional YOLOv5 model in this regard to some extent. The basic idea of the weighted loss function is to assign different weights to different samples or classes when calculating the loss. This way, the weighted loss function can better handle class imbalances, noisy data, or the importance of specific samples, thus improving the model’s performance and generalization ability. Typical weighted loss functions include Weighted Cross-Entropy Loss [
38], Weighted Mean Squared Error (W-MSE) [
39], Class Balanced Loss [
40], and Focal Loss, among others. The multi-task learning loss function is designed to simultaneously optimize multiple related tasks and combine the losses of different tasks through weighted summation. By leveraging the correlations between tasks, multi-task learning can improve the model’s learning efficiency and generalization ability. Typical multi-task learning loss functions include Weighted Average Loss [
41], Shared Feature Loss [
42], Joint Training Loss [
43], Gradient Adjustment Loss [
44], and Cross-task Loss [
45], among others.
Although considerable research has focused on optimizing the YOLOv5 model for small object detection, in the task of detecting marine metal corrosion, corrosion targets often exhibit irregular shapes and complex backgrounds [
46], which still present challenges for the existing YOLOv5 model in detecting corrosion features. Therefore, exploring and comparing the effects of different loss functions on the YOLOv5 model’s performance, specifically for marine metal corrosion image datasets, becomes the primary objective of this study. This research will compare several typical loss functions, including Normalized Wasserstein Distance (NWD), Shape Intersection over Union (Shape-IoU), Weighted Intersection over Union (WIoU), Focal Expanded Intersection over Union (Focal-EIoU), Soft Intersection over Union (SIoU), and CIoU, evaluating their performance differences in detecting small corrosion features. Through this comparative analysis, the paper aims to identify the most suitable loss function for small object corrosion detection and propose an optimized YOLOv5 model to further improve detection accuracy and efficiency in marine metal corrosion detection.
4. Results
In this study, a metal corrosion image dataset is used as a case example to compare and analyze five YOLOv5 models enhanced with IoU-based loss functions: YOLOv5-NWD, YOLOv5-Shape-IoU, YOLOv5-Wise-IoU, YOLOv5-Focal-EIoU, and YOLOv5-SIoU. The traditional YOLOv5 model serves as the control group, and performance is evaluated using precision, recall, loss function, and FPS as key metrics.
4.1. Comparison of Model Performance
4.1.1. Model Performance Comparison Using Traditional Metrics
Figure 4 shows the accuracy progression of each model across epochs. The overall trend is similar for all models, with accuracy stabilizing around 25 epochs. YOLOv5-NWD achieved the highest accuracy at 0.794, marking an improvement of approximately 7.2% over the traditional YOLOv5 model. YOLOv5-Shape-IoU followed with an accuracy of 0.774, reflecting a 4.5% improvement. YOLOv5-Focal-EIoU, YOLOv5-SIoU, and YOLOv5-Wise-IoU achieved accuracies of 0.758, 0.756, and 0.743, with improvements of 2.3%, 2.0%, and 0.2%, respectively. YOLOv5-NWD showed the most significant improvement, while YOLOv5-Shape-IoU came second. YOLOv5-Wise-IoU showed the smallest gain, with a marginal increase in accuracy.
As shown in
Figure 5, all models demonstrate similar performance in terms of recall, indicating comparable capabilities in identifying actual corrosion points. YOLOv5-Shape-IoU and YOLOv5-Wise-IoU have recall rates of 0.645 and 0.646, respectively, slightly outperforming the baseline model’s recall rate of 0.639, suggesting a modest improvement in detecting corrosion instances. In contrast, YOLOv5-NWD has a recall rate of 0.628, which is 1.7% lower than the baseline. YOLOv5-Focal-EIoU and YOLOv5-SIoU have recall rates of 0.638 and 0.634, respectively, both showing minimal reductions compared to the baseline.
In terms of the F1-Score, YOLOv5-Shape-IoU and YOLOv5-NWD perform similarly, with scores of 0.704 and 0.701, reflecting improvements of 2.6% and 2.2%, respectively, compared to the baseline model. In contrast, YOLOv5-Focal-EIoU, YOLOv5-Wise-IoU, and YOLOv5-SIoU exhibit close F1-Scores of 0.693, 0.691, and 0.690, respectively, showing more modest gains of 1.0%, 0.7%, and 0.6% over the baseline.
Figure 6 presents the confusion matrix results for each model, revealing minimal differences in corrosion detection performance. The baseline YOLOv5 model achieved a corrosion detection accuracy of 0.67 with a background false positive rate of 0.33. Both YOLOv5-NWD and YOLOv5-Shape-IoU models showed a detection accuracy of 0.69 and a background false positive rate of 0.31, reflecting a 5.4% improvement over the baseline. The YOLOv5-Wise-IoU model had an accuracy of 0.68 and a false positive rate of 0.32, resulting in a modest 1.5% improvement. In contrast, YOLOv5-Focal-EIoU and YOLOv5-SIoU both had a corrosion detection accuracy of 0.66 and a background false positive rate of 0.34, showing a 1.5% decline compared to the baseline.
4.1.2. Model Performance Comparison Based on Loss Functions
In the evaluation of loss functions, box loss and objective loss are the primary metrics used to assess the performance of models in small target metal corrosion detection.
Figure 7 illustrates the trend of box loss for each model throughout training. All models demonstrate a similar pattern of box loss reduction over the epochs, with a rapid initial drop followed by gradual stabilization, particularly after 150 epochs, where the losses plateau. By the 200th epoch, YOLOv5-Shape-IoU, YOLOv5-Wise-IoU, and YOLOv5-SIoU converged around a box loss of 0.027, while YOLOv5 and YOLOv5-Focal-EIoU settled around 0.023. YOLOv5-NWD recorded the lowest box loss, converging at approximately 0.014. Thus, in terms of loss convergence, YOLOv5-NWD demonstrated the best performance. However, in terms of loss fluctuation, YOLOv5-Wise-IoU showed the most notable instability, with significant spikes around the 12th and 125th epochs, indicating inconsistency in performance on certain training samples.
Figure 8 illustrates the object loss trends for each model during training. As a general pattern, all models exhibit a decrease in object loss as epochs progress, gradually stabilizing, which indicates that the models are continuously learning and refining their prediction accuracy. Although the initial object loss values vary among the models, after about 25 epochs, all improved models demonstrate similar patterns in terms of trends, fluctuations, and convergence behavior. By the 200th epoch, the values have largely stabilized around 0.05, indicating that the models have learned to accurately identify corrosion targets and their locations, with comparable accuracy levels.
Looking at the trends in object loss, YOLOv5-NWD shows a consistent decrease in object loss as training progresses, with faster convergence and more stable improvement. However, models like YOLOv5-Shape-IoU, YOLOv5-Wise-IoU, YOLOv5-SIoU, and YOLOv5-Focal-EIoU experience greater fluctuations during the early stages, with a pattern of increasing loss before it declines, reflecting higher uncertainty and slower convergence. The baseline YOLOv5 model begins with a significantly lower initial object loss than the other models, leading to a much lower final convergence value.
4.1.3. Model Performance Comparison Based on FPS
Performance comparison based on the FPS metric shows that all improved models exhibit a decrease in FPS compared to the baseline model, as illustrated in
Figure 9. Notably, YOLOv5-Wise-IoU experienced the largest reduction at approximately 22.3%. This is followed by YOLOv5-Focal-EIoU, YOLOv5-Shape-IoU, YOLOv5-NWD, and YOLOv5-SIoU, with reductions of 17.2%, 13.1%, 11.4%, and 8.5%, respectively. This indicates that while the models show enhancements in accuracy and F1 score, these improvements come at the cost of processing speed.
4.1.4. Comprehensive Comparison of Model Performance
Considering the evaluation metrics, a comprehensive comparison of model performance was conducted using a radar chart, illustrated in
Figure 10. YOLOv5-NWD stands out with balanced results across various metrics; aside from recall, all other metrics are equal to or exceed those of the other improved models, boasting the highest F1-Score. This indicates that YOLOv5-NWD strikes a superior balance between precision and recall, making it the best performer for the small target corrosion image dataset in this study. Following closely is YOLOv5-Shape-IoU, which presents balanced results but scores lower than YOLOv5-NWD. YOLOv5-Wise-IoU ranks third, showing less consistency in its metrics, with significant improvements observed only in recall and confusion matrix outcomes compared to the baseline model. Both YOLOv5-Focal-EIoU and YOLOv5-SIoU demonstrate similar enhancement effects; although their metrics are relatively balanced, the overall improvements are limited, providing minimal advantages over the baseline model.
In terms of overall metric comparison, YOLOv5-NWD demonstrates the most significant improvement. While its processing speed decreased by 8.2% compared to the baseline model, all other metrics saw varying levels of enhancement, with precision showing the largest gain of around 7.2%. As a result, for the small target metal corrosion image dataset used in this study, the YOLOv5 model enhanced with the NWD loss function achieves the best performance.
4.2. YOLOv5-NWD Sensitivity to IoU Proportion
In the previous comparison, the YOLOv5-NWD model utilized a 4:6 weight ratio between IoU and NWD. To further refine the model and determine the most effective weight distribution, this study investigated various weight ratios for IoU and NWD, with the goal of improving the detection of corrosion features in the dataset. Using the performance results at the 200th epoch as the baseline, the influence of various IoU and NWD weight ratios on model performance was analyzed.
Figure 11 presents the evaluation results of YOLOv5-NWD across different weight ratios.
The performance of the YOLOv5-NWD model depends on the weight ratios assigned to IoU and NWD. Precision reaches its highest value of 0.794 at an IoU ratio of 4:6 but declines when the IoU ratio is adjusted either higher or lower, with the most significant drop occurring when the IoU ratio decreases. For recall, the peak value of 0.628 is observed with an IoU ratio of 6:4, while the lowest value of 0.600 is seen at a ratio of 4:6. Regarding the F1 score, the results remain largely consistent. The highest F1 score of 0.70 is recorded for ratios of 6:4 and 7:3, while the lowest of 0.68 is found at ratios of 3:7 and 4:6. The difference of only 0.02 between these values indicates that variations in IoU and NWD weights have minimal impact on the F1 score.
Table 2 shows the pre-process time, inference time, and non-maximum suppression (NMS) time for the YOLOv5-NWD model under different IoU ratios. The pre-process time remains nearly constant across all ratios, with a slight increase from 0.4 ms to 0.5 ms at the 7:3 ratio. This stability suggests that adjustments to the IoU ratio do not significantly impact the pre-processing stage, allowing for a consistent initial step prior to model inference.
In contrast, inference time and NMS time show slight variations based on the ratio. Inference time ranges from 12.0 ms at the 3:7 ratio to 13.5 ms at the 6:4 ratio, indicating that higher IoU ratios introduce a minor increase in inference time. This may be due to the model requiring more focus on precise bounding box calculations when there is a greater emphasis on IoU. Similarly, NMS time increases from 2.0 ms at the 3:7 ratio to 2.3 ms at the 6:4 and 7:3 ratios. This trend suggests that a stronger focus on IoU could require more intensive processing during the NMS stage, possibly due to an increased number of overlaps in bounding box predictions that need to be filtered.
The FPS trend under different IoU ratios is somewhat unpredictable. The highest FPS is observed at a 3:7 ratio, while the lowest appears at a 6:4 ratio. Generally, FPS tends to decrease as the IoU ratio increases, though this is not a strict pattern. Accuracy in detecting corrosion regions remains relatively stable across ratios, with a maximum probability of 0.71 at a 6:4 ratio and a minimum of 0.69 at 3:7, 4:6, and 5:5 ratios. This slight 0.02 difference indicates that adjusting the IoU and NWD weight ratios has minimal impact on the model’s precision in accurately detecting corrosion areas. If accuracy is prioritized, a 4:6 ratio offers the best balance for YOLOv5-NWD. However, for a more comprehensive performance that includes recall, F1 score, and confusion matrix metrics, a 6:4 ratio proves optimal. Meanwhile, the 3:7 ratio is the most effective for FPS and inference time, making it suitable for applications requiring high-speed processing. All IoU configurations achieve an FPS above 60 Hz, indicating that YOLOv5-NWD meets real-time processing requirements across different ratio settings, although applications needing faster processing may benefit from the 3:7 or 5:5 ratios.
4.3. YOLOv5-NWD Sensitivity to Different Dataset Sizes
The aforementioned research discovered that at an IoU to NWD ratio of 5:5, the YOLOv5-NWD model strikes an optimal balance between precision and recall and excels in both recall and F1 score metrics, which makes it ideal for environments with high demands for comprehensive performance. Therefore, to investigate how the performance of the YOLOv5-NWD model is affected by the size of the training dataset under this IoU ratio, this paper further examines the comprehensive performance of the YOLOv5-NWD model when trained on datasets consisting of 300, 600, 900, and 1266 images.
Figure 12 displays the detection accuracy of the YOLOv5-NWD model as epochs progress under various training dataset sizes. It is evident that the trend of model accuracy changing with epochs is quite similar across different training dataset sizes, and the final accuracies are also closely matched. Furthermore, the larger the training dataset size, the sooner the model tends to stabilize and the higher the accuracy, suggesting that the overall model accuracy generally increases with the enlargement of the training dataset size.
Figure 13 offers a statistical analysis of the precision, recall, and F1 scores of the YOLOv5-NWD model across various image dataset sizes. It is clear that with an image dataset of 1266 images, the model attains the highest F1 score, suggesting that the model’s precision and recall are well-balanced, leading to peak performance. Although the F1 score of the model trained on a 900-image dataset is marginally lower than that of the 600-image dataset, the difference of 0.002 is negligible. In general, as the size of the image dataset grows, the model’s F1 score also rises, and the balance between precision and recall improves, indicating better model performance.
Table 3 compiles the performance results of the YOLOv5-NWD model in terms of processing speed after training with datasets of different sizes. With a dataset of 300 images, the FPS is 37.0 Hz; with 600 images, it increases to 41.8 Hz, an increase of 4.8 Hz; with 900 images, it reaches 55.0 Hz, an increase of 13.2 Hz; and with 1266 images, it further increases to 66.2 Hz, an additional increase of 11.2 Hz. This indicates that the model’s FPS significantly improves with the expansion of the training image dataset. Moreover,
Table 3 also reveals that as the size of the training image dataset increases, there is a noticeable decrease in the model’s preprocessing time, inference time, and NMS, demonstrating that the model’s processing capabilities are enhanced with a larger training dataset.
In conclusion, the dimensions of the training image dataset significantly affect the performance of the YOLOv5-NWD model. With an increase in dataset size, there is a noticeable enhancement in the model’s precision, recall, and F1 score. For example, with a dataset of 1266 images, the model strikes a better balance between precision and recall, suggesting that the model has achieved superior performance in detecting corrosion features at this juncture.
4.4. YOLOv5-NWD Model Validation
To further validate the detection performance of the YOLOv5-NWD model (with an IoU:NWD ratio of 5:5) for corrosion features under various environmental conditions, especially its adaptability and robustness, two images of corroded steel structures were randomly selected from the internet corrosion images. Five image processing techniques were applied to simulate different environmental conditions, including bright, dark, rainy, windy, and snowy scenarios. Specifically, image contrast was adjusted to simulate varying lighting conditions, salt-and-pepper noise was added to mimic sandstorms and haze, Gaussian blur was applied to represent heavy rainfall, and a snow effect was added to simulate heavy snowfall.
Using these processed images, the YOLOv5-NWD model was employed to identify corrosion features under different environmental conditions, assessing its recognition performance in adverse weather scenarios.
Figure 14 illustrates the model’s detection results for corrosion features in the two images. The results indicate that, except in poorly lit environments, the YOLOv5-NWD model effectively identifies corrosion features across various conditions, with confidence scores for accurately identified features generally being high, reaching up to 0.97. However, when the corrosion color is similar to the background color (such as the coating color), the model’s accuracy declines, particularly in dark environments where identification accuracy is notably lower.
Therefore, while the YOLOv5-NWD model demonstrates good recognition of corrosion features under different lighting and weather conditions, its accuracy significantly drops in dimly lit settings. This study suggests that using a corrosion feature image dataset obtained through data augmentation techniques is both reasonable and feasible for training models to enhance generalization capabilities. After training on these datasets, the YOLOv5-NWD model shows good adaptability to target detection tasks in various environments and exhibits satisfactory performance in detecting corrosion features in real-world scenarios. However, the validation experiments also reveal that there is room for improvement in the model’s accuracy for detecting corrosion features in extremely harsh environments. Future research will focus on optimizing the model structure and enriching the dataset of corrosion images from real-world environments.
5. Discussion
5.1. Analysis of Model Performance Comparison Results
This study evaluated the performance of YOLOv5 along with five of its modified versions for detecting small metal corrosion targets. The results demonstrated that YOLOv5-NWD achieved the most significant overall performance improvement among the modified models. Except for FPS and recall, all evaluation metrics showed improvements. Significantly, the precision achieved 0.794, representing a 7.2% improvement over the original YOLOv5 model. The confusion matrix and F1-Score also increased by 5.4% and 2.2%, respectively, although the recall rate decreased by 1.7%. These results suggest that the NWD loss function is more effective in capturing the distribution characteristics of corrosion images, thereby improving the model’s sensitivity to small targets. Compared to the CIOU loss function, NWD outperforms in minimizing false positives for small target corrosion detection, resulting in improved precision. This difference can be attributed to distinct optimization objectives, robustness against background noise, and the handling of gradient information during training. The CIOU loss function focuses primarily on box overlaps, which is effective for large target detection but less so for small targets, where the overlap is minimal. In contrast, NWD takes into account the overall distribution, making it more suitable for small target detection.
Moreover, NWD’s consideration of the background distribution allows the model to achieve higher detection accuracy in complex environments, thus enhancing the F1 score. Importantly, the NWD loss function provides smoother gradient updates during training, which helps the model effectively adjust its weights and avoid getting stuck in local optima—particularly when handling small targets. However, the FPS of YOLOv5-NWD dropped by 11.4% compared to the standard YOLOv5 model, likely due to the increased computational complexity of the NWD loss function compared to CIOU. Additionally, using NWD may require more iterations for convergence, increasing both training time and inference delay, especially when dealing with small targets that demand more computational resources.
The YOLOv5-Shape-IoU model also demonstrated a notable improvement in overall performance. Compared to the traditional YOLOv5 model, this version showed enhancements across several key metrics: precision increased by 4.5%, recall improved by 0.9%, the F1 score rose by 2.6%, and the probability of correctly identifying positive corrosion samples in the confusion matrix grew by 5.4%. Although the FPS decreased by approximately 12.66%, this change still reflects the improved boundary box regression, classification, and localization capabilities brought by the Shape-IoU loss function. Additionally, the higher F1 score suggests that this model is more effective at balancing precision and recall. The performance gains can be attributed to the intrinsic properties of the Shape-IoU loss function, its adaptability to small targets, and parameter adjustments during the training process. For example, Shape-IoU incorporates shape information when calculating the overlap of target boxes, which makes the model more sensitive when detecting small targets. This focus on boundary and shape information contributes to improvements in both recall and precision. In contrast, CIOU, being a more general loss function, takes into account both the overlap and the spatial relationship between the target boxes, but it may not perform as effectively with small targets. Furthermore, Shape-IoU is likely more successful at minimizing misclassifications, which leads to better accuracy in the confusion matrix. It is important to note that the decrease in FPS suggests an increase in computational complexity when using the Shape-IoU loss function, especially when the dataset contains many small targets. This may impact the inference speed, as a lower FPS implies that real-time applications might require more computational resources or necessitate compromises in processing speed. In summary, while the YOLOv5-Shape-IoU model has achieved significant performance improvements, trade-offs between speed and accuracy must be carefully considered in practical applications.
Considering all evaluation metrics, it is evident that compared to YOLOv5-NWD and YOLOv5-Shape-IoU, the three improved models—YOLOv5-Focal-EIoU, YOLOv5-SIoU, and YOLOv5-Wise-IoU—showed only minor improvements across key indicators. Specifically, the YOLOv5-Wise-IoU model demonstrated limited enhancement; compared to the CIOU loss function, the introduction of the Wise-IoU loss function resulted in only a 0.2% increase in precision, a 0.7% improvement in the F1 score, and a 1.5% improvement in the probability of correctly identifying positive corrosion samples in the confusion matrix. However, it also experienced the largest decrease in FPS, dropping by 22.25%. This suggests that while the Wise-IoU loss function improved the model’s performance to some extent, the gains were minimal. This limited improvement might be attributed to the characteristics of the Wise-IoU loss function not fully leveraging its potential for detecting small target corrosion images. The weak features of small corrosion targets make them susceptible to background interference, hindering performance gains. Additionally, shortcomings in parameter settings and data augmentation strategies during model training could have also played a role. The increased computational complexity introduced by Wise-IoU led to a reduction in inference speed, and limitations in the quantity and quality of small target samples in the dataset may have further undermined the potential for model improvement.
In summary, based on the results comparison and analysis, the NWD loss function provides the most comprehensive improvement in the detection performance of small target corrosion images for the traditional YOLOv5 model among the IoU-based improved loss functions. The Shape-IoU loss function ranks second, while the other three loss functions show relatively small and comparable overall improvements. In terms of precision and FPS, the Wise-IoU loss function showed the weakest performance. Regarding recall, F1 score, and accuracy, the SIoU loss function performed the poorest.
5.2. Discussion on Sensitivity Analysis of IoU Proportion in YOLOv5-NWD
This study explored how different IoU and NWD weight ratios affect the performance of the YOLOv5-NWD model. By comparing five configurations with IoU ratios of 3:7, 4:6, 5:5, 6:4, and 7:3, the model’s performance was evaluated based on metrics such as precision, recall, F1 score, FPS, and confusion matrix results. The findings show that the confusion matrix and F1 score are relatively less affected by the proportion of IoU in the loss function, with maximum variation rates of only 2.90% and 2.94%, respectively. However, precision, recall, and FPS are significantly influenced by changes in the IoU ratio, with maximum variations of 4.61%, 4.67%, and 12.64%, respectively. Specifically, the highest precision was achieved when the IoU ratio was 4:6, while the lowest occurred at 3:7. For recall, the maximum value appeared at a ratio of 6:4, and the minimum at 4:6. FPS showed a different trend, with the highest value at 3:7 and the lowest at 6:4. Although there does not seem to be a clear pattern between the metric changes and the IoU ratio, overall, the precision, recall, F1 score, and confusion matrix results for YOLOv5-NWD generally showed an upward trend with increasing IoU proportion, with fluctuations at certain ratio values. Conversely, FPS tended to decrease as the NWD proportion decreased, with some minor fluctuations at certain ratios. This indicates that incorporating IoU in the model’s loss function improves the overall performance in terms of precision, recall, and F1 score, whereas incorporating NWD contributes to enhancing the processing speed of the model.
Overall, when the IoU ratio is 5:5, the performance evaluation metrics show a balanced outcome, indicating that the YOLOv5-NWD model achieves good overall performance at this ratio. Therefore, in corrosion image recognition tasks, it is crucial to strike an appropriate balance between IoU and NWD in the loss function. Generally, a ratio of 5:5 is a reasonable choice, and deviations from this balance should be minimized. From this perspective, both 4:6 and 6:4 ratios are also relatively good alternatives. In the dataset used in this study, the ratio of 4:6 yielded the highest precision, while the ratio of 6:4 resulted in the best recall, F1 score, and confusion matrix results. In corrosion image detection applications, the choice of the IoU proportion in the loss function should be made based on specific application requirements.
Based on the performance analysis and discussion, the strengths and limitations of each improved model for different detailed industrial applications are suggested as follows: YOLOv5-NWD is suitable for high-precision small-target corrosion detection tasks, such as in marine infrastructure. While its inference speed is slower, it performs with higher stability and accuracy in complex environments. YOLOv5-Shape-IoU is ideal for high-precision tasks like metal pipeline and bridge inspections, but its lower FPS limits its use in real-time monitoring systems. YOLOv5-Wise-IoU contributes to accuracy improvements but experiences a significant drop in FPS, making it suitable for environments with less stringent real-time requirements, such as storage facilities. YOLOv5-Focal-EIoU and YOLOv5-SIoU are well-suited for corrosion detection in complex environments, balancing precision and recall, but still require optimization for fast inference in real-time detection applications. Overall, the choice of model depends on the specific trade-offs between real-time performance, accuracy, and computational resources required for the application.
5.3. Discussion on Sensitivity Analysis of Different Dataset Sizes in YOLOv5-NWD
This research assesses how different training image dataset sizes impact the performance of the YOLOv5-NWD model. The results show that as the training image dataset expands, the model’s precision stabilizes more quickly. The main reason is that larger training datasets offer more samples, allowing the model to learn a broader range of features and patterns, thus improving its generalization capabilities. This enhancement in generalization helps the model learn and stabilize its precision more rapidly. Moreover, larger training datasets enable more effective hyperparameter optimization and data augmentation strategies, which in turn speed up the stabilization of precision.
Additionally, as the size of the training image dataset increases, the model more easily strikes a balance between precision and recall, leading to better overall performance. This is likely due to the dataset’s diversity, which encompasses more scenarios and conditions, enabling the model to maintain good performance under various circumstances. Large-scale datasets also allow for more nuanced adjustments to post-processing strategies, such as Non-Maximum Suppression (NMS), and assist the model in learning the settings of anchor boxes more effectively. This contributes to improving the model’s precision and recall and achieving a better balance between them, enhancing the model’s overall performance.
In terms of processing speed, as the dataset size grows, the model’s FPS, inference time, and NMS time have all significantly improved. This indicates that training with larger datasets not only enhances the model’s recognition performance for corrosion features but also optimizes computational efficiency while maintaining high precision, allowing the model to adapt to more real-time application scenarios.
Overall, the size of the dataset significantly affects the YOLOv5-NWD model’s recognition performance in corrosion feature detection tasks. Increasing the dataset size can effectively enhance the model’s object detection performance and processing speed, thereby improving its performance under complex corrosion samples and background noise conditions. Therefore, it is recommended, whenever possible, to acquire more training images to form a larger-scale training dataset to train the recommended YOLOv5-NWD model, thereby achieving better detection performance for small corrosion features and faster model processing speed.
It is important to note that while expanding the dataset can improve model performance, it may also increase computational costs and training time, sometimes leading to hardware resource limitations and latency issues in practical applications. To mitigate these issues, techniques such as data sampling, incremental learning, and model compression can be employed to reduce computational burdens and enhance training efficiency. Additionally, large datasets may lead to overfitting, especially when sample diversity is lacking. To address this, data augmentation, regularization techniques, and cross-domain data transfer can be used to strengthen the model’s generalization capabilities and prevent overfitting to specific samples. Furthermore, the model’s adaptability in different environments remains a challenge, particularly when dealing with corrosion samples from varying climatic or environmental conditions. By simulating environmental conditions and integrating multimodal data, the model’s robustness and adaptability can be effectively enhanced. Lastly, dataset imbalances may cause the model to favor majority class corrosion samples, affecting the detection accuracy of minority class samples. To tackle this challenge, strategies such as oversampling, undersampling, and weighted loss functions can be used to address class imbalances and improve the model’s detection capabilities for rare samples. Therefore, future enhancements to model performance and practicality can be achieved by optimizing training methods, increasing data diversity, improving environmental adaptability, and addressing dataset imbalances.
6. Conclusions
To explore the impact of IoU-based loss functions on the performance of YOLOv5 for small target corrosion image detection, this study proposed modifications to the IoU loss function and developed five improved YOLOv5 models: YOLOv5-NWD, YOLOv5-Shape-IoU, YOLOv5-Wise-IoU, YOLOv5-Focal-EIoU, and YOLOv5-SIoU. Using the metal corrosion dataset from the Zhoushan Seawater Station of China National Materials Corrosion and Protection Science Data Center for training, the performance of the improved models and the traditional YOLOv5 was evaluated based on precision, recall, F1 score, and FPS. The results showed that YOLOv5-NWD achieved the best overall performance among the modified models, with a precision increase of 7.2%, an F1 score increase of 2.2%, and a 5.4% improvement in recognizing positive corrosion samples in the confusion matrix. YOLOv5-Shape-IoU followed closely, with a 4.5% increase in precision, a 2.6% increase in the F1 score, and a 5.4% improvement in positive sample recognition in the confusion matrix. In contrast, the performance gains for YOLOv5-Focal-EIoU, YOLOv5-SIoU, and YOLOv5-Wise-IoU were relatively small. Particularly, YOLOv5-Wise-IoU showed only a 0.2% improvement in precision, a 0.7% increase in the F1 score, and a 1.5% improvement in recognizing positive samples, while FPS decreased by approximately 22.25%. These findings suggest that, among the various IoU-based loss functions, the NWD loss function provides the most significant improvement in the performance of YOLOv5 for small target corrosion image detection, followed by the Shape-IoU loss function. Meanwhile, Focal-EIoU, Wise-IoU, and SIoU offered comparatively limited performance gains.
To further analyze the impact of different IoU and NWD weight ratios on the performance of the YOLOv5-NWD model, this study compared five configurations with IoU ratios of 3:7, 4:6, 5:5, 6:4, and 7:3, evaluating their performance based on precision, recall, F1 score, FPS, and confusion matrix metrics. The findings indicated that the accuracy of positive sample recognition in the confusion matrix and the F1 score of the YOLOv5-NWD model were less sensitive to the proportion of IoU in the loss function. However, precision, recall, and FPS were notably affected by this proportion. Overall, a balanced performance across all evaluation metrics was observed when the IoU ratio was 5:5, suggesting that the YOLOv5-NWD model exhibited good overall performance at this ratio. In corrosion image recognition tasks, finding an optimal balance between IoU and NWD in the loss function is crucial. For the dataset used in this study, a 4:6 ratio yielded the highest precision, while a 6:4 ratio produced the best recall, F1 score, and confusion matrix results. Therefore, in corrosion image detection applications, selecting an appropriate IoU ratio in the loss function should be based on the specific requirements of the application. Moreover, this study explores the effects of different dataset sizes on the performance of the YOLOv5-NWD (5:5) model. The size of the dataset has a significant impact on the model’s precision, recall, F1 score, and inference speed. With an increased dataset size, the model can reach a stable and higher precision more quickly. At the same time, the model’s F1 score increases, and there is a better balance between precision and recall, resulting in improved overall model performance. Furthermore, enlarging the dataset size can also effectively enhance the model’s inference speed.
The results demonstrate that the proposed approach accurately identifies corrosion features, improving both detection precision and efficiency, especially in challenging environmental conditions such as lighting variations, shadows, and occlusions. These advancements not only boost YOLOv5’s performance in detecting small corrosion targets but also offer valuable theoretical insights for loss function design, contributing to the progress of related research areas. The findings provide practical support for corrosion monitoring and protection, promoting more scientific and precise management of material safety. However, the inherent irregularity and instability of corrosion features present ongoing challenges, requiring further model enhancements to improve performance in complex settings.
Nevertheless, there are some potential issues in the research. For instance, due to the insufficient quality and quantity of the original corrosion images, the dataset created has certain limitations, which may affect the model’s performance in corrosion image recognition tasks and its generalization ability, thus reducing the applicability and scalability of the research findings. Additionally, the loss functions employed in the study are not comprehensive, as there is insufficient consideration of other IoU loss functions and non-IoU loss functions. Furthermore, the relevant ablation experiments are not thorough enough, which may affect the generalizability of the model’s results. In addition, the study only considers improvements to the performance of the YOLOv5 model from the perspective of IoU loss functions, without taking into account the impact of model depth, model architecture, and other factors on the model’s performance.
Given the current limitations, future research can expand and optimize the study in several areas. First, to improve the representativeness and credibility of the research findings, there is a need to further increase the sample size of corrosion images, covering more types of corrosion patterns and different environmental factors. Additionally, conducting comparative studies with data from different regions and testing environments will enhance the universality of the research, particularly in terms of adaptability under different climate conditions and metal materials. Optimizing model structure and adjusting hyperparameters are also key to improving model performance, especially in complex environments. In addition, the generalizability of the research results can be enhanced from multiple perspectives. For example, exploring the effects of different dataset sizes and model depths on the improved model’s performance in detecting small object corrosion features, while also incorporating a more comprehensive range of loss function types for ablation experiments, could improve the generalization ability of the model proposed in this study. In coastal environments, corrosion features on metal structures often exhibit significant irregularity and instability, such as variations in lighting, shadows, and object occlusion, which pose challenges to the model’s accuracy. Therefore, improving image recognition technology for precise and rapid detection of small target corrosion features in complex environments is an urgent need. Future research can also incorporate multimodal information, such as thermography and spectral analysis, to enhance the accuracy and robustness of image recognition.