Open AccessArticle

Enhancing YOLOv5 Performance for Small-Scale Corrosion Detection in Coastal Environments Using IoU-Based Loss Functions

Qifeng Yu

Yudong Han

Yi Han

Xinjia Gao

and

Lingyu Zheng

College of Transport and Communications, Shanghai Maritime University, Shanghai 201306, China

Authors to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(12), 2295; https://doi.org/10.3390/jmse12122295

Submission received: 2 November 2024 / Revised: 30 November 2024 / Accepted: 10 December 2024 / Published: 13 December 2024

(This article belongs to the Special Issue Monitoring and Evaluation of Marine Engineering Equipment and Structures)

Download

Browse Figures

Figure 1
Framework of the study. "> Figure 2
Dataset annotation flow chart. "> Figure 3
Distribution of corrosion area ranges. "> Figure 4
Precision varies with epoch for different models. "> Figure 5
Performance evaluation of various YOLOv5 models. "> Figure 6
Confusion matrix outcomes for various models. "> Figure 7
Box loss variation over epochs for different models. "> Figure 8
Objective loss variation over epochs for different models. "> Figure 9
FPS of different models. "> Figure 10
Comparative analysis of model performance. "> Figure 11
Model performance comparison for IoU ratios. "> Figure 12
Model Performance Comparison for Different Dataset Sizes. "> Figure 13
Comparison of YOLOv5-NWD Performance Across Different Dataset Sizes. "> Figure 14
Corrosion monitoring diagrams in different harsh environments. ">

Versions Notes

Abstract

The high salinity, humidity, and oxygen-rich environments of coastal marine areas pose serious corrosion risks to metal structures, particularly in equipment such as ships, offshore platforms, and port facilities. With the development of artificial intelligence technologies, image recognition-based intelligent detection methods have provided effective support for corrosion monitoring in marine engineering structures. This study aims to explore the performance improvements of different modified YOLOv5 models in small-object corrosion detection tasks, focusing on five IoU-based improved loss functions and their optimization effects on the YOLOv5 model. First, the study utilizes corrosion testing data from the Zhoushan seawater station of the China National Materials Corrosion and Protection Science Data Center to construct a corrosion image dataset containing 1266 labeled images. Then, based on the improved IoU loss functions, five YOLOv5 models were constructed: YOLOv5-NWD, YOLOv5-Shape-IoU, YOLOv5-WIoU, YOLOv5-Focal-EIoU, and YOLOv5-SIoU. These models, along with the traditional YOLOv5 model, were trained using the dataset, and their performance was evaluated using metrics such as precision, recall, F1 score, and FPS. The results showed that YOLOv5-NWD performed the best across all metrics, with a 7.2% increase in precision and a 2.2% increase in F1 score. The YOLOv5-Shape-IoU model followed, with improvements of 4.5% in precision and 2.6% in F1 score. In contrast, the performance improvements of YOLOv5-Focal-EIoU, YOLOv5-SIoU, and YOLOv5-WIoU were more limited. Further analysis revealed that different IoU ratios significantly affected the performance of the YOLOv5-NWD model. Experiments showed that the 4:6 ratio yielded the highest precision, while the 6:4 ratio performed the best in terms of recall, F1 score, and confusion matrix results. In addition, this study conducted an assessment using four datasets of different sizes: 300, 600, 900, and 1266 images. The results indicate that increasing the size of the training dataset enables the model to find a better balance between precision and recall, that is, a higher F1 score, while also effectively improving the model’s processing speed. Therefore, the choice of an appropriate IoU ratio should be based on specific application needs to optimize model performance. This study provides theoretical support for small-object corrosion detection tasks, advances the development of loss function design, and enhances the detection accuracy and reliability of YOLOv5 in practical applications.

Keywords:

corrosion monitoring; YOLOv5; loss function; performance evaluation; marine engineering

1. Introduction

In marine engineering, metal corrosion is a severe issue, particularly in environments with high salinity, high humidity, and high oxygen levels, where structures such as ships, offshore platforms, port facilities, and subsea pipelines are often affected by electrochemical corrosion [1,2]. Due to the harsh conditions of the marine environment, traditional corrosion detection methods, such as visual inspection, electrochemical measurements, and ultrasonic testing, face challenges such as low detection accuracy, high maintenance costs, and difficulties in data analysis [3]. Therefore, the development of efficient, automated corrosion detection technologies has become an important solution to this problem. In recent years, artificial intelligence (AI)-based image recognition technologies have achieved significant success in intelligent corrosion detection, where AI technologies have been widely applied to the identification and analysis of surface corrosion on structures [4]. This technology can quickly and accurately detect corrosion phenomena, providing strong support for the maintenance and management of marine engineering projects.

Among various AI technologies, deep learning-based object detection algorithms, particularly the YOLO (You Only Look Once) series of object detection algorithms, have demonstrated significant application potential in the field of object detection due to their fast detection speed and high accuracy [5]. It was noted that YOLOv5 has garnered attention for its exceptional speed and accuracy, achieving positive results in the corrosion detection of marine engineering structures [6]. However, marine metal corrosion is characterized by complexity and irregularity, especially in early metal structures where frequently occurring small-scale corrosion features pose challenges for detection [7]. Although YOLOv5 has made progress in detecting small objects, there is still room for improvement in their performance when dealing with complex backgrounds and marine corrosion features. This necessitates further optimization of the YOLOv5 model’s detection performance to better meet the complex demands of marine corrosion detection.

The traditional YOLOv5 model uses Intersection over Union (IoU) as its loss function, but this loss function often leads to slow convergence and low accuracy when handling small object detection tasks. As a result, improving the loss function has become an important direction for enhancing the performance of the YOLOv5 model [8]. In recent years, researchers have proposed various improved loss functions, such as Generalized Intersection over Union (GIoU), Distance Intersection over Union (DIoU), and Complete Intersection over Union (CIoU). These new loss functions not only accelerate the model’s convergence speed but also significantly improve the accuracy of small object detection [9]. However, despite the good results achieved by these methods in general object detection tasks, the complex backgrounds and irregular features of marine metal corrosion still pose significant challenges to existing algorithms. Therefore, further optimizing the loss function to enhance the performance of YOLOv5 in detecting small and numerous corrosion features remains of great practical value and has substantial room for improvement.

To enhance the accuracy and robustness of the YOLOv5 model in detecting small target corrosion features, this paper focuses on enhancing the model’s loss function, specifically examining several typical IoU-based loss functions. Through a comparative analysis of how these IoU-based loss functions affect the performance of the YOLOv5 model in small target corrosion detection tasks, the goal is to propose a new YOLOv5 model that incorporates an improved IoU loss function, thereby optimizing its effectiveness in marine metal corrosion detection. This approach aims to provide a more efficient method for identifying corrosion damage, which will aid in the maintenance and management of marine engineering structures.

To clearly articulate the purpose and fundamental approach of this research, the subsequent sections of the article first present a detailed literature review of the YOLOv5 object detection algorithm, focusing on the evolution of the YOLOv5 algorithm, methods for optimizing the loss function, and its specific applications in marine metal corrosion detection. Additionally, the unique advantages and challenges of YOLOv5 in small target detection are analyzed, along with a discussion on the impact of different loss functions on the performance of the YOLOv5 model. These analyses will provide a solid theoretical foundation for the subsequent optimization of YOLOv5’s effectiveness in corrosion detection tasks.

2. Literature Review

The YOLO algorithm employs an end-to-end training approach, directly detecting objects in raw images, eliminating the need for complex preprocessing and postprocessing steps, simplifying system design, and improving detection efficiency [10]. In 2017, Redmon et al. proposed YOLOv2 (YOLO9000), which combined anchor boxes with a deeper feature extraction network, further enhancing detection accuracy [11]. In 2018, YOLOv3 was introduced, using multi-scale feature maps and an improved network structure, which strengthened its ability to detect objects in complex backgrounds and at different scales [12]. In 2020, Alexey Bochkovskiy et al. proposed YOLOv4, which used CSPDarknet53 as the backbone network, incorporated Mosaic data augmentation, the DropBlock regularization method, and the CIoU loss function, and introduced the Spatial Attention Module (SAM), greatly enhancing the model’s feature extraction efficiency and accuracy [13]. In June 2020, the Ultralytics team, led by Glenn Jocher, released the YOLOv5 model, which was the first version to be implemented using PyTorch (1.6.0), making the model easier to deploy and integrate. The model offers different scales (YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x) to suit varying computational capabilities and real-time demands. It improved accuracy while maintaining high speed, especially in small object detection [14]. Since then, the YOLO series has continuously innovated, with multiple versions released from YOLOv5 to YOLOv10, progressively improving model performance. YOLOv5, due to its outstanding performance, remains one of the key models for researchers to study and apply. Many scholars have optimized the traditional YOLOv5 model by designing lightweight versions through optimized training strategies and model architecture, incorporating more efficient feature extraction and multi-scale feature fusion techniques [15]. Notably, substantial improvements have been made in small object detection tasks, including modifications to the backbone [16,17,18], the introduction of attention mechanisms [19,20], improvements to the neck network [21,22,23,24], and advancements in loss functions [25].

In the field of computer vision, the choice and optimization of the loss function are often overlooked when it comes to object detection performance. Therefore, designing an appropriate loss function is crucial for the detection of object features [26,27]. In object detection tasks, the key is to accurately predict the boundary boxes of the target objects. To assess and optimize the prediction of boundary boxes, various loss functions have been proposed to minimize the difference between the predicted and ground truth boxes. Among them, Intersection over Union (IoU) is widely adopted due to its intuitiveness and effectiveness. However, traditional IoU-based loss functions suffer from slow convergence and inaccurate predictions [28]. As a result, many researchers have made improvements to the IoU loss function to propose more efficient alternatives. The IoU loss function measures the accuracy of predictions by calculating the ratio of the intersection to the union of the predicted and ground truth boxes. While IoU loss theoretically reflects the quality of the predicted boxes well, in practice, it often requires more iterations to converge, and when there is no overlap between the predicted and ground truth boxes, the gradient vanishing issue can lead to inefficient learning [29]. To address the gradient vanishing problem in cases of no overlap between the predicted and ground truth boxes, researchers proposed the Generalized IoU (GIoU) loss function. GIoU adds a penalty term to IoU to ensure that effective gradient information is provided even in non-overlapping cases. GIoU loss improves convergence speed in non-overlapping scenarios by minimizing the difference in the closed area between the predicted and ground truth boxes [30]. Although GIoU loss improves convergence speed to some extent, it still depends on IoU changes, which leads to slower convergence in certain situations (e.g., when the predicted and ground truth boxes are vertically or horizontally aligned). To further accelerate convergence, researchers introduced the Distance IoU (DIoU) loss function. DIoU loss directly optimizes the distance between the center points of the boxes, significantly improving training convergence speed [31]. To further improve the accuracy of boundary box regression, researchers proposed the Complete IoU (CIoU) loss function. CIoU loss not only considers the overlapping area and center point distance but also introduces aspect ratio consistency as an optimization objective, further enhancing both the accuracy and convergence speed of object detection [32]. Notably, the improved IoU loss functions (such as GIoU) were first introduced in YOLOv4 to improve the localization ability of the boundary boxes. YOLOv5, on the other hand, uses CIoU (Complete Intersection over Union) as the primary loss function for boundary box regression.

In addition, some researchers have proposed various improved loss functions, such as adaptive loss functions, weighted loss functions, and multi-task learning loss functions, to enhance the detection performance and generalization ability of the traditional YOLOv5 model for small object features. The adaptive loss function adjusts its weights or form dynamically based on the characteristics of the data or the performance of the model, improving the model’s robustness and generalization ability, especially when dealing with imbalanced data, outliers, or other complex situations, demonstrating better detection performance. Typical adaptive loss functions include Focal Loss [33], Adaptive Loss [34], Class Balanced Loss [35], Weighted Loss [36], and Label Smoothing [37]. The introduction of weighted loss functions has improved the shortcomings of the traditional YOLOv5 model in this regard to some extent. The basic idea of the weighted loss function is to assign different weights to different samples or classes when calculating the loss. This way, the weighted loss function can better handle class imbalances, noisy data, or the importance of specific samples, thus improving the model’s performance and generalization ability. Typical weighted loss functions include Weighted Cross-Entropy Loss [38], Weighted Mean Squared Error (W-MSE) [39], Class Balanced Loss [40], and Focal Loss, among others. The multi-task learning loss function is designed to simultaneously optimize multiple related tasks and combine the losses of different tasks through weighted summation. By leveraging the correlations between tasks, multi-task learning can improve the model’s learning efficiency and generalization ability. Typical multi-task learning loss functions include Weighted Average Loss [41], Shared Feature Loss [42], Joint Training Loss [43], Gradient Adjustment Loss [44], and Cross-task Loss [45], among others.

Although considerable research has focused on optimizing the YOLOv5 model for small object detection, in the task of detecting marine metal corrosion, corrosion targets often exhibit irregular shapes and complex backgrounds [46], which still present challenges for the existing YOLOv5 model in detecting corrosion features. Therefore, exploring and comparing the effects of different loss functions on the YOLOv5 model’s performance, specifically for marine metal corrosion image datasets, becomes the primary objective of this study. This research will compare several typical loss functions, including Normalized Wasserstein Distance (NWD), Shape Intersection over Union (Shape-IoU), Weighted Intersection over Union (WIoU), Focal Expanded Intersection over Union (Focal-EIoU), Soft Intersection over Union (SIoU), and CIoU, evaluating their performance differences in detecting small corrosion features. Through this comparative analysis, the paper aims to identify the most suitable loss function for small object corrosion detection and propose an optimized YOLOv5 model to further improve detection accuracy and efficiency in marine metal corrosion detection.

3. Materials and Methods

This research leverages corrosion data from the China National Center for Materials Corrosion and Protection Science, specifically the Zhoushan seawater station test data, and applies advanced data processing techniques to build a corrosion-labeled image dataset for various metal protective coatings. Five enhanced models are proposed, all based on the traditional YOLOv5 framework and commonly used loss functions. These include YOLOv5-NWD, YOLOv5-Shape-IoU, YOLOv5-WIoU, YOLOv5-Focal-EIoU, and YOLOv5-SIoU. Using the corrosion-labeled image dataset, these models are trained and validated, with the original YOLOv5 model serving as a benchmark. A thorough comparative analysis is carried out using standard performance metrics to evaluate the impact of the different loss functions on the YOLOv5 model’s performance. The study’s framework is depicted in Figure 1.

3.1. Data

3.1.1. Dataset Acquisition

The dataset utilized in this study originates from open-source corrosion testing images published by the China National Center for Materials Corrosion and Protection Science. It provides comprehensive details on various testing environments for metal corrosion coatings, as well as the duration and conditions at different marine port stations. It provides comprehensive details on various testing environments for metal corrosion coatings, as well as the duration and conditions at different marine port stations. For this study, the seawater immersion test image data from the Zhoushan seawater station was selected as the primary dataset due to its completeness and clarity. The Zhoushan station is located in Luotou, Dinghai District, Zhoushan City, Zhejiang Province, at coordinates 122°06′ E and 30°00′ N. This area is characterized by low salinity, high turbidity, significant sediment presence, and minimal marine biofouling. The test specimens, made of standard Q235 carbon steel (100 mm × 50 mm × 3 mm), were immersed in seawater from the East China Sea. Seven coating types were tested: powder epoxy coating, epoxy coating, chlorinated rubber, fluorocarbon coating, Wuxi anti-fouling coating, fusion-bonded epoxy coating, and epoxy zinc-rich coating system. The tests were conducted over durations of 24, 60, and 96 months. From this data, 125 valid images were collected as the initial dataset. Table 1 provides details on the coating types and sample specimens used in the tests.

3.1.2. Data Augmentation

Given the limited availability of corrosion data in this study, 11 data augmentation techniques were applied to simulate real corrosion monitoring under various harsh weather conditions, expanding the dataset significantly. These augmentations include brightness adjustment to simulate varying light intensities, salt-and-pepper noise to mimic sandstorm effects, Gaussian noise to replicate foggy conditions, and scaling to represent different monitoring distances, generating a total of 1500 images. However, certain augmentation techniques led to partial loss of corrosion features on metal surfaces with fewer distinguishing characteristics, which increased computational load without impacting the experimental results. As a result, further filtering and removal of these flawed images were required. Ultimately, 1266 valid images were retained for model training, validation, and testing, comprising 125 original images and 1141 augmented ones.

3.1.3. Dataset Annotation

To generate the necessary label files for deep learning, this study used Labelme (v5.4.1) for manual annotation of the initial dataset, consisting of 125 images. A deep learning model was then trained on this annotated dataset, producing a set of weight files. These weights were subsequently applied in Python (3.8.18) to automatically label the expanded dataset of 1141 images. Finally, experts conducted manual adjustments to further refine the annotations, resulting in a final label file used for model training. Figure 2 depicts the dataset annotation process. The green rectangular boxes in Figure 2 are used to mark corrosion features, with each box containing four key corner points (green dots) that indicate the coordinates of the corrosion features in the image.

3.2. Statistical Analysis of Corrosion Area

A preliminary analysis of the corrosion image dataset revealed a high prevalence of blister corrosion, with many of the blisters being relatively small. In this study, the corroded areas of each test specimen were annotated, and their sizes estimated, as shown in Figure 3. The analysis indicates that 4145 corrosion spots have an area of less than 5 mm², comprising 73.3% of the total. Of these, 999 spots fall within the 0 to 1.0 mm² range, representing 17.7% of the total. Additionally, 1340 spots have an area between 1.0 and 2.0 mm², accounting for 23.7%, and 1806 spots fall between 2.0 and 5.0 mm², making up 32.0% of the total. This suggests that the dataset is dominated by small corrosion features, which poses greater challenges for the object detection model.

The IoU loss function quantifies prediction accuracy by calculating the ratio of the intersection to the union of the predicted and ground-truth bounding boxes. This function helps improve the model’s precision in detecting small corrosion targets and enhances its robustness to noise. Given the characteristics of the corrosion image dataset, this study focuses on evaluating the performance of the YOLOv5 model, enhanced with IoU-based loss functions, in detecting small corrosion features.

3.3. Loss Functions and Models

3.3.1. CIoU

The YOLOv5 model typically uses the CIoU loss function, which improves upon the DIoU loss function. In addition to accounting for the overlap between the predicted and ground-truth bounding boxes, CIoU also considers the distance between their center points and the alignment of their aspect ratios. The formula for calculating the CIoU loss function is presented in Equation (1).

L_{C I o U} = 1 - I o U + \frac{ρ^{2} (B, B^{g t})}{c_{1}^{2}} + β v

(1)

where,

L_{C I o U}

denotes the loss function of CIoU, IoU refers to the ratio between the overlapping area of the predicted box and the ground truth box relative to the total area,

ρ

denotes the Euclidean distance,

B

is the predicted bounding box,

B^{g t}

stands for the actual bounding box, and

c_{1}

represents the diagonal distance of the smallest enclosing box that covers both the predicted and actual bounding boxes.

β

serves as an adjustment factor and

v

measures the consistency of the aspect ratio.

In object detection, especially with the YOLOv4 model, the CIoU loss function has proven effective in enhancing the accuracy of small target detection, particularly in complex environments. This function refines the predicted bounding box by considering multiple factors, such as the position, shape, and size of the target. Although it introduces greater computational complexity compared to the traditional IoU loss function, the significant improvements in detection accuracy justify this added complexity.

3.3.2. NWD

Although the CIoU loss function is effective in object recognition, it encounters difficulties when detecting small corrosion features. The small size of these features results in a reduced IoU, where even slight localization errors can significantly affect detection accuracy. Furthermore, CIoU may not fully capture the distinctive properties of small targets, leading to less accurate results. To address these challenges, the NWD (Normalized Wasserstein Distance) loss function was proposed [47]. NWD treats bounding boxes as Gaussian distributions and utilizes Wasserstein distance to improve detection of small corrosion targets. By accounting for both overlap and spatial distribution, NWD enhances localization precision, particularly for subtle corrosion features that traditional IoU-based methods may miss. The fundamental formula for the NWD loss function is presented in Equation (2) below [48]:

N W D (Q_{a}, Q_{b}) = e x p (- \frac{\sqrt{W_{2}^{2} (Q_{a}, Q_{b})}}{c_{2}})

(2)

where,

Q_{a}

and

Q_{b}

denote the Gaussian distribution parameters for the two bounding boxes. The formula uses

W_{2}^{2} (Q_{a}, Q_{b})

, which represents the squared Wasserstein distance between these two distributions. This metric is employed to quantify the difference between the predicted and ground-truth bounding boxes in a more robust manner than traditional methods.

c_{2}

serves as a normalization constant. The calculation of the squared Wasserstein distance is provided in Equation (3).

W_{2}^{2} (Q_{a}, Q_{b}) = {‖({[c_{x_{a}}, c_{y_{a}}, \frac{w_{a}}{2}, \frac{h_{a}}{2}]}^{T}, {[c_{x_{b}}, c_{y_{b}}, \frac{w_{b}}{2}, \frac{h_{b}}{2}]}^{T})‖}_{2}^{2}

(3)

where,

c_{x_{a}}

c_{y_{a}}

c_{x_{b}}

, and

c_{y_{b}}

represent the center coordinates of the two bounding boxes. Meanwhile,

w_{a}

h_{a}

w_{b}

, and

h_{b}

denote the widths and heights of the respective bounding boxes.

The NWD loss function not only refines the process of quantifying differences between bounding boxes but also effectively captures the subtle features of small targets, thereby enhancing detection accuracy for such objects. Compared to the traditional CIoU loss function, NWD improves the model’s robustness. Consequently, the NWD loss function boosts the model’s effectiveness and precision, particularly in complex settings. It provides a robust solution for identifying small targets that may be obscured or overlap with others. This enhances the model’s adaptability and robustness when dealing with such challenges. The final NWD loss function is expressed in Equation (4).

L_{N W D} = \frac{1}{2 N} \sum_{i = 1}^{N} (1 - N W D_{i}) + \frac{1}{2 N} \sum_{i = 1}^{N} (1 - I o U_{i})

(4)

where,

L_{N W D}

denotes the loss function of NWD,

N

denotes the number of detection frames, and averaging is used to aggregate the losses from multiple targets into a single value for further computation and optimization of the loss function. This approach ensures that both NWD and IoU similarity information are integrated during training, which boosts the model’s precision and robustness in detecting small targets.

3.3.3. Shape-IoU

The traditional Intersection over Union (IoU) metric is frequently utilized in object detection tasks to measure the accuracy of predicted bounding boxes. It computes the detection accuracy by measuring the ratio of the overlapping area to the union area between the predicted and ground-truth bounding boxes. However, when dealing with complex or irregular shapes, or objects with inconsistent boundaries, the traditional IoU may be insufficient, as it only considers area overlaps and neglects shape similarity.

Shape-IoU (Shape Intersection over Union) is a performance metric for evaluating object detection models, specifically designed for handling objects with complex shapes and irregular boundaries [49]. Unlike traditional IoU-based loss functions, Shape-IoU introduces an additional regularization term that penalizes the shape discrepancies between the predicted and ground truth bounding boxes. Specifically, Shape-IoU considers the aspect ratio and geometric properties of the bounding boxes, further enhancing the precision of small object localization. As an improvement over the traditional IoU (Intersection over Union) method, Shape-IoU adjusts the IoU value by calculating the shape similarity, allowing for a more accurate reflection of the alignment between the predicted and actual bounding boxes. This, in turn, improves the robustness and accuracy of evaluations for complex objects. The final Shape-IoU loss function is calculated as shown in Equation (5) [50].

L_{S h a p e - I o U} = 1 - I o U + D^{s h a p e} + 0.5 \times Ω^{s h a p e}

(5)

where,

D^{s h a p e}

quantifies the disparity in shape between the center points of the predicted bounding box and the actual ground truth box and

Ω^{s h a p e}

represents a shape-related regularization term. This regularization term is multiplied by a coefficient of 0.5, indicating its relatively minor influence on the overall loss.

3.3.4. Wise-IoU

Wise-IoU is an enhanced bounding box regression loss function that refines anchor box quality evaluation through a dynamic focusing mechanism and non-monotonic attention coefficients. It overcomes the limitations of traditional IoU in handling low-quality samples by adjusting gradient gains based on anchor box quality. For high-quality anchors, Wise-IoU minimizes unnecessary disturbances, while for low-quality anchors, it effectively reduces the impact of harmful gradients. This makes Wise-IoU particularly useful in object detection tasks, including high-precision applications such as small target and metal corrosion detection, significantly enhancing detection accuracy and stability.

In this study, the Wise-IoU loss function proposed by Ref. [51] is used as the loss metric for bounding box regression. This function helps address issues with low-quality samples in training data, where geometric factors like distance and aspect ratio can amplify penalties, negatively affecting the model’s generalization ability. By mitigating the impact of geometric factors when anchor boxes are well aligned with target boxes, Wise-IoU enhances the model’s performance and generalization ability in detecting small metal corrosion targets. Wise-IoU loss is computed using Equation (6) [51].

L_{W i s e - I o U} = e x p (\frac{{(x - x^{g t})}^{2} + {(y - y^{g t})}^{2}}{{(W_{g}^{2} - H_{G}^{2})}^{*}}) {\times L}_{I o U}

(6)

In this context,

L_{I o U}

refers to the loss function of the standard

I o U

loss, which evaluates the overlap between the predicted and ground truth box.

x^{g t}

and

y^{g t}

are the horizontal and vertical coordinates of the ground truth box, respectively.

W_{g}

and

H_{g}

represent the width and height of the minimum enclosing box. When using the Wise-IoU loss function, a total loss is commonly used in YOLOv5. The total loss can be expressed as a combination of the individual loss functions from Equation (7).

L_{l o s s} = λ_{1} L_{B C E L} + {λ_{2} L}_{D F L} + λ_{3} L_{W i s e - I o U}

(7)

where,

λ_{1}

λ_{2}

, and

λ_{3}

are the weighting factors for each loss component.

L_{B C E L}

is the Binary Cross-Entropy Loss, responsible for determining if the predicted box contains a target.

L_{D F L}

is the Distribution Focal Loss, used for location regression, addressing the task of box position prediction.

3.3.5. Focal-EIoU

The Focal-EIoU loss function is an enhancement that combines the benefits of Focal Loss and EIoU Loss to improve model performance, particularly when detecting small targets [41]. In object detection, the primary goal is to precisely locate and classify objects within an image. However, small targets pose unique challenges due to their limited size and subtle characteristics, often resulting in class imbalance and inaccurate bounding box predictions.

In datasets, small targets often represent a small fraction, resulting in significant class imbalance. Focal Loss mitigates this by assigning higher weights to harder-to-detect samples and lowering the weights of easily detectable ones, enabling the model to concentrate on more difficult examples and enhancing its ability to detect small objects. Meanwhile, EIoU Loss extends the traditional IoU by incorporating additional geometric factors, such as the distance between the centers and the overlap of bounding boxes. This enhancement provides a more stable gradient, especially when there is significant overlap between the predicted and target boxes, leading to more accurate regression. By combining the strengths of both Focal Loss and EIoU Loss, the Focal-EIoU loss function performs exceptionally well in small target detection tasks. It effectively addresses class imbalance while offering more precise bounding box regression, thereby boosting overall detection accuracy. In real-world applications, particularly those in complex environments with high precision demands, the Focal-EIoU loss function proves to be highly advantageous. The definition of the Focal-EIoU loss function is presented in Equation (8) [52].

L_{F o c a l - E I o U} = - α {(1 - p_{t})}^{γ} \log (p_{t}) + (1 - E I o U)

(8)

where,

α

and

γ

are focal loss adjustment parameters that balance the weights of positive and negative samples.

p_{t}

denotes the predicted probability, while

\log (p_{t})

represents the logarithmic loss term, which quantifies the discrepancy between the predicted and actual values.

1 - E I o U

serves as the bounding box regression loss term, assessing the difference between the predicted bounding box and the ground truth.

3.3.6. SIoU

Soft Intersection over Union (SIoU) is a loss function used for bounding box regression, commonly applied in object detection tasks to assess the overlap between predicted and ground truth boxes [44]. SIoU improves small target detection accuracy by integrating the traditional IoU metric with additional geometric factors [53]. While IoU is a standard measure for evaluating bounding box overlaps, its weak gradient information often leads to suboptimal training outcomes, particularly when detecting small targets. To address this issue, SIoU incorporates factors like center point distance, aspect ratio, and scale, offering richer gradient information that better guides model training.

SIoU takes into account both the overlap area and the shape and positional relationships of bounding boxes, offering more stable and effective gradient updates during training. This approach significantly improves the model’s localization accuracy and classification performance, especially for small target detection. Consequently, using the SIoU loss function in small target detection tasks enhances the model’s ability to detect and identify small objects in images, leading to better performance in complex environments and high-precision applications.

Based on the research by Gevorgyan et al. [44], the introduction of SIoU greatly facilitated the model’s training process. SIoU allows the predicted box to rapidly shift towards the nearest axis, requiring only a single coordinate (X or Y) for regression, thereby accelerating model convergence. In traditional YOLOv5, the CIoU loss function is used for bounding box regression, accounting for overlap area, centroid distance, and aspect ratio between the predicted and ground truth boxes. However, SIoU provides a better representation of variations in width, height, and confidence. Therefore, this study replaces CIoU with SIoU to improve the model’s overall detection performance. The SIoU loss function is provided in Equation (9) [54].

L_{S I o U} = 1 - I o U + \frac{θ + Δ}{2}

(9)

where,

θ

typically denotes the shape alignment penalty, which captures the differences in shape between the predicted and ground truth boxes.

Δ

represents the center point alignment penalty, assessing the deviation between the center points of the predicted and ground truth boxes. These penalties are averaged to adjust the IoU value, thereby introducing additional geometric alignment information into the loss function.

3.4. Evaluation Metrics

3.4.1. Traditional Evaluation Metrics

In the evaluation of classification model performance, traditional metrics include Precision, Recall, F1-Score, and the Confusion Matrix.

Precision is the ratio of true positive instances to all instances predicted as positive by the model, reflecting the model’s ability to correctly identify positive cases. A high precision indicates a low false positive rate and a high accuracy in predicting the true category. The calculation is provided in Equation (10).

P r e c i s i o n = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e P o s i t i v e}

(10)

Here, True Positive (TP) refers to the number of correctly predicted positive samples, and False Positive (FP) refers to the number of negative samples incorrectly predicted as positive by the model.

Recall assesses the percentage of actual positive samples that are correctly identified by the model, indicating its effectiveness in detecting positive instances. A higher recall value signifies that the model is able to correctly identify the majority of positive samples. The formula for calculating Recall is given in Equation (11).

R e c e l l = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e N e g a t i v e}

(11)

False Negative (FN) represents the number of true positive samples that the model misses.

The F1-Score, calculated as the harmonic mean of Precision and Recall, provides a balanced metric for assessing both of these measures. It assesses the model’s ability to balance precision and recall. The formula for calculating the F1-Score is shown in Equation (12).

F 1 - S c o r e = \frac{2 \times P r e c i s i o n \times R e c e l l}{P r e c i s i o n + R e c e l l}

(12)

The Confusion Matrix compares the model’s predictions with the actual labels, classifying the results into four categories: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). The confusion matrix, typically shown as an n × n matrix, offers a visual representation of how the predicted labels align with the true classes. It is a useful tool for assessing the performance of classification models, as it highlights both accurate and erroneous predictions. This detailed view helps identify areas of bias in the model’s predictions and guides improvements in its strategy.

3.4.2. Loss Function

The loss function is essential in deep learning as it quantifies the discrepancy between the model’s predictions and the actual labels. By minimizing the loss, the model adjusts its parameters to enhance prediction accuracy. Therefore, the loss function (Loss) is commonly used as a performance metric in YOLOv5. In object detection tasks, the loss function typically comprises three components, each weighted to balance its contribution to the overall loss.

Localization Loss quantifies the difference between the predicted and ground-truth bounding boxes. Common localization loss functions include IoU loss and its improved variants, such as GIoU, DIoU, and CIoU Loss. These loss functions calculate the discrepancy based on the IoU between the predicted and ground-truth boxes, offering a more accurate reflection of overlap and being more sensitive to the box’s position and size prediction.

Confidence Loss measures the model’s confidence in the presence of an object within each predicted bounding box. It ensures that the model not only identifies the bounding boxes but also accurately assesses whether an object is present within the detected box.

Classification Loss assesses how accurately the model predicts the object class within each bounding box. It is crucial in multi-class object detection tasks, ensuring the model correctly classifies the detected objects. Since this study focuses on a single class, the classification loss is zero and thus excluded from the analysis.

3.4.3. Inference Speed

In object detection tasks, FPS (Frames Per Second) is a key performance metric that reflects the model’s image processing speed, indicating how many frames it can process per second. A higher FPS signifies faster model performance. In metal corrosion detection, FPS directly impacts the system’s real-time capabilities and overall performance. A higher FPS allows for quicker processing and analysis of corrosion images, enabling timely feedback and faster responses—critical for real-time monitoring and automated detection. As FPS requirements vary across different applications, it is important to select an appropriate FPS range that aligns with the task’s complexity and real-time needs when designing and evaluating a corrosion detection system.

FPS is calculated by measuring the time it takes to process a single frame. Specifically, FPS is determined by dividing a fixed time interval (usually 1 s) by the time required to process one frame. The formula for this calculation is shown in Equation (13).

F P S = 1000 \times \frac{1}{P r e p r o c e s s + I n f e r e n c e + N M S}

(13)

where, Preprocess denotes the time required for the image preprocessing phase, in ms. Inference represents the time taken for the inference phase, in ms, and NMS corresponds to the time spent during the NMS phase, in ms.

In YOLO models, FPS measurement typically accounts for the time taken by various steps, including image scaling, padding, forward propagation, and NMS. Optimizing these processes can boost the model’s FPS, thus improving its effectiveness in real-time applications.

4. Results

In this study, a metal corrosion image dataset is used as a case example to compare and analyze five YOLOv5 models enhanced with IoU-based loss functions: YOLOv5-NWD, YOLOv5-Shape-IoU, YOLOv5-Wise-IoU, YOLOv5-Focal-EIoU, and YOLOv5-SIoU. The traditional YOLOv5 model serves as the control group, and performance is evaluated using precision, recall, loss function, and FPS as key metrics.

4.1. Comparison of Model Performance

4.1.1. Model Performance Comparison Using Traditional Metrics

Figure 4 shows the accuracy progression of each model across epochs. The overall trend is similar for all models, with accuracy stabilizing around 25 epochs. YOLOv5-NWD achieved the highest accuracy at 0.794, marking an improvement of approximately 7.2% over the traditional YOLOv5 model. YOLOv5-Shape-IoU followed with an accuracy of 0.774, reflecting a 4.5% improvement. YOLOv5-Focal-EIoU, YOLOv5-SIoU, and YOLOv5-Wise-IoU achieved accuracies of 0.758, 0.756, and 0.743, with improvements of 2.3%, 2.0%, and 0.2%, respectively. YOLOv5-NWD showed the most significant improvement, while YOLOv5-Shape-IoU came second. YOLOv5-Wise-IoU showed the smallest gain, with a marginal increase in accuracy.

As shown in Figure 5, all models demonstrate similar performance in terms of recall, indicating comparable capabilities in identifying actual corrosion points. YOLOv5-Shape-IoU and YOLOv5-Wise-IoU have recall rates of 0.645 and 0.646, respectively, slightly outperforming the baseline model’s recall rate of 0.639, suggesting a modest improvement in detecting corrosion instances. In contrast, YOLOv5-NWD has a recall rate of 0.628, which is 1.7% lower than the baseline. YOLOv5-Focal-EIoU and YOLOv5-SIoU have recall rates of 0.638 and 0.634, respectively, both showing minimal reductions compared to the baseline.

In terms of the F1-Score, YOLOv5-Shape-IoU and YOLOv5-NWD perform similarly, with scores of 0.704 and 0.701, reflecting improvements of 2.6% and 2.2%, respectively, compared to the baseline model. In contrast, YOLOv5-Focal-EIoU, YOLOv5-Wise-IoU, and YOLOv5-SIoU exhibit close F1-Scores of 0.693, 0.691, and 0.690, respectively, showing more modest gains of 1.0%, 0.7%, and 0.6% over the baseline.

Figure 6 presents the confusion matrix results for each model, revealing minimal differences in corrosion detection performance. The baseline YOLOv5 model achieved a corrosion detection accuracy of 0.67 with a background false positive rate of 0.33. Both YOLOv5-NWD and YOLOv5-Shape-IoU models showed a detection accuracy of 0.69 and a background false positive rate of 0.31, reflecting a 5.4% improvement over the baseline. The YOLOv5-Wise-IoU model had an accuracy of 0.68 and a false positive rate of 0.32, resulting in a modest 1.5% improvement. In contrast, YOLOv5-Focal-EIoU and YOLOv5-SIoU both had a corrosion detection accuracy of 0.66 and a background false positive rate of 0.34, showing a 1.5% decline compared to the baseline.

4.1.2. Model Performance Comparison Based on Loss Functions

In the evaluation of loss functions, box loss and objective loss are the primary metrics used to assess the performance of models in small target metal corrosion detection. Figure 7 illustrates the trend of box loss for each model throughout training. All models demonstrate a similar pattern of box loss reduction over the epochs, with a rapid initial drop followed by gradual stabilization, particularly after 150 epochs, where the losses plateau. By the 200th epoch, YOLOv5-Shape-IoU, YOLOv5-Wise-IoU, and YOLOv5-SIoU converged around a box loss of 0.027, while YOLOv5 and YOLOv5-Focal-EIoU settled around 0.023. YOLOv5-NWD recorded the lowest box loss, converging at approximately 0.014. Thus, in terms of loss convergence, YOLOv5-NWD demonstrated the best performance. However, in terms of loss fluctuation, YOLOv5-Wise-IoU showed the most notable instability, with significant spikes around the 12th and 125th epochs, indicating inconsistency in performance on certain training samples.

Figure 8 illustrates the object loss trends for each model during training. As a general pattern, all models exhibit a decrease in object loss as epochs progress, gradually stabilizing, which indicates that the models are continuously learning and refining their prediction accuracy. Although the initial object loss values vary among the models, after about 25 epochs, all improved models demonstrate similar patterns in terms of trends, fluctuations, and convergence behavior. By the 200th epoch, the values have largely stabilized around 0.05, indicating that the models have learned to accurately identify corrosion targets and their locations, with comparable accuracy levels.

Looking at the trends in object loss, YOLOv5-NWD shows a consistent decrease in object loss as training progresses, with faster convergence and more stable improvement. However, models like YOLOv5-Shape-IoU, YOLOv5-Wise-IoU, YOLOv5-SIoU, and YOLOv5-Focal-EIoU experience greater fluctuations during the early stages, with a pattern of increasing loss before it declines, reflecting higher uncertainty and slower convergence. The baseline YOLOv5 model begins with a significantly lower initial object loss than the other models, leading to a much lower final convergence value.

4.1.3. Model Performance Comparison Based on FPS

Performance comparison based on the FPS metric shows that all improved models exhibit a decrease in FPS compared to the baseline model, as illustrated in Figure 9. Notably, YOLOv5-Wise-IoU experienced the largest reduction at approximately 22.3%. This is followed by YOLOv5-Focal-EIoU, YOLOv5-Shape-IoU, YOLOv5-NWD, and YOLOv5-SIoU, with reductions of 17.2%, 13.1%, 11.4%, and 8.5%, respectively. This indicates that while the models show enhancements in accuracy and F1 score, these improvements come at the cost of processing speed.

4.1.4. Comprehensive Comparison of Model Performance

Considering the evaluation metrics, a comprehensive comparison of model performance was conducted using a radar chart, illustrated in Figure 10. YOLOv5-NWD stands out with balanced results across various metrics; aside from recall, all other metrics are equal to or exceed those of the other improved models, boasting the highest F1-Score. This indicates that YOLOv5-NWD strikes a superior balance between precision and recall, making it the best performer for the small target corrosion image dataset in this study. Following closely is YOLOv5-Shape-IoU, which presents balanced results but scores lower than YOLOv5-NWD. YOLOv5-Wise-IoU ranks third, showing less consistency in its metrics, with significant improvements observed only in recall and confusion matrix outcomes compared to the baseline model. Both YOLOv5-Focal-EIoU and YOLOv5-SIoU demonstrate similar enhancement effects; although their metrics are relatively balanced, the overall improvements are limited, providing minimal advantages over the baseline model.

In terms of overall metric comparison, YOLOv5-NWD demonstrates the most significant improvement. While its processing speed decreased by 8.2% compared to the baseline model, all other metrics saw varying levels of enhancement, with precision showing the largest gain of around 7.2%. As a result, for the small target metal corrosion image dataset used in this study, the YOLOv5 model enhanced with the NWD loss function achieves the best performance.

4.2. YOLOv5-NWD Sensitivity to IoU Proportion

In the previous comparison, the YOLOv5-NWD model utilized a 4:6 weight ratio between IoU and NWD. To further refine the model and determine the most effective weight distribution, this study investigated various weight ratios for IoU and NWD, with the goal of improving the detection of corrosion features in the dataset. Using the performance results at the 200th epoch as the baseline, the influence of various IoU and NWD weight ratios on model performance was analyzed. Figure 11 presents the evaluation results of YOLOv5-NWD across different weight ratios.

The performance of the YOLOv5-NWD model depends on the weight ratios assigned to IoU and NWD. Precision reaches its highest value of 0.794 at an IoU ratio of 4:6 but declines when the IoU ratio is adjusted either higher or lower, with the most significant drop occurring when the IoU ratio decreases. For recall, the peak value of 0.628 is observed with an IoU ratio of 6:4, while the lowest value of 0.600 is seen at a ratio of 4:6. Regarding the F1 score, the results remain largely consistent. The highest F1 score of 0.70 is recorded for ratios of 6:4 and 7:3, while the lowest of 0.68 is found at ratios of 3:7 and 4:6. The difference of only 0.02 between these values indicates that variations in IoU and NWD weights have minimal impact on the F1 score.

Table 2 shows the pre-process time, inference time, and non-maximum suppression (NMS) time for the YOLOv5-NWD model under different IoU ratios. The pre-process time remains nearly constant across all ratios, with a slight increase from 0.4 ms to 0.5 ms at the 7:3 ratio. This stability suggests that adjustments to the IoU ratio do not significantly impact the pre-processing stage, allowing for a consistent initial step prior to model inference.

In contrast, inference time and NMS time show slight variations based on the ratio. Inference time ranges from 12.0 ms at the 3:7 ratio to 13.5 ms at the 6:4 ratio, indicating that higher IoU ratios introduce a minor increase in inference time. This may be due to the model requiring more focus on precise bounding box calculations when there is a greater emphasis on IoU. Similarly, NMS time increases from 2.0 ms at the 3:7 ratio to 2.3 ms at the 6:4 and 7:3 ratios. This trend suggests that a stronger focus on IoU could require more intensive processing during the NMS stage, possibly due to an increased number of overlaps in bounding box predictions that need to be filtered.

The FPS trend under different IoU ratios is somewhat unpredictable. The highest FPS is observed at a 3:7 ratio, while the lowest appears at a 6:4 ratio. Generally, FPS tends to decrease as the IoU ratio increases, though this is not a strict pattern. Accuracy in detecting corrosion regions remains relatively stable across ratios, with a maximum probability of 0.71 at a 6:4 ratio and a minimum of 0.69 at 3:7, 4:6, and 5:5 ratios. This slight 0.02 difference indicates that adjusting the IoU and NWD weight ratios has minimal impact on the model’s precision in accurately detecting corrosion areas. If accuracy is prioritized, a 4:6 ratio offers the best balance for YOLOv5-NWD. However, for a more comprehensive performance that includes recall, F1 score, and confusion matrix metrics, a 6:4 ratio proves optimal. Meanwhile, the 3:7 ratio is the most effective for FPS and inference time, making it suitable for applications requiring high-speed processing. All IoU configurations achieve an FPS above 60 Hz, indicating that YOLOv5-NWD meets real-time processing requirements across different ratio settings, although applications needing faster processing may benefit from the 3:7 or 5:5 ratios.

4.3. YOLOv5-NWD Sensitivity to Different Dataset Sizes

The aforementioned research discovered that at an IoU to NWD ratio of 5:5, the YOLOv5-NWD model strikes an optimal balance between precision and recall and excels in both recall and F1 score metrics, which makes it ideal for environments with high demands for comprehensive performance. Therefore, to investigate how the performance of the YOLOv5-NWD model is affected by the size of the training dataset under this IoU ratio, this paper further examines the comprehensive performance of the YOLOv5-NWD model when trained on datasets consisting of 300, 600, 900, and 1266 images. Figure 12 displays the detection accuracy of the YOLOv5-NWD model as epochs progress under various training dataset sizes. It is evident that the trend of model accuracy changing with epochs is quite similar across different training dataset sizes, and the final accuracies are also closely matched. Furthermore, the larger the training dataset size, the sooner the model tends to stabilize and the higher the accuracy, suggesting that the overall model accuracy generally increases with the enlargement of the training dataset size.

Figure 13 offers a statistical analysis of the precision, recall, and F1 scores of the YOLOv5-NWD model across various image dataset sizes. It is clear that with an image dataset of 1266 images, the model attains the highest F1 score, suggesting that the model’s precision and recall are well-balanced, leading to peak performance. Although the F1 score of the model trained on a 900-image dataset is marginally lower than that of the 600-image dataset, the difference of 0.002 is negligible. In general, as the size of the image dataset grows, the model’s F1 score also rises, and the balance between precision and recall improves, indicating better model performance.

Table 3 compiles the performance results of the YOLOv5-NWD model in terms of processing speed after training with datasets of different sizes. With a dataset of 300 images, the FPS is 37.0 Hz; with 600 images, it increases to 41.8 Hz, an increase of 4.8 Hz; with 900 images, it reaches 55.0 Hz, an increase of 13.2 Hz; and with 1266 images, it further increases to 66.2 Hz, an additional increase of 11.2 Hz. This indicates that the model’s FPS significantly improves with the expansion of the training image dataset. Moreover, Table 3 also reveals that as the size of the training image dataset increases, there is a noticeable decrease in the model’s preprocessing time, inference time, and NMS, demonstrating that the model’s processing capabilities are enhanced with a larger training dataset.

In conclusion, the dimensions of the training image dataset significantly affect the performance of the YOLOv5-NWD model. With an increase in dataset size, there is a noticeable enhancement in the model’s precision, recall, and F1 score. For example, with a dataset of 1266 images, the model strikes a better balance between precision and recall, suggesting that the model has achieved superior performance in detecting corrosion features at this juncture.

4.4. YOLOv5-NWD Model Validation

To further validate the detection performance of the YOLOv5-NWD model (with an IoU:NWD ratio of 5:5) for corrosion features under various environmental conditions, especially its adaptability and robustness, two images of corroded steel structures were randomly selected from the internet corrosion images. Five image processing techniques were applied to simulate different environmental conditions, including bright, dark, rainy, windy, and snowy scenarios. Specifically, image contrast was adjusted to simulate varying lighting conditions, salt-and-pepper noise was added to mimic sandstorms and haze, Gaussian blur was applied to represent heavy rainfall, and a snow effect was added to simulate heavy snowfall.

Using these processed images, the YOLOv5-NWD model was employed to identify corrosion features under different environmental conditions, assessing its recognition performance in adverse weather scenarios. Figure 14 illustrates the model’s detection results for corrosion features in the two images. The results indicate that, except in poorly lit environments, the YOLOv5-NWD model effectively identifies corrosion features across various conditions, with confidence scores for accurately identified features generally being high, reaching up to 0.97. However, when the corrosion color is similar to the background color (such as the coating color), the model’s accuracy declines, particularly in dark environments where identification accuracy is notably lower.

Therefore, while the YOLOv5-NWD model demonstrates good recognition of corrosion features under different lighting and weather conditions, its accuracy significantly drops in dimly lit settings. This study suggests that using a corrosion feature image dataset obtained through data augmentation techniques is both reasonable and feasible for training models to enhance generalization capabilities. After training on these datasets, the YOLOv5-NWD model shows good adaptability to target detection tasks in various environments and exhibits satisfactory performance in detecting corrosion features in real-world scenarios. However, the validation experiments also reveal that there is room for improvement in the model’s accuracy for detecting corrosion features in extremely harsh environments. Future research will focus on optimizing the model structure and enriching the dataset of corrosion images from real-world environments.

5. Discussion

5.1. Analysis of Model Performance Comparison Results

This study evaluated the performance of YOLOv5 along with five of its modified versions for detecting small metal corrosion targets. The results demonstrated that YOLOv5-NWD achieved the most significant overall performance improvement among the modified models. Except for FPS and recall, all evaluation metrics showed improvements. Significantly, the precision achieved 0.794, representing a 7.2% improvement over the original YOLOv5 model. The confusion matrix and F1-Score also increased by 5.4% and 2.2%, respectively, although the recall rate decreased by 1.7%. These results suggest that the NWD loss function is more effective in capturing the distribution characteristics of corrosion images, thereby improving the model’s sensitivity to small targets. Compared to the CIOU loss function, NWD outperforms in minimizing false positives for small target corrosion detection, resulting in improved precision. This difference can be attributed to distinct optimization objectives, robustness against background noise, and the handling of gradient information during training. The CIOU loss function focuses primarily on box overlaps, which is effective for large target detection but less so for small targets, where the overlap is minimal. In contrast, NWD takes into account the overall distribution, making it more suitable for small target detection.

Moreover, NWD’s consideration of the background distribution allows the model to achieve higher detection accuracy in complex environments, thus enhancing the F1 score. Importantly, the NWD loss function provides smoother gradient updates during training, which helps the model effectively adjust its weights and avoid getting stuck in local optima—particularly when handling small targets. However, the FPS of YOLOv5-NWD dropped by 11.4% compared to the standard YOLOv5 model, likely due to the increased computational complexity of the NWD loss function compared to CIOU. Additionally, using NWD may require more iterations for convergence, increasing both training time and inference delay, especially when dealing with small targets that demand more computational resources.

The YOLOv5-Shape-IoU model also demonstrated a notable improvement in overall performance. Compared to the traditional YOLOv5 model, this version showed enhancements across several key metrics: precision increased by 4.5%, recall improved by 0.9%, the F1 score rose by 2.6%, and the probability of correctly identifying positive corrosion samples in the confusion matrix grew by 5.4%. Although the FPS decreased by approximately 12.66%, this change still reflects the improved boundary box regression, classification, and localization capabilities brought by the Shape-IoU loss function. Additionally, the higher F1 score suggests that this model is more effective at balancing precision and recall. The performance gains can be attributed to the intrinsic properties of the Shape-IoU loss function, its adaptability to small targets, and parameter adjustments during the training process. For example, Shape-IoU incorporates shape information when calculating the overlap of target boxes, which makes the model more sensitive when detecting small targets. This focus on boundary and shape information contributes to improvements in both recall and precision. In contrast, CIOU, being a more general loss function, takes into account both the overlap and the spatial relationship between the target boxes, but it may not perform as effectively with small targets. Furthermore, Shape-IoU is likely more successful at minimizing misclassifications, which leads to better accuracy in the confusion matrix. It is important to note that the decrease in FPS suggests an increase in computational complexity when using the Shape-IoU loss function, especially when the dataset contains many small targets. This may impact the inference speed, as a lower FPS implies that real-time applications might require more computational resources or necessitate compromises in processing speed. In summary, while the YOLOv5-Shape-IoU model has achieved significant performance improvements, trade-offs between speed and accuracy must be carefully considered in practical applications.

Considering all evaluation metrics, it is evident that compared to YOLOv5-NWD and YOLOv5-Shape-IoU, the three improved models—YOLOv5-Focal-EIoU, YOLOv5-SIoU, and YOLOv5-Wise-IoU—showed only minor improvements across key indicators. Specifically, the YOLOv5-Wise-IoU model demonstrated limited enhancement; compared to the CIOU loss function, the introduction of the Wise-IoU loss function resulted in only a 0.2% increase in precision, a 0.7% improvement in the F1 score, and a 1.5% improvement in the probability of correctly identifying positive corrosion samples in the confusion matrix. However, it also experienced the largest decrease in FPS, dropping by 22.25%. This suggests that while the Wise-IoU loss function improved the model’s performance to some extent, the gains were minimal. This limited improvement might be attributed to the characteristics of the Wise-IoU loss function not fully leveraging its potential for detecting small target corrosion images. The weak features of small corrosion targets make them susceptible to background interference, hindering performance gains. Additionally, shortcomings in parameter settings and data augmentation strategies during model training could have also played a role. The increased computational complexity introduced by Wise-IoU led to a reduction in inference speed, and limitations in the quantity and quality of small target samples in the dataset may have further undermined the potential for model improvement.

In summary, based on the results comparison and analysis, the NWD loss function provides the most comprehensive improvement in the detection performance of small target corrosion images for the traditional YOLOv5 model among the IoU-based improved loss functions. The Shape-IoU loss function ranks second, while the other three loss functions show relatively small and comparable overall improvements. In terms of precision and FPS, the Wise-IoU loss function showed the weakest performance. Regarding recall, F1 score, and accuracy, the SIoU loss function performed the poorest.

5.2. Discussion on Sensitivity Analysis of IoU Proportion in YOLOv5-NWD

This study explored how different IoU and NWD weight ratios affect the performance of the YOLOv5-NWD model. By comparing five configurations with IoU ratios of 3:7, 4:6, 5:5, 6:4, and 7:3, the model’s performance was evaluated based on metrics such as precision, recall, F1 score, FPS, and confusion matrix results. The findings show that the confusion matrix and F1 score are relatively less affected by the proportion of IoU in the loss function, with maximum variation rates of only 2.90% and 2.94%, respectively. However, precision, recall, and FPS are significantly influenced by changes in the IoU ratio, with maximum variations of 4.61%, 4.67%, and 12.64%, respectively. Specifically, the highest precision was achieved when the IoU ratio was 4:6, while the lowest occurred at 3:7. For recall, the maximum value appeared at a ratio of 6:4, and the minimum at 4:6. FPS showed a different trend, with the highest value at 3:7 and the lowest at 6:4. Although there does not seem to be a clear pattern between the metric changes and the IoU ratio, overall, the precision, recall, F1 score, and confusion matrix results for YOLOv5-NWD generally showed an upward trend with increasing IoU proportion, with fluctuations at certain ratio values. Conversely, FPS tended to decrease as the NWD proportion decreased, with some minor fluctuations at certain ratios. This indicates that incorporating IoU in the model’s loss function improves the overall performance in terms of precision, recall, and F1 score, whereas incorporating NWD contributes to enhancing the processing speed of the model.

Overall, when the IoU ratio is 5:5, the performance evaluation metrics show a balanced outcome, indicating that the YOLOv5-NWD model achieves good overall performance at this ratio. Therefore, in corrosion image recognition tasks, it is crucial to strike an appropriate balance between IoU and NWD in the loss function. Generally, a ratio of 5:5 is a reasonable choice, and deviations from this balance should be minimized. From this perspective, both 4:6 and 6:4 ratios are also relatively good alternatives. In the dataset used in this study, the ratio of 4:6 yielded the highest precision, while the ratio of 6:4 resulted in the best recall, F1 score, and confusion matrix results. In corrosion image detection applications, the choice of the IoU proportion in the loss function should be made based on specific application requirements.

Based on the performance analysis and discussion, the strengths and limitations of each improved model for different detailed industrial applications are suggested as follows: YOLOv5-NWD is suitable for high-precision small-target corrosion detection tasks, such as in marine infrastructure. While its inference speed is slower, it performs with higher stability and accuracy in complex environments. YOLOv5-Shape-IoU is ideal for high-precision tasks like metal pipeline and bridge inspections, but its lower FPS limits its use in real-time monitoring systems. YOLOv5-Wise-IoU contributes to accuracy improvements but experiences a significant drop in FPS, making it suitable for environments with less stringent real-time requirements, such as storage facilities. YOLOv5-Focal-EIoU and YOLOv5-SIoU are well-suited for corrosion detection in complex environments, balancing precision and recall, but still require optimization for fast inference in real-time detection applications. Overall, the choice of model depends on the specific trade-offs between real-time performance, accuracy, and computational resources required for the application.

5.3. Discussion on Sensitivity Analysis of Different Dataset Sizes in YOLOv5-NWD

This research assesses how different training image dataset sizes impact the performance of the YOLOv5-NWD model. The results show that as the training image dataset expands, the model’s precision stabilizes more quickly. The main reason is that larger training datasets offer more samples, allowing the model to learn a broader range of features and patterns, thus improving its generalization capabilities. This enhancement in generalization helps the model learn and stabilize its precision more rapidly. Moreover, larger training datasets enable more effective hyperparameter optimization and data augmentation strategies, which in turn speed up the stabilization of precision.

Additionally, as the size of the training image dataset increases, the model more easily strikes a balance between precision and recall, leading to better overall performance. This is likely due to the dataset’s diversity, which encompasses more scenarios and conditions, enabling the model to maintain good performance under various circumstances. Large-scale datasets also allow for more nuanced adjustments to post-processing strategies, such as Non-Maximum Suppression (NMS), and assist the model in learning the settings of anchor boxes more effectively. This contributes to improving the model’s precision and recall and achieving a better balance between them, enhancing the model’s overall performance.

In terms of processing speed, as the dataset size grows, the model’s FPS, inference time, and NMS time have all significantly improved. This indicates that training with larger datasets not only enhances the model’s recognition performance for corrosion features but also optimizes computational efficiency while maintaining high precision, allowing the model to adapt to more real-time application scenarios.

Overall, the size of the dataset significantly affects the YOLOv5-NWD model’s recognition performance in corrosion feature detection tasks. Increasing the dataset size can effectively enhance the model’s object detection performance and processing speed, thereby improving its performance under complex corrosion samples and background noise conditions. Therefore, it is recommended, whenever possible, to acquire more training images to form a larger-scale training dataset to train the recommended YOLOv5-NWD model, thereby achieving better detection performance for small corrosion features and faster model processing speed.

It is important to note that while expanding the dataset can improve model performance, it may also increase computational costs and training time, sometimes leading to hardware resource limitations and latency issues in practical applications. To mitigate these issues, techniques such as data sampling, incremental learning, and model compression can be employed to reduce computational burdens and enhance training efficiency. Additionally, large datasets may lead to overfitting, especially when sample diversity is lacking. To address this, data augmentation, regularization techniques, and cross-domain data transfer can be used to strengthen the model’s generalization capabilities and prevent overfitting to specific samples. Furthermore, the model’s adaptability in different environments remains a challenge, particularly when dealing with corrosion samples from varying climatic or environmental conditions. By simulating environmental conditions and integrating multimodal data, the model’s robustness and adaptability can be effectively enhanced. Lastly, dataset imbalances may cause the model to favor majority class corrosion samples, affecting the detection accuracy of minority class samples. To tackle this challenge, strategies such as oversampling, undersampling, and weighted loss functions can be used to address class imbalances and improve the model’s detection capabilities for rare samples. Therefore, future enhancements to model performance and practicality can be achieved by optimizing training methods, increasing data diversity, improving environmental adaptability, and addressing dataset imbalances.

6. Conclusions

To explore the impact of IoU-based loss functions on the performance of YOLOv5 for small target corrosion image detection, this study proposed modifications to the IoU loss function and developed five improved YOLOv5 models: YOLOv5-NWD, YOLOv5-Shape-IoU, YOLOv5-Wise-IoU, YOLOv5-Focal-EIoU, and YOLOv5-SIoU. Using the metal corrosion dataset from the Zhoushan Seawater Station of China National Materials Corrosion and Protection Science Data Center for training, the performance of the improved models and the traditional YOLOv5 was evaluated based on precision, recall, F1 score, and FPS. The results showed that YOLOv5-NWD achieved the best overall performance among the modified models, with a precision increase of 7.2%, an F1 score increase of 2.2%, and a 5.4% improvement in recognizing positive corrosion samples in the confusion matrix. YOLOv5-Shape-IoU followed closely, with a 4.5% increase in precision, a 2.6% increase in the F1 score, and a 5.4% improvement in positive sample recognition in the confusion matrix. In contrast, the performance gains for YOLOv5-Focal-EIoU, YOLOv5-SIoU, and YOLOv5-Wise-IoU were relatively small. Particularly, YOLOv5-Wise-IoU showed only a 0.2% improvement in precision, a 0.7% increase in the F1 score, and a 1.5% improvement in recognizing positive samples, while FPS decreased by approximately 22.25%. These findings suggest that, among the various IoU-based loss functions, the NWD loss function provides the most significant improvement in the performance of YOLOv5 for small target corrosion image detection, followed by the Shape-IoU loss function. Meanwhile, Focal-EIoU, Wise-IoU, and SIoU offered comparatively limited performance gains.

To further analyze the impact of different IoU and NWD weight ratios on the performance of the YOLOv5-NWD model, this study compared five configurations with IoU ratios of 3:7, 4:6, 5:5, 6:4, and 7:3, evaluating their performance based on precision, recall, F1 score, FPS, and confusion matrix metrics. The findings indicated that the accuracy of positive sample recognition in the confusion matrix and the F1 score of the YOLOv5-NWD model were less sensitive to the proportion of IoU in the loss function. However, precision, recall, and FPS were notably affected by this proportion. Overall, a balanced performance across all evaluation metrics was observed when the IoU ratio was 5:5, suggesting that the YOLOv5-NWD model exhibited good overall performance at this ratio. In corrosion image recognition tasks, finding an optimal balance between IoU and NWD in the loss function is crucial. For the dataset used in this study, a 4:6 ratio yielded the highest precision, while a 6:4 ratio produced the best recall, F1 score, and confusion matrix results. Therefore, in corrosion image detection applications, selecting an appropriate IoU ratio in the loss function should be based on the specific requirements of the application. Moreover, this study explores the effects of different dataset sizes on the performance of the YOLOv5-NWD (5:5) model. The size of the dataset has a significant impact on the model’s precision, recall, F1 score, and inference speed. With an increased dataset size, the model can reach a stable and higher precision more quickly. At the same time, the model’s F1 score increases, and there is a better balance between precision and recall, resulting in improved overall model performance. Furthermore, enlarging the dataset size can also effectively enhance the model’s inference speed.

The results demonstrate that the proposed approach accurately identifies corrosion features, improving both detection precision and efficiency, especially in challenging environmental conditions such as lighting variations, shadows, and occlusions. These advancements not only boost YOLOv5’s performance in detecting small corrosion targets but also offer valuable theoretical insights for loss function design, contributing to the progress of related research areas. The findings provide practical support for corrosion monitoring and protection, promoting more scientific and precise management of material safety. However, the inherent irregularity and instability of corrosion features present ongoing challenges, requiring further model enhancements to improve performance in complex settings.

Nevertheless, there are some potential issues in the research. For instance, due to the insufficient quality and quantity of the original corrosion images, the dataset created has certain limitations, which may affect the model’s performance in corrosion image recognition tasks and its generalization ability, thus reducing the applicability and scalability of the research findings. Additionally, the loss functions employed in the study are not comprehensive, as there is insufficient consideration of other IoU loss functions and non-IoU loss functions. Furthermore, the relevant ablation experiments are not thorough enough, which may affect the generalizability of the model’s results. In addition, the study only considers improvements to the performance of the YOLOv5 model from the perspective of IoU loss functions, without taking into account the impact of model depth, model architecture, and other factors on the model’s performance.

Given the current limitations, future research can expand and optimize the study in several areas. First, to improve the representativeness and credibility of the research findings, there is a need to further increase the sample size of corrosion images, covering more types of corrosion patterns and different environmental factors. Additionally, conducting comparative studies with data from different regions and testing environments will enhance the universality of the research, particularly in terms of adaptability under different climate conditions and metal materials. Optimizing model structure and adjusting hyperparameters are also key to improving model performance, especially in complex environments. In addition, the generalizability of the research results can be enhanced from multiple perspectives. For example, exploring the effects of different dataset sizes and model depths on the improved model’s performance in detecting small object corrosion features, while also incorporating a more comprehensive range of loss function types for ablation experiments, could improve the generalization ability of the model proposed in this study. In coastal environments, corrosion features on metal structures often exhibit significant irregularity and instability, such as variations in lighting, shadows, and object occlusion, which pose challenges to the model’s accuracy. Therefore, improving image recognition technology for precise and rapid detection of small target corrosion features in complex environments is an urgent need. Future research can also incorporate multimodal information, such as thermography and spectral analysis, to enhance the accuracy and robustness of image recognition.

Author Contributions

Conceptualization, Q.Y. and X.G.; methodology, Q.Y., X.G. and L.Z.; software, Y.H. (Yudong Han); validation, Q.Y. and Y.H. (Yudong Han); formal analysis, Q.Y., L.Z. and Y.H. (Yudong Han); investigation, Y.H. (Yudong Han) and Y.H. (Yi Han); data curation, Y.H. (Yudong Han) and Y.H. (Yi Han); Visualization, Y.H. (Yudong Han) and Y.H. (Yi Han); Supervision, Q.Y., X.G. and L.Z.; writing—original draft preparation, Y.H. (Yudong Han); writing—review and editing, X.G., L.Z. and Q.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in this article. Further inquiries can be directed to the corresponding author.

Acknowledgments

All authors would like to express their sincere thanks to the editor and reviewers for their constructive comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Symbols and Notations

Symbols	Descriptions
$L_{C I o U}$	Loss function of CIoU
$I o U$	Loss function of IoU
$ρ$	Euclidean distance
$B$	Predicted bounding box
$B^{g t}$	Actual bounding box
$c_{1}$	Diagonal distance of the smallest enclosing box that covers both the predicted and actual bounding boxes
$β$	Adjustment factor
$v$	Consistency of the aspect ratio
$Q_{a}$ and $Q_{b}$	Gaussian distribution parameters for first and second bounding boxes, respectively.
$W_{2}^{2} (Q_{a}, Q_{b})$	Squared Wasserstein distance between the two Gaussian distributions
$c_{2}$	Normalization constant that adjusts the sensitivity of the loss function to ensure stability during the training process.
$c_{x_{a}}$	x-coordinate of the center of the first bounding box
$c_{y_{a}}$	y-coordinate of the center of the first bounding box
$c_{x_{b}}$	x-coordinate of the center of the second bounding box
$c_{y_{b}}$	y-coordinate of the center of the second bounding box
$w_{a}$ and $w_{b}$	Widths of the first and second bounding boxes, respectively
$h_{a}$ and $h_{b}$	Height of the first bounding box and the width of the second bounding box, respectively.
$L_{N W D}$	Loss function of NWD
$N$	Number of detection frames
$L_{S h a p e - I o U}$	Loss function of Shape-IoU
$D^{s h a p e}$	Shape difference between the center points of the predicted and ground truth boxes
$Ω^{s h a p e}$	Shape-related regularization term
$L_{W i s e - I o U}$	Loss function of Wise-IoU
$x^{g t}$	Horizontal coordinate of the ground truth bounding box
$y^{g t}$	Vertical coordinate of the ground truth bounding box
$W_{g}$	Width of the ground truth bounding box
$H_{g}$	Height of the ground truth bounding box
$L_{l o s s}$	Combined individual loss functions
$λ_{1}$ , $λ_{2}$ , $λ_{3}$	Weight factors for each loss component
$L_{B C E L}$	Binary Cross-Entropy Loss
$L_{D F L}$	Distribution Focal Loss
$L_{F o c a l - E I o U}$	Loss function of Focal-EIoU
$α$	Importance of positive and negative samples in focal loss
$γ$	Focusing strength to emphasize harder examples in training
$p_{t}$	Predicted probability
$E I o U$	Bounding box regression loss term
$L_{S I o U}$	Loss function of SIoU
$θ$	Shape alignment penalty
$Δ$	Center point alignment penalty

References

Abd El Fattah, A.; Al-Duais, I.; Riding, K.; Thomas, M. Field evaluation of corrosion mitigation on reinforced concrete in marine exposure conditions. Constr. Build. Mater. 2018, 165, 663–674. [Google Scholar] [CrossRef]
Xia, D.H.; Deng, C.M.; Macdonald, D.; Jamali, S.; Mills, D.; Luo, J.L.; Hu, W. Electrochemical measurements used for assessment of corrosion and protection of metallic materials in the field: A critical review. J. Mater. Sci. Technol. 2022, 112, 151–183. [Google Scholar] [CrossRef]
Bhandari, J.; Khan, F.; Abbassi, R.; Garaniya, V.; Ojeda, R. Modelling of pitting corrosion in marine and offshore steel structures—A technical review. J. Loss Prev. Process Ind. 2015, 37, 39–62. [Google Scholar] [CrossRef]
Foorginezhad, S.; Mohseni-Dargah, M.; Firoozirad, K.; Aryai, V.; Razmjou, A.; Abbassi, R.; Asadnia, M. Recent advances in sensing and assessment of corrosion in sewage pipelines. Process Saf. Environ. Prot. 2021, 147, 192–213. [Google Scholar] [CrossRef]
Cao, H.; Wang, K.; Song, S.; Zhang, X.; Gao, Q.; Liu, Y. Corrosion behavior research and corrosion prediction of structural steel in marine engineering. Anti-Corros. Methods Mater. 2022, 69, 636–650. [Google Scholar] [CrossRef]
Yu, Q.; Han, Y.; Lin, W.; Gao, X. Detection and analysis of corrosion on coated metal surfaces using enhanced YOLOv5 algorithm for anti-corrosion performance evaluation. J. Mar. Sci. Eng. 2024, 12, 1090. [Google Scholar] [CrossRef]
Zhang, R.; Wen, C. SOD-YOLO: A small target defect detection algorithm for wind turbine blades based on improved YOLOv5. Adv. Theory Simul. 2022, 5, 2100631. [Google Scholar] [CrossRef]
Li, L.; Zhang, R.; Xie, T.; He, Y.; Zhou, H.; Zhang, Y. Experimental design of steel surface defect detection based on MSFE-YOLO—An improved YOLOv5 algorithm with multi-scale feature extraction. Electronics 2024, 13, 3783. [Google Scholar] [CrossRef]
Li, J.; Liu, C.; Lu, X.; Wu, B. CME-YOLOv5: An efficient object detection network for densely spaced fish and small targets. Water 2022, 14, 2412. [Google Scholar] [CrossRef]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A review of YOLO algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Wang, J.; Xiao, H.; Chen, L.; Xing, J.; Pan, Z.; Luo, R.; Cai, X. Integrating weighted feature fusion and the spatial attention module with convolutional neural networks for automatic aircraft detection from SAR images. Remote Sens. 2021, 13, 910. [Google Scholar] [CrossRef]
Jocher, G.; Nishimura, K.; Mineeva, T.; Vilariño, R. YOLOv5; GitHub Repository. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 15 May 2024).
Zhang, Y.; Cai, W.; Fan, S.; Song, R.; Jin, J. Object detection based on YOLOv5 and GhostNet for orchard pests. Information 2022, 13, 548. [Google Scholar] [CrossRef]
Lawal, O.M. YOLOv5-LiNet: A lightweight network for fruits instance segmentation. PLoS ONE 2023, 18, e0282297. [Google Scholar] [CrossRef]
Liu, L.; Wang, L.; Ma, Z. Improved lightweight YOLOv5 based on ShuffleNet and its application on traffic signs detection. PLoS ONE 2024, 19, e0310269. [Google Scholar] [CrossRef]
Wu, S.; Lu, X.; Guo, C. YOLOv5_mamba: Unmanned aerial vehicle object detection based on bidirectional dense feedback network and adaptive gate feature fusion. Sci. Rep. 2024, 14, 22396. [Google Scholar] [CrossRef]
Chen, H.; Liu, H.; Sun, T.; Lou, H.; Duan, X.; Bi, L.; Liu, L. MC-YOLOv5: A multi-class small object detection algorithm. Biomimetics 2023, 8, 342. [Google Scholar] [CrossRef]
Zhang, H.; Fu, W.; Li, D.; Wang, X.; Xu, T. Improved small foreign object debris detection network based on YOLOv5. J. Real-Time Image Process. 2024, 21, 21. [Google Scholar] [CrossRef]
Wang, J.; Chen, Y.; Dong, Z.; Gao, M. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput. Appl. 2023, 35, 7853–7865. [Google Scholar] [CrossRef]
Yar, H.; Khan, Z.A.; Ullah, F.U.M.; Ullah, W.; Baik, S.W. A modified YOLOv5 architecture for efficient fire detection in smart cities. Expert Syst. Appl. 2023, 231, 120465. [Google Scholar] [CrossRef]
Zhou, J.; Su, T.; Li, K.; Dai, J. Small Target-YOLOv5: Enhancing the algorithm for small object detection in drone aerial imagery based on YOLOv5. Sensors 2023, 24, 134. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.J.; Lim, S.S.; Jeong, S.Y.; Yoon, J.W. Detection of defects on cut-out switches in high-resolution images based on YOLOv5 algorithm. J. Electr. Eng. Technol. 2024, 19, 1–14. [Google Scholar] [CrossRef]
Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Chen, T. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
Xiao, X.; Xue, X.; Zhao, Z.; Fan, Y. A recursive prediction-based feature enhancement for small object detection. Sensors 2024, 24, 3856. [Google Scholar] [CrossRef]
Cai, D.; Zhang, Z.; Zhang, Z. Corner-point and foreground-area IoU loss: Better localization of small objects in bounding box regression. Sensors 2023, 23, 4961. [Google Scholar] [CrossRef]
Zhou, M.; Li, B.; Wang, J. Optimization of hyperparameters in object detection models based on fractal loss function. Fractal Fract. 2022, 6, 706. [Google Scholar] [CrossRef]
Allo, N.T.; Indrabayu; Zainuddin, Z. A novel approach of hybrid bounding box regression mechanism to improve convergency rate and accuracy. Int. J. Intell. Eng. Syst. 2024, 17, 715–727. [Google Scholar]
Tong, C.; Yang, X.; Huang, Q.; Qian, F. NGIoU loss: Generalized intersection over union loss based on a new bounding box regression. Appl. Sci. 2022, 12, 12785. [Google Scholar] [CrossRef]
Su, K.; Cao, L.; Zhao, B.; Li, N.; Wu, D.; Han, X. N-IoU: Better IoU-based bounding box regression loss for object detection. Neural Comput. Appl. 2024, 36, 3049–3063. [Google Scholar] [CrossRef]
Sumi, L.; Dey, S. Improved bounding box regression loss for weapon detection systems using deep learning. Int. J. Inf. Technol. 2024, 1, 1–17. [Google Scholar] [CrossRef]
Jing, Z.; Li, P.; Wu, B.; Yuan, S.; Chen, Y. An adaptive focal loss function based on transfer learning for few-shot radar signal intra-pulse modulation classification. Remote Sens. 2022, 14, 1950. [Google Scholar] [CrossRef]
Krothapalli, U.; Abbott, A.L. Adaptive label smoothing. arXiv 2020, arXiv:2009.06432. [Google Scholar]
Cui, Y.; Jia, M.; Lin, T.Y.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9268–9277. [Google Scholar]
Ross, T.Y.; Dollár, G.K.H.P. Focal loss for dense object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2980–2988. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Fu, M.; Jia, Z.; Wu, L.; Cui, Z. Detection and recognition of metal surface corrosion based on CBG-YOLOv5s. PLoS ONE 2024, 19, e0300440. [Google Scholar] [CrossRef] [PubMed]
Zhu, C.; Wang, Z. Entropy-based matrix learning machine for imbalanced data sets. Pattern Recognit. Lett. 2017, 88, 72–80. [Google Scholar] [CrossRef]
Xie, Z.; Shu, C.; Fu, Y.; Zhou, J.; Chen, D. Balanced loss function for accurate surface defect segmentation. Appl. Sci. 2023, 13, 826. [Google Scholar] [CrossRef]
Wang, C.; Deng, C.; Wang, S. Imbalance-XGBoost: Leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. Pattern Recognit. Lett. 2020, 136, 190–197. [Google Scholar] [CrossRef]
Lin, Z.; Pan, J.; Yu, H.; Xiao, X.; Wang, X.; Feng, Z.; Jiang, J. Disentangled representation with cross experts covariance loss for multi-domain recommendation. arXiv 2024, arXiv:2405.12706. [Google Scholar]
Sugiura, H.; Gienger, M.; Janssen, H.; Goerick, C. Reactive self-collision avoidance with dynamic task prioritization for humanoid robots. Int. J. Humanoid Robot. 2010, 7, 31–54. [Google Scholar] [CrossRef]
Xia, Y.; Jiang, S.; Meng, L.; Ju, X. XGBoost-B-GHM: An ensemble model with feature selection and GHM loss function optimization for credit scoring. Systems 2024, 12, 254. [Google Scholar] [CrossRef]
Liu, Y.; Zhou, T.; Xu, J.; Hong, Y.; Pu, Q.; Wen, X. Rotating target detection method of concrete bridge crack based on YOLO v5. Appl. Sci. 2023, 13, 11118. [Google Scholar] [CrossRef]
Yu, Q.; Han, Y.; Gao, X.; Lin, W.; Han, Y. Comparative Analysis of Improved YOLO v5 Models for Corrosion Detection in Coastal Environments. J. Mar. Sci. Eng. 2024, 12, 1754. [Google Scholar] [CrossRef]
Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
Wang, S.; Li, C.; Song, X.; Wang, Y. SC-YOLO: An Improved Yolo Network for PCB Defect Detection. In Proceedings of the 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), Nanjing, China, 22–24 March 2024. [Google Scholar]
Zhang, H.; Zhang, S. Shape-iou: More accurate metric considering bounding box shape and scale. arXiv 2023, arXiv:2312.17663. [Google Scholar]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IoU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
Zhang, Y.; Liu, X.; Guo, J.; Zhou, P. Surface defect detection of strip-steel based on an improved PP-YOLOE-m detection network. Electronics 2022, 11, 2603. [Google Scholar] [CrossRef]

Figure 1. Framework of the study.

Figure 2. Dataset annotation flow chart.

Figure 3. Distribution of corrosion area ranges.

Figure 4. Precision varies with epoch for different models.

Figure 5. Performance evaluation of various YOLOv5 models.

Figure 6. Confusion matrix outcomes for various models.

Figure 7. Box loss variation over epochs for different models.

Figure 8. Objective loss variation over epochs for different models.

Figure 9. FPS of different models.

Figure 10. Comparative analysis of model performance.

Figure 11. Model performance comparison for IoU ratios.

Figure 12. Model Performance Comparison for Different Dataset Sizes.

Figure 13. Comparison of YOLOv5-NWD Performance Across Different Dataset Sizes.

Figure 14. Corrosion monitoring diagrams in different harsh environments.

Table 1. Fundamental details for various coating types.

Coating	Coating Composition	Coating Thickness	Test Time (Month)
Powder epoxy	Powder Epoxy Coating	700 μm	24/60
Fluorocarbon	Epoxy Zinc-Rich Primer/Sealer/High-Build Epoxy Asphalt/Fluorocarbon Topcoat	430 μm	24/60
Epoxy	Inorganic Zinc-Rich Primer/Epoxy Mastic/Epoxy Topcoat	290 μm	24/60
Chlorinated rubber	Inorganic Zinc-Rich Primer/Sealer/Solvent-Free Ultra-High Build Epoxy Intermediate layer	500 μm	24/60
Wuxi anti-fouling	Epoxy Zinc-Rich Primer/Epoxy Mastic/Wuxi Anti-Fouling Topcoat	450 μm	24
Fusion Bonded Epoxy	Fusion-Bonded Epoxy Coating	1000 μm	96
Epoxy Zinc-Rich Coating System	Epoxy Zinc-Rich Primer/Epoxy Mica Iron Intermediate Coat/High-Build Chlorinated Rubber Topcoat	475 μm	96

Table 2. Processing speed for different ratio of IoU and NWD (in YOLOv5-NWD).

IoU:NWD	FPS (Hz)	Pre-Process (ms)	Inference (ms)	NMS (ms)
3:7	69.5	0.4	12.0	2.0
4:6	63.7	0.4	13.1	2.2
5:5	66.2	0.4	12.6	2.1
6:4	61.7	0.4	13.5	2.3
7:3	62.1	0.5	13.3	2.3

Table 3. Processing speed for different Dataset Sizes in YOLOv5-NWD with a 5:5 IoU to NWD Ratio.

Dataset Sizes (Images)	FPS (Hz)	Pre-Process (ms)	Inference (ms)	NMS (ms)
300	37.0	0.6	21.5	4.9
600	41.8	0.4	19.4	4.1
900	55.0	0.4	14.5	3.3
1266	66.2	0.4	12.6	2.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, Q.; Han, Y.; Han, Y.; Gao, X.; Zheng, L. Enhancing YOLOv5 Performance for Small-Scale Corrosion Detection in Coastal Environments Using IoU-Based Loss Functions. J. Mar. Sci. Eng. 2024, 12, 2295. https://doi.org/10.3390/jmse12122295

AMA Style

Yu Q, Han Y, Han Y, Gao X, Zheng L. Enhancing YOLOv5 Performance for Small-Scale Corrosion Detection in Coastal Environments Using IoU-Based Loss Functions. Journal of Marine Science and Engineering. 2024; 12(12):2295. https://doi.org/10.3390/jmse12122295

Chicago/Turabian Style

Yu, Qifeng, Yudong Han, Yi Han, Xinjia Gao, and Lingyu Zheng. 2024. "Enhancing YOLOv5 Performance for Small-Scale Corrosion Detection in Coastal Environments Using IoU-Based Loss Functions" Journal of Marine Science and Engineering 12, no. 12: 2295. https://doi.org/10.3390/jmse12122295

APA Style

Yu, Q., Han, Y., Han, Y., Gao, X., & Zheng, L. (2024). Enhancing YOLOv5 Performance for Small-Scale Corrosion Detection in Coastal Environments Using IoU-Based Loss Functions. Journal of Marine Science and Engineering, 12(12), 2295. https://doi.org/10.3390/jmse12122295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing YOLOv5 Performance for Small-Scale Corrosion Detection in Coastal Environments Using IoU-Based Loss Functions

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data

3.1.1. Dataset Acquisition

3.1.2. Data Augmentation

3.1.3. Dataset Annotation

3.2. Statistical Analysis of Corrosion Area

3.3. Loss Functions and Models

3.3.1. CIoU

3.3.2. NWD

3.3.3. Shape-IoU

3.3.4. Wise-IoU

3.3.5. Focal-EIoU

3.3.6. SIoU

3.4. Evaluation Metrics

3.4.1. Traditional Evaluation Metrics

3.4.2. Loss Function

3.4.3. Inference Speed

4. Results

4.1. Comparison of Model Performance

4.1.1. Model Performance Comparison Using Traditional Metrics

4.1.2. Model Performance Comparison Based on Loss Functions

4.1.3. Model Performance Comparison Based on FPS

4.1.4. Comprehensive Comparison of Model Performance

4.2. YOLOv5-NWD Sensitivity to IoU Proportion

4.3. YOLOv5-NWD Sensitivity to Different Dataset Sizes

4.4. YOLOv5-NWD Model Validation

5. Discussion

5.1. Analysis of Model Performance Comparison Results

5.2. Discussion on Sensitivity Analysis of IoU Proportion in YOLOv5-NWD

5.3. Discussion on Sensitivity Analysis of Different Dataset Sizes in YOLOv5-NWD

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Symbols and Notations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI