Open AccessArticle

Enhancing Bolt Object Detection via AIGC-Driven Data Augmentation for Automated Construction Inspection

Jie Wu

^1,2

Beilin Han

Yihang Zhang

Chuyue Huang

¹,

Shengqiang Qiu

³,

Wang Feng

⁴,

Zhiwei Liu

^1,* and

Chao Zou

⁵

School of Civil Engineering and Architecture, Wuhan Polytechnic University, Wuhan 430023, China

Guangdong Provincial Key Laboratory of Intelligent Disaster Prevention and Emergency Technologies for Urban Lifeline Engineering, Dongguan 523808, China

School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 432063, China

⁴

Central & Southern China Municipal Engineering Design and Research Institute Co., Ltd., Wuhan 430010, China

⁵

School of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou 510006, China

Author to whom correspondence should be addressed.

Buildings 2025, 15(5), 819; https://doi.org/10.3390/buildings15050819

Submission received: 6 February 2025 / Revised: 25 February 2025 / Accepted: 3 March 2025 / Published: 5 March 2025

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 3rd Edition)

Download

Browse Figures

Figure 1
Schematic diagram of the Stable Diffusion model architecture. "> Figure 2
The structure of the ViT model. "> Figure 3
Flowchart of the training process of the CLIP model. "> Figure 4
Structural diagram of the U-Net model. "> Figure 5
Workflow chart of the VAE decoder. "> Figure 6
Workflow chart of Stable Diffusion. "> Figure 7
Text prompt in Stable Diffusion WebUI (Version 1.10.0). "> Figure 8
LoRA fine-tuning model process. "> Figure 9
The fine-tuning result of Dreambooth. "> Figure 10
The effect diagram of LoRA fine-tuning. "> Figure 11
Images of data augmentation. "> Figure 12
Diagram of the training process of YOLO (The precision-recall curve for 9 groups). "> Figure 12 Cont.
Diagram of the training process of YOLO (The precision-recall curve for 9 groups). "> Figure 13
The confusion matrix for the training process of YOLO. "> Figure 13 Cont.
The confusion matrix for the training process of YOLO. ">

Versions Notes

Abstract

In the engineering domain, the detection of damage in high-strength bolts is critical for ensuring the safe and reliable operation of equipment. Traditional manual inspection methods are not only inefficient but also susceptible to human error. This paper proposes an automated bolt damage identification method leveraging AIGC (Artificial Intelligence Generated Content) technology and object detection algorithms. Specifically, we introduce the application of AIGC in image generation, focusing on the Stable Diffusion model. Given that the quality of bolt images generated directly by the Stable Diffusion model is suboptimal, we employ the LoRA fine-tuning technique to enhance the model, thereby generating a high-quality dataset of bolt images. This dataset is then used to train the YOLO (You Only Look Once) object detection algorithm, demonstrating significant improvements in both accuracy and recall for bolt damage recognition. Experimental results show that the LoRA fine-tuned Stable Diffusion model significantly enhances the performance of the YOLO algorithm, providing an efficient and accurate solution for automated bolt damage detection. Future work will concentrate on further optimizing the model to improve its robustness and real-time performance, thereby better meeting the demands of practical industrial applications.

Keywords:

high-performance bolts; damage detection; automated detection; AIGC technology; YOLO algorithm

1. Introduction

In recent years, the rapid advancement of artificial intelligence (AI) and computer vision technologies has greatly influenced various industries, including construction and structural safety. Image-based object detection algorithms [1,2,3] have become crucial tools for the automated inspection and maintenance of infrastructure, such as bolts in construction structures [4,5]. These algorithms can automatically identify, classify, and localize target objects within images, thereby significantly enhancing detection efficiency and accuracy. Among these algorithms, the YOLO (You Only Look Once) algorithm has gained widespread application in object detection tasks due to its high efficiency and real-time performance [1]. YOLO is capable of simultaneously performing target localization and classification within a single inference process, which markedly improves detection speed and makes it well-suited for real-time monitoring requirements in industrial settings [2,3,4].

In engineering structures, bolts serve as critical components for connection and fixation. Damage to these components can pose significant safety risks [5,6,7,8]. Traditional methods for detecting bolt damage predominantly rely on manual inspection, which is not only time-consuming and labor-intensive but also susceptible to missed or erroneous detections. In contrast, image-based object detection techniques offer an automated and intelligent approach to detection, significantly enhancing work efficiency and reliability. For instance, in the maintenance of critical infrastructure such as large bridges [9,10] and wind power installations [11,12], automated bolt damage detection systems can rapidly and accurately identify potential issues without disrupting normal operations, thereby ensuring the safe and continuous operation of these facilities.

However, the performance of object detection algorithms is significantly influenced by the quality and quantity of the training data. In practical applications, acquiring a substantial amount of high-quality images of bolt damage often presents significant challenges. Firstly, the occurrence of bolt damage exhibits randomness and diversity, making large-scale collection through conventional methods difficult. Secondly, manual annotation of these images requires considerable time and human resources, thereby limiting the enhancement of the performance of object detection algorithms. Consequently, the effective augmentation of high-quality training data has emerged as an urgent issue that requires immediate attention and resolution.

To address this challenge, AIGC (Artificial Intelligence Generated Content) technology offers a promising solution for data augmentation [13]. By leveraging generative models to produce high-quality image data, AIGC can effectively mitigate the issue of insufficient training samples. Specifically, AIGC employs advanced deep learning models to generate realistic images, thereby enriching the training dataset and enhancing the model’s generalization capability [14,15,16,17]. This approach not only increases the diversity of the training data but also improves the robustness and reliability of object detection algorithms.

Frid-Adar et al. established a deep learning framework based on the prototype of deep convolutional GAN (DCGAN) and utilized it to generate 64×64-pixel-sized CT images of liver lesions [18]. This approach not only enhanced the quality of image generation but also provided additional training samples for medical image analysis. Jia carried out a thorough review of the application of diffusion models in AIGC, emphasizing the importance of these models in image generation [19]. The research indicated that diffusion models are capable of generating high-fidelity images by controlling the process of adding and removing noise and are applicable to complex tasks such as image synthesis, medical imaging, and interactive media creation. Zuo Xianyu et al. put forward a residual diffusion model for generating super-resolution remote sensing images, further improving the details and clarity of the images [20]. Yang conducted a comprehensive analysis of the application of generative artificial intelligence (GAI) in the fields of education and art, emphasizing the significant capabilities of AIGC technology in multimodal content generation [21]. The study demonstrated that AIGC can simultaneously process and understand language, images, videos, and audio, thereby generating relevant content across multiple domains. This capability not only enhances the diversity of educational materials and artistic creations but also provides a novel approach for expanding image datasets. Shao et al. provides a comprehensive review on the recent state of studies involving AIGC in medicine to explore and analyze AIGC in medicine [22]. Jin et al. reported the role of state-of-the-art AIGC technologies, how they will drive design innovation, and identified their promising future applications [23].

As mentioned before, a substantial amount of high-quality images of bolt damage should be obtained to evaluate the damage content of structures. With the help of AIGC, a large number of high-strength bolts can be generated. However, when directly utilized to generate images of high-strength bolts, the quality of the generated images often falls short of expectations. The primary reason for this suboptimal performance is that images of bolt damage exhibit specific geometric structures and texture characteristics that are challenging for general-purpose generative models to fully capture. Stable Diffusion, as an emerging generative model, demonstrates significant capability in producing highly realistic images and is applicable across a wide range of scenarios. Due to the disadvantages of the generated images, it is requisite to fine-tune the Stable Diffusion model for generating a high-quality dataset of bolt images.

The fine-tuning process typically involves the following steps: Firstly, an appropriate pre-trained model is selected as the foundation. Secondly, the model parameters are fine-tuned based on the specific characteristics of bolt damage images. Thirdly, through extensive experimental validation and iterative optimization, the quality and diversity of the generated images are rigorously ensured. Subsequently, the generated dataset was employed to train and verify the performance of the YOLO algorithm, validating its efficacy in the identification of bolt damage.

Therefore, this paper investigates the bolt target detection algorithm utilizing AIGC-based data augmentation techniques. First, we introduced the fabrication of a bolt target detection dataset based on AIGC technology, which includes AIGC, the image generation algorithm, Stable Diffusion, the processing procedure of Stable Diffusion, and the production of the bolt target detection dataset. Second, the generated dataset was utilized for training and performance validation of the YOLO algorithm, confirming its validity in the recognition of bolt damage. Finally, based on the experimental validation, the approach presented in this paper can prominently enhance the performance of the YOLO algorithm, offering an efficacious solution for the automated detection of bolt damage.

2. The Fabrication of a Bolt Target Detection Dataset Based on AIGC Technology

2.1. A Brief Introduction to AIGC

Artificial Intelligence Generated Content (AIGC for brevity) [14] pertains to a set of approaches and procedures for automatically generating contents such as text, images, audio, and video through artificial intelligence techniques. Along with the advancement of deep learning, AIGC has achieved remarkable progress in the domain of content creation, particularly in natural language processing (NLP) [24] and computer vision (CV) [25].

In the field of natural language processing (NLP), AIGC techniques have demonstrated the capability to generate coherent texts, summaries, and even complete narratives and articles. By training on extensive text corpora, these technologies can identify and replicate language patterns and contextual nuances, enabling the production of novel textual content.

In the field of computer vision, AIGC technology has demonstrated the capability to generate high-quality images and videos. These generated visual assets find applications in diverse domains such as game design, film production, and data science, where they can be employed to create synthetic datasets for training machine learning models.

2.2. Image Generation Algorithm

Image generation algorithms represent a critical research direction in the fields of computer vision and machine learning, with the objective of generating novel and realistic image data. Currently, prominent image generation algorithms include Variational Autoencoders (VAEs) [26], Generative Adversarial Networks (GANs) [27], and diffusion models [28].

A Variational Autoencoder (VAE) is a generative model comprising an encoder and a decoder. The encoder maps input data to the parameters of a latent variable distribution, typically the mean and variance, while the decoder reconstructs the data from these latent variables, generating new samples in the data space. The core objective of a VAE is to learn a compact latent representation of the data and enable the generation of novel, realistic data samples. VAE has found extensive applications across various domains, including image generation, text generation, and audio synthesis. Specifically, in image generation tasks, VAE can effectively learn the latent structure of images and generate new images that are similar to the training data. The training process of VAE involves maximizing the Evidence Lower Bound (ELBO), which is achieved by minimizing both the reconstruction error and the Kullback–Leibler (KL) divergence between the learned latent distribution and a prior distribution. A key component of a VAE is the reparameterization trick, which allows gradients to flow through the stochastic sampling process, enabling efficient training via gradient-based optimization methods.

Generative Adversarial Networks (GANs), introduced by Goodfellow et al. in 2014, constitute a powerful generative model that produces high-quality data samples through an adversarial process between a generator and a discriminator. The generator network is responsible for generating realistic synthetic data, while the discriminator network evaluates the authenticity of the data. During training, the generator progressively learns to generate increasingly indistinguishable fake data, while the discriminator concurrently refines its ability to differentiate between real and generated data. This competitive training paradigm has significantly advanced the field of generative modeling, enabling GANs to exhibit extensive application potential across various domains, including image generation, style transfer, and data augmentation.

Diffusion models are generative models based on probabilistic processes, designed to generate new data samples such as images and text. The core principle of these models is to simulate a gradual transformation process where data transitions from its original distribution to a simple noise distribution, followed by a reverse process that reconstructs the original data step by step. In the forward diffusion process, the model incrementally adds noise to the input data until it is fully transformed into noise; in the reverse diffusion process, the model learns to progressively remove the noise to reconstruct high-quality data samples. The training of diffusion models involves two key stages: first, converting the data into noise over multiple time steps, and second, learning the reverse process to reconstruct the data from the noise. During the reverse process, neural networks such as U-Net architectures are typically employed to predict and remove noise at each step. This approach enables diffusion models to maintain the diversity and authenticity of the generated data.

2.3. A Comprehensive Analysis of Stable Diffusion

Stable Diffusion, jointly developed and open-sourced by Stability AI, the CompVis team, and RunwayML, is a robust text-to-image generative model. It is freely available under an open-source license, marking a significant milestone in the evolution of AI-driven image generation. This model has facilitated substantial advancements in various applications, including creative design, content generation, and research.

The Stable Diffusion framework is not a monolithic model but an integrated system comprising multiple components. Specifically, it consists of three key modules: the ClipText text encoder, the U-Net image generator, and the VAE decoder. The schematic diagram of the Stable Diffusion model architecture is illustrated in Figure 1.

(1): Contrastive Language–Image Pre-training Model

CLIP (Contrastive Language–Image Pre-training) [29], a multimodal pre-training neural network model introduced by OpenAI, undergoes pre-training with a vast amount of image–text paired data to acquire the alignment relationship between images and texts. The core concept of CLIP lies in applying self-supervised learning approaches to map images and texts into the same vector space, enabling semantically related images and texts to be proximate to each other within the vector space.

The CLIP model comprises two primary encoders: a text encoder and an image encoder. The text encoder is based on the Transformer architecture, which converts textual input into vector representations. In contrast, the image encoder utilizes the Vision Transformer (ViT) architecture to transform visual input into corresponding vector representations.

The Transformer model is a deep learning architecture based on the self-attention mechanism, first introduced by Vaswani et al. in 2017 [30]. The key innovation of this model lies in its ability to process sequential data without relying on the recurrent structure typical of traditional Recurrent Neural Networks (RNNs), thereby enhancing its efficiency in handling long-distance dependencies. The core components of the Transformer include the Multi-Head Self-Attention mechanism and Positional Encoding, which enable the model to establish direct dependencies between any two positions within the sequence irrespective of their relative positions.

The Transformer model has achieved groundbreaking advancements in the field of natural language processing (NLP), particularly in tasks such as machine translation, text summarization, and language modeling. One of its most notable characteristics is its scalability; by increasing the depth and width of the model architecture and expanding the scale of training data, the performance of the model can be significantly enhanced. Moreover, the self-attention mechanism inherent in the Transformer model facilitates its application in multimodal tasks. For instance, the text encoder component of the aforementioned CLIP model is built upon the Transformer architecture.

The Vision Transformer (ViT) is an advanced deep learning model that extends the revolutionary Transformer architecture from natural language processing (NLP) to computer vision tasks [31]. The key innovation of the ViT lies in its departure from traditional convolutional neural network (CNN) structures and its adoption of the multi-head self-attention mechanism for image data processing. This shift represents a significant breakthrough in image processing methodologies and introduces a novel perspective to the field of computer vision. The model structure is illustrated in Figure 2.

In the Vision Transformer (ViT), images are first segmented into fixed-size patches, which are then linearly projected into a high-dimensional space to form a sequence. This sequence is subsequently fed into the Transformer model, where each patch is treated as an element in the sequence. Through this approach, the ViT can effectively capture local features within the image and learn the global relationships among these features via the self-attention mechanism. This global perspective provides the ViT with a unique advantage in comprehending the content of images.

Another key characteristic of the ViT is its scalability. By adjusting the model’s size and depth, as well as the scale of the training data, the performance of the ViT can be significantly enhanced. This scalability enables the ViT to adapt to a wide range of visual tasks, including but not limited to image classification, object detection, and semantic segmentation. The ViT’s performance in these tasks is comparable to that of state-of-the-art CNN-based models at this time and, in some cases, even surpasses them. The success of the ViT has also spurred interest in multimodal learning tasks, such as those combining images and text. In these tasks, the image encoder of the ViT can be integrated with the text encoder of the Transformer to achieve cross-modal understanding and generation. This cross-modal capability offers new possibilities for developing AI systems capable of comprehending and generating complex visual content. In the CLIP model, the ViT serves as the image encoder.

During the pre-training phase, CLIP employs contrastive learning to construct a similarity matrix, thereby learning the matching relationships between images and text pairs. The training process is illustrated in Figure 3.

A key characteristic of CLIP is its ability to perform predictions on new images or texts without having been trained on specific categories, a capability known as zero-shot learning. In zero-shot learning tasks, CLIP can classify inputs by computing the cosine similarity between text prompts and images, obviating the need for task-specific training data. This approach has yielded remarkable performance across various domains, including image classification, image–text retrieval, and image–text generation. Moreover, the CLIP model has demonstrated strong performance in diverse applications such as OCR, geolocation, and action recognition.

The training process of the CLIP model consists of two distinct phases. In the first phase, contrastive learning is employed to establish correlations between text descriptions and images. In the second phase, a classification task is introduced to enable the model to associate text prompts with their corresponding images. The training process is illustrated in Figure 3. Overall, CLIP is a robust multimodal model that achieves outstanding performance across various visual and language tasks; however, it also faces certain limitations and challenges.

(2): U-Net-Based Image Generator

The Stable Diffusion model employs the U-Net architecture [32] as its core image generator. Initially developed for biomedical image segmentation tasks, U-Net is named for its distinctive U-shaped structure. In the context of Stable Diffusion, the U-Net framework is utilized to extract and reconstruct features from training images, enabling the model to generate accurate and diverse image data even with a limited number of training samples. The key advantage of U-Net lies in its ability to capture details at multiple scales, thereby enhancing the precision and quality of generated images. The model structure is illustrated in Figure 4.

During the training of diffusion models, the addition of noise is a critical component, and the Scheduler defines the algorithm for progressively adding and removing noise. The Scheduler governs the denoising steps within the diffusion process, determining both the stochastic nature of these steps and the methodology for reconstructing clear image samples from noisy data. As such, the Scheduler functions as a key sampling mechanism, directly influencing the model’s performance and the quality of the final generated images.

In the Stable Diffusion model, the U-Net image generator and the Scheduler must collaborate to accomplish the image generation task. U-Net leverages its encoder–decoder architecture to process image data, while the Scheduler regulates the progressive addition and removal of noise throughout the generation process.

Specifically, the Scheduler defines a progressive diffusion process from pure noise to the target image. In the initial stages of generation, the Scheduler introduces a substantial amount of noise, rendering the U-Net’s input highly stochastic. Over time, the Scheduler gradually reduces the noise level. At each time step, the U-Net predicts and removes noise, progressively reconstructing clearer image features. This process involves multiple convolutional layers and skip connections within the U-Net, which collectively refine the details of the image.

The Scheduler not only governs the reduction of noise but also introduces stochastic elements during the generation process to enhance the diversity of images generated by the model. This stochasticity is achieved by employing distinct noise patterns at different time steps or dynamically adjusting the noise level. Consequently, U-Net adapts to these variations and learns to reconstruct high-quality images under a wide range of conditions.

Throughout the entire generation process, the close collaboration between U-Net and the Scheduler is essential for the progressive enhancement of image quality. The prediction network component of U-Net is responsible for forecasting the next noise pattern based on the current noisy image and historical information, while the Scheduler dynamically adjusts the noise addition strategy for subsequent time steps according to these predictions. This dynamic interaction enables the Stable Diffusion model to generate highly realistic images while preserving image diversity.

(3): VAE Decoding Unit

Within the framework of the Stable Diffusion model, the VAE decoder serves to transform latent variables into observable images. As a critical component of the generative model, the VAE decoder collaborates with the encoder to achieve efficient data compression and high-fidelity reconstruction. The image generation process facilitated by the decoder is illustrated in Figure 5.

The VAE decoder is typically a deep neural network that takes the latent representation obtained from the encoder as input and outputs the pixel values of the image. In the context of Stable Diffusion, the decoder is specifically designed to reverse the diffusion process, progressively reconstructing a clear image from the noisy latent space. This involves precisely predicting the noise reduction at each time step and dynamically adjusting the image generation process accordingly.

The VAE decoder’s strength lies in its ability to generate novel data samples by learning the latent distribution of the data. In the context of image generation, this capability enables it to produce diverse and realistic images, even when the latent space representation is highly abstract. The decoder’s network architecture typically includes multiple transposed convolutional layers (also known as deconvolutional layers), which progressively upscale the feature maps while simultaneously incorporating contextual information from the encoder through skip connections.

During the training process of Stable Diffusion, the decoder’s objective is to maximize the log-likelihood of the generated images while ensuring the diversity of latent representations and maintaining high-quality image generation. To achieve this goal, the decoder undergoes continuous optimization to ensure that the generated images are not only visually satisfactory but also statistically consistent with the real data distribution.

2.4. The Processing Procedure for Stable Diffusion

The working process of Stable Diffusion is depicted as shown in Figure 6. The procedure of generating images from text by means of Stable Diffusion is presented as follows:

(1) Stable Diffusion generates random tensors in the latent space, which can be controlled by setting the seed of the random number generator. By fixing the seed to a specific value, the same random tensor can be consistently reproduced.

(2) The U-Net network takes the latent noisy image and text prompt as inputs and predicts the noise, also represented as a 4×64×64 tensor in the latent space.

(3) The predicted latent noise is subtracted from the latent image, resulting in an updated latent image.

(4) Steps 2 and 3 are iterated for a predefined number of sampling steps, typically 20 iterations.

(5) Finally, the VAE decoder transforms the refined latent image back into the pixel space.

2.5. Fine-Tuning of Stable Diffusion Based on LoRA

Stable Diffusion, as an advanced latent diffusion model for text-to-image generation, demonstrates the capability to produce high-quality and diverse images. However, in specific application scenarios, the pre-trained model may not fully meet user requirements. Therefore, fine-tuning is essential to enhance the model’s performance and adaptability. The goal of fine-tuning is to incorporate domain-specific knowledge while preserving the model’s generalization ability, enabling it to generate images that better align with specific needs based on text prompts. In this study, Stable Diffusion is required to generate images of bolts. However, the pre-trained model lacks the capability to generate such images accurately. Without fine-tuning, the generated bolt images exhibit poor quality. As shown in Figure 7, when the prompt words “bolt”, “metallic texture”, “high-strength bolt”, and “hexagonal bolt” are input into the Stable Diffusion WebUI, the resulting images demonstrate significant deficiencies. Consequently, the un-fine-tuned Stable Diffusion model is unsuitable for expanding the bolt target detection dataset. Thus, the fine-tuning technique of Stable Diffusion becomes a critical focus of this chapter.

Stable Diffusion [33] has four common fine-tuning approaches: Textual Inversion [34], Hypernetwork [35], Dreambooth [36], and LoRA [37].

Textual Inversion is a technique that trains the model to associate specific text prompts with corresponding image features. By introducing new embeddings into the text encoder, this method enables the model to learn and generate images based on detailed text descriptions. Specifically, Textual Inversion allows the model to recognize and produce image characteristics that are closely aligned with the newly created embeddings, thereby enhancing its ability to generate contextually relevant images.

A Hypernetwork is a compact neural network integrated with the Stable Diffusion model, designed to modify its style by adjusting the cross-attention mechanisms. Typically, a Hypernetwork has a size of less than 200 MB and cannot function independently. Instead, it must be used in conjunction with a larger model, such as a Checkpoint model, to facilitate image generation.

Dreambooth is a fine-tuning method that trains models using a limited set of example images, enabling the generation of images with specific styles or characteristics. This approach is particularly suitable for scenarios where datasets have higher tolerance for variability and can effectively handle abstract or generalized concepts.

LoRA (low-rank adaptation) is a fine-tuning technique that achieves its objectives by introducing minor modifications to the cross-attention layers of the model. Compared to Hypernetwork models, LoRA models are typically smaller in size while maintaining a favorable balance between training performance and file size. This characteristic endows LoRA models with advantages in storage efficiency and training capacity.

LoRA fine-tuning offers an efficient and resource-saving method for adapting the Stable Diffusion model. It adjusts the model’s generation capabilities through low-rank adaptation layers, without modifying the original model’s parameters. By using LoRA fine-tuning, Stable Diffusion can be adapted to generate specific categories of images (bolt images) in a short time, while avoiding the high computational cost associated with updating a large number of parameters in traditional fine-tuning methods.

As shown in Figure 8, during LoRA fine-tuning, the original weight matrix

W \in R^{d_{o u t} \times d_{i n}}

is frozen and does not participate in the update. Only the low-rank matrices

U \in R^{r \times d_{i n}}

and

D \in R^{d_{o u t} \times r}

have their parameters updated during training, where the rank

r ≪ d_{i n}, d_{o u t}

. The introduction of low-rank matrices

U

and D allows for adaptation of the original weight matrix. Their product

A = U \times D

is added to the original weight matrix WWW, resulting in a new weight matrix

W^{*} = W + A

, which produces the final fine-tuned output

Y

When applying LoRA to Stable Diffusion (SD), it is common to fine-tune all the parameters of the multi-head attention modules within the U-Net, specifically adjusting the four key matrices of the multi-head attention module (

W_{Q}, W_{K}, W_{V}, W_{O}

). This approach enables LoRA to effectively optimize these matrices to meet task-specific requirements while keeping the original weights unchanged. Not only does this method reduce the computational load and storage requirements, but it also speeds up the fine-tuning process, especially when dealing with large-scale pre-trained models.

Among these methods, Dreambooth and LoRA are currently the most prevalent fine-tuning solutions for Stable Diffusion. To determine the most suitable fine-tuning approach, this study conducted experiments by fine-tuning the model with both Dreambooth and LoRA, subsequently generating images of bolts. Figure 9 and Figure 10, respectively, illustrate the generation results from the models fine-tuned using Dreambooth and LoRA.

Figure 9 illustrates the image generation results after fine-tuning the Stable Diffusion model using Dreambooth. The model successfully captures certain metallic and threaded characteristics typical of bolts, such as their general shape and texture. However, it struggles to generate realistic and complete bolt images. Notably, while the bolts appear somewhat recognizable, the details and the consistency in terms of shapes, reflections, and surface textures are not fully captured, leading to incomplete representations. This limitation is evident in the artifacts and lack of fine-grained features that are crucial for realistic bolt representations.

Figure 10 demonstrates the results of fine-tuning the Stable Diffusion model using LoRA. The model shows significant improvements in generating realistic and detailed bolt images, with accurate representation of metallic surfaces, threading, and finer textures. Compared to the Dreambooth fine-tuned model, LoRA achieves more complete and realistic bolt renderings, with better consistency in shape, texture, and reflection details. The effectiveness of LoRA in fine-tuning is evident in the enhanced image quality, which addresses the shortcomings observed in Figure 9. This is achieved while maintaining computational efficiency, which makes LoRA a more suitable approach for this task.

Therefore, considering factors such as image generation quality, computational resources, and training efficiency, this study opted for LoRA to fine-tune the Stable Diffusion model.

2.6. Production of Bolt Target Detection Dataset

In this study, images of bolts were captured using unmanned aerial vehicles (UAVs) and smartphones. A total of 810 images were collected for the bolt target detection dataset, with 750 images allocated to the training set and 60 to the validation set. Among these 810 images, the annotations included 1087 bounding boxes for bolts, 327 for corrosion bolts, and 360 for loosened bolts. After manual screening and cropping, 50 high-quality bolt images were selected for fine-tuning the Stable Diffusion model. These images were carefully chosen to ensure the model received high-quality data, improving the accuracy and robustness of the generated results. Among these 50 images, 15 were bolt images, 20 were corrosion bolt images, and 15 were loosened bolt images.

Through extensive experimentation, it was found that generating a single bolt image per generation process yields the highest image quality. Therefore, this study utilized Stable Diffusion to generate individual bolt images and subsequently combined four separate bolt images into a composite image, with each composite image containing four bolts. In total, 200 images were generated, and each composite image was formed by merging four individual bolt images, ensuring high-quality image generation while including different types of bolts in a single image. These generated images were then manually annotated to mark the positions and damage states of the bolts. The image generation results are illustrated in Figure 11.

After enhancement using Stable Diffusion, the training set expanded to 800 images. This includes the original 750 images in the training set, combined with the newly generated 50 composite bolt images. By annotating the enhanced dataset, a robust bolt target detection dataset was constructed, improving the model’s ability to identify bolts and their various damage conditions.

3. The YOLO Algorithm and Its Performance Validation

The YOLO (You Only Look Once) algorithm is a highly efficient object detection method widely employed in real-time object detection tasks. In this study, we will employ three versions of the YOLO algorithm—YOLOv5, YOLOv8, and YOLOv11—for bolt damage identification. Through comparative experiments, we aim to verify whether the dataset generated by fine-tuning the Stable Diffusion model with LoRA can enhance the performance of the YOLO algorithms. The experimental results will demonstrate the performance improvements of different YOLO versions on the augmented dataset.

3.1. Experimental Configuration and Parameters

The YOLO (You Only Look Once) algorithm is a highly efficient object detection method widely utilized in real-time object detection applications. In this study, we propose to augment the dataset using two fine-tuning techniques: Dreambooth and LoRA. Specifically, we will employ three versions of the YOLO algorithm—YOLOv5, YOLOv8, and YOLOv11—for the task of bolt damage identification. Through a series of comparative experiments, we aim to evaluate whether the dataset generated by the Stable Diffusion model fine-tuned with LoRA can improve the performance of the YOLO algorithms. The experimental results will provide insights into the performance gains achieved by different YOLO versions when trained on the augmented dataset.

3.2. Evaluation Indexes

To evaluate the model’s performance, this study employs mean average precision (mAP) on the validation set as the evaluation metric. mAP is defined as the average of the average precisions (APs) across multiple classification tasks. AP, in turn, is calculated as the area under the precision–recall (PR) curve. Therefore, before computing mAP, it is necessary to first derive the PR curve using precision and recall values. The formulas for calculating precision and recall are provided below.

P r e c i s i o n = \frac{T P}{T P + F P} = \frac{T P}{all detections}

(1)

R e c a l l = \frac{T P}{T P + F N} = \frac{T P}{all ground truths}

(2)

3.3. Experimental Environment and Hyperparameters

Our experimental hardware environment is as follows in Table 1.

To ensure the fairness of training, the exact same parameter settings were used for all models of the experiment, and the specific information is shown in Table 2.

3.4. Experimental Results

To evaluate the effectiveness of data augmentation using LoRA fine-tuning of Stable Diffusion, this study divides the dataset into three categories: the original dataset, the dataset augmented by Dreambooth fine-tuned Stable Diffusion, and the dataset augmented by LoRA fine-tuned Stable Diffusion. Performance evaluations are conducted across three YOLO models: YOLOv5, YOLOv8, and YOLOv11. The experimental setup is detailed in Table 3.

As indicated in Table 3, Groups 1–3 utilize YOLOv5, Groups 4–6 employ YOLOv8, and Groups 7–9 adopt YOLOv11 as their respective experimental models. For performance validation across these three distinct YOLO algorithms, the following datasets are used: the original bolt dataset, the DB-SD augmented dataset (generated by fine-tuning Stable Diffusion using Dreambooth), and the LoRA-SD augmented dataset (generated by fine-tuning Stable Diffusion using LoRA). The training process and model mAP results are illustrated in Figure 12.

As shown in Table 4, all three YOLO algorithms demonstrated significantly improved detection performance (mAP) on the dataset augmented using LoRA-fine-tuned Stable Diffusion. Specifically, YOLOv5n increased from 92.4% to 95.3%, YOLOv8n from 91.9% to 96.1%, and YOLOv11n from 91.7% to 97.0%. In contrast, none of the three algorithms achieved performance improvements on the dataset augmented using Dreambooth-fine-tuned Stable Diffusion; notably, the YOLOv5 and YOLOv8 models experienced performance degradation. This can be attributed to the significant discrepancy between the bolt images generated by the Dreambooth-fine-tuned model and the actual bolt images, leading to difficulties in training convergence. Therefore, the Dreambooth fine-tuning approach is not suitable for data augmentation in industrial inspection datasets. The experimental results indicate that image data generated by the LoRA-fine-tuned Stable Diffusion model can substantially enhance the performance of YOLO algorithms. These findings validate the effectiveness of using fine-tuned Stable Diffusion models for dataset augmentation in object detection tasks. In order to show the results better, the confusion matrix for the training process of YOLO can be seen in Figure 13.

4. Conclusions

This study introduces an advanced bolt damage recognition method leveraging AIGC technology and object detection algorithms. Specifically, the Stable Diffusion model is fine-tuned using LoRA to generate a high-quality dataset of bolt images. The effectiveness of this dataset in enhancing the performance of the YOLO algorithm has been rigorously validated. Experimental results demonstrate that the dataset generated through LoRA fine-tuning of the Stable Diffusion model significantly improves both the accuracy and recall rate of the YOLO algorithm, thereby providing a robust and efficient solution for automated bolt damage detection.

Compared to prior research, this study employs the Stable Diffusion model and LoRA fine-tuning technique, which have demonstrated superior performance in both image generation quality and training efficacy. LoRA fine-tuning is particularly adept at adapting to the specific task of generating bolt images, resulting in highly realistic visual outputs that closely resemble actual bolt images. This capability provides a robust dataset for training object detection algorithms. Relative to Dreambooth fine-tuning, LoRA fine-tuning achieves a more optimal balance between image fidelity and training efficiency, leading to significant improvements in the performance of the YOLO algorithm.

Furthermore, this study selected the YOLO series algorithms (YOLOv5, YOLOv8, and YOLOv11) for object detection due to their superior efficiency and real-time performance. These algorithms have been widely adopted in various object detection tasks because of their computational efficiency and speed. Compared with traditional detection methods, the AIGC-based data augmentation approach has achieved significant improvements in performance metrics. Specifically, the mean average precision (mAP) of YOLOv5 increased from 92.4% to 95.3%, that of YOLOv8 increased from 91.9% to 96.1%, and that of YOLOv11 increased from 91.7% to 97.0%. These results demonstrate that integrating AIGC technology with object detection algorithms not only addresses the issue of insufficient training data but also substantially enhances the overall detection performance.

Despite the significant achievements of this study, several limitations remain. The generated images may not fully replace real images in certain complex industrial environments, particularly under challenging lighting conditions or in the presence of substantial background noise. Additionally, this study has only validated the AIGC data augmentation approach on the YOLO series of algorithms. Future research should consider applying this method to other object detection algorithms to further assess its generalizability. Moreover, the dataset used in this study is relatively limited in size. To further optimize model performance, future work should involve larger-scale datasets.

The theoretical contribution of this research lies in introducing a novel approach for bolt damage recognition that integrates AIGC technology with object detection algorithms. This method offers a fresh perspective on addressing the challenge of insufficient data in industrial inspection tasks. By optimizing the Stable Diffusion model through LoRA fine-tuning, this study not only enhances the performance of object detection algorithms but also provides robust support for the application of AIGC technology in industrial settings. In terms of practical application, the automated bolt damage detection method proposed herein holds significant practical value. It effectively reduces the labor intensity and error rate associated with manual inspections, thereby ensuring the safe operation of industrial equipment. Future work will focus on further refining the model and improving detection performance to promote the widespread adoption of this method in the industrial sector.

Author Contributions

Investigation, C.H.; Resources, C.H., Z.L. and W.F.; Data curation, C.H. and W.F.; Writing—original draft, C.H., B.H., Y.Z. and S.Q.; Validation, B.H.; Conceptualization, J.W.; Supervision, C.Z. and J.W.; Funding acquisition, J.W.; Writing—reviewing and editing, C.Z., J.W. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

The corresponding author wishes to thank the Open Project Program of Guangdong Provincial Key Laboratory of Intelligent Disaster Prevention and Emergency Technologies for Urban Lifeline Engineering (No. 2022ZB04) for their support.

Data Availability Statement

The data presented in this study are available upon reasonable request.

Conflicts of Interest

Author Wang Feng was employed by the company Central & Southern China Municipal Engineering Design and Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xu, L.; Zhao, Y.; Zhai, Y.; Huang, L.; Ruan, C. Small Object Detection in UAV Images Based on YOLOv8n. Int. J. Comput. Int. Sys. 2024, 17, 223. [Google Scholar] [CrossRef]
Lawal, O.M. Real-Time Cucurbit Fruit Detection in Greenhouse Using Improved YOLO Series Algorithm. Precis. Agric. 2024, 25, 347–359. [Google Scholar] [CrossRef]
Nan, G.; Zhao, Y.; Lin, C.; Ye, Q. General Optimization Methods for YOLO Series Object Detection in Remote Sensing Images. IEEE Signal Process. Lett. 2024, 31, 2860–2864. [Google Scholar] [CrossRef]
Gao, Z.; Li, Y.; Chen, Z.; Asif, M.; Xu, L.; Li, X.; Aaron Gulliver, T. Intelligent Spectrum Sensing of Consumer IoT Based on GAN-GRU-YOLO. IEEE Trans. Consum. Electron. 2024, 70, 6140–6148. [Google Scholar] [CrossRef]
Yang, Y.; Cheng, H.; Du, K.; Liang, B.; Hu, W.; Luo, B.; Zhang, K. Microscale Damage Modeling of Bolt-Hole Contact Interface during the Bolt Installation Process of Composite Structure. Compos. Struct. 2022, 291, 115561. [Google Scholar] [CrossRef]
Champati, A.; Voggu, S.; Lute, V. Detection of Damage in Bolted Steel Structures Using Vibration Signature Analysis. J. Vib. Eng. Technol. 2024, 12, 1399–1412. [Google Scholar] [CrossRef]
Li, X.; Zheng, B.; Chen, Y.; Zou, C. A hybrid methodology for estimating train-induced rigid foundation building vibrations. Constr. Build. Mater. 2025, 460, 139852. [Google Scholar] [CrossRef]
Zou, C.; Li, X.; He, C.; Zhou, S. An efficient method for estimating building dynamic response due to train operations in tunnel considering transmission path from source to receiver. Comput. Struct. 2024, 305, 107555. [Google Scholar] [CrossRef]
Li, Z.; Shao, P.; Zhao, M.; Yan, K.; Liu, G.; Wan, L.; Xu, X.; Li, K. Optimized Deep Learning for Steel Bridge Bolt Corrosion Detection and Classification. J. Constr. Steel Res. 2024, 215, 108570. [Google Scholar] [CrossRef]
Tao, Z.; Zhang, D.; Tu, D.; He, L.; Zou, C. Prediction of train-induced ground-borne vibration transmission considering parametric uncertainties. Probab. Eng. Mech. 2025, 79, 103731. [Google Scholar] [CrossRef]
Huang, H.; Wang, Y.; Pang, Q. Analysis and Prediction of Wind Turbine Bolts Based on GPR Method. J. Mech. Sci. Technol. 2023, 37, 1155–1164. [Google Scholar] [CrossRef]
Tao, T.; Yang, Y.; Yang, T.; Liu, S.; Guo, X.; Wang, H.; Liu, Z.; Chen, W.; Liang, C.; Long, K.; et al. Time-Domain Fatigue Damage Assessment for Wind Turbine Tower Bolts under Yaw Optimization Control at Offshore Wind Farm. Ocean Eng. 2024, 303, 117706. [Google Scholar] [CrossRef]
Chen, Y.; Zhao, Z.; Liu, J.; Tan, S.; Liu, C. Application of Generative AI-Based Data Augmentation Technique in Transformer Winding Deformation Fault Diagnosis. Eng. Fail. Anal. 2024, 159, 108115. [Google Scholar] [CrossRef]
Li, F.; Ge, J.; Wang, X.; Zhao, G.; Yu, X.; Li, X. Privacy-Preserving Vertical Federated Broad Learning System for Artificial Intelligence Generated Image Content. J. Real-Time Image Proc. 2024, 21, 14. [Google Scholar] [CrossRef]
Wang, B.; Yang, F. Lightweight and Privacy-Preserving Hierarchical Federated Learning Mechanism for Artificial Intelligence-Generated Image Content. J. Real-Time Image Proc. 2024, 21, 149. [Google Scholar] [CrossRef]
Zhang, J.; Sun, L.; Jin, C.; Gao, J.; Li, X.; Luo, J.; Pan, Z.; Tang, Y.; Wang, J. Recent Advances in Artificial Intelligence Generated Content. Front. Inform. Technol. Electron. Eng. 2024, 25, 1–5. [Google Scholar] [CrossRef]
Vijendran, M.; Deng, J.; Chen, S.; Ho, E.S.L.; Shum, H.P.H. Artificial Intelligence for Geometry-Based Feature Extraction, Analysis and Synthesis in Artistic Images: A Survey. Artif. Intell. Rev. 2024, 58, 64. [Google Scholar] [CrossRef]
Frid-Adar, M.; Diamant, I.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. GAN-Based Synthetic Medical Image Augmentation for Increased CNN Performance in Liver Lesion Classification. Neurocomputing 2018, 321, 321–331. [Google Scholar] [CrossRef]
Jia, Y. A Comprehensive Review of Diffusion Models in AI-Generated Content for Image Applications. ACE 2024, 94, 197–202. [Google Scholar] [CrossRef]
Zuo, X.; Tian, Z.; Yin, M.; Dang, L.; Qiao, B.; Liu, Y.; Xie, Y. Remote sensing super-resolution image generation based on residual diffusion mode. J. Henan Norm. Univ. 2025, 1–8. [Google Scholar]
Yang, J.; Zhang, H. Development and Challenges of Generative Artificial Intelligence in Education and Art. HSET 2024, 85, 1334–1347. [Google Scholar] [CrossRef]
Shao, L.; Chen, B.; Zhang, Z.; Zhang, Z.; Chen, X. Artificial Intelligence Generated Content (AIGC) in Medicine: A Narrative Review. Math. Biosci. Eng. 2024, 21, 1672–1711. [Google Scholar] [CrossRef] [PubMed]
Jin, J.; Yang, M.; Hu, H.; Guo, X.; Luo, J.; Liu, Y. Empowering Design Innovation Using AI-Generated Content. J. Eng. Des. 2025, 36, 1–18. [Google Scholar] [CrossRef]
Li, B.; Yang, P.; Sun, Y.; Hu, Z.; Yi, M. Advances and Challenges in Artificial Intelligence Text Generation. Front. Inform. Technol. Electron. Eng. 2024, 25, 64–83. [Google Scholar] [CrossRef]
Safiya, K.M.; Pandian, R. A Real-Time Image Captioning Framework Using Computer Vision to Help the Visually Impaired. Multimed. Tools Appl. 2023, 83, 59413–59438. [Google Scholar] [CrossRef]
Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2022, arXiv:1312.6114. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Sohl-Dickstein, J.; Weiss, E.A.; Maheswaranathan, N.; Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2015; pp. 2256–2265. [Google Scholar]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models from Natural Language Supervision. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2015; pp. 8748–8763. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale 2021. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar]
Yuan, Z.; Li, L.; Wang, Z.; Zhang, X. Watermarking for Stable Diffusion Models. IEEE Internet Things J. 2024, 11, 35238–35249. [Google Scholar] [CrossRef]
Tai, Y.; Yang, K.; Peng, T.; Huang, Z.; Zhang, Z. Defect Image Sample Generation with Diffusion Prior for Steel Surface Defect Recognition. IEEE Trans. Autom. Sci. Eng. 2024, 1–13. [Google Scholar] [CrossRef]
Jiang, X.; Wang, Z.; Liu, W. Information Dissemination in Dynamic Hypernetwork. Phys. A 2019, 532, 121578. [Google Scholar] [CrossRef]
Wang, Z.; Wang, X.; Xie, L.; Qi, Z.; Shan, Y.; Wang, W.; Luo, P. StyleAdapter: A Unified Stylized Image Generation Model. Int. J. Comput. Vis. 2024. [Google Scholar] [CrossRef]
Zhang, M.; Yang, J.; Xian, Y.; Li, W.; Gu, J.; Meng, W.; Zhang, J.; Zhang, X. AG-SDM: Aquascape Generation Based on Stable Diffusion Model with Low-rank Adaptation. Comput. Anim. Virtual Worlds 2024, 35, e2252. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the Stable Diffusion model architecture.

Figure 2. The structure of the ViT model.

Figure 3. Flowchart of the training process of the CLIP model.

Figure 4. Structural diagram of the U-Net model.

Figure 5. Workflow chart of the VAE decoder.

Figure 6. Workflow chart of Stable Diffusion.

Figure 7. Text prompt in Stable Diffusion WebUI (Version 1.10.0).

Figure 8. LoRA fine-tuning model process.

Figure 9. The fine-tuning result of Dreambooth.

Figure 10. The effect diagram of LoRA fine-tuning.

Figure 11. Images of data augmentation.

Figure 12. Diagram of the training process of YOLO (The precision-recall curve for 9 groups).

Figure 13. The confusion matrix for the training process of YOLO.

Table 1. Experimental environment parameters.

Name	Parameter Information
CPU	Intel Core-i7-13650HX (Intel Corporation, Santa Clara, CA, USA)
GPU	NVIDIA GeForce RTX 4060 8 GB (NVIDIA Corporation, Santa Clara, CA, USA)
Memory	32 GB
Operating system	Windows 10 (Microsoft Corporation, Redmond, WA, USA)
Development language	Python

Table 2. YOLO training parameters.

Name	Parameter Information
Epochs	100
Optimizer	AdamW
Initial learning rate	0.001
Batch size	16
Image size	640 × 640
Learning rate decay strategy	Cosine Annealing

Table 3. Experimental division.

Experimental Categorization	Dataset	Model Selection
Group 1	Original	YOLOv5
Group 2	DB-SD Augmented	YOLOv5
Group 3	LoRA-SD Augmented	YOLOv5
Group 4	Original	YOLOv8
Group 5	DB-SD Augmented	YOLOv8
Group 6	LoRA-SD Augmented	YOLOv8
Group 7	Original	YOLOv11
Group 8	DB-SD Augmented	YOLOv11
Group 9	LoRA-SD Augmented	YOLOv11

Table 4. Performance assessment of the YOLO algorithm.

Models	Datasets	Epochs	AP			mAP
Models	Datasets	Epochs	Bolt	Corrosion	Loosened	mAP
YOLOv5n	Original	100	0.961	0.857	0.955	0.924
	DB-SD Augmented		0.961	0.847	0.953	0.920
	LoRA-SD Augmented		0.952	0.981	0.927	0.953
YOLOv8n	Original		0.941	0.872	0.945	0.919
	DB-SD Augmented		0.964	0.808	0.916	0.896
	LoRA-SD Augmented		0.965	0.989	0.930	0.961
YOLOv11n	Original		0.946	0.868	0.937	0.917
	DB-SD Augmented		0.951	0.869	0.933	0.918
	LoRA-SD Augmented		0.970	0.995	0.947	0.970

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, J.; Han, B.; Zhang, Y.; Huang, C.; Qiu, S.; Feng, W.; Liu, Z.; Zou, C. Enhancing Bolt Object Detection via AIGC-Driven Data Augmentation for Automated Construction Inspection. Buildings 2025, 15, 819. https://doi.org/10.3390/buildings15050819

AMA Style

Wu J, Han B, Zhang Y, Huang C, Qiu S, Feng W, Liu Z, Zou C. Enhancing Bolt Object Detection via AIGC-Driven Data Augmentation for Automated Construction Inspection. Buildings. 2025; 15(5):819. https://doi.org/10.3390/buildings15050819

Chicago/Turabian Style

Wu, Jie, Beilin Han, Yihang Zhang, Chuyue Huang, Shengqiang Qiu, Wang Feng, Zhiwei Liu, and Chao Zou. 2025. "Enhancing Bolt Object Detection via AIGC-Driven Data Augmentation for Automated Construction Inspection" Buildings 15, no. 5: 819. https://doi.org/10.3390/buildings15050819

APA Style

Wu, J., Han, B., Zhang, Y., Huang, C., Qiu, S., Feng, W., Liu, Z., & Zou, C. (2025). Enhancing Bolt Object Detection via AIGC-Driven Data Augmentation for Automated Construction Inspection. Buildings, 15(5), 819. https://doi.org/10.3390/buildings15050819

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu