[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
An Efficient and Accurate Adaptive Time-Stepping Method for the Landau–Lifshitz Equation
Previous Article in Journal
UAV (Unmanned Aerial Vehicle): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking
Previous Article in Special Issue
Data Assimilated Atmospheric Forecasts for Digital Twin of the Ocean Applications: A Case Study in the South Aegean, Greece
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Disassembly of Distribution Transformers Based on Multimodal Data Recognition and Collaborative Processing

1
Materials Branch, State Grid Zhejiang Electric Power Co., Ltd., Hangzhou 310015, China
2
Zhejiang Huadian Equipment Testing and Research Insitute Co., Ltd., Hangzhou 310015, China
3
Huzhou Institute of Zhejiang University, Huzhou 313000, China
4
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027, China
*
Authors to whom correspondence should be addressed.
Algorithms 2024, 17(12), 595; https://doi.org/10.3390/a17120595
Submission received: 25 November 2024 / Revised: 20 December 2024 / Accepted: 20 December 2024 / Published: 23 December 2024
Figure 1
<p>General scheme for transformer disassembly system.</p> ">
Figure 2
<p>Schematic diagram of the transformer cover disassembly process.</p> ">
Figure 3
<p>Schematic diagram of the disassembly process of the internal components of the transformer.</p> ">
Figure 4
<p>Overall flowchart of transformer copper wire winding stripping method.</p> ">
Figure 5
<p>Two types of transformer cover. (<b>a</b>) Screw-fastened cover. (<b>b</b>) Welded cover.</p> ">
Figure 6
<p>Process of target detection.</p> ">
Figure 7
<p>Diagram of the target detection effect.</p> ">
Figure 8
<p>Result of the transformer cover segmentation.</p> ">
Figure 9
<p>The process of the ODIN modeling transformer.</p> ">
Figure 10
<p>The process of segmentation and identification of internal components of the transformer.</p> ">
Figure 11
<p>Constant-tension intelligent unwinding system structure.</p> ">
Figure 12
<p>Intelligent wire removal system.</p> ">
Figure 13
<p>Architecture of the RCF network.</p> ">
Figure 14
<p>Transformer disassembly area in the new established disassembly factory.</p> ">
Figure 15
<p>Transformer interior before disassembly.</p> ">
Figure 16
<p>Transformer interior after disassembly.</p> ">
Figure 17
<p>Component recognition and disassembly process. (<b>a</b>) Transformer cover and internal component recognition. (<b>b</b>) Transformer disassembly process and results.</p> ">
Versions Notes

Abstract

:
As power system equipment gradually ages, the automated disassembly of transformers has become a critical area of research to enhance both efficiency and safety. This paper presents a transformer disassembly system designed for power systems, leveraging multimodal perception and collaborative processing. By integrating 2D images and 3D point cloud data captured by RGB-D cameras, the system enables the precise recognition and efficient disassembly of transformer covers and internal components through multimodal data fusion, deep learning models, and control technologies. The system employs an enhanced YOLOv8 model for positioning and identifying screw-fastened covers while also utilizing the STDC network for segmentation and cutting path planning of welded covers. In addition, the system captures 3D point cloud data of the transformer’s interior using multi-view RGB-D cameras and performs multimodal semantic segmentation and object detection via the ODIN model, facilitating the high-precision identification and cutting of complex components such as windings, studs, and silicon steel sheets. Experimental results show that the system achieves a recognition accuracy of 99% for both cover and internal component disassembly, with a disassembly success rate of 98%, demonstrating its high adaptability and safety in complex industrial environments.

1. Introduction

Transformers play a crucial role in transmission and distribution systems, and their reliable operation is essential for grid stability. However, with aging equipment and increasing maintenance demands, transformer disassembly has become progressively more complex. Traditional manual methods are not only inefficient but also hazardous and environmentally damaging, underscoring the need for safer, automated solutions [1,2,3]. Current technologies still rely heavily on manual labor, requiring skilled operators and presenting safety risks, such as mishandling bolts or fasteners, which can lead to injury. Furthermore, disassembling welded covers requires precise cutting and handling, and errors in identifying key components can damage critical parts [4]. Enhancing disassembly efficiency while ensuring safety remains a key challenge in transformer recycling and maintenance [5,6].
Significant progress has been made in the automation of transformer disassembly, particularly in the context of electric vehicle battery recycling. With the advancement of deep learning techniques in the industry [7,8,9], researchers have developed multimodal image fusion and intelligent control technologies to enhance component identification and operational efficiency [10]. Cognitive robotics, which integrates vision systems with intelligent planning, enables robots to adapt dynamically in real-time, improving system flexibility [11]. Robotic arms, combined with deep learning, are increasingly employed for disassembling industrial transformers, allowing for the precise targeting of bolts and welded parts [12]. Additionally, multimodal fusion techniques, which combine 2D images with 3D point cloud data from RGB-D cameras, significantly enhance target recognition in complex environments [13]. Automated and semi-automated wire stripping machines have also been introduced to improve the recycling of waste wires. For example, pneumatic automatic wire strippers can increase efficiency by 25% compared to manual methods [14]. Researchers have also developed a multi-diameter cable-cutting machine for cables with varying diameters. Experiments have shown that adjusting the pusher extension distance improves both cutting speed and quality, enhancing the efficiency and performance of industrial cable processing [15].
Recent advancements have led to the development of automated transformer disassembly methods that leverage machine learning and image recognition technologies. Researchers have designed robot-based disassembly systems, primarily for recycling waste transformers, which integrate multimodal data fusion to identify and process complex structures such as bolts and welds [16]. The incorporation of IoT and big data technologies has also enabled the remote monitoring and dynamic operation of these systems, facilitating real-time analysis and optimization of the disassembly process [17,18,19]. For electronic waste, intelligent robotic and vision-based solutions use multimodal data to accurately identify hazardous and recyclable materials, efficiently processing them on automated platforms [20]. Moreover, multimodal fusion technology, combining RGB-D cameras with deep learning models, improves disassembly automation by accurately detecting and evaluating critical components like welds and bolts [21]. To address the challenges of recycling small-diameter waste transformer wires—where traditional mechanical knives wear quickly and lack automation, and methods like incineration or chemical stripping lead to environmental pollution—a device utilizing ultrasonic cutting technology has been developed. This approach offers high efficiency and minimal wear, providing an effective solution for stripping waste transformer wires [15].
Despite global advancements in automated transformer disassembly, several challenges remain. In complex industrial environments, ensuring robustness and accuracy is particularly difficult when dealing with worn bolts or deformed weld joints, where current algorithms often struggle with efficiency and precision, especially in real-time applications [22,23]. While multimodal data fusion holds promise for complex scene recognition, improving real-time processing and reducing latency are critical challenges that still need to be addressed. Striking a balance between computational efficiency and accuracy is a key focus of ongoing research. Furthermore, given the diversity of power equipment models and specifications, developing adaptable automated disassembly systems that can accommodate various equipment types is becoming increasingly important [10,11].
To address the limitations of current technologies, this study aims to improve the efficiency and safety of transformer disassembly through the use of multimodal technology, proposing an intelligent system that integrates multimodal data processing, deep learning, and automated control. The system starts by accurately identifying the transformer cover type using 2D images and 3D point cloud data from an RGB-D camera, then selects the most suitable disassembly strategy based on real-time conditions. For screw-fastened covers, the system assesses the screw conditions and performs automated disassembly. For welded covers, it employs precise edge detection and cutting path planning. The system extracts 2D image features of the internal transformer structure using a convolutional neural network (CNN), mapping these features into 3D space. It then utilizes the K-Nearest Neighbor (KNN) algorithm and positional coding for 3D feature fusion, accurately locating cutting components (e.g., screws, copper plates, studs, silicon steel sheets) through target detection and executing the cutting process based on the planned path. This approach enables the efficient disassembly of waste transformers. Additionally, the developed wire extraction line effectively separates wire insulation from metal cores, enhancing both the integrity and efficiency of wire recovery.

2. Transformer Disassembly System Preprogram Design

This paper proposes a transformer disassembly system based on multimodal sensing and intelligent control for the efficient and safe disposal of used transformers. The system integrates RGB-D cameras, multimodal data fusion, and deep learning models to enable fully automated transformer disassembly. The process is divided into two main components: the cover plate and the internal structure.

2.1. General Scheme for Transformer Disassembly System

The disassembly process of the transformer is shown in Figure 1 and consists of five main steps. First, data from the transformer’s cover plate are collected through sensors, providing the necessary support for identifying the type and structure of the cover plate. Next, the system uses the collected data to identify and classify the transformer cover plate and selects the appropriate disassembly method, such as unscrewing screws or welding and cutting, to complete the disassembly of the cover plate. After the cover plate is removed, the sensors further collect data on the internal components of the transformer, helping the system understand its internal structure and component characteristics. Then, the system performs precise identification and disassembly of the internal components of the transformer to ensure that important parts are effectively separated. Finally, the system processes the transformer’s winding section, stripping the copper wire for recycling. The entire process is efficient and safe, leveraging multimodal perception and intelligent control technologies to achieve the full disassembly of obsolete transformers.

2.2. Transformer Cover Disassembly Program

The cover disassembly process is illustrated in Figure 2. The system captures multi-view 2D images and 3D point cloud data using RGB-D cameras, employing fusion algorithms to improve feature extraction and address recognition challenges in complex environments. Initially, the system identifies the cover connection type using a pre-trained classification model, categorizing it as either screw-fastened or welded.
For screw-fastened covers, the system uses an improved YOLOv8 model integrated with a Mask Attention mechanism to accurately detect screw locations. Once identified, the system assesses the condition of each screw. If any screw is found to be slipping or damaged, the system automatically switches to cutting mode for disassembly, ensuring continuous operation.
For welded covers, the system employs a segmentation model based on an enhanced STDC network to precisely segment the cover and its components, enabling the accurate identification of weld seams and optimal cutting path planning. An improved SegFormer model generates the weld cut line, incorporating multi-scale feature extraction, positional encoding, and geometric feature assistance modules to enhance edge precision. Additionally, dynamic edge distance adjustment mechanisms are applied to different components, avoiding critical areas and ensuring safe cutting operations.

2.3. Transformer Internal Component Disassembly Program

The disassembly process of the internal components of the transformer is illustrated in Figure 3. After the cover is removed, the system begins processing the internal parts, focusing on the identification and disassembly of complex components such as windings, studs, and silicon steel sheets. Using RGB-D cameras positioned at multiple angles, the system collects 3D point cloud data and employs the ODIN model for semantic segmentation and object detection, alternately integrating 2D and 3D information to achieve precise identification.
The system constructs a 3D point cloud model of the transformer’s interior by merging multi-view images with depth data, enabling the effective segmentation and localization of complex structures. The KNN algorithm, combined with positional encoding, ensures accurate identification of intricate geometric structures like windings and studs. Once identified, the system plans the cutting path and executes the disassembly based on the specific characteristics of each component, ensuring a smooth and efficient transformer disassembly process.

2.4. Transformer Copper Winding Stripping Methods

The transformer copper wire winding stripping system comprises two main components, as shown in Figure 4: wire constant tension release technology and intelligent wire removal technology. These two systems work in tandem to achieve continuous and efficient wire extraction, facilitating effective wire recycling.
The principle of payoff tension control operates as follows: The payoff motor speed regulates the forward speed of the production product at the payoff stage, while the traction speed or take-up speed governs the speed at the traction or take-up locations. These speeds are synchronized through data processing and coordination by the main control unit, creating a speed differential that generates the required payoff tension on the production product. The constant tension intelligent servo unwinding system is a high-precision control system designed to manage the wire unwinding process. By precisely controlling the unwinding tension, it prevents stretching, deformation, or breakage of the wire, ensuring product quality and production efficiency. In this system, a servo motor accurately controls the output torque of the unwinding mechanism to maintain a stable wire feeding speed. Tension sensors continuously monitor tension fluctuations during the feeding process. The controller utilizes a combination of traditional PID control and adaptive fuzzy neural networks to optimize the servo motor’s control strategy. The intelligent wire removal system for conductors uses an ultrasonic cutter head to strip the outer insulation layer of the wire within a constant tension payoff system. It integrates a control algorithm that leverages pressure and vision sensors to regulate the cutting depth of the insulation layer, ensuring complete removal while protecting the cutter head. To enhance precision, the system is equipped with a displacement sensor that accurately measures the robotic arm’s movement, ensuring operational accuracy. An ARM microcontroller serves as the control system, employing PID algorithms for real-time control of the cutter head’s feed depth. Additionally, a camera monitors the wire removal process, while the RCF algorithm, integrated with a neural network, performs real-time detection to confirm whether the wire has been fully stripped.

2.5. Exception Handling and Manual Intervention

To enhance the automation and adaptability of the system in complex situations, various anomaly detection and handling mechanisms have been developed. When issues arise—such as screw slippage, weld recognition failures, or equipment obstructions—the system automatically pauses and triggers an alert for manual intervention. Additionally, an optimization mechanism based on historical data has been integrated to improve anomaly assessment. By dynamically adjusting the judgment threshold through comparisons between current inspection results and historical maintenance data, the system seeks to minimize manual interventions, thereby improving overall disassembly efficiency.

3. Transformer Tank Cover Disassembly Algorithm

3.1. Image Acquisition and Pre-Processing

The RGB-D camera captures both 2D images and 3D point cloud data, enhancing image features through multimodal fusion techniques. This camera not only provides rich texture information but also delivers accurate depth data, which can be integrated through multimodal fusion methods [24,25]. During data processing, the system performs not only 2D and 3D feature fusion within a single view but also cross-view multiscale feature extraction, enriching spatial information and improving target detection across different sizes while enhancing the model’s generalization performance [26]. Specifically, the RGB image features and 3D point cloud data are effectively combined using an adaptive attention mechanism, resulting in improved overall detection accuracy [25]. The acquired image data are illustrated in Figure 5.

3.2. Detection and Evaluation of Cover Types

In transformer maintenance and recycling, accurately identifying the type and assessing the condition of the cover plate is crucial for automated disassembly. Transformer covers are primarily secured either by screws or welding. For screw-fastened covers, an inspection and evaluation of the screws is necessary, whereas welded covers necessitate specific cutting operations. Traditionally, operators determine disassembly strategies through manual inspection, which can be inefficient and pose safety risks. Therefore, implementing automated systems can enhance both the efficiency and safety of disassembly processes by accurately recognizing cover types and automatically assessing their conditions.
In order to address the above problems, this paper proposes an automated identification scheme utilizing a pre-trained EfficientNetV2 [27] model. This model effectively classifies how the transformer cover is connected to the enclosure.
C = EfficientNetV 2 ( F fusion ) .
The system processes the fused feature map F fusion to produce classification results, categorizing the connection method of the cover plate into screw-fastened or welded. This approach enables rapid and accurate identification, streamlining subsequent operations such as screw detection or welding and cutting tasks.
For screw-fastened covers, the system necessitates the further detection and evaluation of screws, particularly assessing for deformation or slippage. This paper employs an improved Scan2CAD [28] scheme, which accurately aligns RGB-D data with the transformer’s CAD model, enhancing screw detection accuracy. Scan2CAD extracts features from the RGB-D image I RGB and the CAD model M CAD using a 3D CNN, facilitating precise alignment.
F aligned = Align ( F RGBD , F CAD ; W CNN )
where W CNN is the weights of 3D CNN, and F aligned is the aligned feature map. Based on the feature alignment, this paper utilizes the improved PointNet++ [29] model to detect screw and cover deformation. The system extracts the screw deformation features Z screw and cover deformation features Z plate using the PointNet++ model.
Z screw = f PointNet + + ( F aligned screw )
Z plate = f PointNet + + ( F aligned plate ) .
The model analyzes the differences between the RGB-D data and the CAD model to assess severe deformation in screws and cover plates. To enhance its capability in complex scenarios, this paper introduces a historical data enhancement scheme. By incorporating historical maintenance data, the model adaptively adjusts the judgment threshold, optimizing detection for common failure modes such as screw slipping, thread wear, and local cover plate deformation. Historical data features are integrated into the PointNet++ model and compared with current inspection results to evaluate the similarity of failure modes.
Z history = f history ( H history ) .
The detection results of screw deformation are utilized not only to assess disassembly difficulty but also to estimate the torque required for unscrewing. To achieve this, this paper employs a graph neural network (GNN) to estimate the unscrewing moment T estimate [30]. The GNN model leverages the geometric deformations of the screws and cover along with material properties, generating a force simulation based on the discrepancies between the RGB-D data and the CAD model, ultimately calculating the unscrewing moment.
T estimate = f GNN ( Z screw , Z plate , M CAD ) .
When the estimated torque T estimate exceeds the allowable range of the device, the system determines that the screws cannot be removed through conventional screwing methods. In such cases, the system automatically transitions to alternative disassembly methods, such as cutting operations.
Anomaly = 1 , T estimate > T threshold o r Z screw 0 , otherwise .
Finally, based on the detection results of the screws and cover plate, the system makes a corresponding disassembly decision. If abnormalities such as screw slippage, deformation, or serious cover plate distortion are detected, the system automatically selects a cutting program. Conversely, if the screws and cover plate are in a normal condition, the system opts for the screw-unscrewing disassembly method. This approach, driven by intelligent detection and decision-making, enhances disassembly efficiency while reducing safety risks and improving precision.

3.3. Disassembly of Screw-Fastened Cover Plate

To fully leverage the transformer’s feature conditions, this paper employs a target detection method based on YOLOv8 [31] integrated with Mask Attention [32] for the precise localization of each nut. Given the challenges associated with transformer nuts, such as rust, contamination, and breakage, traditional target detection methods may struggle to effectively utilize depth information and subtle features. This approach enhances the target detection network by incorporating Mask Attention to focus on small targets like nuts on the cover plate. Additionally, it introduces embedded nut template matching to initialize the Query against the complex backgrounds of the transformer cover, accounting for possible occlusions, partial wear, or rust. This strategy aims to better utilize the shape and distribution characteristics of the nuts, thereby improving detection accuracy in complex environments. The modified network structure is illustrated in Figure 6.

3.3.1. Multi-Scale Feature Extraction

To address the challenges posed by the complex backgrounds on the transformer cover and the small size of the nuts, this approach utilizes YOLOv8’s backbone for multi-scale feature extraction. By employing feature maps at various scales, the system effectively captures both the intricate details and the broader context of the nuts.
The input image is initially processed by the backbone network of YOLOv8 for feature extraction. This backbone employs a series of convolutional layers and modules to extract multi-scale features from the image, resulting in three key feature maps: the third-level feature map P 3 , the fourth-level feature map P 4 , and the fifth-level feature map P 5 .

3.3.2. Feature Map Conversion and Projection

Each feature map, P 3 , P 4 , and P 5 , is first spread and then converted to a vector in D dimensions using linear projection to fit the subsequent processing steps. The specific process is described as follows:
F i = Flatten ( P i ) , i { 3 , 4 , 5 }
S i = F i W i , S i R N i × D
where W i is the projection matrix an N i = H i × W i is the number of flattened features [33].

3.3.3. Mask Attention for Region Extraction and Optimization

To enhance the accuracy of nut detection, particularly when nuts are contaminated, rusted, or partially obscured, this paper integrates a Mask Attention mechanism into the target detection framework. By generating masks through Mask Attention, the system focuses on critical features within specific regions, thereby improving recognition accuracy. Once the feature extraction is completed, the resulting feature map is further processed by a six-layer transformer that incorporates the Mask Attention module.
At each layer, the global information GAP ( S i ( l ) ) of each feature map is extracted by global average pooling. Next, it is processed using a lightweight convolutional network to generate a preliminary mask M i ( l ) .
At each layer, the global information GAP ( S i ( l ) ) of each feature map is extracted through global average pooling. This information is then processed by a lightweight convolutional network to produce a preliminary mask M i ( l ) .
M i ( l ) = Softmax ( Conv ( GAP ( S i ( l ) ) ) )
where S i ( l ) represents the feature map of the l-th layer, l = 1 , 2 , , 6 , GAP ( S i ( l ) ) denotes the global average pooling of the feature map S i ( l ) , Conv is the convolutional operation, and M i ( l ) is the mask normalized by Softmax operation to adjust the importance of each pixel within the feature map. This Mask Attention mechanism allows the model to focus on target areas while minimizing background interference, effectively ignoring irrelevant information, such as oil contamination on the transformer cover, thereby enhancing nut detection accuracy.
The generated mask M i ( l ) is applied to the input feature map S i ( l ) using element-wise multiplication to obtain the masked feature map S i ( l ) ^ :
S i ( l ) ^ = M i ( l ) S i ( l )
where ⊙ is the element-wise multiplication operation, and S i ( l ) ^ is the masked feature map. Mask Attention hones in on the region where the nut is located by a masking strategy, effectively filtering out noise from the complex background and greatly improving the detection of nuts in the presence of contamination, occlusion, etc.
The masked feature map S i ( l ) ^ is fed into the multi-head self-attention mechanism for further processing.

3.3.4. Initialization of Embedded Square-Nut Template Matching Query

To efficiently detect nuts on transformer covers, particularly when encountering various nut types, the Query is initialized with an embedded square-edge nut template match, which ensures that Query accurately captures the geometric features of the nut and adapts to variations in the size and shape of the nut.
(1)
Predefined Orthogonal Nut Templates: Templates for positive N-sides define the geometry and dimensions of the nut, allowing for the flexible adaptation to various nut types. The vertices of the template can be calculated using the following formula:
( x i , y i ) = x c + l · cos ( 2 π i N ) , y c + l · sin ( 2 π i N ) , i = 1 , 2 , , N
where ( x i , y i ) is the center coordinates of the nut formwork, l is the side length of the template defining the size of the nut, and N is the number of sides of the square polygon, allowing the model to adapt to various nut designs [34].
(2)
Matching Templates and Candidate Boxes: After generating the candidate box, each Query is initialized using the embedded nut template. Given a detected nut region represented by the candidate box B i = ( x i , y i , w i , h i ) , the initialization process of the Query is as follows:
Adjust the side length of the template by the size of the candidate box l match that makes the template fit the actual size of the nut:
l match = w i + h i 2 .
Generate matched nut template vertices:
T match N = x i + l match · cos ( 2 π j N ) , y i + l match · sin ( 2 π j N ) j = 1 N .
This formula matches the center coordinates of the candidate box to the vertices of the positive N-side template, thus providing a geometric basis for the subsequent Query [35,36].
(3)
Geometry initialization of Query: The matched positive N-sided nut templates are used as the initial Query for subsequent target detection tasks. The geometry of each Query after initialization is as follows:
Q i init = T match N .
This Query contains the geometric information of the nut, serving as an effective input for the subsequent multi-head self-attention mechanism.

3.3.5. Multi-Head Self-Attention Mechanism

(1)
Generate Query, Key, and Value vectors: First, the feature map S i ( l ) ^ is mapped into the Query vector Q i ( l ) , key vector K i ( l ) , and value vector V i ( l ) through linear transformations using the learnable weight matrices W Q ( l ) , W K ( l ) , and W V ( l ) , respectively.
Q i ( l ) = W Q ( l ) Q i init + W Q ( l ) S i ( l ) ^ , K i ( l ) = W K ( l ) S i ( l ) ^ , V i ( l ) = W V ( l ) S i ( l ) ^ .
(2)
Compute the Attention Scores: Attention scores are calculated using the Query vector Q i ( l ) and key vector K i ( l ) . These attention scores are applied to the value vector V i ( l ) after normalization through the Softmax function.
Attention ( Q i ( l ) , K i ( l ) , V i ( l ) ) = Softmax ( Q i ( l ) ( K i ( l ) ) T d k ) V i ( l )
where d k is the dimension of the Query and Key vectors to scale the attention score.
(3)
Multi-head Attention Fusion: The multi-head self-attention mechanism concatenates the results and generates the final feature representation Z i ( l ) using a linear transformation:
Z i ( l ) = W O ( l ) Concat ( Attention 1 ( l ) , Attention 2 ( l ) , , Attention h ( l ) )
where W O ( l ) is the linear transformation matrix of the output and h is the number of attention heads.
(4)
Residual Concatenation and Normalization: To preserve the information of input features, the input feature maps S i ( l ) are summed with the multi-head attention output Z i ( l ) using residual concatenation, followed by normalization by the LayerNorm layer.
Z i ( l ) ˜ = LayerNorm ( Z i ( l ) + S i ( l ) )
where Z i ( l ) ˜ is the normalized feature map.
(5)
Feed-forward Network (FFN): The output of each layer Z i ( l ) ˜ is further processed by FFN, which consists of two linear layers and a ReLU activation function.
FFN ( Z i ( l ) ˜ ) = W 2 ( l ) ReLU ( W 1 ( l ) Z i ( l ) ˜ )
where W 1 ( l ) and W 2 ( l ) is the linear transformation matrices in FFN.
(6)
Calculation of Bounding Box: After processing through the Mask Transformer, the output includes the center coordinates ( x , y ) , width and height ( w , h ) , and category confidence p of each prediction frame. The specific calculations are as follows:
x ^ = σ ( output x ) , y ^ = σ ( output y ) , w ^ = exp ( output w ) × w a , h ^ = exp ( output h ) × h a , p = σ ( output p )
where σ is the sigmoid function, output x , output y , output w , output h , output p are the outputs of the network and ( w a , h a ) are the width and height of the anchor frame.
(7)
Output: The final detection result contains the location and category confidence of the nut.
output = { ( x , y , w , h , p ) }
where ( x , y ) is the center coordinate of the nut. w and h are the width and height of the nut. p is the category confidence.
After the target detection model outputs the 2D detection frame along with the center point, the coordinates are transformed into the world coordinate system using the camera’s intrinsic and extrinsic parameters [37]. During the transformer disassembly process, a camera is mounted on top to capture real-time images of the transformer cover. The improved YOLOv8 model is employed to localize the screw positions, outputting their coordinate positions based on the camera’s intrinsic and extrinsic references. These coordinates are then provided to the screwing robotic arm for the disassembly operation. The diagram of the target detection effect is shown in Figure 7.

3.4. Segmentation and Disassembly of Welded Covers

3.4.1. Semantic Segmentation of Welded-Type Cover Plates

The visible light image captured by the camera is input into the cover recognition segmentation network, where the cover and each of its components are finely segmented using the improved Short-Term Dense Concatenation (STDC) network. To better adapt to the structural characteristics of the transformer cover plate and enhance the network’s perception of spatial location and geometry, this paper introduces a geometric feature assistant module and position coding, leveraging the spatial distribution characteristics of the transformer cover plate components. This approach optimizes the segmentation performance of the STDC network for various components.
(1)
Feature Extraction: The input image I input is fed into the STDC network for feature extraction, which gathers multi-scale features through multiple convolutional layers and STDC modules. Based on the traditional feature extraction, this paper introduces a geometric feature assist module, specifically designed for segmenting complex components on the surface of the transformer cover, such as screws, bushings, and release valves.
The geometric feature assist module first detects geometric structures (e.g., circles, lines, edges) in the image and generates a geometric feature map F geo . These geometric features are fused with the multi-scale features F STDC extracted through the STDC module to produce an enhanced feature map F enhanced .
F geo = Geo _ Feature ( I input )
F enhanced = F STDC + α · F geo
where α is the coefficient of the geometric feature map [24].
(2)
Multi-scale Feature Fusion: To improve the segmentation of various components on the cover, particularly screws and sleeves with spatially fixed locations, the enhanced STDC network employs a multi-scale feature fusion strategy combined with a position coding mechanism to boost its spatial sensing capability.
The position encoding module incorporates the spatial position information of each pixel Pos ( x , y ) into the feature map. The feature maps at different scales F s are combined with the corresponding position encoding for feature fusion, resulting in a higher-resolution fused feature map F fusion .
F fusion = s = 1 S w s · ( F s + Pos ( x , y ) )
where F s is the input feature map of the s-th scale, w s is the corresponding weight, S is the number of scales, and Pos ( x , y ) is the position encoding of the pixel [25].
(3)
Sampling on the Feature Map: The extracted enhanced multi-scale features are up-sampled by a full convolutional network (FCN) to produce the segmentation result image. This up-sampling process employs the inverse convolution operation and integrates both geometric features and position encoding to ensure accurate segmentation results, particularly in edge regions.
M = Deconv ( F fusion )
where M is the segmentation result image [25]. The cover segmentation result is shown in Figure 8.

3.4.2. Automated Cutting Line Generation for Welded-Type Cover Plates

To achieve the effective separation of the transformer cover from the housing, a graphic-based processing method is proposed. This approach aims to accurately extract edges and generate safe, reliable cut lines by considering the geometric features of the transformer cover (e.g., bolt holes, piping interfaces) and the contamination present in the industrial environment (e.g., oil stains, light reflections). Additionally, this method incorporates mask expansion for each component to ensure that the cut line remains at a safe distance from the edge of the transformer cover, preventing cuts into other components.
(1)
Semantic Segmentation of Images: First, the input image of the transformer cover I input is processed by a semantic segmentation model to generate a mask M mask that identifies the location of key components such as bolt holes and pipe connections.
M mask = SemanticSegmentation ( I input )
where M mask is the binary image containing each component area.
(2)
Surface Part Mask Expansion Treatment: To maintain a safe distance for cutting, the mask is expanded using morphological operations, resulting in an inflated mask M dilated .
M dilated = Dilation ( M mask , k i )
where k i is the size of the dilation kernel. The expansion treatment ensures that the cut line remains clear of the actual boundaries of the part, avoiding cuts into critical components [28].
(3)
Canny Edge Detection: Canny edge detection is performed on the inflated mask M dilated to extract the initial edges of the transformer cover E initial :
E initial = Canny ( M dilated )
where E initial denotes the edge information extracted by the Canny operator, including the edges of each component and the outer contour of the transformer cover plate [28].
(4)
Generation of Closed-loop Contours: Discontinuous edges are connected by a shortest path algorithm and a minimum spanning tree algorithm to create a complete closed-loop profile E closed .
E closed = Close ( E initial )
where the closed-loop operation connects disconnected edges, ensuring that the generated edges form complete closed contours, suitable for subsequent cut line generation [30].
(5)
Selection of Suitable Contours and Smoothing: A closed-loop profile matching the transformer cover is selected and smoothed to create the optimized profile E smooth :
E smooth = Smooth ( E closed ) .
Smoothing is accomplished by spline interpolation, which reduces noise and irregularities in the edges, ensuring the continuity and smoothness of the cut line in practical applications [29].

4. Algorithm for Disassembly of Internal Transformer Components

4.1. Semantic Segmentation of Multimodal Transformer Components

In the automated disassembly of transformers, the semantic segmentation of internal components is crucial for ensuring accurate disassembly. The internal structure of a transformer is complex, containing various components such as screws, copper plates, winding studs, and silicon steel sheets, each differing significantly in physical location and geometry. Therefore, achieving efficient multimodal component recognition and segmentation is a key objective for the automated system.
To address this challenge, this paper proposes a semantic segmentation scheme for multimodal components based on the ODIN model. The ODIN model utilizes a design centered on cross-view feature fusion, enabling the effective processing of multi-view data through alternating 2D in-view fusion and 3D cross-view feature fusion [38]. In the transformer disassembly scenario, the system captures internal images of the transformer using a multi-view RGB-D camera, combining these data with depth information to extract and fuse 2D and 3D features at different scales. This model maintains high accuracy in complex component recognition tasks, particularly when multi-view information is incomplete or data are missing, as ODIN’s feature fusion technique significantly enhances component recognition accuracy.
The ODIN model is employed for 2D and 3D semantic segmentation, incorporating cross-view feature fusion techniques. Transformer data are captured using RGB-D cameras and IMUs. The model leverages multi-scale feature fusion, which enhances the detection of targets of different sizes and improves generalization performance across multiple scenes. During transformer disassembly, the ODIN model enables accurate 2D and 3D instance segmentation and the identification of components such as screw support columns that need to be cut, ensuring high-precision target localization and segmentation to support subsequent disassembly operations.
The ODIN model structure is shown in Figure 9, and its functionality is achieved through the following key components:
(1)
Alternative 2D and 3D Information Fusion: Two-dimensional feature fusion occurs within each image view, which is subsequently projected into 3D space for cross-view feature fusion. Finally, the fused 3D features are reprojected back to the 2D plane. The 2D feature fusion is conducted using pixel coordinates, while 3D feature fusion utilizes spatial coordinates for positional encoding to ensure information consistency.
(2)
Multi-scale Feature Fusion: ODIN conducts feature fusion at different scales to enhance the detection of targets of various sizes. This is accomplished by utilizing multi-scale feature maps during both 2D and 3D feature fusion, ensuring that targets of different sizes are effectively detected and segmented.
In the 2D feature extraction process, multi-scale feature maps are extracted using Swin Transformer [39]. These feature maps are fused at multiple scales:
F fused = s = 1 S w s · F s
where F s represents the feature map of the s-th scale, w s denotes the corresponding weight, and S is the number of scales.
In the 3D feature fusion process, the KNN algorithm is used to select neighboring points in 3D space, with feature fusion achieved through relative position encoding.
Attention ( Q , K , V ) = Softmax ( Q K T d k + PE 3 D ) V
where Q, K, V are the Query, key, and value vector, respectively. PE 3 D is the position encoding of the 3D space.
(3)
Target Detection and Feature Classification: Target detection using the ODIN model predicts the category and location of the component by the following equation.
Y = σ ( W · F fusion + b )
where W and b are the weight and bias of the model, respectively, and F fusion is the fused feature map.
During the transformer disassembly process, the ODIN model alternately fuses 2D and 3D features from image data acquired by multi-view RGB-D cameras. When the input is a single RGB image, the model skips the 3D fusion layer, while when the input is an RGB-D sequence, the model extracts 2D features within each view before projecting them into 3D space and combining feature information from different views for cross-view fusion, ensuring a comprehensive perception of the transformer’s internal structure. This allows ODIN to accurately identify screws, support columns, and other components that need to be cut, reducing the risk of misuse and improving disassembly efficiency and safety. The ODIN model employs multimodal feature fusion and a pre-trained 2D backbone network to maintain high-performance recognition and segmentation capabilities in complex scenarios, fully meeting disassembly requirements. The process of segmentation and identification of internal components of the transformer is shown in Figure 10.

4.2. Technical Route for Disassembly of Internal Components

The overall process of internal component disassembly involves acquiring the 3D model and depth information of the transformer using a multi-view RGB-D camera. This is followed by extracting 2D image features and mapping them to 3D space using a CNN, then combining the KNN algorithm and positional coding to perform 3D feature fusion. The system accurately recognizes the locations of the components to be cut (e.g., screws, copper plates, studs, silicon steel sheets) through target detection techniques, and subsequently executes the cutting operation based on the generated cutting path.
The steps are as follows:
(1)
Cutting the cover plate first: Separate the cover plate from the inside of the transformer by recognizing and cutting the screws and copper plates beneath it.
(2)
Cutting the studs: Cut the studs around the windings to separate the cleat from the windings.
(3)
Final treatment of the silicon steel sheet: Separate the silicon steel sheet from the winding by cutting or lifting it.
In the specific implementation of the program for disassembling the internal components of the transformer, each step follows the same technical approach: first, visual sensors are employed to collect internal data from the transformer. Next, neural network methods are used for feature extraction and fusion. The target detection technique is then applied to identify the components of the transformer that need disassembly at each step. Finally, an optimization method based on the identified components generates the cutting path plan, allowing the cutting machine to follow this path and complete the overall disassembly process of the transformer.

4.3. Multimodal Data Acquisition for Transformer Internal Components

Through the RGB-D cameras installed on the top and four sides of the transformer, multi-view images and depth information are acquired, capturing the 3D data of components such as screws and covers. The top camera, positioned directly above the transformer, captures a top view and obtains depth information for the screws and covers located on the upper surface. Meanwhile, RGB-D cameras installed on each of the four sides provide omni-directional data acquisition from various angles, ensuring comprehensive information about the transformer. This setup lays a solid foundation for subsequent operations, including semantic segmentation and other analyses.
An efficient image acquisition strategy is developed based on the camera mounting method. The RGB-D camera, installed as described, can capture multi-view images from both above and the sides of the transformer. For each image, the acquired RGB image is denoted as I ( x , y ) and the depth map as D ( x , y ) , where x and y represent the pixel coordinates of the image.
Considering the optical properties of the camera, accurate and usable 3D data of transformer components such as screws and covers can only be obtained if the internal and external parameters of the camera are accurately determined. The internal reference matrix of the camera K defines the relationship between pixel coordinates and the camera coordinate system, while the external reference matrix [ R T ] describes the transformation from the camera coordinate system to the world coordinate system, where R is the rotation matrix and T is the translation vector. However, the optical characteristics of the camera lens can cause geometric distortion in images, necessitating the use of calibration methods for camera distortion correction to obtain complete internal and external parameters.
The internal reference matrix K and distortion parameters of the camera k 1 , k 2 , k 3 , p 1 , p 2 were obtained using the Zhang Zhengyou calibration method [37]:
K = f x 0 c x 0 f y c y 0 0 1 x corrected = x ( 1 + k 1 r 2 + k 2 r 4 + k 3 r 6 ) + x + 2 p 1 x y + p 2 ( r 2 + 2 x 2 ) y corrected = y ( 1 + k 1 r 2 + k 2 r 4 + k 3 r 6 ) + y + 2 p 2 x y + p 1 ( r 2 + 2 y 2 )
For each calibration image, the rotation matrix R and translation vector T of the camera relative to the calibration plate are calculated to determine the camera’s relative position and orientation. The coordinate system transformation method is employed to generate the external reference matrix by combining the rotation matrix R and the translation vector T into the transformation relation matrix [ R T ] . This matrix describes the transformation from the camera coordinate system to the world coordinate system.
R T = R T 0 1
After that, the depth map projection method is utilized to project the pixel coordinates into the 3D space using the depth map D ( x , y ) along with the camera parameters K and [ R T ] , yielding the 3D coordinates P 3 D :
P 3 D = K 1 · [ R T ] · x y D ( x , y ) 1
where P 3 D = [ X , Y , Z ] T is the point in 3D space.

4.4. Feature Extraction and Classification

In the cutting process of transformers, feature extraction, fusion, and target detection techniques are crucial for achieving high-precision automated cutting. Firstly, for each transformer component image, multi-scale feature extraction is performed on the RGB image I x y using a CNN to generate feature maps F s x y at different scales, represented as
F s x y = CNN s ( I x y )
This multi-scale feature extraction method can effectively capture the surface information of the component, providing a foundation for subsequent 3D feature mapping. After completing the 2D feature extraction, the 2D features are projected into the 3D space using the depth map D x y along with the internal and external parameter matrices K and R T sum of the camera. The coordinate points in the 3D space are obtained as follows:
P 3 D = K 1 · ( R T ) 1 · D x y
With this step, the features are converted from a 2D plane to a 3D point cloud, ensuring an accurate correspondence with the physical space.
To enhance the spatial consistency of the 3D features, a KNN algorithm is employed to identify the neighboring points of each 3D point. The features are then fused by combining the positional encoding and attention mechanisms.
F 3 D fused ( P 3 D ) = P j N ( P 3 D ) Attn ( F 3 D ( P 3 D ) , F 3 D ( P j ) , P E ( P 3 D , P j ) )
The method can effectively integrate spatial information and is particularly suitable for fusing features of complex transformer structures, such as the contact surface of windings and silicon steel sheets.
After feature fusion, the ODIN deep learning model is leveraged to perform target detection on the fused 3D features, identifying the key components of the transformer. For the detection of screws, the model predicts them using the following equation:
Y screw = σ ( W · F 3 D fused ( P 3 D ) + b )
This process outputs the target location and category, providing accurate coordinate information for subsequent cutting path planning.
Although the basic process of feature extraction and target detection is consistent, the characteristics of different transformer parts lead to varying complexities in the detection methods. For simpler parts like screws and copper plates, feature extraction mainly relies on surface information. In contrast, for windings and studs, the structural complexity necessitates higher accuracy to capture intricate details. The contact surface between the silicon steel sheet and the winding requires consideration of its spatial contact characteristics, relying on positional encoding in 3D feature fusion to ensure accurate detection.
Therefore, in the transformer cutting task, the accurate identification of different components is achieved by effectively combining 2D and 3D information along with utilizing feature extraction and target detection techniques. The structural differences among components necessitate the application of varied feature extraction and detection strategies, ultimately enabling efficient automated cutting.

5. Transformer Copper Winding Stripping Methods

5.1. Constant-Tension Intelligent Unwinding System

A constant-tension intelligent unwinding system is a high-precision control system designed to manage the wire-unwinding process, and it is primarily used in the production and processing of cables, fiber optic cables, and similar materials. Its design goal is to prevent wire stretching, deformation, or breakage by accurately controlling the unwinding tension, thereby ensuring product quality and production efficiency. The system mainly comprises servo motors and drives, tension sensors, and intelligent controllers. The system structure is shown in Figure 11.
A constant-tension intelligent unwinding system achieves continuous and constant tension release of the wire from the bobbin through the collaboration of various hardware components, ensuring stability and reliability during the unwinding process. The system primarily consists of the following parts:
(1)
Driven payoff reel: The primary function of the follower pulley is to release the wire by making contact with it. The follower pulley rotates with the master pulley, allowing the wire to be released from the bobbin in a slow and even manner.
(2)
Active payoff reel: The primary function of the active wheel is to drive the rotation of the driven payoff wheel, thereby controlling the release speed and tension of the wire. It is typically powered by a motor that adjusts its speed based on feedback from the tension signal, ensuring constant tension during the payoff process.
(3)
Servo motor: The servomotor enables precise speed control of the active payoff wheel based on instructions from the tension controller. By finely adjusting the rotational speed of the active payoff wheel, it effectively controls the wire’s release speed, which is essential for maintaining constant wire tension.
(4)
Rollers: The rollers guide the wire’s path and ensure a smooth release from the bobbin. Typically arranged between the driven and active payoff pulleys, they help maintain a steady course for the wire.
(5)
Tension sensors: The primary function of the tension sensors is to detect the real-time tension of the wire and provide feedback signals to the tension controller. Common types include pressure and load sensors, which accurately measure the wire’s tension.
(6)
Controllers: The controller is a crucial component of the system, responsible for adjusting the rotational speed of the active payoff wheel based on feedback from the tension sensor to maintain constant wire tension. Tension controllers typically use PID algorithms or other neural network algorithms to quickly and accurately adjust the tension.
The constant tension control algorithm is based on PID tension control, represented by the following equation:
u PID ( t ) = K p e ( t ) + K i 0 t e ( t ) d t + K d d e ( t ) d t
where e ( t ) is the error signal and K p , K i , K d are the proportional, integral, and derivative coefficients, respectively. By using the set tension value, the tension error calculated from the tension sensor feedback enables tension stabilization control through PID. However, since wires of varying thicknesses and specifications require different settings, a single PID configuration may not provide stable control. Therefore, tension compensation is introduced as a feedforward quantity to enhance performance.

5.2. Design of Adaptive Fuzzy Neural Networks

The core idea of adaptive fuzzy neural networks is to map input data into an implicit space through fuzzification, allowing for training and prediction within that space. These networks offer significant advantages over traditional neural networks when handling fuzzy and uncertain data.
The design of the adaptive fuzzy neural network involves three key components: data collection, network construction, and network training. The application of the adaptive neural network in the constant tension control system will be described in detail for each of these parts below.
(1)
Collection of Tension Data: High-precision tension sensors collect real-time wire tension data, including the tension magnitude, change trends, and other time series information. These collected data are then normalized for subsequent neural network training. Additionally, environmental state data—such as wire diameter and friction coefficient of the insulation layer—are gathered as auxiliary inputs for the neural network.
(2)
Adaptive Neural Networks Construction: The adaptive neural network comprises several layers specifically designed for the constant-tension system. The input layer receives and normalizes various external input signals. The fuzzification layer applies a Gaussian affiliation function to fuzzify these inputs. The rule layer combines the fuzzy input variables using predefined fuzzy rules. The inference layer computes the final fuzzy output variables through fuzzy inference. The defuzzification layer converts these fuzzy outputs into specific numerical values. The final output layer processes these numerical outputs into control commands, which also serve as feedback signals to regulate the active payoff system’s operation. The defuzzification layer and the output layer are represented in Equation (43).
u i j ( t ) = exp ( a i ( t ) c i j ( t ) ) 2 2 σ i j 2 ( t ) , j = 1 , 2 , , m
where u i j is the output of j-th affiliation function corresponding to the i-th input at time t. c i j ( t ) and σ i j ( t ) are the center and width of the j-th affiliation function, respectively.
ϕ i j ( t ) = f i i = 1 n u i j ( t ) , j = 1 , 2 , , m
f i = 1 1 + exp ( h i )
h i = ϕ j ( t 1 ) λ j ( t )
where ϕ j ( t ) is the output of the j-th rule layer at time t, and ϕ j ( t 1 ) denotes the output at the previous moment. f i represents the sigmoid function, h i is the internal variable, and λ j ( t ) refers to the feedback weights for recursive links.
ϕ ¯ j ( t ) = ϕ j ( t ) j = 1 m ϕ j ( t )
y ( t ) = j = 1 m ω j ( t ) · ϕ ¯ j ( t )
where y ( t ) is the network output, ω j ( t ) represents the output weights of the network, and ϕ j ( t ) is the output of the defuzzification layer.
To enhance the convergence of the network, this paper adjusts the parameters of the recursive fuzzy neural network using a gradient descent algorithm with an adaptive learning rate.
η = η max d ( η max η min ) / D
where η max and η min are the maximum and minimum learning rates, respectively, d is the current iteration step, and D is the total number of iteration steps. In the initial stage, the parameters are adjusted significantly to enable rapid optimization, saving time. As the number of iteration steps increases, the learning rate is gradually reduced to ensure the network’s stability.

5.3. Wire Intelligent Disconnection Technology

The wire removal system employs an ultrasonic cutter head to strip the outer insulation layer of the wire in a constant-tension payoff system. By designing a control algorithm that leverages pressure sensors and vision sensors, the depth of the insulation cut is regulated to ensure complete insulation removal, thereby safeguarding the cutter head. The system is illustrated in Figure 12.

5.4. RCF-Based Edge Detection Technique

By fusing feature maps from various scales, the RCF model effectively captures different levels of detail, excelling particularly in edge detection tasks. We utilize RCF as a detection network for measuring the thickness of the wire insulation layer, enabling two key functions: first, it detects the cutter head’s feed depth (i.e., the insulation thickness to be cut) in real time, ensuring the cutter head’s feed exceeds this thickness for effective cutting; second, it verifies the quality of the insulation layer peeling, providing feedback signals to the system in case of subpar quality.
The network is depicted in Figure 13. It comprises a backbone, an RCF module, and an edge thickness regression prediction head, utilizing ResNet-50 as the backbone network. The RCF module includes a series of convolutional layers and activation functions for fusing edge features and generating edge response maps. Meanwhile, the edge thickness regression prediction head predicts both the classification categories of wire-stripping results and the thickness of the wire insulation layer.
The loss function comprises classification loss and thickness regression loss. The classification loss uses cross-entropy loss to minimize the distance between the predicted and actual labels of the wire-cutting results. In contrast, the thickness regression loss employs squared loss to minimize the difference between the predicted and labeled thickness values.
In training the RCF network, rather than classifying samples into positive and negative, the network averages the labeling results from multiple labelers to obtain a final edge probability value between 0 and 1. A value of 0 indicates pixels not labeled by any labeler, while a value of 1 signifies that all labelers marked those pixels.
We define a parameter η ; when the averaged result exceeds η , it is classified as a positive sample, while a result of 0 indicates a negative sample. Any other pixel results are ignored. The loss for each pixel is then calculated based on these classifications.
l ( X i ; W ) = α · log ( 1 P ( X i ; W ) ) if Y i = 1 0 if 0 < Y i η β · log ( P ( X i ; W ) ) if o t h e r w i s e
where
α = λ · | Y + | | Y + | + | Y |
β = | Y | | Y + | + | Y |
The total loss is the sum of the loss of all pixels:
L ( W ) = i = 1 | l | ( k = 1 K l ( X i ( k ) ; W ) + l ( X i ( fuse ) ; W ) )
By incorporating scale information, the trained RCF network can accurately determine in real time if the cutting depth exceeds the insulation layer thickness. Additionally, by adjusting the threshold value, the network can modify its tolerance for the completeness of the wire insulation cut, enabling the autonomous assessment of whether the insulation has been fully removed.

6. Transformer Disassembly Results

6.1. Experimental Results of Fuel Tank Cover Disassembly

We are establishing a new intelligent disassembly factory for electrical equipment, and here we provide a photo of the transformer disassembly area, as shown in Figure 14.
The experiments utilized an RGB-D camera for data acquisition, employing multimodal fusion to integrate 2D images and 3D point cloud data for cover plate identification and processing. A disassembly robot, equipped with a torque-adjustable robotic arm and a laser cutter, automated the disassembly of various covers. The experimental setup simulated an industrial environment, featuring transformer covers of different sizes and conditions, with 50 samples each of screw-fastened and soldered covers. Each sample underwent surface treatment (such as rust or oil) or was sourced from actual end-of-life transformers to accurately replicate real industrial conditions.
To fully assess the disassembly performance of the fuel tank cover, the following evaluation metrics were employed in the experiments:
(1)
Recognition accuracy: Measures the system’s ability to classify screw-fastened and welded covers accurately.
(2)
Disassembly time: Total time taken from identification to the completion of disassembly.
(3)
Success rate: The rate of successful disassembly without causing damage.
(4)
Safety assessment: Evaluates the impact on transformer core components during the disassembly process.
The results are illustrated in Table 1.
The recognition accuracy reached 99% for both screw-fastened and welded covers, demonstrating that the combination of RGB-D cameras and multimodal fusion technology effectively mitigates the adverse effects of contamination, such as oil and rust, on recognition performance. By leveraging both the texture information from 2D images and the depth data from 3D point clouds, the system maintains high accuracy even in complex industrial environments.
The average disassembly time for screw-fastened covers was 186.3 s, compared to 265.4 s for welded covers. The increased time for welded covers stems from the complexity of cutting through welded joints, necessitating a more intricate process. While screw-fastened covers allow for straightforward torque control with a robotic arm, welded covers involve laser cutting, edge recognition, and detailed cutting path planning, making the operation more time-consuming.
The success rates for disassembling screw-fastened and welded covers were 98% and 96%, respectively. The higher success rate for screw-fastened covers was due to precise torque control of the robotic arm, minimizing damage during disassembly. In contrast, the slightly lower success rate for welded covers was linked to issues like melt adhesion from localized melting of the joints. Enhancing laser cutting precision and optimizing laser power could further improve outcomes. In safety assessments, no damage occurred to the transformer core during the disassembly of either cover type, highlighting effective safety measures and the use of an expansion mask to protect critical components. Overall, the system demonstrated strong performance in recognition accuracy, efficiency, safety, and success rate, confirming the feasibility of the proposed approach.

6.2. Experimental Results of Internal Component Disassembly

To validate the effectiveness of the automated disassembly system for transformer internal components, 50 standardized experiments were conducted to evaluate system performance in disassembly time, success rate, and safety. The experiments took place in a controlled laboratory environment, with the transformer designated for disassembly shown in Figure 15.
A multi-view RGB-D camera was employed to capture 2D images and 3D point cloud data of the transformer interior, ensuring comprehensive coverage of all disassembled parts. The cameras were strategically mounted on the top and four sides of the transformer to effectively acquire data on screws, copper plates, winding studs, and silicon steel sheets.
High-performance computing equipment facilitated data processing, while an automated cutting robot arm performed the disassembly operations. A stable experimental environment was maintained to ensure the accuracy of the cutting process.
To objectively evaluate the system’s performance, three key metrics were established: disassembly time, disassembly success rate, and safety assessment. Disassembly time refers to the total duration from system activation to the completion of internal component removal. Disassembly success rate is the proportion of successful disassembly operations relative to the total number of experiments conducted. The safety assessment focuses on detecting any damage to critical transformer components, such as windings and key electronic parts, during the disassembly process.
In the 50 standardized transformer cutting experiments, the system achieved an average disassembly time of 876.6 s, with the fastest recorded at 848.2 s and the slowest at 896.7 s. Out of these operations, 49 were successful, resulting in a disassembly success rate of 98%. The sole failure occurred during silicon steel sheet processing due to a path planning error. Notably, no successful disassembly operation damaged core components, demonstrating effective cutting path planning that mitigated risks to critical parts. The results are visually represented in Figure 16 and summarized in Table 2.

6.2.1. Visualization of the Disassembly Process

In order to make our experiment appear more comprehensive and to demonstrate the effectiveness of our method, we provide additional disassembly photos, as shown in Figure 17. Figure 17a illustrates the transformer cover and internal component recognition, while Figure 17b shows the process and photos of the dismantling of discarded transformers.

6.2.2. Analysis of Disassembly Time

The results from the 50 experiments indicate that the system’s average disassembly time is 876.6 s, showcasing a significant advantage over traditional methods. This time reduction is largely due to the automated path planning and target detection techniques employed by the system, which prove particularly efficient for handling screws and winding studs. In our experiments, the transformer we used is a 10 KV, 400 KVA oil-immersed transformer. The number of screws is between approximately 150 and 300, the length of the core welds is about 1.5 to 3 m, the length of the oil tank welds is about 5 to 8 m, and the weld length of other components is about 5 to 10 m.

6.2.3. Analysis of Disassembly Success Rate

The system achieved a success rate of 98%, demonstrating its strong adaptability to the complex internal structures of transformers. The sole failure during silicon steel sheet processing highlights the challenges posed by specific transformer designs, suggesting that further optimization of the path planning algorithm could enhance performance in such cases.

6.2.4. Analysis of Safety Assessment

The automated disassembly system has proven effective in safeguarding the core components of the transformer, with no damage reported to the windings or other critical parts during successful operations. This reliability highlights the system’s superiority over manual methods, as it minimizes the risk of mishandling and enhances the overall safety during the disassembly process.
The results from the 50 experiments validate the system’s effectiveness in disassembly time, success rate, and safety. Its automated path planning and target detection notably enhance disassembly efficiency while protecting core components. Future research should focus on optimizing the path planning algorithm to increase adaptability to complex structures and further improve the success rate of disassembly operations.

7. Conclusions

This paper presents an automated transformer disassembly method leveraging multimodal sensing and cooperative control. By fusing 2D images and 3D point cloud data from a multi-view RGB-D camera, alongside deep learning models and advanced cutting technologies, the system demonstrates high accuracy and efficiency in disassembling transformers. The experimental results highlight the system’s impressive disassembly success rate and safety, effectively handling key components like screws, copper plates, winding studs, and silicon steel sheets while preventing damage to critical transformer components.
The experimental results indicate that
  • The system achieves an average disassembly time of under 20 min, marking a significant reduction compared to traditional manual methods.
  • The disassembly success rate stands at 98%, showcasing the system’s adaptability to various complex structures and transformer types.
  • Importantly, there was no damage to core components in any successful disassembly experiments, ensuring a safe disassembly process.
The system notably enhances disassembly efficiency and simplifies operations through automated path planning and multimodal feature fusion technology. Its versatility allows it to adapt to various transformer types, presenting promising applications in waste transformer recycling.

Author Contributions

Conceptualization, L.W.; methodology, F.C.; software, Y.H.; validation, L.W.; formal analysis, F.C.; investigation, Z.Z.; resources, Y.H.; data curation, Y.H.; writing—original draft preparation, L.W.; writing—review and editing, Y.H. and K.Z.; visualization, K.Z.; supervision, Y.H. and Z.Z.; project administration, Y.H.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is funded by the State Grid Zhejiang Electric Power Science and Technology Project (5211WF240001 Research on Key Technology of Fine Dismantling and Green Full-volume Recovery of Distribution Transformer Based on Artificial Intelligence).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors Li Wang and Feng Chen were employed by the company State Grid Zhejiang Electric Power Co., Ltd. And authors Yujia Hu and Zhiyao Zheng was employed by the company Zhejiang Huadian Equipment Testing and Research Insitute Co. Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Santochi, M.; Dini, G.; Failli, F. Disassembly for recycling, maintenance and remanufacturing: State of the art and perspectives. In Proceedings of the AMST’02 Advanced Manufacturing Systems and Technology: Proceedings of the Sixth International Conference; Springer: Berlin/Heidelberg, Germany, 2002; pp. 73–89. [Google Scholar]
  2. Han, G.; Ding, G.; Li, C.; Li, S. Case-matched study of pcbs toxiciy effects to women and children case in the areas which disused transformers were dismantled. J. Hyg. Res. 2006, 35, 791–793. [Google Scholar]
  3. Jia, Z.; Bhatia, A.; Aronson, R.M.; Bourne, D.; Mason, M.T. A Survey of Automated Threaded Fastening. IEEE Trans. Autom. Sci. Eng. 2018, 16, 298–310. [Google Scholar] [CrossRef]
  4. Rosati, G.; Faccio, M.; Carli, A.; Rossi, A. Fully flexible assembly systems (F-FAS): A new concept in flexible automation. Assem. Autom. 2013, 33, 8–21. [Google Scholar] [CrossRef]
  5. Cai, H.; Xu, X.; Zhang, Y.; Cong, X.; Lu, X.; Huo, X. Elevated lead levels from e-waste exposure are linked to sensory integration difficulties in preschool children. Neurotoxicology 2019, 71, 150–158. [Google Scholar] [CrossRef]
  6. Zhang, L.; Sheng, H.; Lei, W. Distribution Transformer Recycling Pathway. China Electr. Equip. Ind. 2024, 6, 32–35. [Google Scholar]
  7. Zhang, K.; Wen, Q.; Zhang, C.; Cai, R.; Jin, M.; Liu, Y.; Zhang, J.Y.; Liang, Y.; Pang, G.; Song, D.; et al. Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6775–6794. [Google Scholar] [CrossRef]
  8. Cai, R.; Gao, W.; Peng, L.; Lu, Z.; Zhang, K.; Liu, Y. Debiased Contrastive Learning With Supervision Guidance for Industrial Fault Detection. IEEE Trans. Ind. Inform. 2024, 20, 12814–12825. [Google Scholar] [CrossRef]
  9. Cai, R.; Peng, L.; Lu, Z.; Zhang, K.; Liu, Y. DCS: Debiased Contrastive Learning with Weak Supervision for Time Series Classification. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Reublic of Korea, 14–19 April 2024; pp. 5625–5629. [Google Scholar]
  10. Kroll, E.; Hanft, T.A. Quantitative evaluation of product disassembly for recycling. Res. Eng. Des. 1998, 10, 1–14. [Google Scholar] [CrossRef]
  11. Chen, W.; Vongbunyong, S. Disassembly Automation: Automated Systems with Cognitive Abilities; Sustainable Production; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
  12. Jin, L.; Zhang, X.; Fang, Y.; Pham, D.T. Transfer Learning-Assisted Evolutionary Dynamic Optimisation for Dynamic Human-Robot Collaborative Disassembly Line Balancing. Appl. Sci. 2022, 12, 11008. [Google Scholar] [CrossRef]
  13. Tan, W.J.; Chin, C.M.M.; Garg, A.; Gao, L. A hybrid disassembly framework for disassembly of electric vehicle batteries. Int. J. Energy Res. 2021, 45, 8073–8082. [Google Scholar] [CrossRef]
  14. Zheran, W.; Jie, Y.; Dongdong, G.; Jianfeng, L. Mechanical Structure Design and Test of Multi Wire Diameter High Voltage Cable Cutting Machine. J. Sci. Res. Rep. 2022, 28, 1–8. [Google Scholar] [CrossRef]
  15. Yong, Z.; Yugang, Z.; Qian, L.; Ning, L.; Chuanying, Z. Design and Control of Ultrasonic Stripping Device for Waste Wire Based on ARM. In Modular Machine Tool & Automatic Manufacturing Technique; Tongfang Knowledge Network (Beijing) Technology Co., Ltd.: Beijing, China, 2020; pp. 147–150. [Google Scholar] [CrossRef]
  16. Tang, X.; Liu, Z.; Yu, P. Detection algorithm for multi-scale and multi-directional bolts. J. Electron. Meas. Instrum. 2023, 37, 221–231. [Google Scholar]
  17. Shafique, K.; Khawaja, B.A.; Sabir, F.; Qazi, S.; Mustaqim, M. Internet of Things (IoT) for Next-Generation Smart Systems: A Review of Current Challenges, Future Trends and Prospects for Emerging 5G-IoT Scenarios. IEEE Access 2020, 8, 23022–23040. [Google Scholar] [CrossRef]
  18. Roselli, L.; Carvalho, N.B.; Alimenti, F.; Mezzanotte, P.; Orecchini, G.; Virili, M.; Mariotti, C.; Goncalves, R.; Pinho, P. Smart Surfaces: Large Area Electronics Systems for Internet of Things Enabled by Energy Harvesting. Proc. IEEE 2014, 102, 1723–1746. [Google Scholar] [CrossRef]
  19. Costanzo, A.; Masotti, D. Energizing 5G: Near- and Far-Field Wireless Energy and Data Trantransfer as an Enabling Technology for the 5G IoT. IEEE Microw. Mag. 2017, 18, 125–136. [Google Scholar] [CrossRef]
  20. Wu, K.; Kuhlenkoetter, B. Dynamic behavior and path accuracy of an industrial robot with a CNC controller. Adv. Mech. Eng. 2022, 14, 16878132221082869. [Google Scholar] [CrossRef]
  21. Radanliev, P.; De Roure, D.; Nicolescu, R.; Huth, M.; Santos, O. Artificial Intelligence and the Internet of Things in Industry 4.0. CCF Trans. Pervasive Comput. Interact. 2021, 3, 329–338. [Google Scholar] [CrossRef]
  22. Liu, H.; Duan, T. Real-Time Multimodal 3D Object Detection with Transformers. World Electr. Veh. J. 2024, 15, 307. [Google Scholar] [CrossRef]
  23. Duan, J.; Zhuang, L.; Zhang, Q.; Zhou, Y.; Qin, J. Multimodal perception-fusion-control and human–robot collaboration in manufacturing: A review. Int. J. Adv. Manuf. Technol. 2024, 132, 1071–1093. [Google Scholar] [CrossRef]
  24. Xu, S.; Zhou, D.; Fang, J.; Yin, J.; Bin, Z.; Zhang, L. FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 3047–3054. [Google Scholar]
  25. Xia, C.; Li, X.; Gao, X.; Ge, B.; Li, K.C.; Fang, X.; Zhang, Y.; Yang, K. PCDR-DFF: Multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion. Neural Comput. Appl. 2024, 36, 9329–9346. [Google Scholar] [CrossRef]
  26. Guo, K.; Gan, T.; Ding, Z.; Ling, Q. Deformable Feature Fusion Network for Multi-Modal 3D Object Detection. In Proceedings of the 2024 3rd International Conference on Robotics, Artificial Intelligence and Intelligent Control (RAIIC), Mianyang, China, 5–7 July 2024; pp. 363–367. [Google Scholar]
  27. Tan, M.; Le, Q. EfficientNetV2: Smaller Models and Faster Training. Proc. Int. Conf. Mach. Learn. PMLR 2021, 139, 10096–10106. [Google Scholar]
  28. Avetisyan, A.; Dahnert, M.; Dai, A.; Savva, M.; Chang, A.X.; Nießner, M. Scan2CAD: Learning CAD Model Alignment in RGB-D Scans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2614–2623. [Google Scholar]
  29. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Adv. Neural Inf. Process. Syst. 2017, 30, 1–10. [Google Scholar]
  30. Božič, A.; Palafox, P.; Zollhöfer, M.; Dai, A.; Thies, J.; Nießner, M. Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1450–1459. [Google Scholar]
  31. Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
  32. Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention Mask Transformer for Universal Image Segmentation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1290–1299. [Google Scholar]
  33. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
  34. Wainwright, M. Stochastic approximation with cone-contractive operators: Sharp. arXiv 2019, arXiv:1905.06265. [Google Scholar]
  35. Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  36. Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
  37. Zhang, Z. A Flexible New Technique for Camera Calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
  38. Jain, A.; Katara, P.; Gkanatsios, N.; Harley, A.W.; Sarch, G.; Aggarwal, K.; Chaudhary, V.; Fragkiadaki, K. ODIN: A Single Model for 2D and 3D Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3564–3574. [Google Scholar]
  39. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
Figure 1. General scheme for transformer disassembly system.
Figure 1. General scheme for transformer disassembly system.
Algorithms 17 00595 g001
Figure 2. Schematic diagram of the transformer cover disassembly process.
Figure 2. Schematic diagram of the transformer cover disassembly process.
Algorithms 17 00595 g002
Figure 3. Schematic diagram of the disassembly process of the internal components of the transformer.
Figure 3. Schematic diagram of the disassembly process of the internal components of the transformer.
Algorithms 17 00595 g003
Figure 4. Overall flowchart of transformer copper wire winding stripping method.
Figure 4. Overall flowchart of transformer copper wire winding stripping method.
Algorithms 17 00595 g004
Figure 5. Two types of transformer cover. (a) Screw-fastened cover. (b) Welded cover.
Figure 5. Two types of transformer cover. (a) Screw-fastened cover. (b) Welded cover.
Algorithms 17 00595 g005
Figure 6. Process of target detection.
Figure 6. Process of target detection.
Algorithms 17 00595 g006
Figure 7. Diagram of the target detection effect.
Figure 7. Diagram of the target detection effect.
Algorithms 17 00595 g007
Figure 8. Result of the transformer cover segmentation.
Figure 8. Result of the transformer cover segmentation.
Algorithms 17 00595 g008
Figure 9. The process of the ODIN modeling transformer.
Figure 9. The process of the ODIN modeling transformer.
Algorithms 17 00595 g009
Figure 10. The process of segmentation and identification of internal components of the transformer.
Figure 10. The process of segmentation and identification of internal components of the transformer.
Algorithms 17 00595 g010
Figure 11. Constant-tension intelligent unwinding system structure.
Figure 11. Constant-tension intelligent unwinding system structure.
Algorithms 17 00595 g011
Figure 12. Intelligent wire removal system.
Figure 12. Intelligent wire removal system.
Algorithms 17 00595 g012
Figure 13. Architecture of the RCF network.
Figure 13. Architecture of the RCF network.
Algorithms 17 00595 g013
Figure 14. Transformer disassembly area in the new established disassembly factory.
Figure 14. Transformer disassembly area in the new established disassembly factory.
Algorithms 17 00595 g014
Figure 15. Transformer interior before disassembly.
Figure 15. Transformer interior before disassembly.
Algorithms 17 00595 g015
Figure 16. Transformer interior after disassembly.
Figure 16. Transformer interior after disassembly.
Algorithms 17 00595 g016
Figure 17. Component recognition and disassembly process. (a) Transformer cover and internal component recognition. (b) Transformer disassembly process and results.
Figure 17. Component recognition and disassembly process. (a) Transformer cover and internal component recognition. (b) Transformer disassembly process and results.
Algorithms 17 00595 g017
Table 1. Experimental results of disassembling the fuel tank cover.
Table 1. Experimental results of disassembling the fuel tank cover.
MetricRecognition AccuracyDisassembly TimeSuccess RateSecurity Assessment
Screw-fastened cover99%186.3 s98%Undamaged
Welded cover99%265.4 s96%Undamaged
Table 2. Experimental results of disassembling the transformer interior.
Table 2. Experimental results of disassembling the transformer interior.
Disassembly StepsAverage Disassembly TimeDisassembly Success RateSecurity Assessment
Cutting cover290.5 s100%Undamaged
Cutting winding studs305.7 s98%Undamaged
Cutting or lifting
silicon steel sheets
280.4 s98%Undamaged
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, L.; Chen, F.; Hu, Y.; Zheng, Z.; Zhang, K. Disassembly of Distribution Transformers Based on Multimodal Data Recognition and Collaborative Processing. Algorithms 2024, 17, 595. https://doi.org/10.3390/a17120595

AMA Style

Wang L, Chen F, Hu Y, Zheng Z, Zhang K. Disassembly of Distribution Transformers Based on Multimodal Data Recognition and Collaborative Processing. Algorithms. 2024; 17(12):595. https://doi.org/10.3390/a17120595

Chicago/Turabian Style

Wang, Li, Feng Chen, Yujia Hu, Zhiyao Zheng, and Kexin Zhang. 2024. "Disassembly of Distribution Transformers Based on Multimodal Data Recognition and Collaborative Processing" Algorithms 17, no. 12: 595. https://doi.org/10.3390/a17120595

APA Style

Wang, L., Chen, F., Hu, Y., Zheng, Z., & Zhang, K. (2024). Disassembly of Distribution Transformers Based on Multimodal Data Recognition and Collaborative Processing. Algorithms, 17(12), 595. https://doi.org/10.3390/a17120595

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop