CN117132914B

CN117132914B - General power equipment identification large model method and system

Info

Publication number: CN117132914B
Application number: CN202311403372.4A
Authority: CN
Inventors: 杨必胜; 陈驰; 付晶; 邵瑰玮; 严正斐; 邹勤; 金昂; 王治邺; 吴少龙; 孙上哲
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2023-10-27
Filing date: 2023-10-27
Publication date: 2024-01-30
Anticipated expiration: 2043-10-27
Also published as: CN117132914A

Abstract

The present invention proposes a general electric equipment identification large model method and system, taking the oblique images collected by the drone inspection system as the research object. According to the characteristics of its data, a single-stage object detector is trained to identify the object bounding box in the image as the initial prompt information, and the intermediate layer features of the image encoder are used to generate prompts containing semantic category information. By fusing two kinds of prompt information, a category-related universal power equipment segmentation model is formed. This method better solves the problem of large data and few identification types required for training models in power scenarios, and provides basic data for subsequent power equipment defect diagnosis and three-dimensional modeling.

Description

General power equipment identification large model method and system

技术领域Technical field

本发明属于无人机电力巡检中的巡检影像电力设备自动识别应用，提出了一种通用的电力设备识别大模型。The invention belongs to the application of automatic identification of power equipment in inspection images during UAV power inspection, and proposes a general large model for identification of power equipment.

背景技术Background technique

随着输电线路里程连年大规模增加，对保障输电线路安全稳定运行提出了重大考验。随着无人机、计算机视觉、嵌入式技术井喷式发展，电力巡检作业模式正逐步由传统人工方式向无人机精细巡视转变。无人机在巡检过程中，机上搭载的传感器吊舱沿电力走廊采集倾斜影像，通过机上或后台部署的目标检测算法定位和识别电力设备，诊断隐患故障。“无人机+视觉识别系统”的新型电力巡检方式因其成本低、效率高，正逐渐成为主流的巡检作业模式。As the mileage of transmission lines increases on a large scale year after year, it poses a major challenge to ensuring the safe and stable operation of transmission lines. With the explosive development of drones, computer vision, and embedded technologies, the power inspection operation mode is gradually changing from traditional manual methods to precise drone inspections. During the inspection process of the drone, the sensor pod mounted on the aircraft collects oblique images along the power corridor, locates and identifies power equipment through the target detection algorithm deployed on the aircraft or in the background, and diagnoses hidden faults. The new power inspection method of "drone + visual recognition system" is gradually becoming the mainstream inspection operation mode because of its low cost and high efficiency.

在计算机视觉领域，SAM技术实现了零样本的高性能检测效果，在通用场景中取得优异的分割性能。在电力巡检领域，由于无人机影像的复杂性及目标尺寸差异性大，现有的检测算法通用性低、参数量有限、泛化能力差，难以实现电力设备通用大模型，在实际输电线路中性能仍缺乏鲁棒性。基于此，在无人机电力巡检领域，采用SAM技术可有效提升检测性能。但SAM是一种类别无关的实例分割方法，严重依赖于先验的手动提示，包括点、框和粗略掩模，这些限制使SAM不适用于电力巡检影像的全自动解译。In the field of computer vision, SAM technology achieves zero-sample high-performance detection and achieves excellent segmentation performance in general scenarios. In the field of power inspection, due to the complexity of UAV images and the large differences in target sizes, the existing detection algorithms have low versatility, limited parameters, and poor generalization capabilities, making it difficult to realize a general large model of power equipment and implement it in actual power transmission. Performance on the line still lacks robustness. Based on this, in the field of drone power inspection, the use of SAM technology can effectively improve detection performance. However, SAM is a category-independent instance segmentation method that relies heavily on a priori manual cues, including points, boxes, and rough masks. These limitations make SAM unsuitable for fully automatic interpretation of power inspection images.

发明内容Contents of the invention

针对电力场景中现有算法通用性差，识别种类少的问题，本发明以电力场景中巡检数据处理为研究对象，设计应用场景广、识别类别全的通用电力设备识别大模型方法及系统。Aiming at the problems of poor versatility and few recognition categories of existing algorithms in power scenarios, the present invention takes inspection data processing in power scenarios as the research object and designs a general large model method and system for power equipment identification with wide application scenarios and complete recognition categories.

本发明所设计的通用电力设备识别大模型方法，其特殊之处在于，包括以下步骤：The special feature of the general power equipment identification large model method designed by this invention is that it includes the following steps:

步骤1，采集获取电力巡检影像数据，构建输电线路数据集；Step 1: Collect power inspection image data and construct a transmission line data set;

步骤2，训练单阶段目标检测器，从数据集图像中检测到目标边界框作为显式提示；Step 2, train a single-stage object detector to detect object bounding boxes from the dataset images as explicit cues;

步骤3，利用大模型中图像编码器处理步骤1中的影像数据，再经大模型的提示器生成包含语义类别信息的隐式提示；融合步骤2和3中两种形式的提示，将融合后的语义类别信息传入大模型中，获得通用电力识别结果；Step 3: Use the image encoder in the large model to process the image data in step 1, and then use the prompter of the large model to generate an implicit prompt containing semantic category information; fuse the two forms of prompts in steps 2 and 3, and the fused The semantic category information is passed into the large model to obtain general electric power recognition results;

其中，融合方式为，将单阶段检测器生成的显式提示与大模型中间层生成的隐式提示特征对齐，通过计算显式提示与隐式提示特征图的映射关系，融合两种提示的位置信息与类别信息。Among them, the fusion method is to align the explicit prompts generated by the single-stage detector with the implicit prompt features generated by the middle layer of the large model, and fuse the positions of the two prompts by calculating the mapping relationship between the explicit prompts and the implicit prompt feature maps. Information and category information.

一种优选方式为，步骤1具体实现如下：A preferred way is to implement step 1 as follows:

步骤1.1，首先将采集到的各种场景的巡检影像筛选及清洗；Step 1.1: First, screen and clean the collected inspection images of various scenes;

步骤1.2，使用labelImg标注巡检影像，其中输电线路场景包括城市、大棚、农田、灌木丛、荒地、湖泊等，电力设备及外部侵入物种类覆盖输电杆塔、绝缘子、均压环、防震锤、间隔棒、绝缘子爆片、飘挂物；Step 1.2, use labelImg to label inspection images. The transmission line scenes include cities, greenhouses, farmland, bushes, wasteland, lakes, etc. The types of power equipment and external intruders cover transmission towers, insulators, voltage equalizing rings, anti-vibration hammers, and intervals. Rods, insulator bursting discs, floating objects;

步骤1.3，将处理后的倾斜影像输入到单阶段目标检测器。Step 1.3, input the processed oblique image to the single-stage object detector.

进一步地，所述单阶段目标检测器采用YOLOv8。Further, the single-stage object detector uses YOLOv8.

一种优选方式为，步骤2的具体过程如下：A preferred way is that the specific process of step 2 is as follows:

步骤2.1，原始影像经过尺度变化和填充；Step 2.1, the original image undergoes scale change and filling;

步骤2.2，将步骤2.1处理后的影像经数据增强及预处理后，输入到单阶段检测器骨干网中；Step 2.2: After data enhancement and preprocessing, the image processed in step 2.1 is input into the single-stage detector backbone network;

步骤2.3，对骨干网提取到的特征进行多尺度特征融合；Step 2.3: Perform multi-scale feature fusion on the features extracted by the backbone network;

步骤2.4，融合后的特征输入到单阶段目标检测器，获取图像中包含的目标类别和粗略检测框。Step 2.4, the fused features are input to the single-stage target detector to obtain the target category and rough detection frame contained in the image.

进一步地，步骤2.2中数据增强及预处理包括：水平及竖直翻转、对比度调整、旋转、马赛克增强，自适应锚框计算和自适应灰度填充。Further, data enhancement and preprocessing in step 2.2 include: horizontal and vertical flipping, contrast adjustment, rotation, mosaic enhancement, adaptive anchor frame calculation and adaptive grayscale filling.

一种优选方式为，步骤3采用SAM大模型，具体过程如下：A preferred way is to use the SAM large model in step 3. The specific process is as follows:

步骤3.1，输入原始图像，经过预训练的VIT骨干网络后，生成中间特征图；Step 3.1, input the original image, and generate an intermediate feature map after passing through the pre-trained VIT backbone network;

步骤3.2，将上步通过ViT骨干网络获取的中间特征，输入到轻量级的特征聚合模块，得到融合后的语义特征；Step 3.2: Input the intermediate features obtained through the ViT backbone network in the previous step into the lightweight feature aggregation module to obtain the fused semantic features;

步骤3.3，在获得融合的语义特征后，使用提示器为SAM掩码解码器生成隐式提示嵌入；Step 3.3, after obtaining the fused semantic features, use the hinter to generate implicit hint embeddings for the SAM mask decoder;

步骤3.4，将单阶段检测器生成的显示提示与SAM中间层生成的隐式提示特征对齐，然后进行提示融合，提取丰富的语义信息。Step 3.4, align the display cues generated by the single-stage detector with the implicit cues features generated by the SAM intermediate layer, and then perform cue fusion to extract rich semantic information.

基于同一发明构思，本方案还设计了一种实现通用电力设备识别大模型方法的系统：Based on the same inventive concept, this solution also designs a system that implements a large model method for universal power equipment identification:

包括数据获取模块，采集获取电力巡检影像数据，构建输电线路数据集；It includes a data acquisition module to collect and obtain power inspection image data and build a transmission line data set;

单阶段目标检测器模块，将检测到的目标边界框作为显式提示；A single-stage object detector module that uses detected object bounding boxes as explicit cues;

通用电力识别模块，将大模型中图像编码器的中间层特征以形成提示器的输入，生成包含语义类别信息的隐式提示；融合显示提示和隐式提示，将融合后的语义类别信息传入大模型中，获得通用电力识别结果；The general electric power recognition module uses the middle layer features of the image encoder in the large model to form the input of the prompter, and generates an implicit prompt containing semantic category information; it fuses the display prompt and the implicit prompt, and passes in the fused semantic category information. In large models, general power identification results are obtained;

其中，融合方式为，将单阶段检测器生成的显式提示与SAM中间层生成的隐式提示特征对齐，通过计算显式提示与隐式提示特征图的映射关系，融合两种提示的位置信息与类别信息。Among them, the fusion method is to align the explicit prompts generated by the single-stage detector with the implicit prompt features generated by the SAM intermediate layer, and fuse the position information of the two prompts by calculating the mapping relationship between the explicit prompts and the implicit prompt feature maps. and category information.

基于同一发明构思，本方案还设计了一种电子设备，包括：Based on the same inventive concept, this solution also designs an electronic device, including:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序；A storage device for storing one or more programs;

当一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现通用电力设备识别大模型方法。When one or more programs are executed by the one or more processors, the one or more processors implement a general power equipment identification large model method.

基于同一发明构思，本方案还设计了一种计算机可读介质，其上存储有计算机程序，其特征在于：所述程序被处理器执行时实现通用电力设备识别大模型方法。Based on the same inventive concept, this solution also designs a computer-readable medium on which a computer program is stored, which is characterized in that: when the program is executed by the processor, a universal power equipment identification large model method is implemented.

与现有技术相比，本发明具有以下优点和有益效果：Compared with the existing technology, the present invention has the following advantages and beneficial effects:

识别的电力设备种类及应用场景大幅度扩充，可满足多时段、多类型场景巡检数据自动处理，如城市、农田、湖泊、森林、草地、荒地等；总体检测精度提升明显，确保较高的召回率；在实际输电线路场景中应用鲁棒性强、泛化能力高。The identified types of power equipment and application scenarios have been greatly expanded, which can meet the automatic processing of inspection data in multiple time periods and multiple types of scenarios, such as cities, farmland, lakes, forests, grasslands, wastelands, etc.; the overall detection accuracy has been significantly improved, ensuring higher Recall rate; strong application robustness and high generalization ability in actual transmission line scenarios.

以无人机巡检系统采集的倾斜影像为研究对象。针对其数据特点，采用单阶段目标检测器识别图像中目标边界框作为初始提示信息，利用图像编码器的中间层特征生成包含语义类别信息的提示。通过融合两种提示信息，完成类别相关的通用电力设备分割模型。该方法较好地解决电力场景中海量巡检数据识别类别低的问题，为后续的电力设备缺陷诊断及三维建模提供基础数据。The oblique images collected by the UAV inspection system are used as the research object. In view of its data characteristics, a single-stage object detector is used to identify the target bounding box in the image as the initial prompt information, and the intermediate layer features of the image encoder are used to generate prompts containing semantic category information. By fusing two kinds of prompt information, a category-related universal power equipment segmentation model is completed. This method can better solve the problem of low identification categories of massive inspection data in power scenarios, and provide basic data for subsequent power equipment defect diagnosis and three-dimensional modeling.

附图说明Description of the drawings

图1 本发明实施例中流程图。Figure 1 is a flow chart in an embodiment of the present invention.

图2本发明实施例中单阶段检测器结构图。Figure 2 is a structural diagram of a single-stage detector in an embodiment of the present invention.

图3本发明实施例中单阶段检测器具体结构图。Figure 3 is a specific structural diagram of a single-stage detector in an embodiment of the present invention.

图4本发明实施例中解码器结构图。Figure 4 is a structural diagram of the decoder in the embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图和实施例对本发明技术方案进行说明。The technical solution of the present invention will be described below with reference to the drawings and examples.

实施例一Embodiment 1

本发明设计的一种通用电力设备识别大模型方法，选择无人机巡检系统采集的巡检影像对本发明提出的方法进行具体说明。该方法的过程如附图1所示，包括以下步骤：This invention designs a method for identifying large models of general electric equipment. The inspection images collected by the UAV inspection system are selected to illustrate the method proposed by this invention in detail. The process of this method is shown in Figure 1 and includes the following steps:

步骤2，训练单阶段目标检测器，将检测到的影像数据目标边界框作为显式提示。Step 2: Train a single-stage object detector and use the detected image data object bounding box as an explicit prompt.

步骤3，将图像编码器的中间层特征以形成提示器的输入，生成包含语义类别信息的提示。融合两种形式的提示，将语义类别信息传入SAM。从而获得通用电力识别结果。优选地，大模型采用SAM，其他图像分割模型也可以，本实施例中SAM模型最优。Step 3: The intermediate layer features of the image encoder are used to form the input of the prompter, and a prompt containing semantic category information is generated. The two forms of cues are fused to pass semantic category information into SAM. Thus, universal power identification results are obtained. Preferably, the large model uses SAM, and other image segmentation models are also possible. In this embodiment, the SAM model is optimal.

进一步的，步骤1的具体实现包括如下子步骤：Further, the specific implementation of step 1 includes the following sub-steps:

步骤1.1，首先将采集到的各种场景的巡检影像筛选及清洗。Step 1.1: First, screen and clean the collected inspection images of various scenes.

步骤1.2，使用labelImg标注巡检影像。其中输电线路场景包括城市、大棚、农田、灌木丛、荒地、湖泊等，电力设备及外部侵入物种类覆盖输电杆塔、绝缘子、均压环、防震锤、间隔棒、绝缘子爆片、飘挂物等。Step 1.2, use labelImg to label inspection images. The transmission line scenarios include cities, greenhouses, farmland, bushes, wasteland, lakes, etc., and the types of power equipment and external intruders cover transmission poles and towers, insulators, voltage equalizing rings, anti-vibration hammers, spacer rods, insulator burst discs, floating objects, etc. .

步骤1.3，将处理后的倾斜影像输入到单阶段目标检测器YOLOv8。Step 1.3, input the processed oblique image to the single-stage object detector YOLOv8.

进一步的，步骤2的具体实现包括如下子步骤：Further, the specific implementation of step 2 includes the following sub-steps:

步骤2.1，原始影像经过尺度变化和填充，缩放到640×640尺度。Step 2.1, the original image is scaled and filled to a scale of 640×640.

步骤2.2，对缩放后的影像进行数据增强及预处理，增强手段包括水平及竖直翻转、对比度调整、旋转、马赛克增强等。预处理包括自适应锚框计算和自适应灰度填充。Step 2.2: Perform data enhancement and preprocessing on the scaled image. Enhancement methods include horizontal and vertical flipping, contrast adjustment, rotation, mosaic enhancement, etc. Preprocessing includes adaptive anchor box calculation and adaptive grayscale filling.

步骤2.3，数据增强后的影像输入到单阶段检测器骨干网中。首先，原始影像经过卷积核大小为6×6，步长为2的卷积CBS模块，再通过4个卷积核大小为3×3，步长为2的CBS和C2f组成的联合模块。CBS模块由1个二维卷积、1个二维批标准化和缩放指数线性单元激活函数组成。C2f模块对残差特征进行学习，增加了更多的跳跃连接和额外的分离操作，在保证轻量化的同时获得更加丰富的梯度流信息。Step 2.3, the data-enhanced image is input into the single-stage detector backbone network. First, the original image passes through a convolutional CBS module with a convolution kernel size of 6×6 and a step size of 2, and then passes through a joint module composed of CBS and C2f with a convolution kernel size of 3×3 and a step size of 2. The CBS module consists of 1 2D convolution, 1 2D batch normalization and scaled exponential linear unit activation function. The C2f module learns residual features and adds more skip connections and additional separation operations to obtain richer gradient flow information while ensuring lightweight.

步骤2.4，通过骨干网提取到的特征经过融入C2f的多尺度特征金字塔PAN-FPN进行多尺度特征融合，输出特征图尺度为 80×80、40×40 和 20×20 的三个特征图。Step 2.4, the features extracted through the backbone network are integrated into the multi-scale feature pyramid PAN-FPN of C2f for multi-scale feature fusion, and the output feature map scales are three feature maps of 80×80, 40×40 and 20×20.

步骤2.5，融合后的特征输入到YOLO检测头。YOLOv8采用解耦检测头，将分类和检测头分离。损失计算过程主要包括正负样本分配策略和损失计算，YOLOv8采用任务对齐分配的原则，根据分类与回归的分数加权结果选择正样本，损失值计算包括分类和回归两个分支，分类分支依然采用二值交叉熵损失，回归分支则使用了Distribution Focal Loss和CIoU损失函数。输出图像中包含的目标类别和粗略检测框。Step 2.5, the fused features are input to the YOLO detection head. YOLOv8 uses a decoupled detection head to separate the classification and detection heads. The loss calculation process mainly includes the positive and negative sample allocation strategy and loss calculation. YOLOv8 adopts the principle of task alignment allocation and selects positive samples based on the score-weighted results of classification and regression. The loss value calculation includes two branches: classification and regression. The classification branch still uses the two branches. Value cross entropy loss, and the regression branch uses Distribution Focal Loss and CIoU loss functions. Output the object categories and rough detection boxes contained in the image.

进一步的，步骤3采用SAM大模型，该模型包括编码器、提示器、融合模块和解码器，其具体实现包括如下子步骤：Further, step 3 uses the SAM large model, which includes an encoder, prompter, fusion module and decoder. Its specific implementation includes the following sub-steps:

步骤3.1，原始的无人机巡检影像数据输入到SAM中编码器，经过预训练的VIT编码器骨干网络后，生成中间特征图。VIT骨干网络的预训练掩码自编码器将原始影像处理为中间特征。原始图像缩放到1024尺度，采用卷积核大小为16，步长为16的卷积将图像离散化为64×64×768的向量，向量在特征图宽和通道维度上顺序展平后再进入多层的VIT骨干网络，VIT骨干网络输出向量通过两层的卷积压缩特征维度为256，两层卷积核大小分别为1和3。Step 3.1: The original drone inspection image data is input to the encoder in SAM. After passing through the pre-trained VIT encoder backbone network, an intermediate feature map is generated. The pre-trained masked autoencoder of the VIT backbone network processes the original image into intermediate features. The original image is scaled to 1024 scale, and a convolution with a convolution kernel size of 16 and a step size of 16 is used to discretize the image into a 64×64×768 vector. The vector is sequentially flattened in the feature map width and channel dimensions before entering Multi-layer VIT backbone network, the output vector of the VIT backbone network is compressed through two layers of convolution. The feature dimension is 256, and the convolution kernel sizes of the two layers are 1 and 3 respectively.

步骤3.2，将上步通过ViT骨干网络获得的中间特征，输入到大模型提示器的轻量级特征聚合模块，生成融合语义特征，再通过掩码解码器生成隐式提示。该模块在不增加提示器计算复杂度的情况下学习表示来自ViT的各种中间特征层的语义特征，过程用以下公式表示：Step 3.2: Input the intermediate features obtained through the ViT backbone network in the previous step into the lightweight feature aggregation module of the large model prompter to generate fused semantic features, and then use the mask decoder to generate implicit prompts. This module learns to represent semantic features from various intermediate feature layers of ViT without increasing the computational complexity of the prompter. The process is expressed by the following formula:

和/>表示SAM 主干网的特征和由/>生成的向下采样特征。首先使用1×1的卷积层将通道从c减少原来维度的1/16，然后使用大小为3×3，步长为2的卷积层，降低空间维度。/>表示大小为3×3卷积层，/>表示最终的融合卷积层，包括两个3×3卷积层和一个1×1卷积层，以恢复SAM掩码解码器的原始信道尺寸。输入上一步获得融合的语义特征，使用提示器为SAM掩码解码器生成提示。首先使用基于锚点的区域提议网络生成候选目标框。通过RoI池化获取来自位置编码过的特征图的单个对象的视觉特征表示。从视觉特征中抽稀出三个感知头：语义头/>、定位头/>和提示头/>。语义头确定特定目标类别，定位头在生成的提示和目标实例掩码之间建立匹配准则，即基于定位的贪心匹配。提示头生成SAM掩码解码器所需的提示嵌入。其中/>表示轻量级RPN。/>操作可能导致后续提示生成丢失相对于整个图像的位置信息，将位置编码(PE)合并到原始融合特征(Fagg)中。过程用以下公式表示： and/> Represents the characteristics and structure of the SAM backbone/> Generated downsampled features. First, use a 1×1 convolutional layer to reduce the channel from c to 1/16 of the original dimension, and then use a 3×3 convolutional layer with a stride of 2 to reduce the spatial dimension. /> Represents a size of 3×3 convolutional layer, /> Represents the final fused convolutional layer, including two 3×3 convolutional layers and one 1×1 convolutional layer to restore the original channel dimensions of the SAM mask decoder. Input the fused semantic features obtained in the previous step and use the hinter to generate hints for the SAM mask decoder. Candidate object boxes are first generated using an anchor-based region proposal network. Obtain the visual feature representation of a single object from the position-encoded feature map through RoI pooling. Three perceptual heads are extracted from visual features: semantic head/> , positioning head/> and prompt header/> . The semantic header determines a specific target category, and the localization head establishes matching criteria between the generated hints and target instance masks, i.e., localization-based greedy matching. The hint header generates the hint embedding required by the SAM mask decoder. Among them/> Represents lightweight RPN. /> The operation may cause subsequent cue generation to lose position information relative to the entire image, merging the position encoding (PE) into the original fused feature (Fagg). The process is represented by the following formula:

该模型的损失包括RPN网络的二元分类损失和定位损失，语义头的分类损失，定位头的回归损失以及冻结的SAM掩码解码器的分割损失。总损失可以表示为：The losses of this model include the binary classification loss and localization loss of the RPN network, the classification loss of the semantic head, the regression loss of the localization head, and the segmentation loss of the frozen SAM mask decoder. The total loss can be expressed as:

步骤3.3，将单阶段检测器生成的显式提示与SAM中间层生成的隐式提示特征对齐，通过计算显式提示与隐式提示特征图的映射关系，融合两种提示的位置信息与类别信息，从而提供更加精确的定位、分类精度。Step 3.3: Align the explicit prompts generated by the single-stage detector with the implicit prompt features generated by the SAM intermediate layer. By calculating the mapping relationship between explicit prompts and implicit prompt feature maps, the location information and category information of the two prompts are integrated. , thereby providing more precise positioning and classification accuracy.

步骤3.4，掩膜解码器整合图像编码器和提示编码器分别输出的两个嵌入，解码出最终的分割掩膜，用Transformer 学习和提示对齐后的图像嵌入层以及额外4个Token的嵌入。4个Token嵌入分别是IoU Token 嵌入和3个分割结果 Token嵌入，经过Transformer 学习得到Token 嵌入经过最终的任务头，得到目标结果。Step 3.4, the mask decoder integrates the two embeddings output by the image encoder and the cue encoder respectively, decodes the final segmentation mask, and uses the Transformer to learn the image embedding layer aligned with the cue and the embedding of the additional 4 Tokens. The four Token embeddings are the IoU Token embedding and the three segmentation result Token embeddings. The Token embedding is obtained through Transformer learning and passes through the final task header to obtain the target result.

下面将结合具体实例应用进一步说明本发明的技术方案及有益效果。The technical solutions and beneficial effects of the present invention will be further described below with reference to specific examples.

利用无人机采集的多个倾斜影像数据集，经过本发明方法处理之后，电力场景部件识别种类超12种，缺陷超20类，电力设备分割mIoU为0.65，运行速度FPS达60。说明本发明可在保证处理效率同时，实现了较多种类、较高精度的通用电力设备分割结果。Using multiple oblique image data sets collected by drones and processed by the method of the present invention, more than 12 types of power scene components and more than 20 types of defects can be identified, the power equipment segmentation mIoU is 0.65, and the operating speed reaches 60 FPS. It shows that the present invention can achieve more types and higher precision general electric equipment segmentation results while ensuring processing efficiency.

实施例二Embodiment 2

基于同一发明构思，本方案还设计一种通用电力设备识别大模型系统，包括数据获取模块，采集获取电力巡检影像数据，构建输电线路数据集；Based on the same inventive concept, this plan also designs a general power equipment identification large model system, including a data acquisition module to collect and obtain power inspection image data and build a transmission line data set;

通用电力识别模块，利用大模型中图像编码器处理数据获取模块中影像数据，再经提示器生成包含语义类别信息的隐式提示；融合显示提示和隐式提示，将融合后的语义类别信息传入大模型中，获得通用电力识别结果；The general electric power recognition module uses the image encoder in the large model to process the data to obtain the image data in the module, and then generates implicit prompts containing semantic category information through the prompter; fuses the display prompts and implicit prompts, and transmits the fused semantic category information. into a large model to obtain general power identification results;

由于本发明实施例二所介绍的设备为实施本发明实施例一种通用电力设备识别大模型方法所采用的系统，故而基于本发明实施例一介绍的方法，本领域所属技术人员能够了解该电子设备的具体结构及变形，故而在此不再赘述。Since the equipment introduced in Embodiment 2 of the present invention is a system used to implement a universal power equipment identification large model method in Embodiment 1 of the present invention, based on the method introduced in Embodiment 1 of the present invention, those skilled in the art can understand the electronic The specific structure and deformation of the equipment will not be described again here.

实施例三Embodiment 3

基于同一发明构思，本发明还提供了一种电子设备，包括一个或多个处理器；存储装置，用于存储一个或多个程序；当一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现实施例一中所述的方法。Based on the same inventive concept, the present invention also provides an electronic device, including one or more processors; a storage device for storing one or more programs; when one or more programs are processed by the one or more processors Execution causes the one or more processors to implement the method described in Embodiment 1.

由于本发明实施例三所介绍的设备为实施本发明实施例一一种通用电力设备识别大模型方法所采用的电子设备，故而基于本发明实施例一介绍的方法，本领域所属技术人员能够了解该电子设备的具体结构及变形，故而在此不再赘述。凡是本发明实施例一种方法所采用的电子设备都属于本发明所欲保护的范围。Since the equipment introduced in Embodiment 3 of the present invention is an electronic device used to implement a method for identifying large models of general power equipment according to Embodiment 1 of the present invention, those skilled in the art can understand based on the method introduced in Embodiment 1 of the present invention. The specific structure and deformation of the electronic device will not be described again here. All electronic devices used in a method of the embodiments of the present invention fall within the scope of protection of the present invention.

实施例四Embodiment 4

基于同一发明构思，本发明还提供了一种计算机可读介质，其上存储有计算机程序，所述程序被处理器执行时实现实施例一中所述的方法。Based on the same inventive concept, the present invention also provides a computer-readable medium on which a computer program is stored. When the program is executed by a processor, the method described in Embodiment 1 is implemented.

由于本发明实施例四所介绍的设备为实施本发明实施例一一种通用电力设备识别大模型方法采用的计算机可读介质，故而基于本发明实施例一介绍的方法，本领域所属技术人员能够了解该电子设备的具体结构及变形，故而在此不再赘述。凡是本发明实施例一种方法所采用的电子设备都属于本发明所欲保护的范围。Since the equipment introduced in Embodiment 4 of the present invention is a computer-readable medium used to implement a method for identifying large models of general electric equipment in Embodiment 1 of the present invention, based on the method introduced in Embodiment 1 of the present invention, those skilled in the art can Understand the specific structure and deformation of the electronic device, so I will not go into details here. All electronic devices used in a method of the embodiments of the present invention fall within the scope of protection of the present invention.

以上内容是结合具体的实施方式对本发明所作的进一步详细说明，不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干简单推演或替换，都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in combination with specific implementation modes. It cannot be concluded that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field to which the present invention belongs, several simple deductions or substitutions can be made without departing from the concept of the present invention, and all of them should be regarded as belonging to the protection scope of the present invention.

Claims

1. A method for identifying a large model of a universal power device, comprising the steps of:

step 1, acquiring power inspection image data and constructing a power transmission line data set;

step 2, training a single-stage target detector, and taking the detected image data target bounding box as an explicit prompt;

step 3, adopting a SAM large model, wherein the large model comprises a VIT encoder, a prompter, a fusion module and a decoder, processing the image data in the step 1 by using the VIT encoder in the large model, and generating an implicit prompt containing semantic category information by using the prompter of the large model; the fusion module fuses the prompts in the two forms in the steps 2 and 3, and transmits the fused semantic category information into the large model to obtain a universal power recognition result;

the fusion mode is that an explicit prompt generated by a single-stage target detector is aligned with an implicit prompt feature generated by a large model middle layer, and the position information and the category information of the two prompts are fused by calculating the mapping relation between the explicit prompt and the implicit prompt feature map.

2. The universal power device identification large model method of claim 1, wherein: the step 1 is specifically implemented as follows:

step 1.1, screening and cleaning acquired inspection images of various scenes;

step 1.2, labeling inspection images by using labelImg, wherein a transmission line scene comprises cities, greenhouses, farmlands, bushes, barren lands and lakes, and power equipment and external invaded object types cover transmission towers, insulators, equalizing rings, damper blocks, spacers, insulator burst sheets and hanging objects;

step 1.3, inputting the processed oblique image to a single-stage object detector.

3. The universal power device identification large model method of claim 2, wherein: the single-stage object detector employs YOLOv8.

4. The universal power device identification large model method of claim 1, wherein: the specific process of the step 2 is as follows:

step 2.1, performing scale change and filling on the original image;

step 2.2, the image processed in the step 2.1 is subjected to data enhancement and pretreatment and then is input into a backbone network of the single-stage target detector;

step 2.3, carrying out multi-scale feature fusion on the features extracted by the backbone network;

and 2.4, inputting the fused characteristics to a single-stage target detector, and acquiring the target category and the rough detection frame contained in the image.

5. The universal power device identification large model method of claim 4, wherein:

the data enhancement and preprocessing in step 2.2 comprises: horizontal and vertical overturn, contrast adjustment, rotation, mosaic enhancement, adaptive anchor frame calculation and adaptive gray filling.

6. The universal power device identification large model method of claim 1, wherein:

the specific implementation process of the SAM big model in the step 3 is as follows:

step 3.1, inputting original image data, and generating an intermediate feature map through a pre-training VIT encoder;

step 3.2, the lightweight feature aggregation module of the prompter generates fusion semantic features from the acquired intermediate feature map, and then the prompter is utilized to generate implicit prompt for the SAM mask decoder;

step 3.3, aligning the display prompt generated by the single-stage target detector with the implicit prompt feature generated by the SAM middle layer by the fusion module, and then carrying out prompt fusion to extract semantic information;

and 3.4, integrating two embeddings output by the VIT encoder and the prompter by the decoder, and decoding a final segmentation mask.

7. The universal power device identification large model method of claim 6, wherein: the original image input in step 3.1 is scaled to 1024 scales.

8. A system for implementing the universal power device identification large model method of claim 1, characterized in that:

the power transmission line data acquisition module acquires power inspection image data and constructs a power transmission line data set;

the single-stage target detector module takes the detected target boundary box as an explicit hint;

the universal power recognition module processes the image data in the data acquisition module by utilizing an image encoder in the large model, and generates an implicit prompt containing semantic category information by a large model prompter; fusing the display prompt and the implicit prompt, and transmitting the fused semantic category information into a large model to obtain a general electric power recognition result;

the fusion mode is that an explicit prompt generated by a single-stage target detector is aligned with an implicit prompt feature generated by a SAM middle layer, and the position information and the category information of the two prompts are fused by calculating the mapping relation between the explicit prompt and the implicit prompt feature map.

9. An electronic device, comprising:

one or more processors;

a storage means for storing one or more programs;

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.

10. A computer readable medium having a computer program stored thereon, characterized by: the program, when executed by a processor, implements the method of any of claims 1-7.