CN117132914B - General power equipment identification large model method and system - Google Patents
General power equipment identification large model method and system Download PDFInfo
- Publication number
- CN117132914B CN117132914B CN202311403372.4A CN202311403372A CN117132914B CN 117132914 B CN117132914 B CN 117132914B CN 202311403372 A CN202311403372 A CN 202311403372A CN 117132914 B CN117132914 B CN 117132914B
- Authority
- CN
- China
- Prior art keywords
- large model
- prompt
- implicit
- image
- sam
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000007689 inspection Methods 0.000 claims abstract description 30
- 230000011218 segmentation Effects 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract 3
- 230000005540 biological transmission Effects 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 11
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 239000012212 insulator Substances 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 2
- 125000006850 spacer group Chemical group 0.000 claims description 2
- 238000004140 cleaning Methods 0.000 claims 1
- 238000002372 labelling Methods 0.000 claims 1
- 238000012216 screening Methods 0.000 claims 1
- 230000007547 defect Effects 0.000 abstract description 3
- 238000011160 research Methods 0.000 abstract description 3
- 238000003745 diagnosis Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 4
- 230000004807 localization Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000007500 overflow downdraw method Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009172 bursting Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/17—Terrestrial scenes taken from planes or by drones
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Remote Sensing (AREA)
- Image Analysis (AREA)
Abstract
本发明提出了一种通用电力设备识别大模型方法及系统,以无人机巡检系统采集的倾斜影像为研究对象。针对其数据特点,训练单阶段目标检测器识别图像中目标边界框作为初始提示信息,利用图像编码器的中间层特征生成包含语义类别信息的提示。通过融合两种提示信息,形成类别相关的通用电力设备分割模型。该方法较好地解决了电力场景中训练模型所需数据庞大、识别种类少的问题,为后续的电力设备缺陷诊断及三维建模提供基础数据。
The present invention proposes a general electric equipment identification large model method and system, taking the oblique images collected by the drone inspection system as the research object. According to the characteristics of its data, a single-stage object detector is trained to identify the object bounding box in the image as the initial prompt information, and the intermediate layer features of the image encoder are used to generate prompts containing semantic category information. By fusing two kinds of prompt information, a category-related universal power equipment segmentation model is formed. This method better solves the problem of large data and few identification types required for training models in power scenarios, and provides basic data for subsequent power equipment defect diagnosis and three-dimensional modeling.
Description
技术领域Technical field
本发明属于无人机电力巡检中的巡检影像电力设备自动识别应用,提出了一种通用的电力设备识别大模型。The invention belongs to the application of automatic identification of power equipment in inspection images during UAV power inspection, and proposes a general large model for identification of power equipment.
背景技术Background technique
随着输电线路里程连年大规模增加,对保障输电线路安全稳定运行提出了重大考验。随着无人机、计算机视觉、嵌入式技术井喷式发展,电力巡检作业模式正逐步由传统人工方式向无人机精细巡视转变。无人机在巡检过程中,机上搭载的传感器吊舱沿电力走廊采集倾斜影像,通过机上或后台部署的目标检测算法定位和识别电力设备,诊断隐患故障。“无人机+视觉识别系统”的新型电力巡检方式因其成本低、效率高,正逐渐成为主流的巡检作业模式。As the mileage of transmission lines increases on a large scale year after year, it poses a major challenge to ensuring the safe and stable operation of transmission lines. With the explosive development of drones, computer vision, and embedded technologies, the power inspection operation mode is gradually changing from traditional manual methods to precise drone inspections. During the inspection process of the drone, the sensor pod mounted on the aircraft collects oblique images along the power corridor, locates and identifies power equipment through the target detection algorithm deployed on the aircraft or in the background, and diagnoses hidden faults. The new power inspection method of "drone + visual recognition system" is gradually becoming the mainstream inspection operation mode because of its low cost and high efficiency.
在计算机视觉领域,SAM技术实现了零样本的高性能检测效果,在通用场景中取得优异的分割性能。在电力巡检领域,由于无人机影像的复杂性及目标尺寸差异性大,现有的检测算法通用性低、参数量有限、泛化能力差,难以实现电力设备通用大模型,在实际输电线路中性能仍缺乏鲁棒性。基于此,在无人机电力巡检领域,采用SAM技术可有效提升检测性能。但SAM是一种类别无关的实例分割方法,严重依赖于先验的手动提示,包括点、框和粗略掩模,这些限制使SAM不适用于电力巡检影像的全自动解译。In the field of computer vision, SAM technology achieves zero-sample high-performance detection and achieves excellent segmentation performance in general scenarios. In the field of power inspection, due to the complexity of UAV images and the large differences in target sizes, the existing detection algorithms have low versatility, limited parameters, and poor generalization capabilities, making it difficult to realize a general large model of power equipment and implement it in actual power transmission. Performance on the line still lacks robustness. Based on this, in the field of drone power inspection, the use of SAM technology can effectively improve detection performance. However, SAM is a category-independent instance segmentation method that relies heavily on a priori manual cues, including points, boxes, and rough masks. These limitations make SAM unsuitable for fully automatic interpretation of power inspection images.
发明内容Contents of the invention
针对电力场景中现有算法通用性差,识别种类少的问题,本发明以电力场景中巡检数据处理为研究对象,设计应用场景广、识别类别全的通用电力设备识别大模型方法及系统。Aiming at the problems of poor versatility and few recognition categories of existing algorithms in power scenarios, the present invention takes inspection data processing in power scenarios as the research object and designs a general large model method and system for power equipment identification with wide application scenarios and complete recognition categories.
本发明所设计的通用电力设备识别大模型方法,其特殊之处在于,包括以下步骤:The special feature of the general power equipment identification large model method designed by this invention is that it includes the following steps:
步骤1,采集获取电力巡检影像数据,构建输电线路数据集;Step 1: Collect power inspection image data and construct a transmission line data set;
步骤2,训练单阶段目标检测器,从数据集图像中检测到目标边界框作为显式提示;Step 2, train a single-stage object detector to detect object bounding boxes from the dataset images as explicit cues;
步骤3,利用大模型中图像编码器处理步骤1中的影像数据,再经大模型的提示器生成包含语义类别信息的隐式提示;融合步骤2和3中两种形式的提示,将融合后的语义类别信息传入大模型中,获得通用电力识别结果;Step 3: Use the image encoder in the large model to process the image data in step 1, and then use the prompter of the large model to generate an implicit prompt containing semantic category information; fuse the two forms of prompts in steps 2 and 3, and the fused The semantic category information is passed into the large model to obtain general electric power recognition results;
其中,融合方式为,将单阶段检测器生成的显式提示与大模型中间层生成的隐式提示特征对齐,通过计算显式提示与隐式提示特征图的映射关系,融合两种提示的位置信息与类别信息。Among them, the fusion method is to align the explicit prompts generated by the single-stage detector with the implicit prompt features generated by the middle layer of the large model, and fuse the positions of the two prompts by calculating the mapping relationship between the explicit prompts and the implicit prompt feature maps. Information and category information.
一种优选方式为,步骤1具体实现如下:A preferred way is to implement step 1 as follows:
步骤1.1,首先将采集到的各种场景的巡检影像筛选及清洗;Step 1.1: First, screen and clean the collected inspection images of various scenes;
步骤1.2,使用labelImg标注巡检影像,其中输电线路场景包括城市、大棚、农田、灌木丛、荒地、湖泊等,电力设备及外部侵入物种类覆盖输电杆塔、绝缘子、均压环、防震锤、间隔棒、绝缘子爆片、飘挂物;Step 1.2, use labelImg to label inspection images. The transmission line scenes include cities, greenhouses, farmland, bushes, wasteland, lakes, etc. The types of power equipment and external intruders cover transmission towers, insulators, voltage equalizing rings, anti-vibration hammers, and intervals. Rods, insulator bursting discs, floating objects;
步骤1.3,将处理后的倾斜影像输入到单阶段目标检测器。Step 1.3, input the processed oblique image to the single-stage object detector.
进一步地,所述单阶段目标检测器采用YOLOv8。Further, the single-stage object detector uses YOLOv8.
一种优选方式为,步骤2的具体过程如下:A preferred way is that the specific process of step 2 is as follows:
步骤2.1,原始影像经过尺度变化和填充;Step 2.1, the original image undergoes scale change and filling;
步骤2.2,将步骤2.1处理后的影像经数据增强及预处理后,输入到单阶段检测器骨干网中;Step 2.2: After data enhancement and preprocessing, the image processed in step 2.1 is input into the single-stage detector backbone network;
步骤2.3,对骨干网提取到的特征进行多尺度特征融合;Step 2.3: Perform multi-scale feature fusion on the features extracted by the backbone network;
步骤2.4,融合后的特征输入到单阶段目标检测器,获取图像中包含的目标类别和粗略检测框。Step 2.4, the fused features are input to the single-stage target detector to obtain the target category and rough detection frame contained in the image.
进一步地,步骤2.2中数据增强及预处理包括:水平及竖直翻转、对比度调整、旋转、马赛克增强,自适应锚框计算和自适应灰度填充。Further, data enhancement and preprocessing in step 2.2 include: horizontal and vertical flipping, contrast adjustment, rotation, mosaic enhancement, adaptive anchor frame calculation and adaptive grayscale filling.
一种优选方式为,步骤3采用SAM大模型,具体过程如下:A preferred way is to use the SAM large model in step 3. The specific process is as follows:
步骤3.1,输入原始图像,经过预训练的VIT骨干网络后,生成中间特征图;Step 3.1, input the original image, and generate an intermediate feature map after passing through the pre-trained VIT backbone network;
步骤3.2,将上步通过ViT骨干网络获取的中间特征,输入到轻量级的特征聚合模块,得到融合后的语义特征;Step 3.2: Input the intermediate features obtained through the ViT backbone network in the previous step into the lightweight feature aggregation module to obtain the fused semantic features;
步骤3.3,在获得融合的语义特征后,使用提示器为SAM掩码解码器生成隐式提示嵌入;Step 3.3, after obtaining the fused semantic features, use the hinter to generate implicit hint embeddings for the SAM mask decoder;
步骤3.4,将单阶段检测器生成的显示提示与SAM中间层生成的隐式提示特征对齐,然后进行提示融合,提取丰富的语义信息。Step 3.4, align the display cues generated by the single-stage detector with the implicit cues features generated by the SAM intermediate layer, and then perform cue fusion to extract rich semantic information.
基于同一发明构思,本方案还设计了一种实现通用电力设备识别大模型方法的系统:Based on the same inventive concept, this solution also designs a system that implements a large model method for universal power equipment identification:
包括数据获取模块,采集获取电力巡检影像数据,构建输电线路数据集;It includes a data acquisition module to collect and obtain power inspection image data and build a transmission line data set;
单阶段目标检测器模块,将检测到的目标边界框作为显式提示;A single-stage object detector module that uses detected object bounding boxes as explicit cues;
通用电力识别模块,将大模型中图像编码器的中间层特征以形成提示器的输入,生成包含语义类别信息的隐式提示;融合显示提示和隐式提示,将融合后的语义类别信息传入大模型中,获得通用电力识别结果;The general electric power recognition module uses the middle layer features of the image encoder in the large model to form the input of the prompter, and generates an implicit prompt containing semantic category information; it fuses the display prompt and the implicit prompt, and passes in the fused semantic category information. In large models, general power identification results are obtained;
其中,融合方式为,将单阶段检测器生成的显式提示与SAM中间层生成的隐式提示特征对齐,通过计算显式提示与隐式提示特征图的映射关系,融合两种提示的位置信息与类别信息。Among them, the fusion method is to align the explicit prompts generated by the single-stage detector with the implicit prompt features generated by the SAM intermediate layer, and fuse the position information of the two prompts by calculating the mapping relationship between the explicit prompts and the implicit prompt feature maps. and category information.
基于同一发明构思,本方案还设计了一种电子设备,包括:Based on the same inventive concept, this solution also designs an electronic device, including:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序;A storage device for storing one or more programs;
当一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现通用电力设备识别大模型方法。When one or more programs are executed by the one or more processors, the one or more processors implement a general power equipment identification large model method.
基于同一发明构思,本方案还设计了一种计算机可读介质,其上存储有计算机程序,其特征在于:所述程序被处理器执行时实现通用电力设备识别大模型方法。Based on the same inventive concept, this solution also designs a computer-readable medium on which a computer program is stored, which is characterized in that: when the program is executed by the processor, a universal power equipment identification large model method is implemented.
与现有技术相比,本发明具有以下优点和有益效果:Compared with the existing technology, the present invention has the following advantages and beneficial effects:
识别的电力设备种类及应用场景大幅度扩充,可满足多时段、多类型场景巡检数据自动处理,如城市、农田、湖泊、森林、草地、荒地等;总体检测精度提升明显,确保较高的召回率;在实际输电线路场景中应用鲁棒性强、泛化能力高。The identified types of power equipment and application scenarios have been greatly expanded, which can meet the automatic processing of inspection data in multiple time periods and multiple types of scenarios, such as cities, farmland, lakes, forests, grasslands, wastelands, etc.; the overall detection accuracy has been significantly improved, ensuring higher Recall rate; strong application robustness and high generalization ability in actual transmission line scenarios.
以无人机巡检系统采集的倾斜影像为研究对象。针对其数据特点,采用单阶段目标检测器识别图像中目标边界框作为初始提示信息,利用图像编码器的中间层特征生成包含语义类别信息的提示。通过融合两种提示信息,完成类别相关的通用电力设备分割模型。该方法较好地解决电力场景中海量巡检数据识别类别低的问题,为后续的电力设备缺陷诊断及三维建模提供基础数据。The oblique images collected by the UAV inspection system are used as the research object. In view of its data characteristics, a single-stage object detector is used to identify the target bounding box in the image as the initial prompt information, and the intermediate layer features of the image encoder are used to generate prompts containing semantic category information. By fusing two kinds of prompt information, a category-related universal power equipment segmentation model is completed. This method can better solve the problem of low identification categories of massive inspection data in power scenarios, and provide basic data for subsequent power equipment defect diagnosis and three-dimensional modeling.
附图说明Description of the drawings
图1 本发明实施例中流程图。Figure 1 is a flow chart in an embodiment of the present invention.
图2本发明实施例中单阶段检测器结构图。Figure 2 is a structural diagram of a single-stage detector in an embodiment of the present invention.
图3本发明实施例中单阶段检测器具体结构图。Figure 3 is a specific structural diagram of a single-stage detector in an embodiment of the present invention.
图4本发明实施例中解码器结构图。Figure 4 is a structural diagram of the decoder in the embodiment of the present invention.
具体实施方式Detailed ways
以下结合附图和实施例对本发明技术方案进行说明。The technical solution of the present invention will be described below with reference to the drawings and examples.
实施例一Embodiment 1
本发明设计的一种通用电力设备识别大模型方法,选择无人机巡检系统采集的巡检影像对本发明提出的方法进行具体说明。该方法的过程如附图1所示,包括以下步骤:This invention designs a method for identifying large models of general electric equipment. The inspection images collected by the UAV inspection system are selected to illustrate the method proposed by this invention in detail. The process of this method is shown in Figure 1 and includes the following steps:
步骤1,采集获取电力巡检影像数据,构建输电线路数据集;Step 1: Collect power inspection image data and construct a transmission line data set;
步骤2,训练单阶段目标检测器,将检测到的影像数据目标边界框作为显式提示。Step 2: Train a single-stage object detector and use the detected image data object bounding box as an explicit prompt.
步骤3,将图像编码器的中间层特征以形成提示器的输入,生成包含语义类别信息的提示。融合两种形式的提示,将语义类别信息传入SAM。从而获得通用电力识别结果。优选地,大模型采用SAM,其他图像分割模型也可以,本实施例中SAM模型最优。Step 3: The intermediate layer features of the image encoder are used to form the input of the prompter, and a prompt containing semantic category information is generated. The two forms of cues are fused to pass semantic category information into SAM. Thus, universal power identification results are obtained. Preferably, the large model uses SAM, and other image segmentation models are also possible. In this embodiment, the SAM model is optimal.
进一步的,步骤1的具体实现包括如下子步骤:Further, the specific implementation of step 1 includes the following sub-steps:
步骤1.1,首先将采集到的各种场景的巡检影像筛选及清洗。Step 1.1: First, screen and clean the collected inspection images of various scenes.
步骤1.2,使用labelImg标注巡检影像。其中输电线路场景包括城市、大棚、农田、灌木丛、荒地、湖泊等,电力设备及外部侵入物种类覆盖输电杆塔、绝缘子、均压环、防震锤、间隔棒、绝缘子爆片、飘挂物等。Step 1.2, use labelImg to label inspection images. The transmission line scenarios include cities, greenhouses, farmland, bushes, wasteland, lakes, etc., and the types of power equipment and external intruders cover transmission poles and towers, insulators, voltage equalizing rings, anti-vibration hammers, spacer rods, insulator burst discs, floating objects, etc. .
步骤1.3,将处理后的倾斜影像输入到单阶段目标检测器YOLOv8。Step 1.3, input the processed oblique image to the single-stage object detector YOLOv8.
进一步的,步骤2的具体实现包括如下子步骤:Further, the specific implementation of step 2 includes the following sub-steps:
步骤2.1,原始影像经过尺度变化和填充,缩放到640×640尺度。Step 2.1, the original image is scaled and filled to a scale of 640×640.
步骤2.2,对缩放后的影像进行数据增强及预处理,增强手段包括水平及竖直翻转、对比度调整、旋转、马赛克增强等。预处理包括自适应锚框计算和自适应灰度填充。Step 2.2: Perform data enhancement and preprocessing on the scaled image. Enhancement methods include horizontal and vertical flipping, contrast adjustment, rotation, mosaic enhancement, etc. Preprocessing includes adaptive anchor box calculation and adaptive grayscale filling.
步骤2.3,数据增强后的影像输入到单阶段检测器骨干网中。首先,原始影像经过卷积核大小为6×6,步长为2的卷积CBS模块,再通过4个卷积核大小为3×3,步长为2的CBS和C2f组成的联合模块。CBS模块由1个二维卷积、1个二维批标准化和缩放指数线性单元激活函数组成。C2f模块对残差特征进行学习,增加了更多的跳跃连接和额外的分离操作,在保证轻量化的同时获得更加丰富的梯度流信息。Step 2.3, the data-enhanced image is input into the single-stage detector backbone network. First, the original image passes through a convolutional CBS module with a convolution kernel size of 6×6 and a step size of 2, and then passes through a joint module composed of CBS and C2f with a convolution kernel size of 3×3 and a step size of 2. The CBS module consists of 1 2D convolution, 1 2D batch normalization and scaled exponential linear unit activation function. The C2f module learns residual features and adds more skip connections and additional separation operations to obtain richer gradient flow information while ensuring lightweight.
步骤2.4,通过骨干网提取到的特征经过融入C2f的多尺度特征金字塔PAN-FPN进行多尺度特征融合,输出特征图尺度为 80×80、40×40 和 20×20 的三个特征图。Step 2.4, the features extracted through the backbone network are integrated into the multi-scale feature pyramid PAN-FPN of C2f for multi-scale feature fusion, and the output feature map scales are three feature maps of 80×80, 40×40 and 20×20.
步骤2.5,融合后的特征输入到YOLO检测头。YOLOv8采用解耦检测头,将分类和检测头分离。损失计算过程主要包括正负样本分配策略和损失计算,YOLOv8采用任务对齐分配的原则,根据分类与回归的分数加权结果选择正样本,损失值计算包括分类和回归两个分支,分类分支依然采用二值交叉熵损失,回归分支则使用了Distribution Focal Loss和CIoU损失函数。输出图像中包含的目标类别和粗略检测框。Step 2.5, the fused features are input to the YOLO detection head. YOLOv8 uses a decoupled detection head to separate the classification and detection heads. The loss calculation process mainly includes the positive and negative sample allocation strategy and loss calculation. YOLOv8 adopts the principle of task alignment allocation and selects positive samples based on the score-weighted results of classification and regression. The loss value calculation includes two branches: classification and regression. The classification branch still uses the two branches. Value cross entropy loss, and the regression branch uses Distribution Focal Loss and CIoU loss functions. Output the object categories and rough detection boxes contained in the image.
进一步的,步骤3采用SAM大模型,该模型包括编码器、提示器、融合模块和解码器,其具体实现包括如下子步骤:Further, step 3 uses the SAM large model, which includes an encoder, prompter, fusion module and decoder. Its specific implementation includes the following sub-steps:
步骤3.1,原始的无人机巡检影像数据输入到SAM中编码器,经过预训练的VIT编码器骨干网络后,生成中间特征图。VIT骨干网络的预训练掩码自编码器将原始影像处理为中间特征。原始图像缩放到1024尺度,采用卷积核大小为16,步长为16的卷积将图像离散化为64×64×768的向量,向量在特征图宽和通道维度上顺序展平后再进入多层的VIT骨干网络,VIT骨干网络输出向量通过两层的卷积压缩特征维度为256,两层卷积核大小分别为1和3。Step 3.1: The original drone inspection image data is input to the encoder in SAM. After passing through the pre-trained VIT encoder backbone network, an intermediate feature map is generated. The pre-trained masked autoencoder of the VIT backbone network processes the original image into intermediate features. The original image is scaled to 1024 scale, and a convolution with a convolution kernel size of 16 and a step size of 16 is used to discretize the image into a 64×64×768 vector. The vector is sequentially flattened in the feature map width and channel dimensions before entering Multi-layer VIT backbone network, the output vector of the VIT backbone network is compressed through two layers of convolution. The feature dimension is 256, and the convolution kernel sizes of the two layers are 1 and 3 respectively.
步骤3.2,将上步通过ViT骨干网络获得的中间特征,输入到大模型提示器的轻量级特征聚合模块,生成融合语义特征,再通过掩码解码器生成隐式提示。该模块在不增加提示器计算复杂度的情况下学习表示来自ViT的各种中间特征层的语义特征,过程用以下公式表示:Step 3.2: Input the intermediate features obtained through the ViT backbone network in the previous step into the lightweight feature aggregation module of the large model prompter to generate fused semantic features, and then use the mask decoder to generate implicit prompts. This module learns to represent semantic features from various intermediate feature layers of ViT without increasing the computational complexity of the prompter. The process is expressed by the following formula:
和/>表示SAM 主干网的特征和由/>生成的向下采样特征。首先使用1×1的卷积层将通道从c减少原来维度的1/16,然后使用大小为3×3,步长为2的卷积层,降低空间维度。/>表示大小为3×3卷积层,/>表示最终的融合卷积层,包括两个3×3卷积层和一个1×1卷积层,以恢复SAM掩码解码器的原始信道尺寸。输入上一步获得融合的语义特征,使用提示器为SAM掩码解码器生成提示。首先使用基于锚点的区域提议网络生成候选目标框。通过RoI池化获取来自位置编码过的特征图的单个对象的视觉特征表示。从视觉特征中抽稀出三个感知头:语义头/>、定位头/>和提示头/>。语义头确定特定目标类别,定位头在生成的提示和目标实例掩码之间建立匹配准则,即基于定位的贪心匹配。提示头生成SAM掩码解码器所需的提示嵌入。其中/>表示轻量级RPN。/>操作可能导致后续提示生成丢失相对于整个图像的位置信息,将位置编码(PE)合并到原始融合特征(Fagg)中。过程用以下公式表示: and/> Represents the characteristics and structure of the SAM backbone/> Generated downsampled features. First, use a 1×1 convolutional layer to reduce the channel from c to 1/16 of the original dimension, and then use a 3×3 convolutional layer with a stride of 2 to reduce the spatial dimension. /> Represents a size of 3×3 convolutional layer, /> Represents the final fused convolutional layer, including two 3×3 convolutional layers and one 1×1 convolutional layer to restore the original channel dimensions of the SAM mask decoder. Input the fused semantic features obtained in the previous step and use the hinter to generate hints for the SAM mask decoder. Candidate object boxes are first generated using an anchor-based region proposal network. Obtain the visual feature representation of a single object from the position-encoded feature map through RoI pooling. Three perceptual heads are extracted from visual features: semantic head/> , positioning head/> and prompt header/> . The semantic header determines a specific target category, and the localization head establishes matching criteria between the generated hints and target instance masks, i.e., localization-based greedy matching. The hint header generates the hint embedding required by the SAM mask decoder. Among them/> Represents lightweight RPN. /> The operation may cause subsequent cue generation to lose position information relative to the entire image, merging the position encoding (PE) into the original fused feature (Fagg). The process is represented by the following formula:
该模型的损失包括RPN网络的二元分类损失和定位损失,语义头的分类损失,定位头的回归损失以及冻结的SAM掩码解码器的分割损失。总损失可以表示为:The losses of this model include the binary classification loss and localization loss of the RPN network, the classification loss of the semantic head, the regression loss of the localization head, and the segmentation loss of the frozen SAM mask decoder. The total loss can be expressed as:
步骤3.3,将单阶段检测器生成的显式提示与SAM中间层生成的隐式提示特征对齐,通过计算显式提示与隐式提示特征图的映射关系,融合两种提示的位置信息与类别信息,从而提供更加精确的定位、分类精度。Step 3.3: Align the explicit prompts generated by the single-stage detector with the implicit prompt features generated by the SAM intermediate layer. By calculating the mapping relationship between explicit prompts and implicit prompt feature maps, the location information and category information of the two prompts are integrated. , thereby providing more precise positioning and classification accuracy.
步骤3.4,掩膜解码器整合图像编码器和提示编码器分别输出的两个嵌入,解码出最终的分割掩膜,用Transformer 学习和提示对齐后的图像嵌入层以及额外4个Token的嵌入。4个Token嵌入分别是IoU Token 嵌入和3个分割结果 Token嵌入,经过Transformer 学习得到Token 嵌入经过最终的任务头,得到目标结果。Step 3.4, the mask decoder integrates the two embeddings output by the image encoder and the cue encoder respectively, decodes the final segmentation mask, and uses the Transformer to learn the image embedding layer aligned with the cue and the embedding of the additional 4 Tokens. The four Token embeddings are the IoU Token embedding and the three segmentation result Token embeddings. The Token embedding is obtained through Transformer learning and passes through the final task header to obtain the target result.
下面将结合具体实例应用进一步说明本发明的技术方案及有益效果。The technical solutions and beneficial effects of the present invention will be further described below with reference to specific examples.
利用无人机采集的多个倾斜影像数据集,经过本发明方法处理之后,电力场景部件识别种类超12种,缺陷超20类,电力设备分割mIoU为0.65,运行速度FPS达60。说明本发明可在保证处理效率同时,实现了较多种类、较高精度的通用电力设备分割结果。Using multiple oblique image data sets collected by drones and processed by the method of the present invention, more than 12 types of power scene components and more than 20 types of defects can be identified, the power equipment segmentation mIoU is 0.65, and the operating speed reaches 60 FPS. It shows that the present invention can achieve more types and higher precision general electric equipment segmentation results while ensuring processing efficiency.
实施例二Embodiment 2
基于同一发明构思,本方案还设计一种通用电力设备识别大模型系统,包括数据获取模块,采集获取电力巡检影像数据,构建输电线路数据集;Based on the same inventive concept, this plan also designs a general power equipment identification large model system, including a data acquisition module to collect and obtain power inspection image data and build a transmission line data set;
单阶段目标检测器模块,将检测到的目标边界框作为显式提示;A single-stage object detector module that uses detected object bounding boxes as explicit cues;
通用电力识别模块,利用大模型中图像编码器处理数据获取模块中影像数据,再经提示器生成包含语义类别信息的隐式提示;融合显示提示和隐式提示,将融合后的语义类别信息传入大模型中,获得通用电力识别结果;The general electric power recognition module uses the image encoder in the large model to process the data to obtain the image data in the module, and then generates implicit prompts containing semantic category information through the prompter; fuses the display prompts and implicit prompts, and transmits the fused semantic category information. into a large model to obtain general power identification results;
其中,融合方式为,将单阶段检测器生成的显式提示与SAM中间层生成的隐式提示特征对齐,通过计算显式提示与隐式提示特征图的映射关系,融合两种提示的位置信息与类别信息。Among them, the fusion method is to align the explicit prompts generated by the single-stage detector with the implicit prompt features generated by the SAM intermediate layer, and fuse the position information of the two prompts by calculating the mapping relationship between the explicit prompts and the implicit prompt feature maps. and category information.
由于本发明实施例二所介绍的设备为实施本发明实施例一种通用电力设备识别大模型方法所采用的系统,故而基于本发明实施例一介绍的方法,本领域所属技术人员能够了解该电子设备的具体结构及变形,故而在此不再赘述。Since the equipment introduced in Embodiment 2 of the present invention is a system used to implement a universal power equipment identification large model method in Embodiment 1 of the present invention, based on the method introduced in Embodiment 1 of the present invention, those skilled in the art can understand the electronic The specific structure and deformation of the equipment will not be described again here.
实施例三Embodiment 3
基于同一发明构思,本发明还提供了一种电子设备,包括一个或多个处理器;存储装置,用于存储一个或多个程序;当一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现实施例一中所述的方法。Based on the same inventive concept, the present invention also provides an electronic device, including one or more processors; a storage device for storing one or more programs; when one or more programs are processed by the one or more processors Execution causes the one or more processors to implement the method described in Embodiment 1.
由于本发明实施例三所介绍的设备为实施本发明实施例一一种通用电力设备识别大模型方法所采用的电子设备,故而基于本发明实施例一介绍的方法,本领域所属技术人员能够了解该电子设备的具体结构及变形,故而在此不再赘述。凡是本发明实施例一种方法所采用的电子设备都属于本发明所欲保护的范围。Since the equipment introduced in Embodiment 3 of the present invention is an electronic device used to implement a method for identifying large models of general power equipment according to Embodiment 1 of the present invention, those skilled in the art can understand based on the method introduced in Embodiment 1 of the present invention. The specific structure and deformation of the electronic device will not be described again here. All electronic devices used in a method of the embodiments of the present invention fall within the scope of protection of the present invention.
实施例四Embodiment 4
基于同一发明构思,本发明还提供了一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现实施例一中所述的方法。Based on the same inventive concept, the present invention also provides a computer-readable medium on which a computer program is stored. When the program is executed by a processor, the method described in Embodiment 1 is implemented.
由于本发明实施例四所介绍的设备为实施本发明实施例一一种通用电力设备识别大模型方法采用的计算机可读介质,故而基于本发明实施例一介绍的方法,本领域所属技术人员能够了解该电子设备的具体结构及变形,故而在此不再赘述。凡是本发明实施例一种方法所采用的电子设备都属于本发明所欲保护的范围。Since the equipment introduced in Embodiment 4 of the present invention is a computer-readable medium used to implement a method for identifying large models of general electric equipment in Embodiment 1 of the present invention, based on the method introduced in Embodiment 1 of the present invention, those skilled in the art can Understand the specific structure and deformation of the electronic device, so I will not go into details here. All electronic devices used in a method of the embodiments of the present invention fall within the scope of protection of the present invention.
以上内容是结合具体的实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in combination with specific implementation modes. It cannot be concluded that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field to which the present invention belongs, several simple deductions or substitutions can be made without departing from the concept of the present invention, and all of them should be regarded as belonging to the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311403372.4A CN117132914B (en) | 2023-10-27 | 2023-10-27 | General power equipment identification large model method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311403372.4A CN117132914B (en) | 2023-10-27 | 2023-10-27 | General power equipment identification large model method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117132914A CN117132914A (en) | 2023-11-28 |
CN117132914B true CN117132914B (en) | 2024-01-30 |
Family
ID=88858669
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311403372.4A Active CN117132914B (en) | 2023-10-27 | 2023-10-27 | General power equipment identification large model method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117132914B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118052830B (en) * | 2024-01-04 | 2024-12-10 | 重庆邮电大学 | A multi-lesion retinal segmentation method based on implicit cues |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111797890A (en) * | 2020-05-18 | 2020-10-20 | 中国电力科学研究院有限公司 | A method and system for detecting defects in transmission line equipment |
WO2021189507A1 (en) * | 2020-03-24 | 2021-09-30 | 南京新一代人工智能研究院有限公司 | Rotor unmanned aerial vehicle system for vehicle detection and tracking, and detection and tracking method |
CN114359754A (en) * | 2021-12-21 | 2022-04-15 | 武汉大学 | Unmanned aerial vehicle power inspection laser point cloud real-time transmission conductor extraction method |
CN114842365A (en) * | 2022-07-04 | 2022-08-02 | 中国科学院地理科学与资源研究所 | Unmanned aerial vehicle aerial photography target detection and identification method and system |
CN115294476A (en) * | 2022-07-22 | 2022-11-04 | 武汉大学 | Edge computing intelligent detection method and equipment for UAV power inspection |
CN115359360A (en) * | 2022-10-19 | 2022-11-18 | 福建亿榕信息技术有限公司 | Power field operation scene detection method, system, equipment and storage medium |
WO2023126914A2 (en) * | 2021-12-27 | 2023-07-06 | Yeda Research And Development Co. Ltd. | METHOD AND SYSTEM FOR SEMANTIC APPEARANCE TRANSFER USING SPLICING ViT FEATURES |
CN116824135A (en) * | 2023-05-23 | 2023-09-29 | 重庆大学 | Atmospheric natural environment test industrial product identification and segmentation method based on machine vision |
CN116883893A (en) * | 2023-06-28 | 2023-10-13 | 西南交通大学 | Tunnel face underground water intelligent identification method and system based on infrared thermal imaging |
CN116883801A (en) * | 2023-07-20 | 2023-10-13 | 华北电力大学(保定) | YOLOv8 target detection method based on attention mechanism and multi-scale feature fusion |
-
2023
- 2023-10-27 CN CN202311403372.4A patent/CN117132914B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021189507A1 (en) * | 2020-03-24 | 2021-09-30 | 南京新一代人工智能研究院有限公司 | Rotor unmanned aerial vehicle system for vehicle detection and tracking, and detection and tracking method |
CN111797890A (en) * | 2020-05-18 | 2020-10-20 | 中国电力科学研究院有限公司 | A method and system for detecting defects in transmission line equipment |
CN114359754A (en) * | 2021-12-21 | 2022-04-15 | 武汉大学 | Unmanned aerial vehicle power inspection laser point cloud real-time transmission conductor extraction method |
WO2023126914A2 (en) * | 2021-12-27 | 2023-07-06 | Yeda Research And Development Co. Ltd. | METHOD AND SYSTEM FOR SEMANTIC APPEARANCE TRANSFER USING SPLICING ViT FEATURES |
CN114842365A (en) * | 2022-07-04 | 2022-08-02 | 中国科学院地理科学与资源研究所 | Unmanned aerial vehicle aerial photography target detection and identification method and system |
CN115294476A (en) * | 2022-07-22 | 2022-11-04 | 武汉大学 | Edge computing intelligent detection method and equipment for UAV power inspection |
CN115359360A (en) * | 2022-10-19 | 2022-11-18 | 福建亿榕信息技术有限公司 | Power field operation scene detection method, system, equipment and storage medium |
CN116824135A (en) * | 2023-05-23 | 2023-09-29 | 重庆大学 | Atmospheric natural environment test industrial product identification and segmentation method based on machine vision |
CN116883893A (en) * | 2023-06-28 | 2023-10-13 | 西南交通大学 | Tunnel face underground water intelligent identification method and system based on infrared thermal imaging |
CN116883801A (en) * | 2023-07-20 | 2023-10-13 | 华北电力大学(保定) | YOLOv8 target detection method based on attention mechanism and multi-scale feature fusion |
Non-Patent Citations (4)
Title |
---|
Adapting Segment Anything Model for Change Detection in VHR Remote Sensing Images;Lei Ding等;《arXiv》;第1-9页 * |
Segment Anything;Alexander Kirillov等;《arXiv》;第1-30页 * |
基于改进UNet 的输电通道沿线易漂浮物遥感识别;杨知等;《高电压技术》;第49卷(第8期);第3395-3404页 * |
输电线路无人机巡检图像缺陷智能识别方法分析;付晶等;《高电压技术》;第49卷;第103-110页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117132914A (en) | 2023-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110246141B (en) | A Vehicle Image Segmentation Method in Complex Traffic Scenes Based on Joint Corner Pooling | |
CN111768388B (en) | A product surface defect detection method and system based on positive sample reference | |
CN108334848A (en) | A kind of small face identification method based on generation confrontation network | |
CN111899227A (en) | Automatic railway fastener defect acquisition and identification method based on unmanned aerial vehicle operation | |
CN111833282B (en) | An Image Fusion Method Based on Improved DDcGAN Model | |
CN114519819A (en) | Remote sensing image target detection method based on global context awareness | |
CN117132914B (en) | General power equipment identification large model method and system | |
CN117541587B (en) | Solar panel defect detection method, system, electronic equipment and storage medium | |
CN115240069A (en) | A real-time obstacle detection method in foggy scene | |
CN112949451A (en) | Cross-modal target tracking method and system through modal perception feature learning | |
CN112488015A (en) | Target detection method and system for intelligent construction site | |
CN115223009A (en) | Small target detection method and device based on improved YOLOv5 | |
CN115115713A (en) | Unified space-time fusion all-around aerial view perception method | |
CN110334775A (en) | A method and device for UAV line fault identification based on width learning | |
CN118469946A (en) | Insulator defect detection method for multiple defect categories based on multi-angle feature enhancement | |
CN112905828A (en) | Image retriever, database and retrieval method combined with significant features | |
CN116912670A (en) | Deep sea fish identification method based on improved YOLO model | |
CN114155524B (en) | Single-stage 3D point cloud target detection method and device, computer equipment, and medium | |
CN115526852A (en) | Molten pool and splash monitoring method in selective laser melting process based on target detection and application | |
CN111898671B (en) | Laser imager and color camera code fusion target recognition method and system | |
CN117710348B (en) | Pavement crack detection method and system based on position information and attention mechanism | |
CN118941526A (en) | A road crack detection method, medium and product | |
CN118506263A (en) | A foreign body detection method for power transmission lines in complex environments based on deep learning | |
CN117333771A (en) | A method and device for detecting cracks in concrete structures based on the improved U-Net model | |
CN115761268A (en) | Pole tower key part defect identification method based on local texture enhancement network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |