[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117132914B - Method and system for identifying large model of universal power equipment - Google Patents

Method and system for identifying large model of universal power equipment Download PDF

Info

Publication number
CN117132914B
CN117132914B CN202311403372.4A CN202311403372A CN117132914B CN 117132914 B CN117132914 B CN 117132914B CN 202311403372 A CN202311403372 A CN 202311403372A CN 117132914 B CN117132914 B CN 117132914B
Authority
CN
China
Prior art keywords
prompt
large model
image
implicit
universal power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311403372.4A
Other languages
Chinese (zh)
Other versions
CN117132914A (en
Inventor
杨必胜
陈驰
付晶
邵瑰玮
严正斐
邹勤
金昂
王治邺
吴少龙
孙上哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202311403372.4A priority Critical patent/CN117132914B/en
Publication of CN117132914A publication Critical patent/CN117132914A/en
Application granted granted Critical
Publication of CN117132914B publication Critical patent/CN117132914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method and a system for identifying a large model by using general power equipment, wherein an inclined image acquired by an unmanned aerial vehicle inspection system is taken as a research object. Aiming at the data characteristics, a single-stage target detector is trained to recognize a target boundary box in an image as initial prompt information, and the middle layer characteristics of an image encoder are utilized to generate a prompt containing semantic category information. And forming a general power equipment segmentation model related to the category by fusing the two kinds of prompt information. The method well solves the problems of huge data and few identification types required by the training model in the power scene, and provides basic data for the follow-up defect diagnosis and three-dimensional modeling of the power equipment.

Description

Method and system for identifying large model of universal power equipment
Technical Field
The invention belongs to automatic identification application of inspection image power equipment in unmanned aerial vehicle power inspection, and provides a universal power equipment identification large model.
Background
With the increase of mileage of the transmission line in successive years, a great test is provided for guaranteeing the safe and stable operation of the transmission line. Along with the development of unmanned aerial vehicle, computer vision and embedded technology blowout, the electric power inspection operation mode is gradually changed from the traditional manual mode to unmanned aerial vehicle fine inspection. In the inspection process of the unmanned aerial vehicle, a sensor pod carried on the unmanned aerial vehicle collects inclined images along an electric power corridor, and the electric power equipment is positioned and identified through a target detection algorithm deployed on the unmanned aerial vehicle or in the background to diagnose hidden trouble. The novel power inspection mode of the unmanned aerial vehicle and the visual recognition system is becoming a mainstream inspection operation mode due to low cost and high efficiency.
In the field of computer vision, the SAM technology realizes the high-performance detection effect of a zero sample, and obtains excellent segmentation performance in a general scene. In the field of power inspection, due to the fact that the complexity of unmanned aerial vehicle images and the size difference of targets are large, the conventional detection algorithm is low in universality, limited in parameter quantity and poor in generalization capability, a large universal model of power equipment is difficult to realize, and the performance of the detection algorithm still lacks robustness in an actual power transmission line. Based on the detection performance can be effectively improved by adopting the SAM technology in the unmanned aerial vehicle power inspection field. However, SAM is a class independent example segmentation method, relying heavily on a priori manual cues, including points, boxes and rough masks, which limits make SAM unsuitable for full-automatic interpretation of power inspection images.
Disclosure of Invention
Aiming at the problems of poor generality and few identification types of the existing algorithm in the power scene, the invention designs a large model identification method and a large model identification system for the universal power equipment, which have wide application scenes and complete identification types, by taking the processing of patrol data in the power scene as a research object.
The invention relates to a method for identifying a large model of universal power equipment, which is characterized by comprising the following steps:
step 1, acquiring power inspection image data and constructing a power transmission line data set;
step 2, training a single-stage target detector, and detecting a target boundary box from the data set image as an explicit prompt;
step 3, processing the image data in the step 1 by using an image encoder in the large model, and generating an implicit prompt containing semantic category information by using a prompter of the large model; fusing the prompts in the two forms in the steps 2 and 3, and transmitting the fused semantic category information into a large model to obtain a universal power identification result;
the fusion mode is that an explicit prompt generated by a single-stage detector is aligned with an implicit prompt feature generated by a large model middle layer, and the position information and the category information of the two prompts are fused by calculating the mapping relation between the explicit prompt and the implicit prompt feature map.
In a preferred manner, the following steps are specifically implemented in step 1:
step 1.1, screening and cleaning acquired inspection images of various scenes;
step 1.2, labeling inspection images by using labelImg, wherein a transmission line scene comprises cities, greenhouses, farmlands, shrubs, barren lands, lakes and the like, and power equipment and external invaded object types cover transmission towers, insulators, equalizing rings, damper blocks, spacers, insulator burst sheets and hanging objects;
step 1.3, inputting the processed oblique image to a single-stage object detector.
Further, the single-stage object detector employs YOLOv8.
In a preferred manner, the specific process of step 2 is as follows:
step 2.1, the original image is subjected to scale change and filling;
step 2.2, the image processed in the step 2.1 is subjected to data enhancement and pretreatment and then is input into a backbone network of the single-stage detector;
step 2.3, carrying out multi-scale feature fusion on the features extracted by the backbone network;
and 2.4, inputting the fused characteristics to a single-stage target detector, and acquiring the target category and the rough detection frame contained in the image.
Further, the data enhancement and preprocessing in step 2.2 includes: horizontal and vertical overturn, contrast adjustment, rotation, mosaic enhancement, adaptive anchor frame calculation and adaptive gray filling.
In a preferred mode, a SAM large model is adopted in the step 3, and the specific process is as follows:
step 3.1, inputting an original image, and generating an intermediate feature map after a pre-trained VIT backbone network;
step 3.2, inputting the intermediate features obtained in the previous step through a ViT backbone network into a lightweight feature aggregation module to obtain fused semantic features;
step 3.3, after the fused semantic features are obtained, generating implicit hint embedding for the SAM mask decoder by using a prompter;
and 3.4, aligning the display prompt generated by the single-stage detector with the implicit prompt feature generated by the SAM middle layer, and then carrying out prompt fusion to extract rich semantic information.
Based on the same inventive concept, the scheme also designs a system for realizing the large model identification method of the universal power equipment:
the power transmission line data acquisition module acquires power inspection image data and constructs a power transmission line data set;
the single-stage target detector module takes the detected target boundary box as an explicit hint;
the universal power recognition module is used for generating implicit prompts containing semantic category information by forming the middle layer characteristics of the image encoder in the large model into the input of the prompter; fusing the display prompt and the implicit prompt, and transmitting the fused semantic category information into a large model to obtain a general electric power recognition result;
the fusion mode is that an explicit prompt generated by a single-stage detector is aligned with an implicit prompt feature generated by a SAM middle layer, and the position information and the category information of the two prompts are fused by calculating the mapping relation between the explicit prompt and the implicit prompt feature map.
Based on the same inventive concept, the scheme also designs electronic equipment, which comprises:
one or more processors;
a storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a generic power device identification large model approach.
Based on the same inventive concept, the present solution further provides a computer readable medium having a computer program stored thereon, wherein: the program, when executed by the processor, implements a generic power device identification large model method.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the identified types and application scenes of the power equipment are greatly expanded, so that automatic processing of multi-period and multi-type scene inspection data, such as cities, farmlands, lakes, forests, grasslands, barren lands and the like, can be satisfied; the overall detection precision is obviously improved, and higher recall rate is ensured; the robustness is strong and the generalization capability is high in the actual power transmission line scene.
And taking the inclined image acquired by the unmanned aerial vehicle inspection system as a research object. Aiming at the data characteristics, a single-stage target detector is adopted to identify a target boundary box in an image as initial prompt information, and the middle layer characteristics of an image encoder are utilized to generate a prompt containing semantic category information. And finishing the class-related general power equipment segmentation model by fusing the two kinds of prompt information. The method well solves the problem of low identification category of massive inspection data in the power scene, and provides basic data for subsequent defect diagnosis and three-dimensional modeling of power equipment.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is a block diagram of a single-stage detector in an embodiment of the invention.
FIG. 3 is a schematic diagram of a single-stage detector in accordance with an embodiment of the present invention.
Fig. 4 is a block diagram of a decoder in an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is described below with reference to the accompanying drawings and examples.
Example 1
The invention provides a general power equipment large model identification method, which is characterized in that inspection images acquired by an unmanned aerial vehicle inspection system are selected to specifically explain the method. The process of the method is shown in the attached figure 1, and comprises the following steps:
step 1, acquiring power inspection image data and constructing a power transmission line data set;
and step 2, training a single-stage target detector, and taking the detected image data target bounding box as an explicit prompt.
And 3, the middle layer of the image encoder is characterized to form the input of a prompter, and a prompter containing semantic category information is generated. And fusing the two forms of prompts, and transmitting semantic category information into the SAM. Thereby obtaining a general power recognition result. Preferably, the large model adopts SAM, and other image segmentation models are also possible, and in this embodiment, the SAM model is optimal.
Further, the specific implementation of the step 1 includes the following sub-steps:
step 1.1, firstly screening and cleaning the collected inspection images of various scenes.
And 1.2, labeling the inspection image by using labelImg. The power transmission line scene comprises cities, greenhouses, farmlands, shrubs, barren lands, lakes and the like, and the power equipment and external invaded object types cover power transmission towers, insulators, equalizing rings, shock-proof hammers, spacing bars, insulator burst pieces, hanging objects and the like.
Step 1.3, the processed oblique image is input to a single-stage object detector YOLOv8.
Further, the specific implementation of the step 2 includes the following sub-steps:
step 2.1, scaling the original image to 640 x 640 scale through scale change and filling.
And 2.2, carrying out data enhancement and pretreatment on the zoomed image, wherein the enhancement means comprise horizontal and vertical overturning, contrast adjustment, rotation, mosaic enhancement and the like. The preprocessing comprises adaptive anchor frame calculation and adaptive gray filling.
And 2.3, inputting the image after data enhancement into a single-stage detector backbone network. Firstly, an original image passes through a convolution CBS module with the convolution kernel size of 6 multiplied by 6 and the step length of 2, and then passes through a combination module composed of 4 CBS modules with the convolution kernel size of 3 multiplied by 3 and the step length of 2 and C2 f. The CBS module consists of 1 two-dimensional convolution, 1 two-dimensional batch normalization, and scaling exponential linear unit activation functions. The C2f module learns residual characteristics, more jump connection and additional separation operations are added, and more rich gradient flow information is obtained while the light weight is ensured.
And 2.4, performing multi-scale feature fusion on the features extracted through the backbone network through a multi-scale feature pyramid PAN-FPN fused with C2f, and outputting three feature graphs with the feature graph scales of 80×80, 40×40 and 20×20.
And 2.5, inputting the fused characteristics to a YOLO detection head. YOLOv8 employs a decoupled detector head to separate the classification from the detector head. The loss calculation process mainly comprises a positive and negative sample distribution strategy and loss calculation, wherein YOLOv8 adopts a task alignment distribution principle, a positive sample is selected according to a classification and regression fractional weighting result, the loss value calculation comprises classification and regression branches, the classification branches still adopt binary cross entropy loss, and the regression branches use Distribution Focal Loss and CIoU loss functions. And outputting the target category and the rough detection frame contained in the image.
Further, step 3 adopts a SAM big model, the model comprises an encoder, a prompter, a fusion module and a decoder, and the specific implementation of the method comprises the following sub-steps:
and 3.1, inputting original unmanned aerial vehicle inspection image data into an SAM encoder, and generating an intermediate feature map after a pre-trained VIT encoder backbone network. The pre-training mask of the VIT backbone processes the original image from the encoder as an intermediate feature. The original image is scaled to 1024 scales, the convolution with the convolution kernel size of 16 and the step size of 16 is adopted to discretize the image into vectors with the size of 64 multiplied by 768, the vectors are sequentially flattened in the width of the feature map and the channel dimension and then enter a multi-layer VIT backbone network, the feature dimension of the VIT backbone network output vector is 256 through the convolution of two layers, and the sizes of the two layers of convolution kernels are 1 and 3 respectively.
And 3.2, inputting the intermediate features obtained in the previous step through a ViT backbone network into a lightweight feature aggregation module of the large model prompter to generate fusion semantic features, and generating implicit prompts through a mask decoder. The module learns to represent semantic features from the various intermediate feature layers of ViT without increasing the computational complexity of the prompter, the process being formulated as follows:
and->Characterization of the SAM backbone and the method of the same>The resulting downsampled features. The channel is first reduced from c by 1/16 of the original dimension using a 1 x 1 convolutional layer, and then the spatial dimension is reduced using a 3 x 3 convolutional layer with a step size of 2. />Representing a convolution layer of size 3 x 3,/i>The final fused convolutional layer is represented, including two 3×3 convolutional layers and one 1×1 convolutional layer, to restore the original channel size of the SAM mask decoder. And inputting the semantic features obtained in the previous step, and generating a prompt for the SAM mask decoder by using a prompter. First using anchor-based region proposalThe network generates candidate target frames. Visual feature representations of individual objects from the position-coded feature map are obtained by RoI pooling. Three perception heads are thinned out from the visual characteristics: semantic header->Positioning head->And prompt head->. The semantic header determines a particular target category and the locator header establishes a matching criterion, i.e., location-based greedy matching, between the generated hint and the target instance mask. The hint header generates the hint embedding required by the SAM mask decoder. Wherein->Representing a lightweight RPN. />The operation may result in the subsequent hint generation losing position information relative to the entire image, incorporating the Position Encoding (PE) into the original fusion feature (Fagg). The process is represented by the following formula:
the model's penalty includes binary classification and localization penalty for the RPN network, classification penalty for the semantic header, regression penalty for the localization header, and segmentation penalty for the frozen SAM mask decoder. The total loss can be expressed as:
and 3.3, aligning the explicit prompt generated by the single-stage detector with the implicit prompt feature generated by the SAM middle layer, and fusing the position information and the category information of the two prompts by calculating the mapping relation between the explicit prompt and the implicit prompt feature map, thereby providing more accurate positioning and classification precision.
And 3.4, integrating two embeddings respectively output by the image encoder and the prompt encoder by the mask decoder, decoding a final segmentation mask, and learning and prompting the aligned image embedding layer and embedding of 4 additional Token by using a transducer. The 4 Token embeddings are respectively the IoU Token embeddings and the 3 segmentation result Token embeddings, the Token embeddings are obtained through the transform learning, and the target result is obtained through the final task head.
The technical scheme and the beneficial effects of the invention are further described below in connection with specific example applications.
After the plurality of inclined image data sets acquired by the unmanned aerial vehicle are processed by the method, the types of the electric power scene components are identified to be more than 12 types, the defects are more than 20 types, the power equipment is divided into mIoU (power plant unit) and the running speed FPS is up to 60. The invention can ensure the processing efficiency and realize the segmentation result of general power equipment with more types and higher precision.
Example two
Based on the same conception, the scheme also designs a universal power equipment identification large model system, which comprises a data acquisition module, a power inspection image acquisition module and a power transmission line data set acquisition module, wherein the data acquisition module acquires power inspection image data;
the single-stage target detector module takes the detected target boundary box as an explicit hint;
the universal power recognition module processes the image data in the data acquisition module by utilizing an image encoder in the large model, and generates implicit prompt containing semantic category information through the prompter; fusing the display prompt and the implicit prompt, and transmitting the fused semantic category information into a large model to obtain a general electric power recognition result;
the fusion mode is that an explicit prompt generated by a single-stage detector is aligned with an implicit prompt feature generated by a SAM middle layer, and the position information and the category information of the two prompts are fused by calculating the mapping relation between the explicit prompt and the implicit prompt feature map.
Because the device described in the second embodiment of the present invention is a system for implementing the method for identifying a large model by using a general power device in the second embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the electronic device, and therefore, the details are not repeated herein.
Example III
Based on the same inventive concept, the invention also provides an electronic device comprising one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method described in embodiment one.
Because the device described in the third embodiment of the present invention is an electronic device used for implementing the method for identifying a large model by using a general power device in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the electronic device, and therefore, the description thereof is omitted herein. All electronic devices adopted by the method of the embodiment of the invention belong to the scope of protection to be protected.
Example IV
Based on the same inventive concept, the present invention also provides a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method described in embodiment one.
Because the device described in the fourth embodiment of the present invention is a computer readable medium used for implementing the method for identifying a large model by using a general power device according to the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the electronic device based on the method described in the first embodiment of the present invention, and therefore, the detailed description thereof is omitted herein. All electronic devices adopted by the method of the embodiment of the invention belong to the scope of protection to be protected.
The foregoing is a further detailed description of the invention in connection with specific embodiments, and it is not intended that the invention be limited to such description. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (10)

1. A method for identifying a large model of a universal power device, comprising the steps of:
step 1, acquiring power inspection image data and constructing a power transmission line data set;
step 2, training a single-stage target detector, and taking the detected image data target bounding box as an explicit prompt;
step 3, adopting a SAM large model, wherein the large model comprises a VIT encoder, a prompter, a fusion module and a decoder, processing the image data in the step 1 by using the VIT encoder in the large model, and generating an implicit prompt containing semantic category information by using the prompter of the large model; the fusion module fuses the prompts in the two forms in the steps 2 and 3, and transmits the fused semantic category information into the large model to obtain a universal power recognition result;
the fusion mode is that an explicit prompt generated by a single-stage target detector is aligned with an implicit prompt feature generated by a large model middle layer, and the position information and the category information of the two prompts are fused by calculating the mapping relation between the explicit prompt and the implicit prompt feature map.
2. The universal power device identification large model method of claim 1, wherein: the step 1 is specifically implemented as follows:
step 1.1, screening and cleaning acquired inspection images of various scenes;
step 1.2, labeling inspection images by using labelImg, wherein a transmission line scene comprises cities, greenhouses, farmlands, bushes, barren lands and lakes, and power equipment and external invaded object types cover transmission towers, insulators, equalizing rings, damper blocks, spacers, insulator burst sheets and hanging objects;
step 1.3, inputting the processed oblique image to a single-stage object detector.
3. The universal power device identification large model method of claim 2, wherein: the single-stage object detector employs YOLOv8.
4. The universal power device identification large model method of claim 1, wherein: the specific process of the step 2 is as follows:
step 2.1, performing scale change and filling on the original image;
step 2.2, the image processed in the step 2.1 is subjected to data enhancement and pretreatment and then is input into a backbone network of the single-stage target detector;
step 2.3, carrying out multi-scale feature fusion on the features extracted by the backbone network;
and 2.4, inputting the fused characteristics to a single-stage target detector, and acquiring the target category and the rough detection frame contained in the image.
5. The universal power device identification large model method of claim 4, wherein:
the data enhancement and preprocessing in step 2.2 comprises: horizontal and vertical overturn, contrast adjustment, rotation, mosaic enhancement, adaptive anchor frame calculation and adaptive gray filling.
6. The universal power device identification large model method of claim 1, wherein:
the specific implementation process of the SAM big model in the step 3 is as follows:
step 3.1, inputting original image data, and generating an intermediate feature map through a pre-training VIT encoder;
step 3.2, the lightweight feature aggregation module of the prompter generates fusion semantic features from the acquired intermediate feature map, and then the prompter is utilized to generate implicit prompt for the SAM mask decoder;
step 3.3, aligning the display prompt generated by the single-stage target detector with the implicit prompt feature generated by the SAM middle layer by the fusion module, and then carrying out prompt fusion to extract semantic information;
and 3.4, integrating two embeddings output by the VIT encoder and the prompter by the decoder, and decoding a final segmentation mask.
7. The universal power device identification large model method of claim 6, wherein: the original image input in step 3.1 is scaled to 1024 scales.
8. A system for implementing the universal power device identification large model method of claim 1, characterized in that:
the power transmission line data acquisition module acquires power inspection image data and constructs a power transmission line data set;
the single-stage target detector module takes the detected target boundary box as an explicit hint;
the universal power recognition module processes the image data in the data acquisition module by utilizing an image encoder in the large model, and generates an implicit prompt containing semantic category information by a large model prompter; fusing the display prompt and the implicit prompt, and transmitting the fused semantic category information into a large model to obtain a general electric power recognition result;
the fusion mode is that an explicit prompt generated by a single-stage target detector is aligned with an implicit prompt feature generated by a SAM middle layer, and the position information and the category information of the two prompts are fused by calculating the mapping relation between the explicit prompt and the implicit prompt feature map.
9. An electronic device, comprising:
one or more processors;
a storage means for storing one or more programs;
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A computer readable medium having a computer program stored thereon, characterized by: the program, when executed by a processor, implements the method of any of claims 1-7.
CN202311403372.4A 2023-10-27 2023-10-27 Method and system for identifying large model of universal power equipment Active CN117132914B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311403372.4A CN117132914B (en) 2023-10-27 2023-10-27 Method and system for identifying large model of universal power equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311403372.4A CN117132914B (en) 2023-10-27 2023-10-27 Method and system for identifying large model of universal power equipment

Publications (2)

Publication Number Publication Date
CN117132914A CN117132914A (en) 2023-11-28
CN117132914B true CN117132914B (en) 2024-01-30

Family

ID=88858669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311403372.4A Active CN117132914B (en) 2023-10-27 2023-10-27 Method and system for identifying large model of universal power equipment

Country Status (1)

Country Link
CN (1) CN117132914B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118052830A (en) * 2024-01-04 2024-05-17 重庆邮电大学 Multi-lesion retina segmentation method based on implicit prompt
CN118135325A (en) * 2024-03-29 2024-06-04 中国海洋大学 Medical image fine granularity classification method based on segmentation large model guidance

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797890A (en) * 2020-05-18 2020-10-20 中国电力科学研究院有限公司 Method and system for detecting defects of power transmission line equipment
WO2021189507A1 (en) * 2020-03-24 2021-09-30 南京新一代人工智能研究院有限公司 Rotor unmanned aerial vehicle system for vehicle detection and tracking, and detection and tracking method
CN114359754A (en) * 2021-12-21 2022-04-15 武汉大学 Unmanned aerial vehicle power inspection laser point cloud real-time transmission conductor extraction method
CN114842365A (en) * 2022-07-04 2022-08-02 中国科学院地理科学与资源研究所 Unmanned aerial vehicle aerial photography target detection and identification method and system
CN115294476A (en) * 2022-07-22 2022-11-04 武汉大学 Edge calculation intelligent detection method and device for unmanned aerial vehicle power inspection
CN115359360A (en) * 2022-10-19 2022-11-18 福建亿榕信息技术有限公司 Power field operation scene detection method, system, equipment and storage medium
WO2023126914A2 (en) * 2021-12-27 2023-07-06 Yeda Research And Development Co. Ltd. METHOD AND SYSTEM FOR SEMANTIC APPEARANCE TRANSFER USING SPLICING ViT FEATURES
CN116824135A (en) * 2023-05-23 2023-09-29 重庆大学 Atmospheric natural environment test industrial product identification and segmentation method based on machine vision
CN116883893A (en) * 2023-06-28 2023-10-13 西南交通大学 Tunnel face underground water intelligent identification method and system based on infrared thermal imaging
CN116883801A (en) * 2023-07-20 2023-10-13 华北电力大学(保定) YOLOv8 target detection method based on attention mechanism and multi-scale feature fusion

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021189507A1 (en) * 2020-03-24 2021-09-30 南京新一代人工智能研究院有限公司 Rotor unmanned aerial vehicle system for vehicle detection and tracking, and detection and tracking method
CN111797890A (en) * 2020-05-18 2020-10-20 中国电力科学研究院有限公司 Method and system for detecting defects of power transmission line equipment
CN114359754A (en) * 2021-12-21 2022-04-15 武汉大学 Unmanned aerial vehicle power inspection laser point cloud real-time transmission conductor extraction method
WO2023126914A2 (en) * 2021-12-27 2023-07-06 Yeda Research And Development Co. Ltd. METHOD AND SYSTEM FOR SEMANTIC APPEARANCE TRANSFER USING SPLICING ViT FEATURES
CN114842365A (en) * 2022-07-04 2022-08-02 中国科学院地理科学与资源研究所 Unmanned aerial vehicle aerial photography target detection and identification method and system
CN115294476A (en) * 2022-07-22 2022-11-04 武汉大学 Edge calculation intelligent detection method and device for unmanned aerial vehicle power inspection
CN115359360A (en) * 2022-10-19 2022-11-18 福建亿榕信息技术有限公司 Power field operation scene detection method, system, equipment and storage medium
CN116824135A (en) * 2023-05-23 2023-09-29 重庆大学 Atmospheric natural environment test industrial product identification and segmentation method based on machine vision
CN116883893A (en) * 2023-06-28 2023-10-13 西南交通大学 Tunnel face underground water intelligent identification method and system based on infrared thermal imaging
CN116883801A (en) * 2023-07-20 2023-10-13 华北电力大学(保定) YOLOv8 target detection method based on attention mechanism and multi-scale feature fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Adapting Segment Anything Model for Change Detection in VHR Remote Sensing Images;Lei Ding等;《arXiv》;第1-9页 *
Segment Anything;Alexander Kirillov等;《arXiv》;第1-30页 *
基于改进UNet 的输电通道沿线易漂浮物遥感识别;杨知等;《高电压技术》;第49卷(第8期);第3395-3404页 *
输电线路无人机巡检图像缺陷智能识别方法分析;付晶等;《高电压技术》;第49卷;第103-110页 *

Also Published As

Publication number Publication date
CN117132914A (en) 2023-11-28

Similar Documents

Publication Publication Date Title
CN111723748B (en) Infrared remote sensing image ship detection method
Yang et al. Pixor: Real-time 3d object detection from point clouds
CN117132914B (en) Method and system for identifying large model of universal power equipment
Wang et al. Real-time underwater onboard vision sensing system for robotic gripping
CN111768388A (en) Product surface defect detection method and system based on positive sample reference
Hong et al. USOD10K: a new benchmark dataset for underwater salient object detection
CN114693661A (en) Rapid sorting method based on deep learning
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
CN114926747A (en) Remote sensing image directional target detection method based on multi-feature aggregation and interaction
CN108537844A (en) A kind of vision SLAM winding detection methods of fusion geological information
CN112507861A (en) Pedestrian detection method based on multilayer convolution feature fusion
Wang et al. MCF3D: Multi-stage complementary fusion for multi-sensor 3D object detection
CN113011308A (en) Pedestrian detection method introducing attention mechanism
CN116246119A (en) 3D target detection method, electronic device and storage medium
Zhang et al. CE-RetinaNet: A channel enhancement method for infrared wildlife detection in UAV images
CN117809082A (en) Bridge crack disease detection method and device based on crack self-segmentation model
Li et al. CSF-Net: Color spectrum fusion network for semantic labeling of airborne laser scanning point cloud
CN116309348A (en) Lunar south pole impact pit detection method based on improved TransUnet network
CN115937520A (en) Point cloud moving target segmentation method based on semantic information guidance
Huang et al. ES-Net: An efficient stereo matching network
CN115526852A (en) Molten pool and splash monitoring method in selective laser melting process based on target detection and application
CN111898671B (en) Target identification method and system based on fusion of laser imager and color camera codes
CN112800932B (en) Method for detecting remarkable ship target in offshore background and electronic equipment
CN115719368B (en) Multi-target ship tracking method and system
Mao et al. Power transmission line image segmentation method based on binocular vision and feature pyramid network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant