[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116416503A - Small sample target detection method, system and medium based on multi-mode fusion - Google Patents

Small sample target detection method, system and medium based on multi-mode fusion Download PDF

Info

Publication number
CN116416503A
CN116416503A CN202310238165.1A CN202310238165A CN116416503A CN 116416503 A CN116416503 A CN 116416503A CN 202310238165 A CN202310238165 A CN 202310238165A CN 116416503 A CN116416503 A CN 116416503A
Authority
CN
China
Prior art keywords
small sample
network
feature map
fusion
sample target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310238165.1A
Other languages
Chinese (zh)
Inventor
宋程程
李捷
张瑞伟
赵火军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Jiuzhou Electric Group Co Ltd
Original Assignee
Sichuan Jiuzhou Electric Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Jiuzhou Electric Group Co Ltd filed Critical Sichuan Jiuzhou Electric Group Co Ltd
Priority to CN202310238165.1A priority Critical patent/CN116416503A/en
Publication of CN116416503A publication Critical patent/CN116416503A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small sample target detection method, a system and a medium based on multi-mode fusion, wherein the method comprises the following steps: acquiring coaxial visible light images and infrared images; inputting the visible light image and the infrared image into a network based on double-branch feature extraction, and performing feature extraction to obtain a visible light mode feature map and an infrared light feature map; carrying out feature fusion on the visible light mode feature map and the infrared light feature map to obtain a fused multi-mode fusion feature map; inputting the multi-mode fusion feature map into an improved RPN network, inputting the multi-mode fusion feature map into an ROI network, and training each model to obtain a basic small sample target detection model; adjusting the basic small sample target detection model to obtain a final small sample target detection model related to the target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model. The invention has good detection effect on the detection of the small sample target and high detection performance.

Description

Small sample target detection method, system and medium based on multi-mode fusion
Technical Field
The invention relates to the technical field of image processing, in particular to a small sample target detection method, a system and a medium based on multi-mode fusion.
Background
At present, artificial intelligence with deep learning as a core technology is continuously developed, and a deep learning algorithm is widely applied to the fields of image classification, image target detection and the like. However, the implementation of the technology depends on the support of large data samples, and in some practical application scenarios, such as disaster prevention and reduction, large-area monitoring, national defense and other fields, it is very expensive and difficult to collect a large amount of labeled data. Therefore, how to detect the target with a small amount of tag data, i.e. target detection in the case of a small sample, has become a hotspot and difficulty of machine learning at present.
At present, most small sample target detection methods are still researched based on the traditional target detection method and the idea of small sample learning, and optimization is performed by improving the algorithm idea and structure. However, the method has limited extracted characteristics due to the small sample number, so that the algorithm effect cannot reach the practical application level, and how to learn to the maximum degree under the condition of a small number of samples is a key problem of research.
However, the existing small sample target detection method has the problems of poor detection effect, low detection performance, large influence by samples and the like.
Disclosure of Invention
The invention aims to solve the technical problems that the existing small sample target detection method is poor in detection effect, low in detection performance, greatly influenced by samples and the like. The invention aims to provide a small sample target detection method, a system and a medium based on multi-mode fusion, which realize detection and identification under extreme conditions such as poor illumination, partial shielding of a target and the like. Aiming at the characteristics of infrared images and visible light images, a dual-branch characteristic extraction network consisting of a residual network and an improved residual network is designed, and the improved residual characteristic extraction network introduces variable convolution to better extract image characteristics. A Soft-NMS coaction method is also designed to reduce the problem of missed detection that is easily caused by multi-objective overlapping. The invention improves the recognition capability of the model to the target by using the multi-mode fusion characteristic, and then carries out fine adjustment on the model by a small amount of small sample class samples so as to realize the target detection under the condition of the small samples.
The invention is realized by the following technical scheme:
in a first aspect, the present invention provides a small sample target detection method based on multi-modal fusion, the method comprising:
acquiring coaxial visible light images and infrared images;
inputting the visible light image and the infrared image into a network based on double-branch feature extraction, and performing feature extraction to obtain a visible light mode feature map and an infrared light feature map;
carrying out feature fusion on the visible light mode feature map and the infrared light feature map to obtain a fused multi-mode fusion feature map;
inputting the multi-mode fusion feature map into an improved RPN network, inputting the multi-mode fusion feature map into an ROI network, and training each model to obtain a basic small sample target detection model;
adjusting the basic small sample target detection model to obtain a final small sample target detection model related to the target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model.
Firstly, coaxially acquiring a visible light image and an infrared image, inputting the visible light image and the infrared image into a double-branch feature extraction network designed by the invention for feature extraction, wherein the double-branch feature extraction network comprises an original residual error network and an improved residual error network, inputting an extracted multi-mode fusion feature map into an RPN, inputting the multi-mode fusion feature map into an ROI network, and performing model pre-training; and then fine-tuning the pre-training model under the condition of a small sample aiming at the target task to obtain a target detection network related to the target task. According to the invention, the multi-mode information fusion idea is applied to the two-stage target detection network, so that the dependence on the number of training samples is greatly reduced, the influence of extreme environments such as poor illumination, partial shielding of targets and the like is reduced, the learning ability of tasks is improved, the robustness of an algorithm is enhanced, and the problem of poor detection effect of small samples in the current stage is solved.
Further, the method comprises the steps of, the double-branch feature extraction network comprises an original residual feature extraction network and a residual network feature extraction network based on variable convolution;
performing feature extraction on the infrared image by adopting an original residual feature extraction net to obtain an infrared light feature map;
and extracting the characteristics of the visible light image by adopting a residual error network characteristic extraction net based on variable convolution to obtain a visible light model characteristic diagram.
Further, the variable convolution based sampling in the variable convolution based residual network feature extraction network is expressed as:
Figure BDA0004123203930000021
wherein y (p) 0 ) For each p 0 Map output, w (p) n ) For sampling weight, x is the input feature map, p 0 For each position on the output profile y, p n Each position in R is listed, R is receptive field, Δp n Is the offset.
Further, the inputting the multi-mode fusion feature map into the improved RPN network, inputting the multi-mode fusion feature map into the ROI network, and performing model training to obtain a basic small sample target detection model, including:
inputting the multi-mode fusion feature map F into an improved RPN network to generate an Anchor frame Anchor box; cutting and filtering the Anchor frame Anchor box, and outputting a Bbox boundary box and category scores, namely obtaining the best target Rois;
inputting the multimodal fusion feature map F into an ROI network and inputting a target ROIs output by the modified RPN network into the ROI network; and (3) entering a full connection layer through ROI pooling for outputting, and finishing classification and regression to obtain a trained basic small sample target detection model.
Further, the improved RPN network adopts a linear Soft-NMS to remove redundant detection frames, and the best target Rois is obtained; the expression of the linear Soft-NMS is:
Figure BDA0004123203930000031
wherein s is i To score, N t Is a threshold value, iou (M, b i ) Is the overlap ratio.
Further, the detection model of the basic small sample target is specifically:
freezing and removing all layer parameters of the last layer according to the basic small sample target detection model, and initializing the parameters of the last layer;
and adjusting the last layer of parameters by using the small sample fine adjustment data set, and simultaneously combining a cosine similarity classifier to obtain a final small sample target detection model.
Further, the small sample fine tuning data set is obtained by sampling the numbers k=1, 2,3,5,10 on the base class and the new class respectively.
In a second aspect, the present invention further provides a small sample target detection system based on multi-mode fusion, where the system is used to implement the small sample target detection method based on multi-mode fusion; the system comprises:
the data acquisition module is used for acquiring coaxial visible light images and infrared images;
the network module is used for inputting the visible light image and the infrared image into the network module for extracting the characteristics to obtain a visible light module characteristic diagram and an infrared light characteristic diagram;
the multi-mode feature fusion module is used for carrying out feature fusion on the visible light mode feature map and the infrared light feature map to obtain a fused multi-mode fusion feature map;
the classification and regression module is used for inputting the multi-mode fusion feature map into the improved RPN network, inputting the multi-mode fusion feature map into the ROI network, and training each model to obtain a basic small sample target detection model;
the adjusting module is used for adjusting the basic small sample target detection model to obtain a final small sample target detection model related to the target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model.
In a third aspect, the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the small sample target detection method based on multi-modal fusion when executing the computer program.
In a fourth aspect, the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the method for detecting a small sample target based on multi-modal fusion.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. firstly, coaxially acquiring a visible light image and an infrared image, inputting the visible light image and the infrared image into a double-branch feature extraction network designed by the invention for feature extraction, wherein the double-branch feature extraction network comprises an original residual error network and an improved residual error network, inputting an extracted multi-mode feature map into an RPN, inputting a multi-mode fusion feature map into an ROI network, and performing model pre-training; and then fine-tuning the pre-training model under the condition of a small sample aiming at the target task to obtain a target detection network related to the target task. According to the invention, the multi-mode information fusion idea is applied to the two-stage target detection network, so that the dependence on the number of training samples is greatly reduced, the influence of extreme environments such as poor illumination, partial shielding of targets and the like is reduced, the learning ability of tasks is improved, the robustness of an algorithm is enhanced, and the problem of poor detection effect of small samples in the current stage is solved.
2. The dual-branch feature extraction network based on the design of the invention adopts different feature extraction models aiming at different features of images, wherein the improved feature extraction model adopts the idea of variable convolution to increase the space sampling position, improves the processing capacity of geometric transformation, can better cope with the conditions of target movement, size scaling, rotation and the like, and efficiently learns the information of visible light images under the condition of a small number of samples. Meanwhile, the fusion module is based on feature level fusion, and compared with common image level fusion, the mode features are more complete, so that the detection and identification effects are improved. In addition, the invention adopts Soft-NMS to replace the traditional NMS to improve the traditional NMS, thereby effectively solving the problems of complex background, object covering, overlapping and the like, reducing the omission ratio of the algorithm and improving the detection effect.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention. In the drawings:
FIG. 1 is a flow chart of a small sample target detection method based on multi-modal fusion;
FIG. 2 is a detailed flowchart of a small sample target detection method based on multi-modal fusion according to the present invention;
FIG. 3 is a block diagram of the network residual error extraction based on the dual-branch feature of the present invention;
FIG. 4 is a block diagram of a small sample target detection system based on multi-modal fusion.
Detailed Description
For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.
The existing small sample target detection method has the problems of poor detection effect, low detection performance, large influence by samples and the like. And under the condition of a small sample, detection and identification are difficult according to only one mode information of an object, and the multi-mode information describes the target from different angles, so that complementation can be performed on characteristics. Based on the method, the multi-mode learning is introduced on the basis of a small sample to improve the algorithm performance, and the visible light mode and infrared mode information of the image is utilized to improve the detection and identification performance.
The common visible light and infrared multimode fusion is based on image-level fusion (an infrared image and a visible light image are fused into a picture to be used as input in a data processing stage), so that certain information of each mode is lost.
Firstly, coaxially acquiring a visible light image and an infrared image, inputting the visible light image and the infrared image into a double-branch feature extraction network designed by the invention for feature extraction, wherein the double-branch feature extraction network comprises an original residual error network and an improved residual error network, inputting an extracted multi-mode feature map into an RPN, inputting a multi-mode fusion feature map into an ROI network, and performing model pre-training; and then fine-tuning the pre-training model under the condition of a small sample aiming at the target task to obtain a target detection network related to the target task. According to the invention, the multi-mode information fusion idea is applied to the two-stage target detection network, so that the dependence on the number of training samples is greatly reduced, the influence of extreme environments such as poor illumination, partial shielding of targets and the like is reduced, the learning ability of tasks is improved, the robustness of an algorithm is enhanced, and the problem of poor detection effect of small samples in the current stage is solved.
Example 1
As shown in fig. 1, the method for detecting a small sample target based on multi-mode fusion of the present invention includes:
acquiring coaxial visible light images and infrared images;
inputting the visible light image and the infrared image into a network based on double-branch feature extraction, and performing feature extraction to obtain a visible light mode feature map and an infrared light feature map;
carrying out feature fusion on the visible light mode feature map and the infrared light feature map to obtain a fused multi-mode fusion feature map;
inputting the multi-mode feature map into an improved RPN network, inputting the multi-mode fusion feature map into an ROI network, and performing model training to obtain a basic small sample target detection model;
adjusting the basic small sample target detection model to obtain a final small sample target detection model related to the target task; and adopts a final small sample target detection model and detecting target tasks of the small sample to be detected.
As shown in fig. 2, the specific implementation is as follows:
step 1, coaxial visible light images S are collected l And infrared image Q l Data set, l=1, 2,..m, m+1,..l, l is the total number of categories, training data set and test data set were made. Firstly, classifying collected data into a base class data set X according to categories base And new class data set X new Wherein the base class data set is composed of k classes of visible light images and infrared image data, and the new class data set is composed of the remaining classes of visible light images and infrared image data:
X base ={S i ,Q i },i=1,2,...,k
X new ={S j ,Q j },j=m+1,m+2...,l
training data set X train Consisting of base class data, test data set X test Consists of partial base class and new class;
X train =X base
X test =X base ∪X new
in specific implementation, step 1 is performed according to 4:1, dividing the disjoint base class and the new class, wherein the data in the base class and the new class are visible light images and infrared images with labels, K images are respectively extracted from each category during the test; randomly extracting the base classes into a training set and a verification set according to the proportion of 7:3;
specifically, the data adopted in the experiment is that the visible light image and the infrared image are in one-to-one correspondence, wherein the basic class comprises a large number of samples, the new class only comprises a small number of samples, and K is extracted from the basic class at a time, wherein K=1, 2,3,5 and 10.
Step 2, constructing a network based on double-branch feature extraction aiming at the characteristics of visible light images and infrared images; the dual-branch feature extraction network comprises an original residual feature extraction network (ResNet 101) and a residual feature extraction network (improved ResNet 101) based on variable convolution; as shown in fig. 3, (a) is a process flow diagram of an original residual block in an original residual feature extraction network, and (b) is a process flow diagram of a variable convolution-based residual block in a variable convolution-based residual network feature extraction network.
The original samples of the original residual feature extraction net may be expressed as:
Figure BDA0004123203930000061
the samples of the variable convolution in the variable convolution-based residual network feature extraction network are expressed as:
Figure BDA0004123203930000062
wherein y (p) 0 ) For each p 0 Map output, w (p) n ) For sampling weight, x is the input feature map, p 0 For each position on the output profile y, p n Each position in R is listed, R is receptive field, Δp n Is the offset.
Step 3, inputting the visible light image and the infrared image into a network based on double-branch feature extraction, and extracting features; specifically, an original residual error feature extraction net is adopted to perform feature extraction on an infrared image, so as to obtain an infrared light feature map f h
Based on the characteristics of the visible light image including shape, color and the like, a base is adoptedPerforming feature extraction on the visible light image in a variable convolution residual network feature extraction net to obtain a visible light model feature map f k
Step 4, the RPN network is improved, a linear Soft-NMS is adopted to replace the traditional NMS, the detection performance of an algorithm under a complex scene is improved, the input of the ROI layer is a multi-mode characteristic obtained by fusing the visible mode characteristic and the infrared mode characteristic, and the classification precision of targets can be improved; the expression of the linear Soft-NMS is:
Figure BDA0004123203930000063
wherein s is i To score, N t Is a threshold value, iou (M, b i ) Is the overlap ratio.
Step 5, the visible light mode characteristic diagram f k And infrared light characteristic map f h Feature fusion is carried out to obtain a multi-mode fusion feature diagram F after fusion; wherein, the feature fusion formula is:
Figure BDA0004123203930000072
wherein F is a fused multi-mode fusion feature map, F k F is a visible light mode characteristic diagram h Is an infrared mode characteristic diagram,
Figure BDA0004123203930000073
is added point by point.
Step 6, inputting the multi-mode fusion feature map F obtained in the step 5 into an improved RPN network, firstly carrying out sliding window processing on the picture, then obtaining a series of detection frames B and corresponding scores S, and judging the position of a foreground suggestion frame by setting a confidence threshold, eliminating less than the threshold and keeping the score larger than the threshold, wherein the linear Soft-NMS is different from NMS in all inhibition of overlapping prediction frames, and the linear Soft-NMS attenuates the detection score of a detection frame B1 which is highly overlapped with the prediction frame M;
and 7, inputting the multi-mode fusion feature map F and the information output by the improved RPN network in the step 6 into the ROI Pooling layer, and adopting the multi-mode feature map to replace the single-mode feature map in the traditional algorithm can provide various information by classifying and identifying under the condition of small sample, so that the accuracy is improved. Specifically:
visible light mode characteristic diagram f k Inputting the data into an improved RPN network to generate an Anchor box; cutting and filtering the Anchor frame Anchor box, and outputting a Bbox boundary box and category scores, namely obtaining the best target Rois;
inputting the multimodal fusion feature map F into an ROI network and inputting a target ROIs output by the modified RPN network into the ROI network; and (5) entering a full connection layer for outputting through ROI pooling, and finishing classification and regression.
Step 8, performing model training iteration to obtain a pre-training model (namely a basic small sample target detection model) with excellent performance on the base class;
in the iterative training process of the basic small sample target detection model, the total network loss is composed of three parts, which are expressed as follows:
L total =L rpn +L cls +L loc
wherein L is total L is the total loss of the network rpn For RPN network loss, L cls For network classification loss, L loc The loss is regressed for the network border.
Step 9, sampling the numbers K=1, 2,3,5 and 10 on the basic class and the new class respectively to make small sample fine adjustment data sets, and verifying the rest data sets;
step 10, introducing a cosine similarity classifier in the fine tuning stage, wherein the cosine similarity evaluation function is as follows:
Figure BDA0004123203930000071
wherein S is i,j Similarity score between input x and j-th class weight vector suggested for i-th target, F (x) is input feature, ω j The weight vector is the weight vector of j-class targets, and alpha is the scale factor.
Step 11, freezing and removing all layer parameters of the last layer according to the basic small sample target detection model obtained in the step 8, and initializing the parameters of the last layer; and adjusting the last layer of parameters by using the small sample fine adjustment data set to obtain the detector with excellent detection performance on the basic class and the new class (namely the final small sample target detection model).
Compared with the prior art, the invention has the following beneficial effects:
the dual-branch feature extraction network based on the design of the invention adopts different feature extraction models aiming at different features of images, wherein the improved feature extraction model adopts the idea of variable convolution to increase the space sampling position, improves the processing capacity of geometric transformation, can better cope with the conditions of target movement, size scaling, rotation and the like, and efficiently learns the information of visible light images under the condition of a small number of samples.
According to the invention, soft-NMS is adopted to replace the traditional NMS for improvement, so that the problems of complex background, target masking, overlapping and the like can be effectively solved, the omission ratio of an algorithm is reduced, and the detection effect is improved.
The method for fusing the visible light mode characteristics and the infrared mode characteristics is adopted to improve the performance of a small sample algorithm, and mainly improves the classification performance of the small sample algorithm. Many scholars' researches show that under the condition of a small sample, the target detection network does not greatly influence the positioning of the target, but how to learn the useful characteristics of the target under the condition of a small amount of samples is difficult. Compared with the traditional small sample detection algorithm, the method has better robustness and detection effect.
Example 2
As shown in fig. 4, the difference between the present embodiment and embodiment 1 is that the present embodiment provides a small sample target detection system based on multi-mode fusion, which is used to implement a small sample target detection method based on multi-mode fusion described in embodiment 1; the system comprises:
the data acquisition module is used for acquiring coaxial visible light images and infrared images;
the network module is used for inputting the visible light image and the infrared image into the network module for extracting the characteristics to obtain a visible light module characteristic diagram and an infrared light characteristic diagram;
the multi-mode feature fusion module is used for carrying out feature fusion on the visible light mode feature map and the infrared light feature map to obtain a fused multi-mode fusion feature map;
the classification and regression module is used for inputting the multi-mode fusion feature map into the improved RPN network, inputting the multi-mode fusion feature map into the ROI network, and training each model to obtain a basic small sample target detection model;
the adjusting module is used for adjusting the basic small sample target detection model to obtain a final small sample target detection model related to the target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model.
The execution process of each module is performed according to the steps of the multi-mode fusion-based small sample target detection method described in embodiment 1, and in this embodiment, details are not repeated.
Meanwhile, the invention also provides a computer device which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the small sample target detection method based on multi-mode fusion when executing the computer program.
Meanwhile, the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the small sample target detection method based on multi-mode fusion when being executed by a processor.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. The method for detecting the small sample target based on the multi-mode fusion is characterized by comprising the following steps of:
acquiring coaxial visible light images and infrared images;
inputting the visible light image and the infrared image into a network based on double-branch feature extraction, and extracting features to obtain a visible light model feature map and an infrared light feature map;
performing feature fusion on the visible light mode feature map and the infrared light feature map to obtain a multi-mode fusion feature map;
inputting the multi-mode fusion feature map into an improved RPN network, inputting the multi-mode fusion feature map into an ROI network, and performing model training to obtain a basic small sample target detection model;
adjusting the basic small sample target detection model to obtain a final small sample target detection model related to a target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model.
2. The method for detecting a small sample target based on multi-modal fusion according to claim 1, wherein the dual-branch feature extraction network comprises an original residual feature extraction network and a residual feature extraction network based on variable convolution;
performing feature extraction on the infrared image by adopting the original residual feature extraction net to obtain an infrared light feature map;
and carrying out feature extraction on the visible light image by adopting the residual error network feature extraction net based on the variable convolution to obtain a visible light model feature map.
3. The method for detecting a small sample target based on multi-modal fusion according to claim 2, wherein the variable convolution samples in the variable convolution-based residual network feature extraction network are expressed as:
Figure FDA0004123203920000011
wherein y (p) 0 ) For each p 0 Map output, w (p) n ) For sampling weight, x is the input feature map, p 0 For each position on the output profile y, p n Each position in R is listed, R is receptive field, Δp n Is the offset.
4. The method for detecting a small sample target based on multi-modal fusion according to claim 1, wherein the inputting the multi-modal fusion feature map into an improved RPN network, inputting the multi-modal fusion feature map into an ROI network, and performing model training to obtain a basic small sample target detection model comprises:
inputting the multi-mode feature map into an improved RPN network to generate an Anchor frame Anchor box; cutting and filtering the Anchor frame Anchor box, and outputting a Bbox boundary box and category scores, namely obtaining the best target Rois;
inputting the multi-modal fusion profile into an ROI network and transmitting a target ROIs output by the modified RPN network into the ROI network; and (3) entering a full connection layer through ROI pooling for outputting, and finishing classification and regression to obtain a trained basic small sample target detection model.
5. The method for small sample target detection based on multi-modal fusion according to claim 4, wherein the improved RPN network uses linear Soft-NMS to remove redundant detection frames to obtain the best target Rois; the expression of the linear Soft-NMS is:
Figure FDA0004123203920000021
wherein s is i To score, N t Is a threshold value, iou (M, b i ) Is the overlap ratio.
6. The method for detecting a small sample target based on multi-modal fusion according to claim 1, wherein the performing of the basic small sample target detection model is specifically:
freezing and removing all layer parameters of the last layer according to the basic small sample target detection model, and initializing the parameters of the last layer;
and adjusting the last layer of parameters by using the small sample fine adjustment data set, and simultaneously combining a cosine similarity classifier to obtain a final small sample target detection model.
7. The method for detecting a small sample target based on multi-modal fusion according to claim 6, wherein the small sample fine tuning dataset is obtained by sampling the numbers k=1, 2,3,5,10 on the base class and the new class respectively.
8. A small sample target detection system based on multi-modal fusion, the system comprising:
the data acquisition module is used for acquiring coaxial visible light images and infrared images;
the network module is used for inputting the visible light image and the infrared image into the network module for extracting the characteristics to obtain a visible light module characteristic diagram and an infrared light characteristic diagram;
the multi-mode feature fusion module is used for carrying out feature fusion on the visible light mode feature map and the infrared light feature map to obtain a fused multi-mode fusion feature map;
the classification and regression module is used for inputting the multi-mode fusion feature map into an improved RPN network, inputting the multi-mode fusion feature map into an ROI network, and training each model to obtain a basic small sample target detection model;
the adjustment module is used for adjusting the basic small sample target detection model to obtain a final small sample target detection model related to a target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements a multimodal fusion based small sample object detection method as claimed in any of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a multimodal fusion based small sample object detection method according to any of claims 1 to 7.
CN202310238165.1A 2023-03-13 2023-03-13 Small sample target detection method, system and medium based on multi-mode fusion Pending CN116416503A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310238165.1A CN116416503A (en) 2023-03-13 2023-03-13 Small sample target detection method, system and medium based on multi-mode fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310238165.1A CN116416503A (en) 2023-03-13 2023-03-13 Small sample target detection method, system and medium based on multi-mode fusion

Publications (1)

Publication Number Publication Date
CN116416503A true CN116416503A (en) 2023-07-11

Family

ID=87048969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310238165.1A Pending CN116416503A (en) 2023-03-13 2023-03-13 Small sample target detection method, system and medium based on multi-mode fusion

Country Status (1)

Country Link
CN (1) CN116416503A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237697A (en) * 2023-08-01 2023-12-15 北京邮电大学 Small sample image detection method, system, medium and equipment
CN117274899A (en) * 2023-09-20 2023-12-22 中国人民解放军海军航空大学 Storage hidden danger detection method based on visible light and infrared light image feature fusion
CN118090743A (en) * 2024-04-22 2024-05-28 山东浪潮数字商业科技有限公司 Porcelain winebottle quality detection system based on multi-mode image recognition technology

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237697A (en) * 2023-08-01 2023-12-15 北京邮电大学 Small sample image detection method, system, medium and equipment
CN117237697B (en) * 2023-08-01 2024-05-17 北京邮电大学 Small sample image detection method, system, medium and equipment
CN117274899A (en) * 2023-09-20 2023-12-22 中国人民解放军海军航空大学 Storage hidden danger detection method based on visible light and infrared light image feature fusion
CN117274899B (en) * 2023-09-20 2024-05-28 中国人民解放军海军航空大学 Storage hidden danger detection method based on visible light and infrared light image feature fusion
CN118090743A (en) * 2024-04-22 2024-05-28 山东浪潮数字商业科技有限公司 Porcelain winebottle quality detection system based on multi-mode image recognition technology

Similar Documents

Publication Publication Date Title
CN112464883B (en) Automatic detection and identification method and system for ship target in natural scene
CN107563372B (en) License plate positioning method based on deep learning SSD frame
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN116416503A (en) Small sample target detection method, system and medium based on multi-mode fusion
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN110175615B (en) Model training method, domain-adaptive visual position identification method and device
CN110659601B (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN104615986A (en) Method for utilizing multiple detectors to conduct pedestrian detection on video images of scene change
CN112149538A (en) Pedestrian re-identification method based on multi-task learning
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
CN108734200B (en) Human target visual detection method and device based on BING (building information network) features
CN116469020A (en) Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance
CN116091551B (en) Target retrieval tracking method and system based on multi-mode fusion
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN112991280B (en) Visual detection method, visual detection system and electronic equipment
CN114998748A (en) Remote sensing image target fine identification method, electronic equipment and storage medium
CN117152625A (en) Remote sensing small target identification method, system, equipment and medium based on CoordConv and Yolov5
CN115272882A (en) Discrete building detection method and system based on remote sensing image
CN114898290A (en) Real-time detection method and system for marine ship
CN113569835A (en) Water meter numerical value reading method based on target detection and segmentation identification
CN111582057B (en) Face verification method based on local receptive field
Li et al. Incremental learning of infrared vehicle detection method based on SSD

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination