CN116416503A

CN116416503A - Small sample target detection method, system and medium based on multi-mode fusion

Info

Publication number: CN116416503A
Application number: CN202310238165.1A
Authority: CN
Inventors: 宋程程; 李捷; 张瑞伟; 赵火军
Original assignee: Sichuan Jiuzhou Electric Group Co Ltd
Current assignee: Sichuan Jiuzhou Electric Group Co Ltd
Priority date: 2023-03-13
Filing date: 2023-03-13
Publication date: 2023-07-11

Abstract

The invention discloses a small sample target detection method, a system and a medium based on multi-mode fusion, wherein the method comprises the following steps: acquiring coaxial visible light images and infrared images; inputting the visible light image and the infrared image into a network based on double-branch feature extraction, and performing feature extraction to obtain a visible light mode feature map and an infrared light feature map; carrying out feature fusion on the visible light mode feature map and the infrared light feature map to obtain a fused multi-mode fusion feature map; inputting the multi-mode fusion feature map into an improved RPN network, inputting the multi-mode fusion feature map into an ROI network, and training each model to obtain a basic small sample target detection model; adjusting the basic small sample target detection model to obtain a final small sample target detection model related to the target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model. The invention has good detection effect on the detection of the small sample target and high detection performance.

Description

Small sample target detection method, system and medium based on multi-mode fusion

Technical Field

The invention relates to the technical field of image processing, in particular to a small sample target detection method, a system and a medium based on multi-mode fusion.

Background

At present, artificial intelligence with deep learning as a core technology is continuously developed, and a deep learning algorithm is widely applied to the fields of image classification, image target detection and the like. However, the implementation of the technology depends on the support of large data samples, and in some practical application scenarios, such as disaster prevention and reduction, large-area monitoring, national defense and other fields, it is very expensive and difficult to collect a large amount of labeled data. Therefore, how to detect the target with a small amount of tag data, i.e. target detection in the case of a small sample, has become a hotspot and difficulty of machine learning at present.

At present, most small sample target detection methods are still researched based on the traditional target detection method and the idea of small sample learning, and optimization is performed by improving the algorithm idea and structure. However, the method has limited extracted characteristics due to the small sample number, so that the algorithm effect cannot reach the practical application level, and how to learn to the maximum degree under the condition of a small number of samples is a key problem of research.

However, the existing small sample target detection method has the problems of poor detection effect, low detection performance, large influence by samples and the like.

Disclosure of Invention

The invention aims to solve the technical problems that the existing small sample target detection method is poor in detection effect, low in detection performance, greatly influenced by samples and the like. The invention aims to provide a small sample target detection method, a system and a medium based on multi-mode fusion, which realize detection and identification under extreme conditions such as poor illumination, partial shielding of a target and the like. Aiming at the characteristics of infrared images and visible light images, a dual-branch characteristic extraction network consisting of a residual network and an improved residual network is designed, and the improved residual characteristic extraction network introduces variable convolution to better extract image characteristics. A Soft-NMS coaction method is also designed to reduce the problem of missed detection that is easily caused by multi-objective overlapping. The invention improves the recognition capability of the model to the target by using the multi-mode fusion characteristic, and then carries out fine adjustment on the model by a small amount of small sample class samples so as to realize the target detection under the condition of the small samples.

The invention is realized by the following technical scheme:

in a first aspect, the present invention provides a small sample target detection method based on multi-modal fusion, the method comprising:

acquiring coaxial visible light images and infrared images;

inputting the visible light image and the infrared image into a network based on double-branch feature extraction, and performing feature extraction to obtain a visible light mode feature map and an infrared light feature map;

carrying out feature fusion on the visible light mode feature map and the infrared light feature map to obtain a fused multi-mode fusion feature map;

inputting the multi-mode fusion feature map into an improved RPN network, inputting the multi-mode fusion feature map into an ROI network, and training each model to obtain a basic small sample target detection model;

adjusting the basic small sample target detection model to obtain a final small sample target detection model related to the target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model.

Firstly, coaxially acquiring a visible light image and an infrared image, inputting the visible light image and the infrared image into a double-branch feature extraction network designed by the invention for feature extraction, wherein the double-branch feature extraction network comprises an original residual error network and an improved residual error network, inputting an extracted multi-mode fusion feature map into an RPN, inputting the multi-mode fusion feature map into an ROI network, and performing model pre-training; and then fine-tuning the pre-training model under the condition of a small sample aiming at the target task to obtain a target detection network related to the target task. According to the invention, the multi-mode information fusion idea is applied to the two-stage target detection network, so that the dependence on the number of training samples is greatly reduced, the influence of extreme environments such as poor illumination, partial shielding of targets and the like is reduced, the learning ability of tasks is improved, the robustness of an algorithm is enhanced, and the problem of poor detection effect of small samples in the current stage is solved.

Further, the method comprises the steps of, the double-branch feature extraction network comprises an original residual feature extraction network and a residual network feature extraction network based on variable convolution;

performing feature extraction on the infrared image by adopting an original residual feature extraction net to obtain an infrared light feature map;

and extracting the characteristics of the visible light image by adopting a residual error network characteristic extraction net based on variable convolution to obtain a visible light model characteristic diagram.

Further, the variable convolution based sampling in the variable convolution based residual network feature extraction network is expressed as:

wherein y (p) ₀ ) For each p ₀ Map output, w (p) _n ) For sampling weight, x is the input feature map, p ₀ For each position on the output profile y, p _n Each position in R is listed, R is receptive field, Δp _n Is the offset.

Further, the inputting the multi-mode fusion feature map into the improved RPN network, inputting the multi-mode fusion feature map into the ROI network, and performing model training to obtain a basic small sample target detection model, including:

inputting the multi-mode fusion feature map F into an improved RPN network to generate an Anchor frame Anchor box; cutting and filtering the Anchor frame Anchor box, and outputting a Bbox boundary box and category scores, namely obtaining the best target Rois;

inputting the multimodal fusion feature map F into an ROI network and inputting a target ROIs output by the modified RPN network into the ROI network; and (3) entering a full connection layer through ROI pooling for outputting, and finishing classification and regression to obtain a trained basic small sample target detection model.

Further, the improved RPN network adopts a linear Soft-NMS to remove redundant detection frames, and the best target Rois is obtained; the expression of the linear Soft-NMS is:

wherein s is _i To score, N _t Is a threshold value, iou (M, b _i ) Is the overlap ratio.

Further, the detection model of the basic small sample target is specifically:

freezing and removing all layer parameters of the last layer according to the basic small sample target detection model, and initializing the parameters of the last layer;

and adjusting the last layer of parameters by using the small sample fine adjustment data set, and simultaneously combining a cosine similarity classifier to obtain a final small sample target detection model.

Further, the small sample fine tuning data set is obtained by sampling the numbers k=1, 2,3,5,10 on the base class and the new class respectively.

In a second aspect, the present invention further provides a small sample target detection system based on multi-mode fusion, where the system is used to implement the small sample target detection method based on multi-mode fusion; the system comprises:

the data acquisition module is used for acquiring coaxial visible light images and infrared images;

the network module is used for inputting the visible light image and the infrared image into the network module for extracting the characteristics to obtain a visible light module characteristic diagram and an infrared light characteristic diagram;

the multi-mode feature fusion module is used for carrying out feature fusion on the visible light mode feature map and the infrared light feature map to obtain a fused multi-mode fusion feature map;

the classification and regression module is used for inputting the multi-mode fusion feature map into the improved RPN network, inputting the multi-mode fusion feature map into the ROI network, and training each model to obtain a basic small sample target detection model;

the adjusting module is used for adjusting the basic small sample target detection model to obtain a final small sample target detection model related to the target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model.

In a third aspect, the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the small sample target detection method based on multi-modal fusion when executing the computer program.

In a fourth aspect, the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the method for detecting a small sample target based on multi-modal fusion.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. firstly, coaxially acquiring a visible light image and an infrared image, inputting the visible light image and the infrared image into a double-branch feature extraction network designed by the invention for feature extraction, wherein the double-branch feature extraction network comprises an original residual error network and an improved residual error network, inputting an extracted multi-mode feature map into an RPN, inputting a multi-mode fusion feature map into an ROI network, and performing model pre-training; and then fine-tuning the pre-training model under the condition of a small sample aiming at the target task to obtain a target detection network related to the target task. According to the invention, the multi-mode information fusion idea is applied to the two-stage target detection network, so that the dependence on the number of training samples is greatly reduced, the influence of extreme environments such as poor illumination, partial shielding of targets and the like is reduced, the learning ability of tasks is improved, the robustness of an algorithm is enhanced, and the problem of poor detection effect of small samples in the current stage is solved.

2. The dual-branch feature extraction network based on the design of the invention adopts different feature extraction models aiming at different features of images, wherein the improved feature extraction model adopts the idea of variable convolution to increase the space sampling position, improves the processing capacity of geometric transformation, can better cope with the conditions of target movement, size scaling, rotation and the like, and efficiently learns the information of visible light images under the condition of a small number of samples. Meanwhile, the fusion module is based on feature level fusion, and compared with common image level fusion, the mode features are more complete, so that the detection and identification effects are improved. In addition, the invention adopts Soft-NMS to replace the traditional NMS to improve the traditional NMS, thereby effectively solving the problems of complex background, object covering, overlapping and the like, reducing the omission ratio of the algorithm and improving the detection effect.

Drawings

The accompanying drawings, which are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention. In the drawings:

FIG. 1 is a flow chart of a small sample target detection method based on multi-modal fusion;

FIG. 2 is a detailed flowchart of a small sample target detection method based on multi-modal fusion according to the present invention;

FIG. 3 is a block diagram of the network residual error extraction based on the dual-branch feature of the present invention;

FIG. 4 is a block diagram of a small sample target detection system based on multi-modal fusion.

Detailed Description

For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.

The existing small sample target detection method has the problems of poor detection effect, low detection performance, large influence by samples and the like. And under the condition of a small sample, detection and identification are difficult according to only one mode information of an object, and the multi-mode information describes the target from different angles, so that complementation can be performed on characteristics. Based on the method, the multi-mode learning is introduced on the basis of a small sample to improve the algorithm performance, and the visible light mode and infrared mode information of the image is utilized to improve the detection and identification performance.

The common visible light and infrared multimode fusion is based on image-level fusion (an infrared image and a visible light image are fused into a picture to be used as input in a data processing stage), so that certain information of each mode is lost.

Firstly, coaxially acquiring a visible light image and an infrared image, inputting the visible light image and the infrared image into a double-branch feature extraction network designed by the invention for feature extraction, wherein the double-branch feature extraction network comprises an original residual error network and an improved residual error network, inputting an extracted multi-mode feature map into an RPN, inputting a multi-mode fusion feature map into an ROI network, and performing model pre-training; and then fine-tuning the pre-training model under the condition of a small sample aiming at the target task to obtain a target detection network related to the target task. According to the invention, the multi-mode information fusion idea is applied to the two-stage target detection network, so that the dependence on the number of training samples is greatly reduced, the influence of extreme environments such as poor illumination, partial shielding of targets and the like is reduced, the learning ability of tasks is improved, the robustness of an algorithm is enhanced, and the problem of poor detection effect of small samples in the current stage is solved.

Example 1

As shown in fig. 1, the method for detecting a small sample target based on multi-mode fusion of the present invention includes:

acquiring coaxial visible light images and infrared images;

inputting the multi-mode feature map into an improved RPN network, inputting the multi-mode fusion feature map into an ROI network, and performing model training to obtain a basic small sample target detection model;

adjusting the basic small sample target detection model to obtain a final small sample target detection model related to the target task; and adopts a final small sample target detection model and detecting target tasks of the small sample to be detected.

As shown in fig. 2, the specific implementation is as follows:

step 1, coaxial visible light images S are collected _l And infrared image Q _l Data set, l=1, 2,..m, m+1,..l, l is the total number of categories, training data set and test data set were made. Firstly, classifying collected data into a base class data set X according to categories _base And new class data set X _new Wherein the base class data set is composed of k classes of visible light images and infrared image data, and the new class data set is composed of the remaining classes of visible light images and infrared image data:

X _base ＝{S _i ,Q _i },i＝1,2,...,k

X _new ＝{S _j ,Q _j },j＝m+1,m+2...,l

training data set X _train Consisting of base class data, test data set X _test Consists of partial base class and new class;

X _train ＝X _base

X _test ＝X _base ∪X _new

in specific implementation, step 1 is performed according to 4:1, dividing the disjoint base class and the new class, wherein the data in the base class and the new class are visible light images and infrared images with labels, K images are respectively extracted from each category during the test; randomly extracting the base classes into a training set and a verification set according to the proportion of 7:3;

specifically, the data adopted in the experiment is that the visible light image and the infrared image are in one-to-one correspondence, wherein the basic class comprises a large number of samples, the new class only comprises a small number of samples, and K is extracted from the basic class at a time, wherein K=1, 2,3,5 and 10.

Step 2, constructing a network based on double-branch feature extraction aiming at the characteristics of visible light images and infrared images; the dual-branch feature extraction network comprises an original residual feature extraction network (ResNet 101) and a residual feature extraction network (improved ResNet 101) based on variable convolution; as shown in fig. 3, (a) is a process flow diagram of an original residual block in an original residual feature extraction network, and (b) is a process flow diagram of a variable convolution-based residual block in a variable convolution-based residual network feature extraction network.

The original samples of the original residual feature extraction net may be expressed as:

the samples of the variable convolution in the variable convolution-based residual network feature extraction network are expressed as:

Step 3, inputting the visible light image and the infrared image into a network based on double-branch feature extraction, and extracting features; specifically, an original residual error feature extraction net is adopted to perform feature extraction on an infrared image, so as to obtain an infrared light feature map f _h ；

Based on the characteristics of the visible light image including shape, color and the like, a base is adoptedPerforming feature extraction on the visible light image in a variable convolution residual network feature extraction net to obtain a visible light model feature map f _k 。

Step 4, the RPN network is improved, a linear Soft-NMS is adopted to replace the traditional NMS, the detection performance of an algorithm under a complex scene is improved, the input of the ROI layer is a multi-mode characteristic obtained by fusing the visible mode characteristic and the infrared mode characteristic, and the classification precision of targets can be improved; the expression of the linear Soft-NMS is:

Step 5, the visible light mode characteristic diagram f _k And infrared light characteristic map f _h Feature fusion is carried out to obtain a multi-mode fusion feature diagram F after fusion; wherein, the feature fusion formula is:

wherein F is a fused multi-mode fusion feature map, F _k F is a visible light mode characteristic diagram _h Is an infrared mode characteristic diagram,

is added point by point.

Step 6, inputting the multi-mode fusion feature map F obtained in the step 5 into an improved RPN network, firstly carrying out sliding window processing on the picture, then obtaining a series of detection frames B and corresponding scores S, and judging the position of a foreground suggestion frame by setting a confidence threshold, eliminating less than the threshold and keeping the score larger than the threshold, wherein the linear Soft-NMS is different from NMS in all inhibition of overlapping prediction frames, and the linear Soft-NMS attenuates the detection score of a detection frame B1 which is highly overlapped with the prediction frame M;

and 7, inputting the multi-mode fusion feature map F and the information output by the improved RPN network in the step 6 into the ROI Pooling layer, and adopting the multi-mode feature map to replace the single-mode feature map in the traditional algorithm can provide various information by classifying and identifying under the condition of small sample, so that the accuracy is improved. Specifically:

visible light mode characteristic diagram f _k Inputting the data into an improved RPN network to generate an Anchor box; cutting and filtering the Anchor frame Anchor box, and outputting a Bbox boundary box and category scores, namely obtaining the best target Rois;

inputting the multimodal fusion feature map F into an ROI network and inputting a target ROIs output by the modified RPN network into the ROI network; and (5) entering a full connection layer for outputting through ROI pooling, and finishing classification and regression.

Step 8, performing model training iteration to obtain a pre-training model (namely a basic small sample target detection model) with excellent performance on the base class;

in the iterative training process of the basic small sample target detection model, the total network loss is composed of three parts, which are expressed as follows:

L _total ＝L _rpn +L _cls +L _loc

wherein L is _total L is the total loss of the network _rpn For RPN network loss, L _cls For network classification loss, L _loc The loss is regressed for the network border.

Step 9, sampling the numbers K=1, 2,3,5 and 10 on the basic class and the new class respectively to make small sample fine adjustment data sets, and verifying the rest data sets;

step 10, introducing a cosine similarity classifier in the fine tuning stage, wherein the cosine similarity evaluation function is as follows:

wherein S is _i,j Similarity score between input x and j-th class weight vector suggested for i-th target, F (x) is input feature, ω _j The weight vector is the weight vector of j-class targets, and alpha is the scale factor.

Step 11, freezing and removing all layer parameters of the last layer according to the basic small sample target detection model obtained in the step 8, and initializing the parameters of the last layer; and adjusting the last layer of parameters by using the small sample fine adjustment data set to obtain the detector with excellent detection performance on the basic class and the new class (namely the final small sample target detection model).

Compared with the prior art, the invention has the following beneficial effects:

the dual-branch feature extraction network based on the design of the invention adopts different feature extraction models aiming at different features of images, wherein the improved feature extraction model adopts the idea of variable convolution to increase the space sampling position, improves the processing capacity of geometric transformation, can better cope with the conditions of target movement, size scaling, rotation and the like, and efficiently learns the information of visible light images under the condition of a small number of samples.

According to the invention, soft-NMS is adopted to replace the traditional NMS for improvement, so that the problems of complex background, target masking, overlapping and the like can be effectively solved, the omission ratio of an algorithm is reduced, and the detection effect is improved.

The method for fusing the visible light mode characteristics and the infrared mode characteristics is adopted to improve the performance of a small sample algorithm, and mainly improves the classification performance of the small sample algorithm. Many scholars' researches show that under the condition of a small sample, the target detection network does not greatly influence the positioning of the target, but how to learn the useful characteristics of the target under the condition of a small amount of samples is difficult. Compared with the traditional small sample detection algorithm, the method has better robustness and detection effect.

Example 2

As shown in fig. 4, the difference between the present embodiment and embodiment 1 is that the present embodiment provides a small sample target detection system based on multi-mode fusion, which is used to implement a small sample target detection method based on multi-mode fusion described in embodiment 1; the system comprises:

The execution process of each module is performed according to the steps of the multi-mode fusion-based small sample target detection method described in embodiment 1, and in this embodiment, details are not repeated.

Meanwhile, the invention also provides a computer device which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the small sample target detection method based on multi-mode fusion when executing the computer program.

Meanwhile, the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the small sample target detection method based on multi-mode fusion when being executed by a processor.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The method for detecting the small sample target based on the multi-mode fusion is characterized by comprising the following steps of:

acquiring coaxial visible light images and infrared images;

inputting the visible light image and the infrared image into a network based on double-branch feature extraction, and extracting features to obtain a visible light model feature map and an infrared light feature map;

performing feature fusion on the visible light mode feature map and the infrared light feature map to obtain a multi-mode fusion feature map;

inputting the multi-mode fusion feature map into an improved RPN network, inputting the multi-mode fusion feature map into an ROI network, and performing model training to obtain a basic small sample target detection model;

adjusting the basic small sample target detection model to obtain a final small sample target detection model related to a target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model.

2. The method for detecting a small sample target based on multi-modal fusion according to claim 1, wherein the dual-branch feature extraction network comprises an original residual feature extraction network and a residual feature extraction network based on variable convolution;

performing feature extraction on the infrared image by adopting the original residual feature extraction net to obtain an infrared light feature map;

and carrying out feature extraction on the visible light image by adopting the residual error network feature extraction net based on the variable convolution to obtain a visible light model feature map.

3. The method for detecting a small sample target based on multi-modal fusion according to claim 2, wherein the variable convolution samples in the variable convolution-based residual network feature extraction network are expressed as:

4. The method for detecting a small sample target based on multi-modal fusion according to claim 1, wherein the inputting the multi-modal fusion feature map into an improved RPN network, inputting the multi-modal fusion feature map into an ROI network, and performing model training to obtain a basic small sample target detection model comprises:

inputting the multi-mode feature map into an improved RPN network to generate an Anchor frame Anchor box; cutting and filtering the Anchor frame Anchor box, and outputting a Bbox boundary box and category scores, namely obtaining the best target Rois;

inputting the multi-modal fusion profile into an ROI network and transmitting a target ROIs output by the modified RPN network into the ROI network; and (3) entering a full connection layer through ROI pooling for outputting, and finishing classification and regression to obtain a trained basic small sample target detection model.

5. The method for small sample target detection based on multi-modal fusion according to claim 4, wherein the improved RPN network uses linear Soft-NMS to remove redundant detection frames to obtain the best target Rois; the expression of the linear Soft-NMS is:

6. The method for detecting a small sample target based on multi-modal fusion according to claim 1, wherein the performing of the basic small sample target detection model is specifically:

7. The method for detecting a small sample target based on multi-modal fusion according to claim 6, wherein the small sample fine tuning dataset is obtained by sampling the numbers k=1, 2,3,5,10 on the base class and the new class respectively.

8. A small sample target detection system based on multi-modal fusion, the system comprising:

the classification and regression module is used for inputting the multi-mode fusion feature map into an improved RPN network, inputting the multi-mode fusion feature map into an ROI network, and training each model to obtain a basic small sample target detection model;

the adjustment module is used for adjusting the basic small sample target detection model to obtain a final small sample target detection model related to a target task; and performing target task detection on the small sample to be detected by adopting a final small sample target detection model.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements a multimodal fusion based small sample object detection method as claimed in any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a multimodal fusion based small sample object detection method according to any of claims 1 to 7.