CN117690161B - Pedestrian detection method, device and medium based on image fusion - Google Patents
Pedestrian detection method, device and medium based on image fusion Download PDFInfo
- Publication number
- CN117690161B CN117690161B CN202311704548.XA CN202311704548A CN117690161B CN 117690161 B CN117690161 B CN 117690161B CN 202311704548 A CN202311704548 A CN 202311704548A CN 117690161 B CN117690161 B CN 117690161B
- Authority
- CN
- China
- Prior art keywords
- feature
- image
- visible light
- thermal infrared
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 66
- 230000004927 fusion Effects 0.000 title claims abstract description 46
- 238000010586 diagram Methods 0.000 claims abstract description 51
- 230000004913 activation Effects 0.000 claims abstract description 38
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000000034 method Methods 0.000 claims description 40
- 238000011176 pooling Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 14
- 238000005070 sampling Methods 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 6
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 2
- 238000002955 isolation Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 description 3
- 239000000047 product Substances 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 102100031315 AP-2 complex subunit mu Human genes 0.000 description 1
- 101000796047 Homo sapiens AP-2 complex subunit mu Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011049 filling Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000011551 log transformation method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012634 optical imaging Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Image Processing (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention relates to a pedestrian detection method, equipment and medium based on image fusion, which comprises the following steps: s1, acquiring a real-time visible light image and a thermal infrared image and preprocessing the images; s2, respectively carrying out multi-scale feature extraction on the preprocessed visible light image and the preprocessed thermal infrared image for a plurality of times to generate a plurality of visible light feature images and a plurality of thermal infrared feature images; s3, carrying out weighted fusion on the visible light characteristic diagram and the thermal infrared characteristic diagram to obtain a class activation diagram; s4, inputting the class activation diagram into a feature pyramid network to perform multi-scale feature fusion, and generating a fusion feature diagram; and S5, executing a detection task on the fusion feature map, and outputting a pedestrian detection result, wherein the detection task comprises pedestrian prediction boundary box regression and pedestrian prediction boundary box object selection classification. Compared with the prior art, the pedestrian detection method and device improve the accuracy and the instantaneity of the pedestrian detection result.
Description
Technical Field
The invention belongs to the technical field of target detection, and particularly relates to a pedestrian detection method, device and medium based on image fusion.
Background
In industry, manual driving of transport vehicles is usually adopted, but the complexity of the night workshop environment and possible misoperation of drivers bring uncertainty factors to driving safety, and life safety and production efficiency of pedestrians are seriously threatened. Object detection is one of the key tasks in the field of computer vision, and pedestrian detection has been remarkably developed as an important branch of object detection, but the detection result thereof depends largely on the quality of an input image. Under the condition of complex illumination conditions or low illumination, the optical imaging sensor is difficult to provide enough information to clearly outline the target outline, and the traditional single-mode target detection technology is difficult to obtain an ideal imaging result, so that the accuracy and reliability of the output result of the pedestrian detection algorithm are directly affected.
In this context, multi-modal object detection techniques have evolved that aim to obtain more comprehensive object information by using multiple sources of data in combination with different sensors. In the existing multi-mode target pedestrian detection method, a plurality of backbone networks are generally used for respectively extracting feature graphs from input modes, then the feature graphs are fused by utilizing an algorithm, and the fusion part allows a detection model to extract detailed information from each input, so that better performance is realized. For example, LEE and the like propose a cascade fusion method, cascade operation is carried out on two modal feature graphs to double the channel number, and then an NiN layer is used for outputting important features, but when the channel number is doubled, the complexity of calculation is increased due to the introduction of redundant calculation amount, the instantaneity is poor, and the deployment of a model is limited; KIM and the like propose a weighted fusion method based on a region of interest, and the region to be fused is selected by judging the feature quantity extracted from the region of interest, but the feature of an unfused region is sacrificed, so that the detection precision of a tiny target is reduced, the superiority of inter-mode fusion is not considered, insufficient mode fusion is caused, and meanwhile, global information in modes is not considered, so that the mode fusion information is lost. Therefore, it is necessary to design a pedestrian detection method, fully utilizing the advantages of modal fusion, and improving the accuracy and instantaneity of pedestrian detection.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a pedestrian detection method, device and medium based on image fusion, which improve the accuracy and real-time performance of pedestrian detection results.
The aim of the invention can be achieved by the following technical scheme:
a pedestrian detection method based on image fusion comprises the following steps:
S1, acquiring a real-time visible light image and a thermal infrared image and preprocessing the images;
s2, respectively carrying out multi-scale feature extraction on the preprocessed visible light image and the preprocessed thermal infrared image for a plurality of times to generate a plurality of visible light feature images and a plurality of thermal infrared feature images;
S3, carrying out weighted fusion on the visible light characteristic diagram and the thermal infrared characteristic diagram to obtain a class activation diagram;
S4, inputting the class activation diagram into a feature pyramid network to perform multi-scale feature fusion, and generating a fusion feature diagram;
and S5, executing a detection task on the fusion feature map, and outputting a pedestrian detection result, wherein the detection task comprises pedestrian prediction boundary box regression and pedestrian prediction boundary box object selection classification.
Further, in step S1, the preprocessing process includes:
unifying the pixel size and format of each image data;
filtering noise reduction and image enhancement are performed.
Further, in step S2, the specific process of multi-scale feature extraction is as follows:
s201, acquiring a preprocessed visible light image and a preprocessed thermal infrared image, sampling isolation pixels in the horizontal direction and the vertical direction, and generating a plurality of visible light image characteristic layers and a plurality of thermal infrared image characteristic layers;
S202, superposing all visible light image feature layers, superposing all thermal infrared image feature layers, respectively inputting the feature layers into a convolution network for feature extraction, and generating a visible light feature map and a thermal infrared feature map.
Further, in step S202, the convolution network includes a convolution layer, a spatial pyramid pooling layer SPP, and a residual block layer.
Further, the specific process of step S3 is as follows:
s301, obtaining a visible light characteristic diagram And thermal infrared signature/>Performing an inner product operation to obtain a first feature mapThen performing a spatial attention operation to obtain a second feature map/>
S302, obtaining a visible light characteristic diagramAnd thermal infrared signature/>Performing addition operation to obtain a third feature mapThen, a convolution operation is carried out to obtain a fourth characteristic diagram/>
S303, combining the second feature mapAnd fourth feature map/>Performing channel self-attention operation to generate class activation graphs
Further, in step S301, the procedure of the spatial attention operation is as follows:
For the first characteristic diagram Respectively carrying out maximum pooling and average pooling, and then respectively carrying out convolution operation;
splicing the results obtained by the convolution operation by using an activation function to obtain a second feature map
Further, in step S303, the process of the channel self-attention operation is as follows:
for the second characteristic diagram And fourth feature map/>After the inner product operation is carried out, respectively carrying out maximum pooling and average pooling;
And respectively weighting the maximum pooling and average pooling results, and then splicing through an activation function.
Further, the activation function is a Sigmoid activation function.
The invention also provides an electronic device comprising a memory, a processor and a program stored in the memory, wherein the processor implements the method according to any one of the above when executing the program.
The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method as described in any of the above.
Compared with the prior art, the invention has the following beneficial effects:
1. According to the invention, the preprocessed visible light image and the preprocessed thermal infrared image are respectively subjected to multi-scale feature extraction to generate a plurality of visible light feature images and a plurality of thermal infrared feature images, and then the visible light feature images and the thermal infrared feature images are subjected to weighted fusion to obtain the similar activation images, so that the pedestrian feature information in the visible light feature images and the thermal infrared feature images can be effectively highlighted without losing information, more details and features in the captured image data are facilitated, and the accuracy of pedestrian detection results is improved.
2. According to the invention, the preprocessed visible light image and the preprocessed thermal infrared image are respectively subjected to multi-scale feature extraction, firstly, the isolated pixels are sampled in the horizontal direction and the vertical direction, then, the images are overlapped, and the images are input into a convolution network for feature extraction for many times, and a spatial pyramid pooling layer SPP is used in the convolution network, so that targets with different sizes can be better processed without obviously increasing the size of the network, the model training speed is improved, and the real-time performance of pedestrian detection is further improved.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a pedestrian detection model structure based on image fusion;
FIG. 3 is a schematic diagram of a CAM activation module.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.
Examples:
The embodiment firstly builds a pedestrian detection model based on image fusion as shown in fig. 2, and comprises a CSPDARKNET module, a CAM activation module, a feature pyramid network and a detection probe, wherein the CSPDARKNET module is used for respectively carrying out multi-scale feature extraction on a visible light image and a thermal infrared image to obtain a visible light feature map and a thermal infrared feature map; the CAM activation module is used for carrying out weighted fusion on the visible light characteristic diagram and the thermal infrared characteristic diagram to obtain a class activation diagram; the feature pyramid network is used for carrying out multi-scale feature fusion on the class activation graph to generate a fusion feature graph; the detection probe is used for executing detection tasks on the fusion feature map and finally outputting detection results of pedestrian detection.
Based on the pedestrian detection model based on image fusion, the embodiment provides a pedestrian detection method based on image fusion, as shown in fig. 1, comprising the following steps:
S1, acquiring a real-time visible light image and a thermal infrared image and preprocessing.
The pretreatment process comprises the following steps:
unifying the pixel size of each image data to 320×240 pixels, wherein the format is json format;
the filtering noise reduction and the image enhancement are carried out, and common noise reduction modes include mean filtering, gaussian filtering, median filtering and the like, and the image enhancement modes include image enhancement based on a Laplace operator, image enhancement based on logarithmic Log transformation, image enhancement based on the Laplace operator and the like.
S2, respectively inputting the preprocessed visible light images and the preprocessed thermal infrared images into a CSPDARKNET module for multi-scale feature extraction to generate a plurality of visible light feature images and a plurality of thermal infrared feature images, wherein the specific process is as follows:
S201, acquiring preprocessed visible light images and preprocessed thermal infrared images, inputting Focus layers, sampling isolated pixels in the horizontal direction and the vertical direction respectively, generating visible light image characteristic layers and thermal infrared image characteristic layers, recombining each input image into four characteristic layers, and then overlapping the four characteristic layers together, so that the number of input channels is expanded by four times, and the number of the overlapped characteristic layers is increased to 12 channels compared with the original 3-channel input;
S202, inputting the result of the step S201 into a convolution network for three times to extract features, wherein the convolution network comprises a convolution layer, a spatial pyramid pooling layer SPP and a plurality of residual block layers. In the convolution layer, the convolution kernel size is 3 multiplied by 3, the step length is 2, and the filling is 1; the space pyramid pooling layer SPP has three layers, and the convolution kernel sizes are 5×5,7×7 and 9×9 respectively; the residual block layer makes a 1x1 convolution and a3 x3 convolution on the image. After three times of feature extraction, as shown in fig. 2, a first visible light feature map, a second visible light feature map, a third visible light feature map, and a first thermal infrared feature map, a second thermal infrared feature map, and a third thermal infrared feature map are generated. The SPP of the spatial pyramid pooling layer can better process targets with different sizes without obviously increasing the size of a network, so that the model training speed is improved, and the real-time performance of pedestrian detection is further improved
And S3, inputting the visible light characteristic diagram and the thermal infrared characteristic diagram into a CAM activation module for weighted fusion, and obtaining a corresponding class activation diagram.
The CAM activation module can enhance the representation of the feature map, integrate the feature information of the visible light and thermodynamic diagram module, help to capture more details and features, and emphasize that the intra-modal features are not lost due to the inter-modal complementarity. The CAM activation block structure is shown in FIG. 3. First, for visible light characteristic diagramAnd thermal infrared signature/>Performing an inner product operation to obtain a first feature map/>Then performing a spatial attention operation to obtain a second feature map/>For visible light characteristic map/>And thermal infrared signature/>Performing addition operation to obtain a third feature map/>Then, a convolution operation is carried out to obtain a fourth characteristic diagram/>Then, the second feature map/>And fourth feature map/>Performing channel self-attention operation to generate class activation diagram/>The formula of the whole process is expressed as follows:
Wherein CSA is a channel self-attention operation, SA is a spatial attention operation, and conv is a convolution operation.
The specific formula of the spatial attention operation SA is as follows:
Wherein, F avg is average pooling, F max is maximum pooling, F 7x7 is convolution with convolution kernel size of 7x7, sigma is activation function, which is used to splice the results obtained by convolution operation.
The specific formula of the channel self-attention operation CSA is as follows:
Where F avg is mean pooling, F max is maximum pooling, W 1 and W 0 are trainable weight matrices, and σ is a Sigmoid activation function.
Through step S3, a first class activation diagram, a second class activation diagram and a third class activation diagram are obtained, the class activation diagram is a special convolutional neural network structure to generate a visual thermodynamic diagram, the pedestrian characteristic information in the visible light characteristic diagram and the thermal infrared characteristic diagram is effectively highlighted, meanwhile, the information is not lost, more details and characteristics in the captured image data are facilitated, and the accuracy of the pedestrian detection result is improved.
And S4, inputting each class of activation graphs into a feature pyramid network to perform multi-scale feature fusion, and generating a fusion feature graph.
The first class activation diagram, the second class activation diagram and the third class activation diagram are input into a feature pyramid network (YOLOXPAFPN), the network comprises four up-sampling processes and four down-sampling processes, three feature diagrams are respectively output after the second, third and fourth down-sampling is finished, the up-sampling convolution kernel size is 3x3, and the step length is 2, so that multi-scale feature fusion is carried out.
And S5, executing a detection task on the fusion feature map, and outputting a pedestrian detection result, wherein the detection task comprises pedestrian prediction boundary box regression and pedestrian prediction boundary box object selection classification.
The present embodiment selects YOLOXHead detector a detection head that uses a1 x1 convolution to reduce the dimension of the feature map for different channel numbers to a uniform channel number, which helps to unify the dimensions of the feature map. Two parallel branches are then used, each of which includes two 3 x 3 convolution kernels, to perform different detection tasks, including pedestrian prediction bounding box regression and pedestrian prediction bounding box object classification, respectively.
In this embodiment, an OSU-CT visible light thermal infrared dataset with a labeling format of Cvml is used to train a pedestrian detection model based on image fusion, first, missing labeled data is screened, and finally 4125 pairs of datasets are provided in total, then, the pixel sizes of the visible light image and the thermal infrared image in the datasets are uniformly adjusted to 320×240 pixels, and the format is uniformly adjusted to a json format input model. After the model executes the detection tasks, the loss of each detection task is calculated respectively, and the parameters of the pedestrian detection model based on image fusion are updated through a back propagation algorithm.
In a preferred embodiment, the loss of pedestrian prediction bounding box regression detection task is calculated using the IOU loss function as follows:
Wherein, box gt and box pre are the actual frame region and the target detection prediction frame region of the target detection frame, respectively.
In a preferred embodiment, the loss of the pedestrian prediction bounding box object classification task is calculated using a cross entropy loss function, the loss weight is 1.0, the regression loss function uses an IOU loss function, its loss weight is 5.0, the learning rate is 0.00001, and the loss rate of the L1 loss function is 1.0.
To verify the performance of the present invention, this example conducted experiments on a public dataset and some of the mainstream methods of pedestrian detection were analyzed and compared at present. The experiments were trained and tested according to the experimental specifications of the corresponding data sets, and the experimental results are shown in table 1.
In Table 1, method 1 uses a visible light single-modality dataset for the Fast-RCnn method, method 2 uses a thermal infrared single-modality dataset for the Fast-RCnn method, and method 3 uses a multi-modality dataset for the YOLOX method. As can be seen from the comparison of the method 1 and the method 2 with the method of the invention, the method of the invention supplements additional information of other modes compared with a single-mode method, and enables the method to have detection capability under different challenging scenes, and as can be seen from the comparison of the method 3 with the method of the invention, the CAM (Class Activation Map) of the method of the invention activates the module, so that the pedestrian characteristic information is effectively highlighted, and meanwhile, the information in each mode is not lost, and a more excellent detection effect is obtained.
TABLE 1 OSU Experimental data on CT dataset
AP50 | AP75 | mAP | |
Method 1 | 75.4 | 36.2 | 37.8 |
Method 2 | 66.3 | 21.2 | 30.6 |
Method 3 | 84.2 | 36.3 | 41.8 |
The method of the invention | 98.6 | 59.3 | 57.6 |
The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention thereto. The invention also comprises a technical scheme which is formed by any combination of the technical characteristics.
The previous description of the embodiments is provided to facilitate a person of ordinary skill in the art in order to make and use the present invention. It will be apparent to those skilled in the art that various modifications can be readily made to these embodiments and the generic principles described herein may be applied to other embodiments without the use of the inventive faculty. Therefore, the present invention is not limited to the above-described embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications without departing from the scope of the present invention.
Claims (9)
1. The pedestrian detection method based on image fusion is characterized by comprising the following steps of:
S1, acquiring a real-time visible light image and a thermal infrared image and preprocessing the images;
s2, respectively carrying out multi-scale feature extraction on the preprocessed visible light image and the preprocessed thermal infrared image for a plurality of times to generate a plurality of visible light feature images and a plurality of thermal infrared feature images;
S3, carrying out weighted fusion on the visible light characteristic diagram and the thermal infrared characteristic diagram to obtain a class activation diagram;
S4, inputting the class activation diagram into a feature pyramid network to perform multi-scale feature fusion, and generating a fusion feature diagram;
S5, executing detection tasks on the fusion feature map, and outputting pedestrian detection results, wherein the detection tasks comprise pedestrian prediction boundary box regression and pedestrian prediction boundary box object selection classification;
The specific process of step S3 is as follows:
S301, obtaining the visible light characteristic diagram and the thermal infrared characteristic diagram, and performing inner product operation to obtain a first characteristic diagram And then for the first feature map/>Performing a spatial attention operation to obtain a second feature map/>
S302, obtaining the visible light characteristic diagram and the thermal infrared characteristic diagram, and performing addition operation to obtain a third characteristic diagramAnd then for the third feature map/>Performing convolution operation to obtain fourth feature map/>
S303, comparing the second feature mapAnd the fourth feature map/>And performing channel self-attention operation to generate a class activation graph.
2. The pedestrian detection method based on image fusion according to claim 1, wherein in step S1, the preprocessing includes:
unifying the pixel size and format of each image data;
filtering noise reduction and image enhancement are performed.
3. The pedestrian detection method based on image fusion according to claim 1, wherein in step S2, the specific process of multi-scale feature extraction is as follows:
s201, acquiring a preprocessed visible light image and a preprocessed thermal infrared image, sampling isolation pixels in the horizontal direction and the vertical direction, and generating a plurality of visible light image characteristic layers and a plurality of thermal infrared image characteristic layers;
S202, superposing all visible light image feature layers, superposing all thermal infrared image feature layers, respectively inputting the feature layers into a convolution network for feature extraction, and generating a visible light feature map and a thermal infrared feature map.
4. A pedestrian detection method based on image fusion according to claim 3, wherein in step S202, the convolution network comprises a convolution layer, a spatial pyramid pooling layer SPP, and a residual block layer.
5. The pedestrian detection method based on image fusion according to claim 1, wherein in step S301, the spatial attention operation is performed as follows:
For the first characteristic diagram Respectively carrying out maximum pooling and average pooling, and then respectively carrying out convolution operation;
splicing the results obtained by the convolution operation by using an activation function to obtain a second feature map
6. The pedestrian detection method based on image fusion according to claim 1, wherein in step S303, the process of the channel self-attention operation is as follows:
for the second characteristic diagram And fourth feature map/>After the inner product operation is carried out, respectively carrying out maximum pooling and average pooling;
And respectively weighting the maximum pooling and average pooling results, and then splicing through an activation function.
7. The pedestrian detection method based on image fusion of claim 6, wherein the activation function is a Sigmoid activation function.
8. An electronic device comprising a memory, a processor, and a program stored in the memory, wherein the processor implements the method of any of claims 1-7 when executing the program.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311704548.XA CN117690161B (en) | 2023-12-12 | 2023-12-12 | Pedestrian detection method, device and medium based on image fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311704548.XA CN117690161B (en) | 2023-12-12 | 2023-12-12 | Pedestrian detection method, device and medium based on image fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117690161A CN117690161A (en) | 2024-03-12 |
CN117690161B true CN117690161B (en) | 2024-06-04 |
Family
ID=90125990
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311704548.XA Active CN117690161B (en) | 2023-12-12 | 2023-12-12 | Pedestrian detection method, device and medium based on image fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117690161B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118314490B (en) * | 2024-06-11 | 2024-09-17 | 合肥工业大学 | Air-space-ground multi-scale re-decision method and system for ultra-high voltage transformer substation |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101908481B1 (en) * | 2017-07-24 | 2018-12-10 | 동국대학교 산학협력단 | Device and method for pedestraian detection |
CN113139542A (en) * | 2021-04-28 | 2021-07-20 | 北京百度网讯科技有限公司 | Target detection method, device, equipment and computer readable storage medium |
CN116452937A (en) * | 2023-04-25 | 2023-07-18 | 重庆邮电大学 | Multi-mode characteristic target detection method based on dynamic convolution and attention mechanism |
CN116580425A (en) * | 2023-05-12 | 2023-08-11 | 浙江工业大学 | Multispectral pedestrian detection method based on cross-transducer fusion |
CN116645696A (en) * | 2023-05-31 | 2023-08-25 | 长春理工大学重庆研究院 | Contour information guiding feature detection method for multi-mode pedestrian detection |
CN117132759A (en) * | 2023-08-02 | 2023-11-28 | 上海无线电设备研究所 | Saliency target detection method based on multiband visual image perception and fusion |
-
2023
- 2023-12-12 CN CN202311704548.XA patent/CN117690161B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101908481B1 (en) * | 2017-07-24 | 2018-12-10 | 동국대학교 산학협력단 | Device and method for pedestraian detection |
CN113139542A (en) * | 2021-04-28 | 2021-07-20 | 北京百度网讯科技有限公司 | Target detection method, device, equipment and computer readable storage medium |
CN116452937A (en) * | 2023-04-25 | 2023-07-18 | 重庆邮电大学 | Multi-mode characteristic target detection method based on dynamic convolution and attention mechanism |
CN116580425A (en) * | 2023-05-12 | 2023-08-11 | 浙江工业大学 | Multispectral pedestrian detection method based on cross-transducer fusion |
CN116645696A (en) * | 2023-05-31 | 2023-08-25 | 长春理工大学重庆研究院 | Contour information guiding feature detection method for multi-mode pedestrian detection |
CN117132759A (en) * | 2023-08-02 | 2023-11-28 | 上海无线电设备研究所 | Saliency target detection method based on multiband visual image perception and fusion |
Also Published As
Publication number | Publication date |
---|---|
CN117690161A (en) | 2024-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN112507997B (en) | Face super-resolution system based on multi-scale convolution and receptive field feature fusion | |
CN113052210B (en) | Rapid low-light target detection method based on convolutional neural network | |
CN110097129B (en) | Remote sensing target detection method based on profile wave grouping characteristic pyramid convolution | |
CN110956126B (en) | Small target detection method combined with super-resolution reconstruction | |
Cui et al. | Improved swin transformer-based semantic segmentation of postearthquake dense buildings in urban areas using remote sensing images | |
CN110222604B (en) | Target identification method and device based on shared convolutional neural network | |
Chen et al. | Remote sensing image quality evaluation based on deep support value learning networks | |
CN115205147A (en) | Multi-scale optimization low-illumination image enhancement method based on Transformer | |
CN116958782A (en) | Method and device for detecting weak and small targets by combining infrared and visible light characteristics | |
CN117690161B (en) | Pedestrian detection method, device and medium based on image fusion | |
CN116580425A (en) | Multispectral pedestrian detection method based on cross-transducer fusion | |
CN114937206A (en) | Hyperspectral image target detection method based on transfer learning and semantic segmentation | |
CN113076884A (en) | Cross-mode eye state identification method from near infrared light to visible light | |
CN114926734B (en) | Solid waste detection device and method based on feature aggregation and attention fusion | |
Zhao et al. | Deep learning-based laser and infrared composite imaging for armor target identification and segmentation in complex battlefield environments | |
Zhang et al. | Enhanced visual perception for underwater images based on multistage generative adversarial network | |
CN114218999A (en) | Millimeter wave radar target detection method and system based on fusion image characteristics | |
CN113191237A (en) | Improved YOLOv 3-based fruit tree image small target detection method and device | |
Rout et al. | S2a: Wasserstein gan with spatio-spectral laplacian attention for multi-spectral band synthesis | |
CN116823610A (en) | Deep learning-based underwater image super-resolution generation method and system | |
CN117671540A (en) | Method and system for detecting small target of attention aerial image based on multispectral frequency channel | |
CN113420660B (en) | Infrared image target detection model construction method, prediction method and system | |
CN113920455B (en) | Night video coloring method based on deep neural network | |
CN115984568A (en) | Target detection method in haze environment based on YOLOv3 network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |