CN117765359A

CN117765359A - Battlefield target detection system and method based on fusion of visible light and infrared image

Info

Publication number: CN117765359A
Application number: CN202311595532.XA
Authority: CN
Inventors: 常天庆; 张�杰; 赵立阳; 张雷; 郭理彬; 韩斌; 罗鑫
Original assignee: Academy of Armored Forces of PLA
Current assignee: Academy of Armored Forces of PLA
Priority date: 2023-11-27
Filing date: 2023-11-27
Publication date: 2024-03-26

Abstract

The invention relates to a battlefield target detection system based on visible light and infrared image fusion, which comprises a two-way feature extraction network, a feature alignment module, a temperature difference sensing module, a feature fusion module, a neck network and a task head network, wherein the two-way feature extraction network is used for respectively extracting features of a visible light image and features of an infrared image; the feature alignment module is used for carrying out space alignment on the features of the visible light image and the infrared image; the temperature difference sensing module is used for obtaining a predicted temperature difference mask; the feature fusion module is used for generating a fusion feature map; the neck network and the task head network are used for completing target category prediction and target position prediction through convolution processing according to the fusion feature map provided by the neck network so as to obtain a final prediction result. The invention can better cope with complex ground battlefield environment, eliminate the adverse effect of various interference factors, improve the detection precision and efficiency and provide reliable basis for command decision in battlefield.

Description

Battlefield target detection system and method based on fusion of visible light and infrared image

Technical Field

The invention relates to the technical field of military target detection, in particular to a battlefield target detection system and method based on visible light and infrared image fusion.

Background

With the continuous development of military technologies, battlefield target detection technologies are also becoming more and more widely used. For example, in war, the army can determine information such as force of an enemy, military equipment and the like by identifying an enemy target; in anti-terrorist actions, the police may track and fight terrorists by identifying suspected targets; in maritime patrol, sea polices can strike pirates and illegal fishing boats, etc. by identifying suspicious ships. With the development of battlefield target detection technology, the method not only can help the army to win the battlefield, but also can help the police to fight terrorism and maintain social stability. The battlefield target detection technology is used for accurately identifying and judging targets on the ground, sea or in the air through a specific technical means, so that accurate guidance and guarantee are provided for military operations. However, the actual battlefield environment is usually complex, and various interference factors are present when the targets of the battlefield are detected, such as heavy fog weather, night combat, etc., which results in unclear target features and inaccurate target positions detected by the informatization technology means, so that the judgment of the target conditions in the battlefield is affected, and reliable basis cannot be provided for the war command decision.

Disclosure of Invention

The invention aims to provide a battlefield target detection system and method based on fusion of visible light and infrared images, which solve the defects in the prior art.

The battlefield target detection system based on the fusion of the visible light and the infrared image comprises a two-way feature extraction network, a feature alignment module, a temperature difference sensing module, a feature fusion module, a neck network and a task head network, wherein the two-way feature extraction network is used for respectively extracting features of the visible light image and obtaining a visible light feature map, extracting features of the infrared image and obtaining an infrared feature map; the characteristic alignment module is used for carrying out space alignment on the characteristics of the visible light image and the infrared image; the temperature difference sensing module is used for carrying out convolution processing on the infrared characteristic map to obtain a predicted temperature difference mask; the feature fusion module is used for fusing the visible light feature map and the infrared feature map by combining the temperature difference mask and generating a fused feature map; the neck network is used for further fusing the deep fusion feature map and the shallow fusion feature map, and the task head network is used for completing target category prediction and target position prediction through convolution processing according to the fusion feature map provided by the neck network so as to obtain a final prediction result.

Preferably, the two-way feature extraction network comprises a visible light image feature extraction network and an infrared image feature extraction network, wherein the visible light image feature extraction network is used for extracting shallow layer and deep layer features of a visible light image and obtaining a visible light feature map, and the infrared image feature extraction network is used for extracting shallow layer and deep layer features of an infrared image and obtaining an infrared feature map.

Preferably, the neck network adopts an FPN network or a PAN network.

A battlefield target detection method based on visible light and infrared image fusion comprises the following steps:

step one, extracting features; respectively extracting features of the visible light image through a two-way feature extraction module to obtain a visible light feature map, extracting features of the infrared image and obtaining an infrared feature map;

step two, aligning the characteristics; the method comprises the steps that the features of a visible light image and an infrared image are spatially aligned through a feature alignment module;

step three, predicting a temperature difference mask; the temperature difference sensing module is used for carrying out convolution processing on the infrared characteristic map to obtain a predicted temperature difference mask;

step four, feature fusion; fusing the visible light characteristic map and the infrared characteristic map through a characteristic fusion module to generate a primary fused characteristic map, and carrying out characteristic weighting on the primary fused characteristic map and the temperature difference mask predicted value in a mode of multiplying corresponding spatial position characteristics to obtain a final fused characteristic map;

step five, obtaining a final prediction result; and processing the fusion feature map through a neck network and a task head network to obtain target category prediction and target position prediction, and obtaining a final prediction result after target bounding box regression processing.

Preferably, in the second step, the method for spatially aligning the features of the visible light image and the infrared image through the feature alignment module includes that the visible light image features and the infrared image features are subjected to channel stitching to obtain a stitching feature map; the method comprises the steps that after a convolution block is carried out on a spliced feature map, offset parameters of feasible convolution are obtained, and the offset parameters represent the size of a feasible convolution kernel; the infrared characteristic diagram is subjected to feasible convolution to obtain the Ji Gong external characteristic diagram.

Preferably, in the third step, the infrared feature map is subjected to a convolution block of the temperature difference sensing module to obtain a predicted temperature difference mask; in the convolution training process, respectively extracting a target area image, a target and a background area image, and calculating the mean value and variance of area pixel points to obtain the mean value u of the target area image ₁ Sum of variances sigma ₁ Average u of target and background area images ₂ Sum of variances sigma ₂ Constructing a target area single Gaussian model N ₁ (u ₁ ,σ ₁ ) And objects andbackground area single Gaussian model N ₂ (u ₂ ,σ ₂ ) The method comprises the steps of carrying out a first treatment on the surface of the Calculation of KL divergence D of two Single Gaussian models _KL (N ₁ (u ₁ ,σ ₁ )||N ₂ (u ₂ ,σ ₂ ) A) is provided; calculating a label value of a temperature difference mask in a target area, wherein the predicted value of the temperature difference mask in the target area is calculated by the following wayWherein alpha is a super parameter, D _KL ∈[0,+∞)，T _label ∈[0,1)。

Preferably, in the fourth step, the calculation formula of the fusion feature map is F _fus ＝(F _T +F _I )·e ^β*T Wherein F is _fus Representing a fused feature map, F _T Representing visible light characteristic diagram, F _I And (3) representing an infrared characteristic diagram, wherein T represents a predicted value of a temperature difference mask, and beta is a super parameter for controlling the characteristic enhancement degree.

Preferably, in the fifth step, the target bounding box regression processing method includes: providing pairs of visible and infrared images; taking a visible light image as a main state, taking an infrared image as an auxiliary state, and taking labeling information of the visible light image as label information; obtaining a temperature difference mask predicted value, a category predicted cls_prediction and a target space position predicted reg_prediction, and calculating a total loss by combining label information, wherein a calculation formula is loss=loss_t+loss_cls+loss_reg, and the loss_t represents the temperature difference mask loss, and the calculation formula is as follows: loss_t= (1-T) _label )log(1-T)+T _label log (T), loss_cls represents class loss, loss_reg represents target location regression loss, loss_cls uses cross entropy loss function, loss_reg uses IOU loss, and after total loss is propagated backward, the network weight parameters are updated.

Preferably, in the first step, the visible light image feature is extracted through the visible light image feature extraction network in the two-way feature extraction network, and three levels of visible light feature images with different scales are obtained respectively, and the infrared image feature is extracted through the infrared image feature extraction network in the two-way feature extraction module, and three levels of infrared feature images with different scales are obtained respectively.

Preferably, in the second step, the features of the visible light feature map and the infrared feature map of the same level of the two-way feature extraction network are aligned by a feature alignment module, so as to obtain three aligned infrared feature maps taking the visible light feature map as a reference; step three, after the three aligned infrared feature images are processed by a temperature difference sensing module, predicted values of three temperature difference masks are respectively obtained; in the fourth step, the predicted values of the Ji Gong external feature map, the visible light feature map and the temperature difference mask are processed through a feature fusion module to obtain three-level primary fusion feature maps, and the final fusion feature map is obtained after the temperature difference mask is combined.

According to the battlefield target detection system based on the fusion of the visible light and the infrared image, provided by the invention, the characteristics of the visible light image and the infrared image are extracted through the convolution block, and the alignment and fusion of the two characteristic spaces are further completed, so that the target detection precision under the complex ground battlefield environment is improved. The battlefield target detection system can better cope with complex ground battlefield environments, eliminates the adverse effects of various interference factors, has all-weather target detection capability, can automatically perform space alignment on multispectral image features, senses the temperature difference of different objects in an infrared image, further improves detection precision and efficiency through fusion features, and provides reliable basis for command decisions in a battlefield.

Drawings

FIG. 1 is a block diagram of a battlefield target detection system of the present invention;

FIG. 2 is a block diagram of the workflow of the feature alignment module of the present invention;

FIG. 3 is a block diagram of the operation of the temperature difference sensing module according to the present invention;

FIG. 4 is a block diagram of the workflow of the feature fusion module of the present invention.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.

Example 1:

referring to fig. 1, a battlefield object detection system based on fusion of visible light and infrared images is improved in that: the system comprises a double-path feature extraction network, a feature alignment module, a temperature difference sensing module, a feature fusion module, a neck network and a task head network, wherein the double-path feature extraction network is used for respectively extracting features of a visible light image and obtaining a visible light feature map, extracting features of an infrared image and obtaining an infrared feature map; the characteristic alignment module is used for carrying out space alignment on the characteristics of the visible light image and the infrared image; the temperature difference sensing module is used for carrying out convolution processing on the infrared characteristic map to obtain a predicted temperature difference mask; the feature fusion module is used for fusing the visible light feature map and the infrared feature map by combining the temperature difference mask and generating a fused feature map; the neck network is used for further fusing the deep fusion feature map and the shallow fusion feature map, and the task head network is used for completing target category prediction and target position prediction through convolution processing according to the fusion feature map provided by the neck network so as to obtain a final prediction result.

Further, the two-way feature extraction network comprises a visible light image feature extraction network and an infrared image feature extraction network, wherein the visible light image feature extraction network is used for extracting shallow layer and deep layer features of a visible light image and obtaining a visible light feature map, and the infrared image feature extraction network is used for extracting shallow layer and deep layer features of an infrared image and obtaining an infrared feature map. The network structures of the visible light image feature extraction network and the infrared image feature extraction network are the same, and the shallow layer features and the deep layer features of the visible light image and the infrared image are respectively extracted.

Further, the two-way feature extraction network adopts a resnet series feature extraction network or a dark net series feature extraction network.

Further, the neck network adopts an FPN network or a PAN network. The neck network fuses the deep fusion feature map and the shallow fusion feature map, so that the three feature maps sent into the task head network have shallow texture features and deep semantic features at the same time.

Further, the task head network is comprised of a plurality of convolutions.

According to the battlefield target detection system based on the fusion of the visible light and the infrared image, provided by the embodiment, the characteristics of the visible light image and the infrared image are extracted through the convolution block, the alignment and fusion of the two characteristic spaces are further completed, and the target detection precision under the complex ground battlefield environment is improved. The battlefield target detection system in the embodiment can better cope with complex ground battlefield environments, eliminates the adverse effects of various interference factors, has all-weather target detection capability, can automatically perform spatial alignment on multispectral image features, senses the temperature difference of different objects in infrared images, further improves detection precision and efficiency through fusion features, and provides reliable basis for command decisions in battlefields.

Example 2:

referring to fig. 1, a battlefield target detection method based on fusion of visible light and infrared images is improved in that: the method comprises the following steps:

In the second step, the method for spatially aligning the features of the visible light image and the infrared image through the feature alignment module comprises the steps of performing channel stitching on the features of the visible light image and the features of the infrared image to obtain a stitched feature map; the method comprises the steps that after a convolution block is carried out on a spliced feature map, offset parameters of feasible convolution are obtained, and the offset parameters represent the size of a feasible convolution kernel; the infrared characteristic diagram is subjected to feasible convolution to obtain the Ji Gong external characteristic diagram.

Further, referring to fig. 3, in step three, the infrared feature map is subjected to a convolution block of the temperature difference sensing module to obtain a predicted temperature difference mask; in the convolution training process, respectively extracting a target area image, a target and a background area image, and calculating the mean value and variance of area pixel points to obtain the mean value u of the target area image ₁ Sum of variances sigma ₁ Average u of target and background area images ₂ Sum of variances sigma ₂ Constructing a target area single Gaussian model N ₁ (u ₁ ,σ ₁ ) And target and background region single Gaussian model N ₂ (u ₂ ,σ ₂ ) The method comprises the steps of carrying out a first treatment on the surface of the Calculation of KL divergence D of two Single Gaussian models _KL (N ₁ (u ₁ ,σ ₁ )||N ₂ (u ₂ ,σ ₂ ) A) is provided; calculating a label value of a temperature difference mask in a target area, wherein the predicted value of the temperature difference mask in the target area is calculated by the following wayWherein alpha is a super parameter, D _KL ∈[0,+∞)，T _label E [0, 1). The predicted temperature difference mask represents the temperature difference between the region represented by the current feature point and the surrounding region. The larger the temperature difference mask value, the greater the temperature difference between the region and the surrounding region, and the greater the likelihood that a target is present.

Further, referring to fig. 4, in the fourth step, the calculation formula of the fusion feature map is F _fus ＝(F _T +F _I )·e ^β*T Wherein F is _fus Representing a fused feature map, F _T Representing visible light characteristic diagram, F _I And (3) representing an infrared characteristic diagram, wherein T represents a predicted value of a temperature difference mask, and beta is a super parameter for controlling the characteristic enhancement degree. In the feature fusion module, fusion is carried out in a feature map adding mode, and complementation of different features can be achieved. The temperature difference mask is used as a space attention, acts on the fusion characteristic diagram, gives different weights to fusion characteristics at different positions in the fusion process, and enhances the characteristics of important areas

Further, in the fifth step, the target bounding box regression processing method is as follows: providing pairs of visible and infrared images; taking a visible light image as a main state, taking an infrared image as an auxiliary state, and taking labeling information of the visible light image as label information; obtaining a temperature difference mask predicted value, a category predicted cls_prediction and a target space position predicted reg_prediction, and calculating a total loss by combining label information, wherein a calculation formula is loss=loss_t+loss_cls+loss_reg, and the loss_t represents the temperature difference mask loss, and the calculation formula is as follows: loss_t= (1-T) _label )log(1-T)+T _label log (T), loss_cls represents class loss, loss_reg represents target location regression loss, loss_cls uses cross entropy loss function, loss_reg uses IOU loss, and after total loss is propagated backward, the network weight parameters are updated.

The battlefield target detection method based on the fusion of the visible light and the infrared image can better cope with complex ground battlefield environments, eliminates adverse effects of various interference factors, has all-weather target detection capability, can automatically perform space alignment on multispectral image features, senses temperature differences of different objects in the infrared image, further improves detection precision and efficiency through the fusion features, and provides reliable basis for command decisions in battlefields.

Example 3:

taking the dimensions of the visible light image and the infrared image as (W, H, 3) as an example on the basis of embodiment 2, the battlefield target detection method comprises the following steps:

step one, extracting features; the sizes of the visible light image and the infrared image are (W, H, 3), and the two-way feature extraction network respectively extracts the features of the visible light image and the infrared image to obtain three layers of feature images with different scales; the visible light image feature extraction network obtains visible light feature graphs F_V1, F_V2 and F_V3, and the scales are (W/8,H/8, C), (W/16, H/16, C), (W/32, H/32, C) respectively; the infrared image feature extraction network obtains infrared feature graphs F_T1, F_T2 and F_T3, and the scales are (W/8,H/8, C), (W/16, H/16, C), (W/32, H/32, C) respectively.

Step two, aligning the characteristics; the visible light characteristic diagram and the infrared characteristic diagram of the same level obtained by the two-way characteristic extraction network are processed by a characteristic alignment module to obtain a pair Ji Gong external characteristic diagram taking the visible light characteristic diagram as a reference; in total, three levels of pairs Ji Gong of external feature maps F_A_T1, F_A_T2 and F_A_T3 were obtained, with scales (W/8,H/8, C), (W/16, H/16, C), (W/32, H/32, C), respectively.

Step three, predicting a temperature difference mask; after the external feature maps F_A_T1, F_A_T2 and F_A_T3 of Ji Gong are subjected to a temperature difference sensing module, predicted values T1, T2 and T3 of a temperature difference mask are obtained, and the scales are (W/8,H/8, 1), (W/16, H/16, 1), (W/32, H/32, 1) respectively

Step four, feature fusion; obtaining three levels of fusion characteristic graphs FUS1, FUS2 and FUS3 of the characteristic graphs outside Ji Gong, the visible light characteristic graphs and the predicted values of the temperature difference masks through a characteristic fusion module, wherein the scales are (W/8,H/8, 1), (W/16, H/16, 1), (W/32, H/32, 1) respectively; the fusion process is shown in fig. 4, firstly, the visible light feature map and the Ji Gong external feature map are fused in a corresponding pixel addition mode to obtain a primary fusion feature map, and then feature weighting is carried out on the primary fusion feature map and the temperature difference mask predicted value in a corresponding spatial position feature multiplication mode to obtain final fusion feature maps FUS1, FUS2 and FUS3.

Fifthly, predicting a target category and predicting a target position; the three-level fusion feature graphs FUS1, FUS2 and FUS3 are obtained through a neck network and a task head network, class prediction cls_prediction and target position prediction reg_prediction are obtained through post-processing, and a final prediction result is obtained.

It should be noted that the foregoing detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is intended to include the plural unless the context clearly indicates otherwise. Furthermore, it will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, steps, operations, devices, components, and/or groups thereof.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Spatially relative terms, such as "above … …," "above … …," "upper surface at … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial location relative to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "above" or "over" other devices or structures would then be oriented "below" or "beneath" the other devices or structures. Thus, the exemplary term "above … …" may include both orientations of "above … …" and "below … …". The device may also be positioned in other different ways, such as rotated 90 degrees or at other orientations, and the spatially relative descriptors used herein interpreted accordingly.

In the above detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, like numerals typically identify like components unless context indicates otherwise. The illustrated embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A battlefield target detection system based on visible light and infrared image fusion is characterized in that: the system comprises a double-path feature extraction network, a feature alignment module, a temperature difference sensing module, a feature fusion module, a neck network and a task head network, wherein the double-path feature extraction network is used for respectively extracting features of a visible light image and obtaining a visible light feature map, extracting features of an infrared image and obtaining an infrared feature map; the characteristic alignment module is used for carrying out space alignment on the characteristics of the visible light image and the infrared image; the temperature difference sensing module is used for carrying out convolution processing on the infrared characteristic map to obtain a predicted temperature difference mask; the feature fusion module is used for fusing the visible light feature map and the infrared feature map by combining the temperature difference mask and generating a fused feature map; the neck network is used for further fusing the deep fusion feature map and the shallow fusion feature map, and the task head network is used for completing target category prediction and target position prediction through convolution processing according to the fusion feature map provided by the neck network so as to obtain a final prediction result.

2. The battlefield target detection system according to claim 1, wherein: the two-way feature extraction network comprises a visible light image feature extraction network and an infrared image feature extraction network, wherein the visible light image feature extraction network is used for extracting shallow layer and deep layer features of a visible light image and obtaining a visible light feature map, and the infrared image feature extraction network is used for extracting the shallow layer and deep layer features of an infrared image and obtaining an infrared feature map.

3. The battlefield target detection system according to claim 1, wherein: the neck network adopts an FPN network or a PAN network.

4. A battlefield target detection method based on visible light and infrared image fusion is characterized by comprising the following steps of: the method comprises the following steps:

5. The battlefield target detection method of claim 4, wherein: in the second step, the method for spatially aligning the features of the visible light image and the infrared image through the feature alignment module comprises the steps of performing channel stitching on the features of the visible light image and the features of the infrared image to obtain a stitched feature map; the method comprises the steps that after a convolution block is carried out on a spliced feature map, offset parameters of feasible convolution are obtained, and the offset parameters represent the size of a feasible convolution kernel; the infrared characteristic diagram is subjected to feasible convolution to obtain the Ji Gong external characteristic diagram.

6. The battlefield target detection method of claim 5, wherein: step three, obtaining a predicted temperature difference mask after the infrared feature map passes through a convolution block of the temperature difference sensing module; in the convolution training process, respectively extracting a target area image, a target and a background area image, and calculating the mean value and variance of area pixel points to obtain the mean value u of the target area image ₁ Sum of variances sigma ₁ Average u of target and background area images ₂ Sum of variances sigma ₂ Constructing a target area single Gaussian model N ₁ (u ₁ ,σ ₁ ) And target and background region single Gaussian model N ₂ (u ₂ ,σ ₂ ) The method comprises the steps of carrying out a first treatment on the surface of the Calculation of KL divergence D of two Single Gaussian models _KL (N ₁ (u ₁ ,σ ₁ )||N ₂ (u ₂ ,σ ₂ ) A) is provided; calculating a label value of a temperature difference mask in a target area, wherein the predicted value of the temperature difference mask in the target area is calculated by the following wayWherein alpha is a super parameter, D _KL ∈[0,+∞)，T _label ∈[0,1)。

7. The battlefield target detection method of claim 6, wherein: in the fourth step, the calculation formula of the fusion feature map is F _fus ＝(F _T +F _I )·e ^β*T Wherein F is _fus Representing a fused feature map, F _T Representing visible light characteristic diagram, F _I And (3) representing an infrared characteristic diagram, wherein T represents a predicted value of a temperature difference mask, and beta is a super parameter for controlling the characteristic enhancement degree.

8. The battlefield target detection method of claim 7, wherein: in the fifth step, the target bounding box regression processing method comprises the following steps: providing pairs of visible and infrared images; taking a visible light image as a main state, taking an infrared image as an auxiliary state, and taking labeling information of the visible light image as label information; obtaining a temperature difference mask predicted value, a category predicted cls_prediction and a target space position predicted reg_prediction, and calculating a total loss by combining label information, wherein a calculation formula is loss=loss_t+loss_cls+loss_reg, and the loss_t represents the temperature difference mask loss, and the calculation formula is as follows: loss_t= (1-T) _label )log(1-T)+T _label log (T), loss_cls represents class loss, loss_reg represents target location regression loss, loss_cls uses cross entropy loss function, loss_reg uses IOU loss, and after total loss is propagated backward, the network weight parameters are updated.

9. The battlefield target detection method of claim 4, wherein: in the first step, the visible light image features are extracted through the visible light image feature extraction network in the two-way feature extraction network, three levels of visible light feature images with different scales are obtained respectively, the infrared image features are extracted through the infrared image feature extraction network in the two-way feature extraction module, and three levels of infrared feature images with different scales are obtained respectively.

10. The battlefield target detection method according to claim 9, wherein: in the second step, the features of the visible light feature map and the infrared feature map of the same level of the two-way feature extraction network are aligned through a feature alignment module, so that three aligned infrared feature maps taking the visible light feature map as a reference are obtained; step three, after the three aligned infrared feature images are processed by a temperature difference sensing module, predicted values of three temperature difference masks are respectively obtained; in the fourth step, the predicted values of the Ji Gong external feature map, the visible light feature map and the temperature difference mask are processed through a feature fusion module to obtain three-level primary fusion feature maps, and the final fusion feature map is obtained after the temperature difference mask is combined.