CN114218999A

CN114218999A - Millimeter wave radar target detection method and system based on fusion image characteristics

Info

Publication number: CN114218999A
Application number: CN202111288212.0A
Authority: CN
Inventors: 张卫东; 李敏; 陈卫星; 谢威; 高睿
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2022-03-22

Abstract

The invention relates to a millimeter wave radar target detection method based on fusion image characteristics, which comprises the following steps: step 1: obtaining a 3D aerial view characteristic map of an input image through an image characteristic processing module, and inputting the 3D aerial view characteristic map into a radar data characteristic and image characteristic fusion module; step 2: obtaining a normalized radar feature map through a radar data feature and image feature fusion module, and fusing the normalized radar feature map with a 3D aerial view feature map to obtain a fused feature map; and step 3: training a target detection network of the target detection module based on the fused feature map; and 4, step 4: after the target detection network training is finished, radar data are input to carry out target detection, and compared with the prior art, the method has the advantages of improving the accuracy of target detection of the automatic driving automobile, avoiding the problem of reduction of the accuracy of target detection of a laser radar and a camera under severe weather conditions and the like.

Description

Millimeter wave radar target detection method and system based on fusion image characteristics

Technical Field

The invention relates to the technical field of automatic driving target detection, in particular to a millimeter wave radar target detection method and system based on fusion image characteristics.

Background

In recent years, automatic driving is one of hot spots of research and application, one of core technologies of automatic driving is that an automatic driving automobile can accurately and efficiently detect obstacles, namely a target detection technology, in the field of target detection, a target detection algorithm with a laser radar and a camera fused together is always the main direction of research, in practical application, the automatic driving automobile inevitably faces to the examination of severe weather conditions, but the camera and the laser radar are sensitive to the severe weather conditions, and the vision and perception abilities of the camera and the laser radar are obviously reduced under the severe weather conditions such as rain, fog and snow, so that the reliability of the algorithm is reduced.

Compared with a laser radar and a camera, the millimeter wave radar works in a millimeter wave band (30-300GHz) lower than visible light and has strong capability of penetrating fog, smoke and dust, so that the millimeter wave radar has very strong adaptability to severe weather conditions, and therefore, the millimeter wave radar draws high attention in the fields of land and water automatic driving, and one main method for detecting the target by utilizing the millimeter wave radar is to process radar information by adopting a deep neural network, which has been proposed for a long time but is difficult to realize, and mainly has the following problems: 1) semantic information in radio information received by the millimeter wave radar is difficult to extract; 2) due to the inherent difference between the laser radar and the radar, the existing algorithm based on the laser radar is difficult to be directly applied to the radar; 3) many surrounding obstacles, such as buildings and trees, generate a lot of noise, reducing the recognition accuracy.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a millimeter wave radar target detection method and system based on fusion image characteristics.

The purpose of the invention can be realized by the following technical scheme:

a millimeter wave radar target detection method based on fusion image characteristics comprises the following steps:

step 1: obtaining a 3D aerial view characteristic map of an input image through an image characteristic processing module, and inputting the 3D aerial view characteristic map into a radar data characteristic and image characteristic fusion module;

step 2: obtaining a normalized radar feature map through a radar data feature and image feature fusion module, and fusing the normalized radar feature map with a 3D aerial view feature map to obtain a fused feature map;

and step 3: training a target detection network of the target detection module based on the fused feature map;

and 4, step 4: and after the target detection network training is finished, inputting radar data to carry out target detection.

In the step 1, the process of obtaining the 3D bird's-eye view feature map of the input image by the image feature processing module specifically includes the following steps:

step 101: inputting the image into a depth residual error network after pre-training, extracting image features with various different resolutions, namely outputting a multi-scale feature map, and generating a feature pyramid;

step 102: the feature pyramid upsamples from the low-resolution image features to provide context information for the high-resolution image features;

step 103: and performing projection transformation on the image features through simple camera geometry, and projecting the image feature maps with different scales into a 3D aerial view feature map.

In step 101, the image features with different resolutions are output of each residual stage of the depth residual network from the third convolutional layer to the seventh convolutional layer.

In step 103, the process of performing projective transformation on the image features by using a simple camera geometry specifically includes the following steps:

step a: compressing a feature map with the height of H, the width of W and C channels in a feature pyramid into bottleneck features with the size of B multiplied by W through a bottleneck layer of a depth residual error network along the vertical dimension and the channel dimension respectively, wherein the bottleneck layer adopts a 1 multiplied by 1 convolutional neural network for reducing the feature dimension, namely compressing the feature map along the vertical dimension and the channel dimension according to the expansion rate epsilon of the bottleneck layer, the output dimension is H epsilon multiplied by W epsilon, and the horizontal dimension is unchanged when the feature map is compressed along the vertical dimension and the channel dimension, namely the width of W is unchanged;

step b: the bottleneck characteristics are subjected to 1 x 1 convolution layer, and a characteristic tensor with the dimension of C x Z x W is obtained along the depth dimension prediction expansion under a group of polar coordinate systems, wherein Z is the depth, W is the width, and C is the number of channels;

step c: resampling the feature tensor under the polar coordinate system to a Cartesian coordinate system to obtain a 3D aerial view feature map of the planar feature map;

step d: and (b) returning to the step a, projecting the feature maps with all scales into a 3D aerial view feature map, and splicing into the 3D aerial view feature map with the same size as the radar feature map.

In the step 2, the process of obtaining the fused feature map specifically includes the following steps:

step 201: radar data features are extracted from the radar data through the 3D convolutional layer, and a radar feature map is obtained;

step 202: and carrying out feature normalization on the extracted radar data features through a normalization layer.

Step 203: and fusing the normalized radar feature map and the 3D aerial view feature map obtained by the image feature processing module to obtain a fused feature map.

In step 202, the process of performing the feature normalization specifically includes the following steps:

step a: calculating the expectation and standard deviation of each hidden layer, wherein the calculation formula of the expectation and standard deviation of each hidden layer is as follows:

wherein, mu^lH is the number of nodes of the normalization layer,

the value of the ith node of the ith hidden layer before activation;

step b: obtaining normalized values according to the expectation and standard deviation of each hidden layer

Wherein,

normalizing the value of the ith node of the ith hidden layer, wherein g and b are respectively a gain parameter and a bias parameter;

step c: obtaining a feature normalized output through an activation function:

wherein Relu (-) is an activation function, and h is an output after the characteristics are normalized;

the expression of the activation function Relu (-) is:

where max (·) is a piecewise linear function, which means that negative values of all inputs are set to 0, positive values of all inputs are unchanged, and x is the value of the input.

In step 3, the process of training the target detection network of the target detection module based on the fused feature map specifically includes the following steps:

step 301: inputting the fused feature map into a target detection network of a target detection module, and obtaining a predicted confidence map through forward propagation;

step 302: updating the network parameters by a back propagation method, and obtaining a loss function l of the target detection network by adopting a cross entropy loss function based on the real label graph in the data set and the confidence graph predicted in the step 301, wherein the expression of the loss function is as follows:

where D represents the true tag matrix,

representing a prediction matrix, c is a target detection category, and (i, j) is an image pixel index;

step 303: and (5) repeating the step 301 to the step 302, and finishing the training when the learning rate alpha is smaller than a set value or the training frequency is larger than a set frequency.

In step 301, the target detection network of the target detection module is a deep neural network, and the deep neural network includes 10 layers of 3D convolutional layers and 5 layers of 3D deconvolution layers.

In step 302, the data set includes images with the same time stamp, radar data, and a real tag corresponding to the detected target.

A system for implementing the millimeter wave radar target detection method as described, the system comprising:

an image feature processing module: the method comprises the steps of image feature extraction and image feature projection transformation, and is used for obtaining a 3D aerial view feature map of an input image;

the radar data feature and image feature fusion module comprises: the method comprises the steps of radar data feature extraction and radar feature map normalization, and is used for fusing a radar feature map and a 3D aerial view feature map to obtain a fused feature map;

the target detection network module: the system comprises a target detection network trained based on the fused feature map and used for carrying out target detection on input radar data.

Compared with the prior art, the invention has the following advantages:

1. the radar target detection method provided by the invention integrates the image characteristics, and links the space and time information of the radar with the semantic information of the image, thereby reducing the influence of noise around the radar, improving the accuracy of target detection, and enabling an automatic driving automobile to accurately and efficiently detect the obstacles;

2. after the target detection network training of the radar target detection system is finished, target detection can be carried out only by inputting radar data, the conditions that images captured by a camera are shielded or unclear and the like under severe weather conditions such as rain, fog, snow and the like are avoided, and the reliability of detecting obstacles by an automatic driving automobile is improved.

Drawings

FIG. 1 is a schematic diagram of the system of the present invention.

FIG. 2 is a schematic diagram of the image feature processing of the present invention.

FIG. 3 is a schematic diagram of projective transformation according to the present invention.

FIG. 4 is a schematic diagram of radar feature processing and fusion in accordance with the present invention.

Fig. 5 is a schematic structural diagram of the target detection network according to the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

Examples

As shown in fig. 1, the invention provides a millimeter wave radar target detection method based on fusion image characteristics, which comprises the following steps:

In step 1, the process of image feature extraction and projection specifically includes the following steps:

step 101: extracting a plurality of image features of different resolutions of the input image, namely a multi-scale feature map, through the pre-trained depth residual error network, wherein the image features of the plurality of different resolutions are output of each residual error stage of the depth residual error network from the third convolutional layer to the seventh convolutional layer;

step 102: the feature pyramid enhances the high-resolution features through the lower-layer spatial background, transmits the multi-scale feature map extracted in the step 101 to the feature pyramid, and the feature pyramid up-samples the low-resolution image features to provide context information for the high-resolution image features, so that the high-resolution image features contain large spatial background;

step 103: projecting and transforming the characteristic pyramid into a 3D aerial view characteristic map, performing projection transformation through simple camera geometry, and projecting the planar characteristic map with the height of H, the width of W and C channels to the 3D aerial view characteristic map with the width of X, the depth of Z and C channels;

as shown in fig. 4, in step 2, the process of fusing the radar data features and the image features specifically includes the following steps:

step 201: extracting radar data characteristics of the radar data through the 3D convolutional layer;

step 202: performing feature normalization on the extracted radar data features through a normalization layer;

step 203: and fusing the radar data features with normalized features and the 3D aerial view feature map of the image.

As shown in fig. 5, in step 3, the process of training the target detection network of the target detection network module specifically includes the following steps:

step 301: inputting the feature map obtained by fusing the radar data features and the image features in the step 2 into a target detection network, wherein the target detection network consists of 10 layers of 3D convolutional layers and 5 layers of 3D anti-convolutional layers, and obtaining a predicted confidence map through forward propagation;

where D represents the true tag matrix,

Specific examples are given below:

the detection targets comprise automobiles, pedestrians and bicyclists, and the data set adopted comprises images and radar data with the same time stamp and real labels corresponding to the detection targets.

As shown in fig. 2, image feature extraction and projective transformation are performed:

1) training on an image training set by using a 50-layer depth residual error network to obtain a trained depth residual error network model (the learning rate is lower than 0.0001);

2) inputting a single image into a depth residual error network after pre-training, reading the output of each residual error stage from a third layer of convolution layer to a seventh layer of convolution layer of the depth residual error network, outputting the characteristics of a multi-scale image and generating a characteristic pyramid;

3) the characteristic pyramid samples upwards from the characteristic diagram with small scale and provides context information for the characteristic diagram with large scale;

4) respectively projecting image feature maps with different scales into a 3D aerial view feature map, taking a plane feature map with the height of H, the width of W and C channels as an example, and compressing the plane feature map into bottleneck features with the size of B multiplied by W along the vertical dimension and the channel dimension;

5) predicting the bottleneck characteristics along the depth dimension through a 1 × 1 convolution layer to obtain a characteristic tensor with a dimension of C × Z × W under a polar coordinate;

6) resampling the feature tensor under the polar coordinate to a Cartesian coordinate to obtain a 3D aerial view feature map corresponding to the planar feature map;

7) and repeating the steps 4) to 6), projecting the image feature maps with all scales into a 3D aerial view feature map, and splicing into the 3D aerial view feature map with the same size as the radar feature map.

As shown in fig. 4, the millimeter wave radar data after radio signal processing is represented by a distance-angle coordinate system, which can be described as a bird's-eye view, and the radar data is subjected to feature extraction and fused with a 3D bird's-eye view feature map:

1) extracting radar data characteristics of the millimeter wave radar data through the 3D convolutional layer;

2) performing feature normalization on the radar feature map through a normalization layer:

wherein, mu^lH is the number of nodes of the normalization layer,

the value of the ith node of the ith hidden layer before activation;

Wherein,

step c: obtaining a feature normalized output through an activation function:

the expression of the activation function Relu (-) is:

3) Fusing the radar feature map with the normalized features and the 3D aerial view feature map, and dividing the fusion into early fusion and late fusion according to the sequence of fusion and prediction, wherein the early fusion is adopted in the embodiment, namely, the multi-layer features are fused firstly, then a predictor is trained on the fused features, the early fusion is divided into concat and add, the add operation is the feature map addition, the number of channels is unchanged, namely, the information content of each dimension feature of the feature map is increased under the condition that the dimension is unchanged, compared with the add operation, the concat operation increases the number of channels, the two feature maps are directly connected, namely, the number of channels is merged, the dimension of the feature maps is increased, and the information content of each dimension feature is unchanged; late fusion improves detection performance by combining detection results of different layers (before final fusion is not completed, detection is started on a partially fused layer, detection of multiple layers is possible, and multiple detection results are finally fused).

As shown in fig. 4, the deep neural network of the target detection network module is composed of 10 layers of 3D convolutional layers and 5 layers of 3D deconvolution layers, and the deep neural network of the target detection network module is trained:

1) inputting the fused feature map into a target detection network of a target detection module, and obtaining a predicted confidence map through forward propagation;

2) updating network parameters by a back propagation method, and obtaining a loss function l of the target detection network by adopting a cross entropy loss function based on a real label graph and a predicted confidence graph in a data set, wherein the expression of the loss function is as follows:

where D represents the true tag matrix,

3) and (5) repeating the steps 1) and 2), and finishing the training when the learning rate is less than 0.0001 or the training times is more than 10000.

The target detection network model in the above embodiment is only exemplary and non-limiting, the number of network layers can be increased or decreased in practical application, or other networks with extracted features can be used instead, and the automatic driving automobile can accurately and efficiently detect the obstacle based on the method, can still reliably detect the obstacle in case of severe weather conditions, and improves the reliability of the target detection method.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A millimeter wave radar target detection method based on fusion image characteristics is characterized by comprising the following steps:

2. The method for detecting the millimeter wave radar target based on the fused image feature as claimed in claim 1, wherein in the step 1, the process of obtaining the 3D bird's eye view feature map of the input image through the image feature processing module specifically includes the following steps:

3. The method according to claim 2, wherein in step 101, the image features with different resolutions are output from each residual stage of the depth residual network from the third convolutional layer to the seventh convolutional layer.

4. The method according to claim 2, wherein in step 103, the process of performing projective transformation on the image features through a simple camera geometry specifically comprises the following steps:

step c: resampling the feature tensor under the polar coordinate system to a Cartesian coordinate system to obtain a 3D aerial view feature map of the feature map;

5. The method for detecting the millimeter wave radar target based on the fusion image feature as claimed in claim 1, wherein in the step 2, the process of obtaining the fusion feature map specifically comprises the following steps:

step 202: performing characteristic normalization on the extracted radar data characteristics through a normalization layer;

6. The method according to claim 5, wherein in step 202, the process of performing feature normalization specifically comprises the following steps:

wherein, mu^lH is the number of nodes of the normalization layer,

the value of the ith node of the ith hidden layer before activation;

Wherein,

step c: obtaining a feature normalized output through an activation function:

the expression of the activation function Relu (-) is:

7. The method for detecting the millimeter wave radar target based on the fused image feature as claimed in claim 1, wherein in the step 3, the process of training the target detection network of the target detection module based on the fused feature map specifically comprises the following steps:

where D represents the true tag matrix,

representing a prediction matrix, c being a target testMeasuring the category, (i, j) is an image pixel index;

8. The method for detecting the target of the millimeter wave radar based on the fusion image feature of claim 7, wherein in the step 301, the target detection network of the target detection module is a deep neural network, and the deep neural network includes 10 layers of 3D convolutional layers and 5 layers of 3D deconvolution layers.

9. The method for detecting millimeter wave radar targets based on fusion image features as claimed in claim 7, wherein in said step 302, the data set includes images with the same time stamp, radar data and real tags corresponding to the detected targets.

10. A system for implementing the millimeter wave radar target detection method according to any one of claims 1 to 9, the system comprising: