CN112734827A

CN112734827A - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN112734827A
Application number: CN202110019670.8A
Authority: CN
Inventors: 白宇
Original assignee: Jingdong Kunpeng Jiangsu Technology Co Ltd
Current assignee: Jingdong Kunpeng Jiangsu Technology Co Ltd
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2021-04-30
Anticipated expiration: 2041-01-07
Also published as: CN112734827B

Abstract

The embodiment of the invention discloses a target detection method, a target detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring radar point cloud data acquired aiming at an application scene; inputting radar point cloud data into a preset convolutional neural network model, wherein the preset convolutional neural network model comprises a preset convolutional layer consisting of preset convolutional kernels with preset sizes, effective convolutional points in the preset convolutional kernels, which participate in current convolution calculation, are determined based on parameter weight values corresponding to the preset convolutional kernels, and the parameter weight values are obtained by training the preset convolutional neural network model; and determining target object information in the application scene according to the output of the preset convolutional neural network model. By the technical scheme of the embodiment of the invention, the accuracy of target detection can be improved.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The present invention relates to computer technologies, and in particular, to a target detection method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of computer technology, target detection can be performed on a target object in an application scene so as to obtain position information and the like of the target object in the application scene. For example, in an unmanned scene, target objects in the environment surrounding the unmanned vehicle may be detected.

Currently, the acquired radar point cloud data may be processed using a common convolutional layer or a sparse convolutional layer (such as a sub-manifold sparse convolutional layer) to obtain a target object in an application scene.

However, in the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:

because the acquired radar point cloud data is sparse data, and the common convolutional layer is usually used for processing dense data, the common convolutional layer cannot obtain a good processing effect when processing the radar point cloud data, and the accuracy of target detection is further reduced. Moreover, when processing sparse radar point cloud data, the sparse convolution layer inhibits the occurrence of the expansion phenomenon of the radar point cloud data in space to a certain extent, but also inhibits the fusion process between adjacent information, so that the fusion of critical information can be performed only through a pooling layer or a larger convolution step length. And the resolution of the final feature map cannot be ensured by using too many pooling layers or large convolution step length, so that the accuracy of target detection is reduced.

Disclosure of Invention

The embodiment of the invention provides a target detection method, a target detection device, electronic equipment and a storage medium, and aims to improve the accuracy of target detection.

In a first aspect, an embodiment of the present invention provides a target detection method, including:

acquiring radar point cloud data acquired aiming at an application scene;

inputting the radar point cloud data into a preset convolutional neural network model, wherein the preset convolutional neural network model comprises a preset convolutional layer consisting of preset convolutional kernels with preset sizes, effective convolutional points in the preset convolutional kernels, which participate in current convolution calculation, are determined based on parameter weight values corresponding to the preset convolutional kernels, and the parameter weight values are obtained by training the preset convolutional neural network model;

and determining the target object information in the application scene according to the output of the preset convolutional neural network model.

In a second aspect, an embodiment of the present invention further provides an object detection apparatus, including:

the radar point cloud data acquisition module is used for acquiring radar point cloud data acquired aiming at an application scene;

the radar point cloud data input module is used for inputting the radar point cloud data into a preset convolution neural network model, wherein the preset convolution neural network model comprises a preset convolution layer formed by preset convolution kernels with preset sizes, effective convolution points participating in current convolution calculation in the preset convolution kernels are determined based on parameter weight values corresponding to the preset convolution kernels, and the parameter weight values are obtained by training the preset convolution neural network model;

and the target object information determining module is used for determining the target object information in the application scene according to the output of the preset convolutional neural network model.

In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of object detection as provided by any embodiment of the invention.

In a fourth aspect, the embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the object detection method provided in any embodiment of the present invention.

The embodiment of the invention has the following advantages or beneficial effects:

the radar point cloud data are processed by utilizing a preset convolution layer which is formed by preset convolution kernels with preset sizes in a preset convolution neural network model, and effective convolution points which participate in current convolution calculation in the preset convolution kernels are determined based on parameter weight values obtained through pre-training, so that the effective convolution points which participate in the current convolution calculation in the preset convolution kernels can be dynamically adjusted, the preset convolution neural network model can obtain more fusion results of adjacent information, the resolution ratio of a final characteristic diagram can be guaranteed, and the accuracy of target detection is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a target detection method according to an embodiment of the present invention;

FIG. 2 is an example of a predetermined convolution kernel according to an embodiment of the present invention;

FIG. 3 is an example of a feature extraction submodel according to an embodiment of the invention;

fig. 4 is a flowchart of a target detection method according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of an object detection apparatus according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a target detection method according to an embodiment of the present invention, which is applicable to a situation of detecting a target object in an application scene, especially to a situation of detecting a target object in an automatic driving scene, and may also be used in other application scenes requiring target detection. The method may be performed by an object detection apparatus, which may be implemented by means of software and/or hardware, integrated in an electronic device, such as may be carried in an autonomous vehicle. As shown in fig. 1, the method specifically includes the following steps:

and S110, acquiring radar point cloud data collected aiming at an application scene.

The application scene may refer to any service scene in which a target object needs to be detected, such as a robot scene, an unmanned scene, and the like. The radar point cloud data may include three-dimensional coordinate information for a plurality of location points in an application scene. For example, the acquired radar point cloud data may include three-dimensional coordinate information of height × width points, that is, the input dimensions of the radar point cloud data are: height × width × 3. Wherein height represents the height of the radar point cloud data; width represents the width of the radar point cloud data.

Specifically, radar point cloud data may be acquired in advance with a radar detector arranged in an application scene so as to acquire the acquired radar point cloud data. For example, in an unmanned scene, radar point cloud data collected by a radar detector pre-installed on an unmanned vehicle may be acquired.

And S120, inputting the radar point cloud data into a preset convolutional neural network model, wherein the preset convolutional neural network model comprises a preset convolutional layer consisting of preset convolutional kernels with preset size, effective convolutional points participating in current convolution calculation in the preset convolutional kernels are determined based on parameter weight values corresponding to the preset convolutional kernels, and the parameter weight values are obtained by training the preset convolutional neural network model.

The preset convolutional neural network model can be a preset deep neural network model used for target detection based on radar point cloud data. The convolutional neural network model in this embodiment may be a model obtained by training according to sample data based on an existing training mode in advance. The preset convolution layer may be a network layer that performs a convolution operation on input data. The predetermined convolutional neural network model may include one or more predetermined convolutional layers, and the specific number thereof may be set based on the service requirement and the depth of the model. The preset convolution kernel may refer to a function that performs a convolution operation with neighboring information in the input data in the preset convolution layer. The preset size of the preset convolution kernel may be preset based on traffic requirements. For example, fig. 2 shows an example of a preset convolution kernel. As shown in fig. 2, the preset size of the preset convolution kernel may be set to 3 × 3. The predetermined convolution kernel may be characterized by a matrix of a predetermined size, such as a 3 x 3 matrix. The convolution points in the preset convolution kernel may refer to respective points corresponding to each row and each column in the preset convolution kernel. For example, the predetermined convolution kernel in fig. 2 may include 9 convolution points. The valid convolution point may refer to a convolution point in a predetermined convolution kernel that is associated with the current convolution calculation, such as the convolution point corresponding to the shaded area in fig. 2. The invalid convolution point may refer to a convolution point in the preset convolution kernel that does not participate in the current convolution calculation, such as a convolution point corresponding to a white area in fig. 2. It can be seen that in the current convolution calculation of the preset convolution kernel in fig. 2, only 4 valid convolution points will be convolved with the corresponding neighboring information in the input data, and the other 5 invalid convolution points will not be convolved. The number and the positions of the effective convolution points in the preset convolution kernel are determined based on the parameter weight value corresponding to the preset convolution kernel obtained through pre-training, that is, the effective convolution points in the preset convolution kernel are obtained through self-learning of the preset convolution neural network model, so that the preset convolution neural network model can perform more effective convolution operations based on self-requirements, and accordingly more fusion results of the adjacent information can be obtained. The parameter weight value corresponding to the preset convolution kernel may be specially set, and is used to determine whether each convolution point in the preset convolution kernel is a training parameter of an effective convolution point. The size of the parameter weight value corresponding to the preset convolution kernel may be consistent with the preset size of the preset convolution kernel. For example, the parameter weight values may also be characterized by a matrix of a predetermined size. Each of the parameter weight values is obtained by training in a process of training a preset convolutional neural network model, so that the number and positions of effective convolution points participating in current calculation in a preset convolution kernel can be dynamically adjusted.

Specifically, each convolution point in the convolution kernel in the existing normal convolution layer and sparse convolution layer needs to participate in the current convolution calculation, that is, the actual size of the existing convolution kernel is fixed. In order to inhibit the spatial expansion of radar point cloud data, some convolution calculations in the sparse convolution layer cannot be performed, so that fusion of adjacent information is also inhibited, and the accuracy of target detection is reduced. The preset convolution kernels in the preset convolution layer in the embodiment can perform convolution operation only by using partial convolution points based on the requirements of the model, that is, the actual size of the preset convolution kernels in the embodiment can be dynamically adjusted, so that the condition that convolution cannot be performed can be avoided, the preset convolution neural network model can obtain more fusion results of adjacent information, excessive pooling layers or large convolution step lengths are not needed, the resolution of the final feature map can be ensured, and the accuracy of target detection is improved.

And S130, determining target object information in the application scene according to the output of the preset convolutional neural network model.

Specifically, radar point cloud data are input into a pre-trained preset convolutional neural network model, so that the preset convolutional neural network model can perform convolution operation and other processing on the radar point cloud data based on a determined preset convolutional layer, detected target object information is output based on a more accurate processing result, more accurate target object information can be obtained based on the output of the preset convolutional neural network model, and the accuracy of target detection is improved.

Illustratively, S130 may include: and determining the length, width, height, rotation angle and central point position of an enclosure where the three-dimensional target object in the application scene is located according to the output of the preset convolutional neural network model. The enclosure may be a closed space that completely encloses the target object. For example, complex target objects may be packaged in a simple enclosure, so that the location of the object may be more conveniently characterized. For example, when detecting a three-dimensional target object in an application scene, the obtained information of the three-dimensional target object may include the length, width, height, rotation angle, center point position, and the like of a bounding box where the three-dimensional target object is located, so that the location and form of the target object may be known more accurately, and the accuracy of target detection is further improved.

According to the technical scheme, the radar point cloud data is processed by utilizing the preset convolution layer formed by the preset convolution kernel with the preset size in the preset convolution neural network model, and the effective convolution points participating in the current convolution calculation in the preset convolution kernel are determined based on the parameter weight value obtained through pre-training, so that the effective convolution points participating in the current convolution calculation in the preset convolution kernel can be dynamically adjusted, the preset convolution neural network model can obtain more fusion results of the adjacent information, the resolution of the final characteristic diagram can be guaranteed, and the accuracy of target detection is improved.

On the basis of the above technical solution, the preset convolutional neural network model may specifically include: the feature extraction submodel comprises a preset number of convolution layers; the specific number of preset convolutional layers may be set based on business requirements and model depth. For example, the larger the number of preset convolutional layers, the larger the model depth and the higher the performance, but the larger the training parameter amount, the slower the training efficiency, so that an appropriate number of preset convolutional layers can be set based on the business requirements.

Wherein the feature extraction submodel is to: performing feature extraction on the input radar point cloud data to obtain a radar feature map corresponding to the radar point cloud data, and inputting the radar feature map into a target detection sub-model; the target detection submodel is to: and determining target object information in the application scene according to the input radar feature map, and outputting the target object information.

The radar feature map may include a feature response value corresponding to each feature point extracted from the radar point cloud data. The radar signature may be, but is not limited to, a radar signature based on a BEV (Bird's Eye View) characterization. The feature extraction submodel in this embodiment may be composed of a preset number of convolution layers, or may be composed of a preset number of convolution layers and a pooling layer. For example, when the convolution step size of each preset convolution layer is 2, no pooling layer may be included. When the convolution step size of each preset convolutional layer is 1, a pooling layer with a pooling step size of 2 is needed to be included, so that the feature vector output by the convolutional layer is reduced, and the overfitting condition is reduced. The convolution step may refer to a step size moved by a preset convolution kernel each time convolution calculation is performed. The pooling step size may be used to characterize the data region for each pooling, to maximize pooling for the data region, or average pooling, etc.

For example, fig. 3 gives an example of a feature extraction submodel. As shown in fig. 3, the processing procedure of the radar point cloud data inside the feature extraction submodel may be: the method comprises the steps that input radar point cloud data (height, width, 3) pass through a preset convolution layer with a preset convolution kernel size of 3 x 3 and a convolution step size of 1 for 2 times, and pass through a pooling layer with a size of 3 x 3 and a pooling step size of 2; then, presetting a convolution layer with convolution kernel size of 3 x 3 and convolution step length of 1 for 3 times, and then presetting a pooling layer with convolution kernel size of 3 x 3 and pooling step length of 2 for one time; then, 3 times of preset convolution layers with convolution kernel size of 3 x 3 and convolution step size of 1 are passed, and then a pooling layer with convolution kernel size of 3 x 3 and pooling step size of 2 is passed; and finally, extracting a more accurate radar feature map with the size of (height/8, width/8, 256) through 3 times of preset convolution layers with the preset convolution kernel size of 3 multiplied by 3 and the convolution step length of 1. The height and the width of the obtained radar feature map are reduced by 2 times due to 3 pooling layers with the pooling step size of 2³8 times.

The target detection submodel in this embodiment may be a deep neural network of a network structure that performs target detection based on the extracted radar feature map. For example, the network structure of the target detection submodel may include existing two-dimensional convolution operations with 2 layers of convolution kernels each having a size of 3 × 3, such as 2 layers of ordinary convolution layers. For example, after the input more accurate radar feature map is subjected to two-dimensional convolution operation with 2 layers of convolution kernels of which the sizes are all 3 × 3, the final target object information can be quickly generated, and therefore the target detection speed and accuracy are further improved.

Example two

Fig. 4 is a flowchart of a target detection method according to a second embodiment of the present invention, and this embodiment describes in detail a determination manner of effective convolution points in a preset convolution kernel, which participate in current convolution calculation, based on the above embodiments. Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted.

Referring to fig. 4, the target detection method provided in this embodiment specifically includes the following steps:

s410, acquiring radar point cloud data collected aiming at an application scene;

s420, inputting the radar point cloud data into a preset convolutional neural network model, wherein a preset convolutional kernel of each preset convolutional layer in the preset convolutional neural network model determines an effective convolutional point participating in current convolutional calculation based on a parameter weight value corresponding to the preset convolutional kernel through the following steps S421-S423:

s421, determining a current weight value corresponding to the current convolution point based on a parameter weight value corresponding to a preset convolution kernel for each convolution point in the preset convolution kernel.

The preset convolution kernel corresponds to a parameter weight value with the same size, namely, each convolution point in the preset convolution kernel corresponds to each weight value in the parameter weight values one to one. For example, the sizes of the predetermined convolution kernel and the parameter weight value are 3 × 3.

Specifically, after the training of the preset convolutional neural network model is finished, a specific parameter weight value corresponding to a preset convolutional kernel of each preset convolutional layer in the preset convolutional neural network model can be obtained. Each convolution point in the preset convolution kernel of the preset convolution layer may be used as a current convolution point to determine whether each convolution point is an effective convolution point participating in the current convolution calculation, so as to obtain all effective convolution points in the preset convolution kernel. In this embodiment, based on the specific position of the current convolution point in the preset convolution kernel, the weight value at the same position may be obtained from the corresponding parameter weight values, and is used as the current weight value corresponding to the current convolution point. For example, if the current convolution point is a convolution point in the 1 st row and 1 st column in the preset convolution kernel, the weight value in the 1 st row and 1 st column in the parameter weight values is used as the current weight value corresponding to the current convolution point.

And S422, determining a current activation value corresponding to the current convolution point according to the current weight value.

Wherein the current activation value can be used to represent whether the current convolution point is a value of a valid convolution point. The higher the current activation value is, the higher the importance degree of the current convolution point is, and the more the current convolution calculation needs to be participated in.

Specifically, a preset activation function may be used to determine a current activation value corresponding to the current convolution point according to the current weight value. For example, the current activation value corresponding to the current convolution point may be determined based on the following formula:

wherein R is_iIs the current activation value, W, corresponding to the ith current convolution point in the preset convolution kernel_iIs the current weight value corresponding to the ith current convolution point in the preset convolution kernel.

And S423, detecting whether the current convolution point is an effective convolution point participating in the current convolution calculation or not according to the current activation value and a preset threshold value.

The preset threshold may be a preset minimum activation value corresponding to the effective convolution point. In this embodiment, the value range of the current activation value determined by using the preset activation function is 0 to 1, so that the preset threshold value can be set to 0.5.

Specifically, whether the current convolution point is a valid convolution point is determined by directly detecting whether the current activation value is greater than or equal to a preset threshold value. If the current activation value is greater than or equal to the preset threshold, it may be determined that the current convolution point is an effective convolution point, and if the current activation value is less than the preset threshold, it may be determined that the current convolution point is an ineffective convolution point, so that all effective convolution points in the preset convolution kernel may be obtained.

It should be noted that, when each preset convolution layer performs convolution calculation by using the preset convolution kernel of the layer, only the convolution operation is performed on each effective convolution point in the preset convolution kernel and the corresponding information of the input data, so as to obtain the result of the proper convolution calculation, and the ineffective convolution point in the preset convolution kernel does not participate in the current convolution calculation, so that the dynamic adjustment of the convolution operation is realized, the preset convolution neural network model can perform more convolution operations under the condition of suppressing spatial collision, obtain more fusion results of the adjacent information, and thus improve the accuracy of target detection.

Exemplarily, S423 may include: if the current convolution is the first convolution, detecting whether the current convolution point is an effective convolution point participating in the current convolution calculation or not according to the current activation value and a preset threshold value; and if the current convolution is not the first convolution, acquiring a last detection result corresponding to the current convolution point in the last convolution calculation, and detecting whether the current convolution point is an effective convolution point participating in the current convolution calculation according to the current activation value, a preset threshold value and the last detection result.

The preset convolutional neural network model may include a plurality of preset convolutional layers, and each preset convolutional layer has a preset convolutional kernel. The first convolution may refer to a convolution operation in a first preset convolution layer used in a preset convolution neural network model.

Specifically, for each preset convolution layer used in the preset convolution neural network model, each effective convolution point in the preset convolution kernel of each preset convolution layer may be sequentially determined according to the sequence of arrangement. For example, when determining each valid convolution point in the preset convolution kernel of the first preset convolution layer in the preset convolution neural network model, it indicates that when the secondary convolution is the first convolution of the preset convolution neural network model, at this time, it may be directly detected whether the current activation value is greater than or equal to the preset threshold value, and it is determined whether the current convolution point is a valid convolution point. For example, if the current activation value is greater than or equal to the preset threshold, it may be determined that the current volume point is a valid volume point. When determining each effective convolution point in the preset convolution kernel of the second or subsequent preset convolution layer in the preset convolution neural network model, it indicates that when the secondary convolution is the non-primary convolution of the preset convolution neural network model, the detection result of the effective convolution point in the current preset convolution layer can be determined based on the detection result of the effective convolution point in the last preset convolution layer, so as to further improve the accuracy of target detection.

Illustratively, detecting whether the current convolution point is a valid convolution point participating in the current convolution calculation according to the current activation value, a preset threshold value and the last detection result may include: and if the current activation value is greater than or equal to the preset threshold value and the current volume point is the effective volume point participating in the convolution calculation of the last time, determining the current volume point as the effective volume point participating in the convolution calculation of the current time.

Specifically, when the current activation value is greater than or equal to the preset threshold value and the current convolution point is also a valid convolution point in the last preset convolution layer, it may be determined that the current convolution point is a valid convolution point in the current preset convolution layer. If the current activation value is smaller than the preset threshold value or the current convolution point is an invalid convolution point in the last preset convolution layer, the current convolution point can be determined to be the invalid convolution point in the current preset convolution layer, so that the detection result of the current preset convolution layer can be determined more accurately based on the last detection result, and the accuracy of target detection is further improved.

And S430, determining target object information in the application scene according to the output of the preset convolutional neural network model.

According to the technical scheme of the embodiment, the current weight value corresponding to the current convolution point is determined based on the parameter weight value corresponding to the preset convolution kernel for each convolution point in the preset convolution kernel, the current activation value corresponding to the current convolution point is determined according to the current weight value, and whether the current convolution point is an effective convolution point participating in current convolution calculation or not can be quickly detected according to the current activation value and the preset threshold value, so that dynamic adjustment of convolution operation is realized, more convolution operations can be performed on the preset convolution neural network model under the condition of restraining space collision, more fusion results of adjacent information are obtained, and the accuracy of target detection is further improved.

The following is an embodiment of the object detection apparatus provided in the embodiments of the present invention, which belongs to the same inventive concept as the object detection methods in the embodiments described above, and reference may be made to the embodiments of the object detection method for details that are not described in detail in the embodiments of the object detection apparatus.

EXAMPLE III

Fig. 5 is a schematic structural diagram of a target detection apparatus according to a third embodiment of the present invention, which is applicable to a situation of detecting a target object in an application scene. The device specifically includes: a radar point cloud data acquisition module 510, a radar point cloud data input module 520, and a target object information determination module 530.

The radar point cloud data acquisition module 510 is configured to acquire radar point cloud data acquired for an application scene; a radar point cloud data input module 520, configured to input radar point cloud data into a preset convolutional neural network model, where the preset convolutional neural network model includes a preset convolutional layer composed of preset convolutional kernels of a preset size, an effective convolutional point in the preset convolutional kernel, which is involved in current convolutional calculation, is determined based on a parameter weight value corresponding to the preset convolutional kernel, and the parameter weight value is obtained by training the preset convolutional neural network model; and a target object information determining module 530, configured to determine target object information in the application scene according to an output of the preset convolutional neural network model.

Optionally, the apparatus further comprises:

the current weight value determining module is used for determining a current weight value corresponding to the current convolution point based on a parameter weight value corresponding to a preset convolution kernel for each convolution point in the preset convolution kernel;

the current activation value determining module is used for determining a current activation value corresponding to the current convolution point according to the current weight value;

and the effective convolution point detection module is used for detecting whether the current convolution point is an effective convolution point participating in current convolution calculation or not according to the current activation value and a preset threshold value.

Optionally, the effective convolution point detection module includes:

the first detection unit is used for detecting whether the current convolution point is an effective convolution point participating in the current convolution calculation or not according to the current activation value and a preset threshold value if the current convolution is the first convolution;

and the second detection unit is used for acquiring a last detection result corresponding to the current convolution point in the last convolution calculation if the current convolution is not the first convolution, and detecting whether the current convolution point is an effective convolution point participating in the current convolution calculation according to the current activation value, a preset threshold value and the last detection result.

Optionally, the first detection unit is specifically configured to: and if the current activation value is larger than or equal to the preset threshold value, determining the current volume point as an effective volume point participating in the current convolution calculation.

Optionally, the second detection unit is specifically configured to: and if the current activation value is greater than or equal to the preset threshold value and the current volume point is the effective volume point participating in the convolution calculation of the last time, determining the current volume point as the effective volume point participating in the convolution calculation of the current time.

Optionally, the current activation value determining module determines the current activation value corresponding to the current convolution point based on the following formula:

Optionally, the preset convolutional neural network model specifically includes: the characteristic extraction submodel comprises a preset number of convolution layers; wherein,

the feature extraction submodel is to: performing feature extraction on the input radar point cloud data to obtain a radar feature map corresponding to the radar point cloud data, and inputting the radar feature map into a target detection sub-model;

the target detection submodel is to: and determining target object information in the application scene according to the input radar feature map, and outputting the target object information.

Optionally, the target object information determining module 530 is specifically configured to: and determining the length, width, height, rotation angle and central point position of an enclosure where the three-dimensional target object in the application scene is located according to the output of the preset convolutional neural network model.

The target detection device provided by the embodiment of the invention can execute the target detection method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the target detection method.

It should be noted that, in the embodiment of the object detection apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

Example four

Fig. 6 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. FIG. 6 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 6 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.

As shown in FIG. 6, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, to implement a target detection method provided by the embodiment of the present invention, the method includes:

acquiring radar point cloud data acquired aiming at an application scene;

inputting radar point cloud data into a preset convolutional neural network model, wherein the preset convolutional neural network model comprises a preset convolutional layer consisting of preset convolutional kernels with preset sizes, effective convolutional points in the preset convolutional kernels, which participate in current convolution calculation, are determined based on parameter weight values corresponding to the preset convolutional kernels, and the parameter weight values are obtained by training the preset convolutional neural network model;

and determining target object information in the application scene according to the output of the preset convolutional neural network model.

Of course, those skilled in the art can understand that the processor can also implement the technical solution of the target detection method provided in any embodiment of the present invention.

EXAMPLE five

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of an object detection method as provided by any of the embodiments of the present invention, the method comprising:

acquiring radar point cloud data acquired aiming at an application scene;

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method of object detection, comprising:

acquiring radar point cloud data acquired aiming at an application scene;

2. The method of claim 1, wherein determining valid convolution points in the current convolution calculation in the preset convolution kernel based on the parameter weight values corresponding to the preset convolution kernel comprises:

for each convolution point in the preset convolution kernel, determining a current weight value corresponding to the current convolution point based on a parameter weight value corresponding to the preset convolution kernel;

determining a current activation value corresponding to the current convolution point according to the current weight value;

and detecting whether the current convolution point is an effective convolution point participating in current convolution calculation or not according to the current activation value and a preset threshold value.

3. The method of claim 2, wherein detecting whether the current convolution point is a valid convolution point participating in the current convolution calculation according to the current activation value and a preset threshold comprises:

if the current convolution is the first convolution, detecting whether the current convolution point is an effective convolution point participating in current convolution calculation or not according to the current activation value and a preset threshold value;

and if the current convolution is not the first convolution, acquiring a last detection result corresponding to the current convolution point in the last convolution calculation, and detecting whether the current convolution point is an effective convolution point participating in the current convolution calculation according to the current activation value, a preset threshold value and the last detection result.

4. The method of claim 3, wherein detecting whether the current convolution point is a valid convolution point participating in the current convolution calculation according to the current activation value and a preset threshold comprises:

and if the current activation value is larger than or equal to a preset threshold value, determining the current volume point as an effective volume point participating in current convolution calculation.

5. The method of claim 3, wherein detecting whether the current convolution point is a valid convolution point participating in the current convolution calculation according to the current activation value, a preset threshold and the last detection result comprises:

and if the current activation value is greater than or equal to a preset threshold value and the current volume point is an effective volume point participating in the last convolution calculation, determining the current volume point as the effective volume point participating in the current convolution calculation.

6. The method of claim 2, wherein the current activation value corresponding to the current convolution point is determined based on the following formula:

wherein R is_iIs the current activation value, W, corresponding to the ith current convolution point in the preset convolution kernel_iAnd the current weight value is the current weight value corresponding to the ith current convolution point in the preset convolution kernel.

7. The method according to claim 1, wherein the predetermined convolutional neural network model specifically comprises: the characteristic extraction submodel comprises a preset number of preset convolution layers; wherein,

the feature extraction submodel is to: performing feature extraction on the input radar point cloud data to obtain a radar feature map corresponding to the radar point cloud data, and inputting the radar feature map into the target detection submodel;

8. The method according to any one of claims 1 to 7, wherein determining the target object information in the application scene according to the output of the preset convolutional neural network model comprises:

and determining the length, the width, the height, the rotation angle and the central point position of an enclosure where the three-dimensional target object in the application scene is located according to the output of the preset convolutional neural network model.

9. An object detection device, comprising:

10. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the object detection method of any one of claims 1-8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the object detection method according to any one of claims 1 to 8.