[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2022017129A1 - 目标对象检测方法、装置、电子设备及存储介质 - Google Patents

目标对象检测方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022017129A1
WO2022017129A1 PCT/CN2021/102684 CN2021102684W WO2022017129A1 WO 2022017129 A1 WO2022017129 A1 WO 2022017129A1 CN 2021102684 W CN2021102684 W CN 2021102684W WO 2022017129 A1 WO2022017129 A1 WO 2022017129A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
sparse matrix
layer
point cloud
convolution module
Prior art date
Application number
PCT/CN2021/102684
Other languages
English (en)
French (fr)
Inventor
付万增
王哲
石建萍
Original Assignee
上海商汤临港智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤临港智能科技有限公司 filed Critical 上海商汤临港智能科技有限公司
Publication of WO2022017129A1 publication Critical patent/WO2022017129A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present disclosure relates to the technical field of lidar, and in particular, to a target object detection method, device, electronic device, and storage medium.
  • object detection and segmentation algorithms are the core algorithms for many artificial intelligence applications.
  • object detection and segmentation algorithms can be applied in the field of automatic driving. , pedestrians, obstacles, etc. are detected to avoid collisions.
  • Convolutional neural network is a kind of feedforward neural network (Feedforward Neural Networks) with deep structure including convolution calculation. It is one of the deep learning algorithms and is widely used in artificial intelligence scenarios. Detection and segmentation algorithms based on convolutional neural networks generally organize a complex computing model through a huge amount of parameters to complete specific tasks. Such computing models often have extremely high requirements on the performance of computing devices, and there are computational The problems of large amount, high power consumption, and high delay lead to the complex detection process of the target object, the large amount of calculation, and the long time-consuming.
  • the present disclosure provides at least a target object detection method, apparatus, electronic device, and storage medium.
  • the present disclosure provides a target object detection method, comprising: acquiring target point cloud data of a target scene collected by a radar device; and generating at least one target corresponding to the target point cloud data based on the target point cloud data sparse matrix; the target sparse matrix is used to represent whether there are target objects at different positions of the target scene; based on the at least one target sparse matrix and the target point cloud data, determine the target included in the target scene 3D inspection data for objects.
  • At least one corresponding target sparse matrix can be generated for the acquired target point cloud data, and the target sparse matrix is used to represent whether there are target objects at different positions of the target scene; in this way, based on the target sparse matrix and the target point Cloud data, when determining the three-dimensional detection data of the target object, the target position of the existing target object can be determined based on the target sparse matrix, so that the features corresponding to the target position can be processed, and the different positions except the target position can be processed. The features corresponding to other positions are not processed, which reduces the amount of calculation to obtain the three-dimensional detection data of the target object and improves the detection efficiency.
  • generating at least one target sparse matrix corresponding to the target point cloud data based on the target point cloud data includes: determining, based on the target point cloud data, a method for detecting the target object.
  • the target sparse matrix corresponding to each layer of convolution module in the neural network includes: determining, based on the target point cloud data, a method for detecting the target object.
  • a corresponding target sparse matrix can be determined for each layer of convolution module of the neural network, so that each layer of convolution module can process the input feature map based on the target sparse matrix.
  • determining, based on the target point cloud data, a target sparse matrix corresponding to each layer of convolution modules in the neural network for detecting the target object including: based on the target point cloud data, generating an initial sparse matrix; based on the initial sparse matrix, determining a target sparse matrix matching the target size of the feature map input to each layer of the convolution module of the neural network.
  • an initial sparse matrix can be generated based on the target point cloud data, and then based on the initial sparse matrix, a corresponding target sparse matrix can be determined for each layer of convolution modules of the neural network, and the target corresponding to each layer of convolution modules.
  • the sparse matrix matches the target size of the feature map input to the convolution module of this layer, so that each layer of the convolution module can process the input feature map based on the target sparse matrix.
  • generating an initial sparse matrix based on the target point cloud data includes: determining a target area corresponding to the target point cloud data; dividing the target area into a plurality of grid areas; based on the grid area where the points of the target point cloud corresponding to the target point cloud data are located, determine the matrix element value corresponding to each grid area; based on the matrix element value corresponding to each grid area , and generate the initial sparse matrix corresponding to the target point cloud data.
  • the matrix element value of each grid area can be determined. For example, if there is a target point cloud in the grid area , then the matrix element value of the grid area is 1, indicating that there is a target object in the grid area, and then based on the matrix element values corresponding to each grid area, an initial sparse matrix is generated, which is used to determine the subsequent 3D detection of the target object. Data provides data support.
  • a target sparse matrix matching the target size of the feature map input to the convolution module of each layer of the neural network including any of the following: based on the The initial sparse matrix is to determine the output sparse matrix corresponding to each layer of convolution module in the neural network, and the output sparse matrix is used as the target sparse matrix; based on the initial sparse matrix, it is determined that in the neural network, each an input sparse matrix corresponding to a layer of convolution modules, and the input sparse matrix as the target sparse matrix; based on the initial sparse matrix, determine the input sparse matrix and output corresponding to each layer of convolution modules in the neural network Sparse matrix, the input sparse matrix and the output sparse matrix are fused to obtain a fused sparse matrix, and the fused sparse matrix is used as the target sparse matrix corresponding to the convolution module of this layer.
  • the target sparse matrix can be an input sparse matrix, an output sparse matrix, or an input sparse matrix and an output sparse matrix.
  • the fused sparse matrix generated by the matrix is set to generate the target sparse matrix corresponding to each layer of convolution module, that is, the target sparse matrix can be an input sparse matrix, an output sparse matrix, or an input sparse matrix and an output sparse matrix.
  • the determining, based on the initial sparse matrix, the input sparse matrix corresponding to each layer of convolution modules in the neural network includes: using the initial sparse matrix as the input sparse matrix of the neural network.
  • the input sparse matrix corresponding to the first layer convolution module based on the input sparse matrix corresponding to the i-1 layer convolution module, determine the feature map corresponding to the i layer convolution module and input to the i layer convolution module
  • the input sparse matrix matching the target size of ; wherein, i is a positive integer greater than 1 and less than n+1, and n is the total number of layers of the convolution module of the neural network.
  • the initial sparse matrix can be used as the input sparse matrix corresponding to the first-layer convolution module, and the input sparse matrix of each layer of convolution modules can be determined in turn, and then the target sparse matrix can be determined based on the input sparse matrix, as Subsequently, based on the target sparse matrix of each layer of convolution module, the 3D detection data of the target object is determined to provide data support.
  • the determining, based on the initial sparse matrix, the output sparse matrix corresponding to each layer of convolution modules in the neural network includes: based on the size threshold of the target object and the initial sparse matrix. Sparse matrix, determine the output sparse matrix corresponding to the neural network; based on the output sparse matrix, generate an output corresponding to the nth layer convolution module and matching the target size of the feature map input by the nth layer convolution module sparse matrix; based on the output sparse matrix corresponding to the j+1th layer convolution module, an output sparse matrix corresponding to the jth layer convolution module and matching the target size of the feature map input by the jth layer convolution module is generated, Wherein, j is a positive integer greater than or equal to 1 and less than n, and n is the total number of layers of the convolution module of the neural network.
  • the output sparse matrix can be determined based on the initial sparse matrix, and the output sparse matrix of the n-th layer convolution module, .
  • the output sparse matrix of the layer determines the target sparse matrix, which provides data support for the subsequent determination of the 3D detection data of the target object based on the target sparse matrix of each layer of convolution modules.
  • determining the three-dimensional detection data of the target object included in the target scene based on the at least one target sparse matrix and the target point cloud data includes: based on the target point cloud data, generating a target point cloud feature map corresponding to the target point cloud data; based on the target point cloud feature map and the at least one target sparse matrix, using a neural network for detecting target objects to determine the target objects included in the target scene
  • the three-dimensional detection data wherein, the neural network includes a multi-layer convolution module.
  • generating a target point cloud feature map corresponding to the target point cloud data based on the target point cloud data includes: for each grid area, based on the target located in the grid area The coordinate information indicated by the target point cloud data corresponding to the points of the point cloud determines the feature information corresponding to the grid area; wherein, the grid area is the target point cloud data according to a preset number of grids. The corresponding target area is divided and generated; based on the feature information corresponding to each grid area, the target point cloud feature map corresponding to the target point cloud data is generated.
  • a target point cloud feature map corresponding to the target point cloud data is generated, and the target point cloud feature map includes the position information of each target point cloud point, and then Based on the target point cloud feature map and the at least one target sparse matrix, the three-dimensional detection data of the target object included in the target scene can be more accurately determined.
  • a neural network for detecting target objects is used to determine the three-dimensional detection data of the target objects included in the target scene, including: Based on the target sparse matrix corresponding to the first-layer convolution module in the neural network, determine the feature information to be convolved in the target point cloud feature map, and use the first-layer convolution module to analyze the target point cloud.
  • the feature information to be convolved in the feature map is subjected to convolution processing to generate a feature map input to the second-layer convolution module; based on the target sparse matrix corresponding to the k-th layer convolution module in the neural network, determine the input to
  • the feature information to be convoluted in the feature map of the k-th layer convolution module is to use the k-th layer convolution module of the neural network to perform convolution features in the feature map of the k-th layer convolution module.
  • the information is subjected to convolution processing to generate a feature map input to the convolution module of the k+1 layer, where k is a positive integer greater than 1 and less than n, and n is the total number of layers of the convolution module of the neural network; based on The target sparse matrix corresponding to the nth layer convolution module in the neural network determines the feature information to be convolved in the feature map input to the nth layer convolution module, and uses the nth layer convolution of the neural network. module, which performs convolution processing on the feature information to be convolved in the feature map of the n-th layer convolution module to obtain three-dimensional detection data of the target object included in the target scene.
  • the feature information to be convoluted can be determined, the feature information to be convolved is subjected to convolution processing, and the features in the feature map except the feature information to be convolved can be convoluted.
  • Other feature information is not processed by convolution, which reduces the calculation amount of convolution processing of each layer of convolution module, improves the operation efficiency of each layer of convolution module, which can reduce the calculation volume of neural network and improve the target object's performance. detection efficiency.
  • a neural network for detecting target objects is used to determine the three-dimensional detection data of the target objects included in the target scene, including: For each layer of convolution modules in the neural network except the last layer of convolution modules, determine the layer based on the target sparse matrix corresponding to the convolution module of the layer and the feature map input to the convolution module of the layer.
  • the convolution vector corresponding to the convolution module based on the convolution vector corresponding to the convolution module of this layer, determine the feature map input to the convolution module of the next layer; based on the target sparse matrix and input corresponding to the convolution module of the last layer To the feature map of the convolution module of the last layer, determine the convolution vector corresponding to the convolution module of the last layer; based on the convolution vector corresponding to the convolution module of the last layer, determine the target object included in the target scene. 3D inspection data.
  • a convolution vector corresponding to each layer of convolution module can be generated based on the target sparse matrix of each layer of convolution module and the input feature map, and the convolution vector includes the feature information to be processed in the feature map.
  • the feature information to be processed is: the feature information in the feature map that matches the position of the three-dimensional detection data of the target object indicated in the target sparse matrix, the generated convolution vector is processed, and the feature map is processed except the pending processing.
  • the feature information other than the feature information is not processed, which reduces the calculation amount of the convolution processing of each layer of convolution module, improves the operation efficiency of each layer of convolution module, and can reduce the calculation amount of the neural network. Improve the detection efficiency of target objects.
  • the present disclosure provides a target object detection device, comprising: an acquisition module for acquiring target point cloud data of a target scene collected by a radar device; a generation module for generating all target point cloud data based on the target point cloud data at least one target sparse matrix corresponding to the target point cloud data; the target sparse matrix is used to characterize whether there are target objects at different positions of the target scene; a determination module is used for the at least one target sparse matrix, and all The target point cloud data is used to determine the three-dimensional detection data of the target object included in the target scene.
  • the present disclosure provides an electronic device, including a processor, a memory and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and the The memories communicate with each other through a bus, and when the machine-readable instructions are executed by the processor, the steps of the target object detection method according to the first aspect or any one of the implementation manners are executed.
  • the present disclosure provides a computer-readable storage medium, where a computer program stored thereon is executed by a processor to execute the steps of the target object detection method according to the first aspect or any one of the embodiments.
  • FIG. 1 shows a schematic flowchart of a target object detection method provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic flowchart of a specific method for determining a target sparse matrix corresponding to each layer of convolution modules in a neural network based on target point cloud data in a target object detection method provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of a target area and an initial sparse matrix corresponding to the target area provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of the architecture of a target object detection apparatus provided by an embodiment of the present disclosure
  • FIG. 5 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • object detection and segmentation algorithms based on convolutional neural networks organize a complex computing model through a huge amount of parameters to complete specific tasks.
  • Such computing models often have extremely high requirements on the performance of computing devices, and practical applications
  • problems such as large amount of calculation, large power consumption and high delay in the detection process of target objects, which lead to the complex, computational and time-consuming process of target object detection.
  • an embodiment of the present disclosure provides a target object detection method, which can reduce the computation amount of the neural network and improve the detection efficiency of the target object.
  • the execution body of the method may be a server or a terminal device, for example, the terminal device may be a mobile phone, a tablet computer, a vehicle-mounted computer, or the like.
  • FIG. 1 is a schematic flowchart of a target object detection method provided by an embodiment of the present disclosure. As shown in FIG. 1 , the method includes S101 to S103, wherein:
  • the target sparse matrix is used to represent whether there are target objects at different positions of the target scene.
  • At least one corresponding target sparse matrix may be generated for the obtained target point cloud data, and the target sparse matrix is used to represent whether there are target objects corresponding to the target point cloud data at different positions of the target scene, for example, in the field of autonomous driving, the target objects can be motor vehicles, non-motor vehicles, pedestrians, obstacles, etc. around the autonomous vehicle equipped with radar devices; accordingly, based on the target sparse matrix and target point cloud data, determine the three-dimensional When detecting data, the target position of the target object in the target scene can be determined based on the target sparse matrix, so that the features corresponding to the target position can be processed, while the features corresponding to other positions in the target scene except the target position are not. processing, thereby reducing the amount of calculation required to obtain the three-dimensional detection data of the target object, and improving the detection efficiency.
  • the radar device may be a laser radar, a millimeter-wave radar, or the like, and the embodiment of the present disclosure is described by taking the radar device as a laser radar device as an example.
  • the LiDAR device collects the target point cloud data of the target scene through real-time emission scan lines.
  • the target scene can be any scene.
  • the target scene can be a real-time scene encountered by a vehicle equipped with a lidar device during driving.
  • At least one target sparse matrix corresponding to the target point cloud data may be generated based on the target point cloud data.
  • the target sparse matrix can represent whether there are target objects at different positions of the target scene.
  • the sparse matrix may be a matrix including 0 and 1, and the element value of the sparse matrix is 0 or 1.
  • the value of the matrix element corresponding to the position where the target object exists in the target scene can be set to 1, then the position is the target position; the value of the matrix element corresponding to the position where the target object does not exist in the target scene can be set to 0.
  • generating at least one target sparse matrix corresponding to the target point cloud data based on the target point cloud data may include: determining, based on the target point cloud data, the volume of each layer in the neural network used to detect the target object.
  • the target sparse matrix corresponding to the product module may include: determining, based on the target point cloud data, the volume of each layer in the neural network used to detect the target object.
  • the neural network may be a trained neural network for detecting target objects.
  • the neural network may include multiple layers of convolution modules, and each layer of convolution modules may include one layer of convolution layers.
  • a corresponding target sparse matrix may be determined for each layer of convolution modules, that is, each layer of convolution modules. layer to determine the corresponding target sparse matrix; or, the neural network may include multiple network modules (blocks), and each network module includes a multi-layer convolutional layer.
  • a corresponding target may be determined for each network module.
  • the sparse matrix is to determine a corresponding target sparse matrix for the multi-layer convolutional layers included in the network module.
  • a corresponding target sparse matrix can be determined for each layer of convolution modules in the neural network based on the target point cloud data.
  • the training sample point cloud data can be obtained, and based on the training sample point cloud data, at least one sample sparse matrix corresponding to the training sample point cloud data can be generated, and then the training sample point cloud data can be generated based on the training sample.
  • the point cloud data and the corresponding at least one sample sparse matrix are used to train the neural network, thereby obtaining a trained neural network.
  • a corresponding target sparse matrix can be determined for each layer of the convolution module of the neural network, so that each layer of the convolution module can be based on the target sparse matrix.
  • Input feature map (feature map) to be processed.
  • the target sparse matrix corresponding to each layer of convolution modules in the neural network is determined, which may include:
  • S202 Based on the initial sparse matrix, determine a target sparse matrix that matches the target size of the feature map input to the convolution module of each layer of the neural network.
  • an initial sparse matrix can be generated based on the target point cloud data, and then based on the initial sparse matrix, a corresponding target sparse matrix can be determined for each layer of the convolution module of the neural network, and the target corresponding to each layer of the convolution module.
  • the sparse matrix matches the target size of the feature map input to the convolution module of this layer, so that each layer of the convolution module can process the input feature map based on the target sparse matrix.
  • an initial sparse matrix is generated, including:
  • A1 Determine a target area corresponding to the target point cloud data, and divide the target area into a plurality of grid areas according to a preset number of grids.
  • A2 Determine the matrix element value corresponding to each grid area based on the grid area where the point of the target point cloud corresponding to the target point cloud data is located.
  • A3 based on the matrix element value corresponding to each grid area, generate an initial sparse matrix corresponding to the target point cloud data.
  • the matrix element value of each grid area can be determined. For example, if there is a target point cloud in the grid area , then the matrix element value of the grid area is 1, indicating that there is a target object at the location of the grid area, and then based on the matrix element values corresponding to each grid area, an initial sparse matrix is generated, which is used for subsequent determination of the target object.
  • 3D inspection data provides data support.
  • the target area corresponding to the target point cloud data may be: based on the position when the laser radar device acquires the target point cloud data (for example, take this position as the starting position) and the farthest distance that the laser radar device can detect (for example, , taking the longest distance as the length) to determine the obtained detection area.
  • the target area can be determined according to the actual situation in combination with the target point cloud data.
  • the preset number of grids may be N ⁇ M, and the target area may be divided into N ⁇ M grid areas, where N and M are positive integers.
  • the values of N and M can be set according to actual needs.
  • the target point cloud data includes the position information of multiple points of the target point cloud, and the grid area where each point is located can be determined based on the position information of the points, and further, for each grid area, in When there is a point corresponding to the target point cloud in the grid area, the value of the matrix element corresponding to the grid area can be 1; when there is no point corresponding to the target point cloud in the grid area, the grid area The value of the matrix element corresponding to the grid area can be 0, so the value of the matrix element corresponding to each grid area is determined.
  • an initial sparse matrix corresponding to the target point cloud data can be generated based on the matrix element value corresponding to each grid area, wherein the number of rows and columns of the initial sparse matrix The number corresponds to the number of grids. For example, if the number of grids is N ⁇ M, the number of rows of the initial sparse matrix is N, and the number of columns is M, that is, the initial sparse matrix is an N ⁇ M matrix.
  • the figure includes a laser radar device 31 .
  • the obtained target area 32 is divided into a plurality of grid areas according to the preset number of grids to obtain The divided grid regions 321 .
  • an initial sparse matrix 33 corresponding to the target point cloud data is generated.
  • a target sparse matrix matching the target size of the feature map input to each layer of convolution module of the neural network may be determined based on the initial sparse matrix.
  • the target sparse matrix matching the target size of the feature map input to the convolution module of each layer of the neural network can be determined in the following manner:
  • Mode 1 Based on the initial sparse matrix, determine the output sparse matrix corresponding to each layer of convolution module in the neural network, and use the output sparse matrix as the target sparse matrix.
  • Method 2 Based on the initial sparse matrix, determine the input sparse matrix corresponding to each layer of convolution module in the neural network, and use the input sparse matrix as the target sparse matrix.
  • Method 3 Based on the initial sparse matrix, determine the input sparse matrix and output sparse matrix corresponding to each layer of convolution module in the neural network, fuse the input sparse matrix and the output sparse matrix to obtain a fused sparse matrix, and use the fused sparse matrix as the The target sparse matrix corresponding to the layer convolution module.
  • the target sparse matrix may be obtained from the output sparse matrix, may also be obtained from the input sparse matrix, or may also be obtained by fusing the input sparse matrix and the output sparse matrix.
  • the target sparse matrix can be an input sparse matrix, an output sparse matrix, or an input sparse matrix and an output sparse matrix.
  • the fused sparse matrix generated by the matrix is set to generate the target sparse matrix corresponding to each layer of convolution module, that is, the target sparse matrix can be an input sparse matrix, an output sparse matrix, or an input sparse matrix and an output sparse matrix.
  • this method obtains the target sparse matrix from the output sparse matrix.
  • the output sparse matrix corresponding to each layer of convolution module in the neural network may be determined based on the initial sparse data, and the output sparse matrix is the target sparse matrix.
  • the output sparse matrix can be used to represent whether there are target objects at different positions corresponding to the target scene in the output results of each layer of convolution modules in the neural network.
  • the value of the matrix element at the position corresponding to this position A can be 1; if there is no target object at position A, the output sparse matrix , the value of the matrix element at the position corresponding to the position A may be 0.
  • this method obtains the target sparse matrix from the input sparse matrix.
  • the input sparse matrix corresponding to each layer of convolution modules in the neural network may be determined based on the initial sparse data, and the input sparse matrix is the target sparse matrix.
  • the input sparse matrix may represent whether there are target objects at different positions corresponding to the target scene in the input data of each layer of convolution modules in the neural network.
  • the value of the matrix element at the position corresponding to this position A can be 1 ; If there is no target object at the position A, in the input sparse matrix, the value of the matrix element at the position corresponding to the position A can be 0.
  • the output sparse matrix corresponding to each layer of convolution module can be determined by mode 1, and the input sparse matrix corresponding to each layer of convolution module can be determined by mode 2, and the input sparse matrix corresponding to each layer of convolution module can be determined. It is fused with the output sparse matrix to obtain a fused sparse matrix, and the fused sparse matrix is used as the target sparse matrix corresponding to the convolution module of this layer.
  • the intersection of the input sparse matrix and the output sparse matrix can be taken to obtain a fused sparse matrix; the union of the input sparse matrix and the output sparse matrix can also be taken to obtain the fused sparse matrix.
  • the input sparse matrix is:
  • the input sparse matrix corresponding to each layer of convolution modules in the neural network is determined, which may include:
  • i is a positive integer greater than 1 and less than n+1, where n is the total number of layers of the convolution module of the neural network.
  • the initial sparse matrix can be used as the input sparse matrix corresponding to the first-layer convolution module of the neural network.
  • the input sparse matrix corresponding to the second layer convolution module can be obtained from the input sparse matrix corresponding to the first layer convolution module, and the number of rows and columns of the input sparse matrix corresponding to the second layer convolution module is the same as the number of rows and columns input to the second layer
  • the target size of the feature maps of the convolution module is consistent.
  • an image expansion processing operation or an image erosion processing operation can be used to process the input sparse matrix corresponding to the first-layer convolution module to obtain a processed sparse matrix, and the number of rows and columns of the processed sparse matrix is adjusted to After the target size of the feature map input by the second-layer convolution module is matched, the input sparse matrix of the second-layer convolution module is obtained.
  • the input sparse matrix corresponding to the first layer convolution module, the input sparse matrix corresponding to the second layer convolution module, ..., the input sparse matrix corresponding to the nth layer convolution module (that is, the last layer of the neural network) can be obtained.
  • input sparse matrix corresponding to the convolution module input sparse matrix corresponding to the convolution module).
  • a dilation processing range can be predetermined, and image dilation processing is performed on the input sparse matrix based on the dilation processing range to obtain a processed sparse matrix, wherein the dilation processing range can be determined based on the size threshold of the target object, or can be determined according to the size threshold of the target object. actually needs to be determined.
  • the dilated sparse matrix can be:
  • the erosion process of the input sparse matrix is the inverse process of the expansion process.
  • the erosion process range can be predetermined, and the input sparse matrix is subjected to image erosion processing based on the erosion process range to obtain the processed sparse matrix.
  • the corrosion processing range may be determined based on the size threshold of the target object, or may be determined according to actual needs.
  • the sparse matrix after erosion processing can be:
  • the number of rows and columns of the processed sparse matrix can be adjusted to a matrix matching the target size of the feature map input by the second-layer convolution module by means of up-sampling or down-sampling to obtain the second The input sparse matrix of the layer convolution module, wherein, there are various processes for adjusting the number of rows and columns of the processed sparse matrix, which are only illustrative here.
  • the sparse degree of the sparse matrix can also be adjusted.
  • the sparse degree of the sparse matrix can be adjusted by adjusting the number of grids; or the sparse degree of the sparse matrix can also be adjusted through the erosion process.
  • the sparse degree of the sparse matrix is: the ratio of the number of matrix elements with a matrix element value of 1 in the sparse matrix to the total number of all matrix elements included in the sparse matrix.
  • the initial sparse matrix can be used as the input sparse matrix corresponding to the first-layer convolution module, and the input sparse matrix of each layer of convolution modules can be determined in turn, and then the target sparse matrix can be determined based on the input sparse matrix, as Subsequently, based on the target sparse matrix of each layer of convolution module, the 3D detection data of the target object is determined to provide data support.
  • the output sparse matrix corresponding to each layer of convolution module in the neural network is determined, which may include:
  • C3 based on the output sparse matrix corresponding to the j+1th layer convolution module, generate an output sparse matrix corresponding to the jth layer convolution module and matching the target size of the feature map input by the jth layer convolution module, where j is a positive integer greater than or equal to 1 and less than n, where n is the total number of layers of the convolution module of the neural network.
  • the processed sparse matrix is the output sparse matrix corresponding to the neural network.
  • the output sparse matrix determine the output sparse matrix of the nth layer convolution module of the neural network (that is, the last layer of the convolution module of the neural network), and so on to obtain the output sparse matrix of the n-1th layer convolution module, ... , the output sparse matrix of the second-layer convolution module, and the output sparse matrix of the first-layer convolution module.
  • the image expansion processing operation or the image erosion processing operation can be used to process the output sparse matrix corresponding to the convolution module of the previous layer to obtain the processed sparse matrix, and the number of rows and columns of the processed sparse matrix can be obtained. After adjusting to match the target size of the feature map input by the current layer convolution module, the output sparse matrix of the current layer convolution module is obtained.
  • the process of determining the output sparse matrix of the convolution module of each layer reference may be made to the above-mentioned process of determining the input sparse matrix, which will not be described in detail here.
  • the target sparse matrix of each convolution module of the neural network is obtained by the fusion of the input sparse matrix and the output sparse matrix
  • the output sparse matrix and the input sparse matrix of each convolution module of each layer can be obtained by using the above method, respectively.
  • the obtained output sparse matrix and the input sparse matrix are fused to obtain the target sparse matrix of each convolution module.
  • the output sparse matrix can be determined based on the initial sparse matrix, and the output sparse matrix of the n-th layer convolution module, .
  • the output sparse matrix of the layer determines the target sparse matrix, which provides data support for the subsequent determination of the 3D detection data of the target object based on the target sparse matrix of the convolution module of each layer.
  • three-dimensional detection data of the target object included in the target scene may be determined based on at least one target sparse matrix, target point cloud data, and a neural network for detecting the target object.
  • the three-dimensional detection data includes the coordinates of the center point of the detection frame of the target object, the three-dimensional size of the detection frame, the orientation angle of the detection frame, the type of the detection frame, the confidence level of the detection frame, the ID of the target tracking, the speed and acceleration of the target object and one or more of timestamps, etc.
  • the position of the 3D detection frame of the target object cannot exceed the position of the target area, that is, if the coordinates of the center point of the 3D detection frame are (X, Y, Z) and the dimensions are length L, width W, and height H, the following conditions are satisfied Condition: 0 ⁇ X-2/L, X+2/L ⁇ N max , 0 ⁇ YW/2, Y+W/2 ⁇ M max , where N max and M max are the length and width thresholds of the target area .
  • the three-dimensional detection data of the target object included in the target scene is determined based on at least one target sparse matrix and target point cloud data, including:
  • Step 1 Based on the target point cloud data, a target point cloud feature map corresponding to the target point cloud data is generated.
  • Step 2 Based on the target point cloud feature map and at least one target sparse matrix, use a neural network for detecting the target object to determine the three-dimensional detection data of the target object included in the target scene, wherein the neural network includes a multi-layer convolution module.
  • the target point cloud data can be input into the neural network, the target point cloud data can be preprocessed, the target point cloud feature map corresponding to the target point cloud data can be generated, and then the target point cloud feature map, at least one target A sparse matrix, and a neural network, determine the three-dimensional detection data of the target object included in the target scene.
  • the feature map corresponding to the target point cloud data ie, corresponding to the target point cloud
  • the target point cloud feature map is simply referred to as the target point cloud feature map.
  • a target point cloud feature map corresponding to the target point cloud data is generated, which may include:
  • the feature information corresponding to the grid area is determined based on the coordinate information indicated by the target point cloud data corresponding to the points of the target point cloud located in the grid area; The number of grids, generated by dividing the target area corresponding to the target point cloud data.
  • a target point cloud feature map corresponding to the target point cloud data is generated.
  • the coordinate information indicated by the target point cloud data corresponding to each point constitutes the feature information corresponding to the grid area;
  • the feature information of the grid area can be 0.
  • the target point cloud feature map corresponding to the target point cloud data is generated.
  • the size of the target point cloud feature map can be N ⁇ M ⁇ C
  • the size of the target point cloud feature map N ⁇ M is consistent with the size of the target sparse matrix of the first-layer convolution module
  • the C It can be the maximum number of points of the target point cloud included in each grid area. For example, if the grid area A includes the largest number of points in the target point cloud in each grid area, There are 50 points in the target point cloud, and the value of C is 50, that is, the target point cloud feature map includes 50 feature maps with a size of N ⁇ M, and each feature map includes at least one point in the target point cloud. Coordinate information .
  • a target point cloud feature map corresponding to the target point cloud data is generated, and the target point cloud feature map includes the position information of each point of the target point cloud, and then Based on the target point cloud feature map and the at least one target sparse matrix, the three-dimensional detection data of the target object included in the target scene can be more accurately determined.
  • the three-dimensional detection data of the target object included in the target scene may be determined based on the target point cloud feature map, at least one target sparse matrix, and the neural network.
  • the three-dimensional detection data of the target object included in the target scene can be determined in the following two ways:
  • Method 1 Based on the target point cloud feature map and at least one target sparse matrix, determine the three-dimensional detection data of the target object included in the target scene, including:
  • module Based on the target sparse matrix corresponding to the k-th layer convolution module in the neural network, determine the feature information to be convolved in the target point cloud feature map input to the k-th layer convolution module, and use the k-th layer convolution of the neural network. module, which performs convolution processing on the feature information to be convolved in the target point cloud feature map of the kth layer convolution module, and generates a feature map input to the k+1th layer convolution module, where k is greater than 1 and less than A positive integer of n, where n is the total number of layers of the convolution module of the neural network.
  • the module Based on the target sparse matrix corresponding to the nth layer convolution module in the neural network, determine the feature information to be convolved in the target point cloud feature map input to the nth layer convolution module, and use the nth layer convolution of the neural network.
  • the module performs convolution processing on the feature information to be convolved in the target point cloud feature map of the nth layer convolution module to obtain the three-dimensional detection data of the target object included in the target scene.
  • the target sparse matrix of the first-layer convolution module can be used to determine the feature information to be convolved in the target point cloud feature map input to the first-layer convolution module.
  • the target position with the matrix value of 1 in the target sparse matrix may be determined, and the feature information of the position corresponding to the target position in the target point cloud feature map is determined as the feature information to be convolved.
  • the convolution module of the first layer is used to perform convolution processing on the feature information to be convolved in the feature map of the target point cloud, and the feature map input to the convolution module of the second layer is generated. Then use the target sparse matrix of the second-layer convolution module to determine the information to be convolved in the feature map input to the second-layer convolution module, and use the second-layer convolution module to analyze the features of the second-layer convolution module
  • the feature information to be convolved in the figure is processed by convolution to generate a feature map input to the third-layer convolution module, and so on to obtain the input to the n-th layer convolution module (the last layer of the convolution module in the neural network).
  • the three-dimensional detection data of the target object included in the target scene is obtained by determining the to-be-convolutional information of the n-th layer convolution module, and performing convolution processing on the to-be-convolved information of the n-th layer convolution module.
  • the feature information to be convoluted can be determined based on the target sparse matrix of each layer of convolution module and the input feature map, and the feature information to be convolved is subjected to convolution processing.
  • Convolution processing is not performed on other feature information of the neural network, which reduces the calculation amount of convolution processing of each layer of convolution module, improves the operation efficiency of each layer of convolution module, which can reduce the calculation amount of the neural network and improve the target object. detection efficiency.
  • Method 2 Based on the target point cloud feature map and at least one target sparse matrix, determine the three-dimensional detection data of the target object included in the target scene, including:
  • the convolution vector corresponding to the convolution module of each layer may also be determined based on the target input matrix corresponding to the convolution module of each layer and the feature map input to the convolution module of the layer. For example, for the first-layer convolution module, the target position with the matrix value of 1 in the target sparse matrix of the first-layer convolution module can be determined, and the feature information of the position corresponding to the target position in the target point cloud feature map can be determined to extract The feature information corresponding to the target position constitutes the convolution vector corresponding to the first-layer convolution module.
  • the img2col and col2img technologies can be used to perform matrix multiplication operations on the corresponding convolution vectors through the first-layer convolution module to obtain a feature map input to the second convolution module.
  • the feature map input to the convolution module of the last layer can be obtained.
  • the convolution vector corresponding to the convolution module of the last layer is determined.
  • the convolution vector corresponding to the convolution module of the last layer is processed to determine the three-dimensional detection data of the target object included in the target scene.
  • a convolution vector corresponding to each layer of convolution module can be generated based on the target sparse matrix of each layer of convolution module and the input feature map, and the convolution vector includes the feature information to be processed in the feature map.
  • the feature information to be processed is: the feature information in the feature map that matches the position of the three-dimensional detection data of the target object indicated in the target sparse matrix, the generated convolution vector is processed, and the feature map is processed except the pending processing.
  • the feature information other than the feature information is not processed, which reduces the calculation amount of the convolution processing of each layer of convolution module, improves the operation efficiency of each layer of convolution module, and can reduce the calculation amount of the neural network. Improve the detection efficiency of target objects.
  • the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
  • an embodiment of the present disclosure also provides a target object detection apparatus.
  • a schematic diagram of the architecture of the target object detection apparatus provided by the embodiment of the present disclosure includes an acquisition module 401 , a generation module 402 and a determination Module 403.
  • an acquisition module 401 configured to acquire target point cloud data of a target scene collected by a radar device
  • a generating module 402 configured to generate at least one target sparse matrix corresponding to the target point cloud data based on the target point cloud data; the target sparse matrix is used to represent whether there are target objects at different positions of the target scene;
  • the determining module 403 is configured to determine the three-dimensional detection data of the target object included in the target scene based on the at least one target sparse matrix and the target point cloud data.
  • the generating module 402 when generating at least one target sparse matrix corresponding to the target point cloud data based on the target point cloud data, is configured to: based on the target point cloud data, A target sparse matrix corresponding to each layer of convolution modules in the neural network for detecting the target object is determined.
  • the generating module 402 determines, based on the target point cloud data, the target sparse matrix corresponding to each layer of convolution modules in the neural network for detecting the target object, for: generating an initial sparse matrix based on the target point cloud data; and determining a target sparse matrix matching the target size of the feature map input to each layer of convolution module of the neural network based on the initial sparse matrix.
  • the generating module 402 when generating an initial sparse matrix based on the target point cloud data, is used for:
  • an initial sparse matrix corresponding to the target point cloud data is generated.
  • the generation module 402 determines a target sparse matrix matching the target size of the feature map input to each layer of the convolution module of the neural network based on the initial sparse matrix, use At:
  • the fusion sparse matrix is used as the target sparse matrix corresponding to the convolution module of this layer.
  • the generating module 402 when determining the input sparse matrix corresponding to each layer of convolution module in the neural network based on the initial sparse matrix, is configured to: generate the initial sparse matrix as the input sparse matrix corresponding to the convolution module of the first layer of the neural network; based on the input sparse matrix corresponding to the convolution module of the i-1 layer, determine the volume corresponding to the convolution module of the i-th layer and the volume of the i-th layer The input sparse matrix matching the target size of the feature map input by the product module; wherein, i is a positive integer greater than 1 and less than n+1, and n is the total number of layers of the convolution module of the neural network.
  • the generation module 402 when determining, based on the initial sparse matrix, a target sparse matrix that matches the target size of the feature map input by the convolution module of each layer of the neural network, uses In: determining the output sparse matrix corresponding to the neural network based on the size threshold of the target object and the initial sparse matrix;
  • an output sparse matrix corresponding to the nth layer convolution module and matching the target size of the feature map input by the nth layer convolution module is generated;
  • an output sparse matrix corresponding to the jth layer convolution module and matching the target size of the feature map input by the jth layer convolution module is generated, where j is a positive integer greater than or equal to 1 and less than n, where n is the total number of layers of the convolution module of the neural network.
  • the determining module 403 uses At:
  • a neural network for detecting target objects is used to determine three-dimensional detection data of the target objects included in the target scene, wherein the neural network includes multiple layers convolution module.
  • the determining module 403 when generating the target point cloud feature map corresponding to the target point cloud data based on the target point cloud data, is used for:
  • the feature information corresponding to the grid area is determined based on the coordinate information indicated by the target point cloud data corresponding to the points of the target point cloud located in the grid area; wherein, the grid area The area is generated by dividing the target area corresponding to the target point cloud data according to the preset number of grids;
  • a target point cloud feature map corresponding to the target point cloud data is generated.
  • the determining module 403 determines, based on the target point cloud feature map and the at least one target sparse matrix, the target object included in the target scene by using a neural network for detecting target objects.
  • a neural network for detecting target objects.
  • the feature information to be convolved in the target point cloud feature map is subjected to convolution processing to generate a feature map input to the second-layer convolution module;
  • the product module Based on the target sparse matrix corresponding to the kth layer convolution module in the neural network, determine the feature information to be convolved in the feature map input to the kth layer convolution module, and use the kth layer volume of the neural network.
  • the product module performs convolution processing on the feature information to be convolved in the feature map of the kth layer convolution module, and generates a feature map input to the k+1th layer convolution module, where k is greater than 1 and less than A positive integer of n, n is the total number of layers of the convolution module of the neural network;
  • the product module Based on the target sparse matrix corresponding to the nth layer convolution module in the neural network, determine the feature information to be convolved in the feature map input to the nth layer convolution module, and use the nth layer volume of the neural network.
  • the product module performs convolution processing on the feature information to be convolved in the feature map of the nth layer convolution module to obtain the three-dimensional detection data of the target object included in the target scene.
  • the determining module 403 determines, based on the target point cloud feature map and the at least one target sparse matrix, the target object included in the target scene by using a neural network for detecting target objects.
  • a neural network for detecting target objects.
  • the convolution vector corresponding to the convolution module based on the convolution vector corresponding to the convolution module of this layer, determine the feature map input to the convolution module of the next layer; based on the target sparse matrix and input corresponding to the convolution module of the last layer
  • To the feature map of the convolution module of the last layer determine the convolution vector corresponding to the convolution module of the last layer; based on the convolution vector corresponding to the convolution module of the last layer, determine the target object included in the target scene. 3D inspection data.
  • the functions or templates included in the apparatus provided by the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments.
  • the functions or templates included in the apparatus provided by the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments.
  • an embodiment of the present disclosure further provides an electronic device including a processor 501 , a memory 502 and a bus 503 .
  • the memory 502 is used to store execution instructions, including the memory 5021 and the external memory 5022; the memory 5021 here is also called the internal memory, which is used to temporarily store the operation data in the processor 501 and the data exchanged with the external memory 5022 such as the hard disk,
  • the processor 501 exchanges data with the external memory 5022 through the memory 5021.
  • the processor 501 communicates with the memory 502 through the bus 503, so that the processor 501 executes the following instructions:
  • At least one target sparse matrix corresponding to the target point cloud data is generated; the target sparse matrix is used to represent whether there are target objects at different positions of the target scene;
  • three-dimensional detection data of the target object included in the target scene is determined.
  • an embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the target object detection method described in the above method embodiments is executed.
  • the computer program product of the target object detection method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the target object detection method described in the above method embodiments. For details, refer to the above method embodiments, which will not be repeated here.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium.
  • the computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种目标对象检测方法、装置、电子设备及存储介质,该方法包括:获取雷达装置采集的目标场景的目标点云数据;基于所述目标点云数据,生成所述目标点云数据对应的至少一个目标稀疏矩阵;所述目标稀疏矩阵用于表征所述目标场景的不同位置处是否具有目标对象;基于所述至少一个目标稀疏矩阵、和所述目标点云数据,确定所述目标场景中包括的目标对象的三维检测数据。

Description

目标对象检测方法、装置、电子设备及存储介质
相关申请的交叉引用
本专利申请要求于2020年7月22日提交的、申请号为202010712645.3、发明名称为“目标对象检测方法、装置、电子设备及存储介质”的中国专利申请的优先权,该申请以引用的方式并入本文中。
技术领域
本公开涉及激光雷达技术领域,具体而言,涉及一种目标对象检测方法、装置、电子设备及存储介质。
背景技术
一般的,物体检测与分割算法是较多人工智能应用的核心算法,比如,物体检测与分割算法可以应用于自动驾驶领域中,在车辆行驶过程中,对该车辆周围的机动车辆、非机动车辆、行人、障碍物等进行检测,以避免发生碰撞。
卷积神经网络是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Networks),是深度学习的算法之一,在人工智能场景中被广泛使用。基于卷积神经网络的检测和分割算法一般通过庞大的参数量组织一个复杂的计算模型来完成具体的任务,这样的计算模型往往对计算设备的性能有着极高的要求,并且实际应用中存在计算量大、功耗大、延迟高的问题,导致目标对象的检测过程较为复杂、计算量较大、耗时较长。
发明内容
有鉴于此,本公开至少提供一种目标对象检测方法、装置、电子设备及存储介质。
第一方面,本公开提供了一种目标对象检测方法,包括:获取雷达装置采集的目标场景的目标点云数据;基于所述目标点云数据,生成所述目标点云数据对应的至少一个目标稀疏矩阵;所述目标稀疏矩阵用于表征所述目标场景的不同位置处是否具有目标对象;基于所述至少一个目标稀疏矩阵、和所述目标点云数据,确定所述目标场景中包括的目标对象的三维检测数据。
采用上述方法,可以为获取到的目标点云数据生成对应的至少一个目标稀疏矩阵,该目标稀疏矩阵用于表征目标场景的不同位置处是否具有目标对象;这样,在基于目标稀疏矩阵和目标点云数据,确定目标对象的三维检测数据时,可以基于目标稀疏矩阵,确定存在的目标对象的目标位置,从而可以将与该目标位置对应的特征进行处理,将不同位置中除目标位置之外的其他位置对应的特征不进行处理,这样就减少了得到目标对象的三维检测数据的计算量,提高了检测效率。
一种可能的实施方式中,基于所述目标点云数据,生成所述目标点云数据对应的至少一个目标稀疏矩阵,包括:基于所述目标点云数据,确定用于检测所述目标对象的神经网络中每一层卷积模块对应的目标稀疏矩阵。
上述实施方式下,可以基于目标点云数据,为神经网络的每一层卷积模块确定对应 的目标稀疏矩阵,使得每一层卷积模块可以基于目标稀疏矩阵对输入的特征图进行处理。
一种可能的实施方式中,基于所述目标点云数据,确定用于检测所述目标对象的所述神经网络中每一层卷积模块对应的目标稀疏矩阵,包括:基于所述目标点云数据,生成初始稀疏矩阵;基于所述初始稀疏矩阵,确定与输入至所述神经网络的每一层卷积模块的特征图的目标尺寸匹配的目标稀疏矩阵。
上述实施方式下,可以基于目标点云数据,生成初始稀疏矩阵,再基于初始稀疏矩阵,为神经网络的每一层卷积模块确定对应的目标稀疏矩阵,且每一层卷积模块对应的目标稀疏矩阵与输入至该层卷积模块的特征图的目标尺寸相匹配,使得每一层卷积模块可以基于目标稀疏矩阵对输入的特征图进行处理。
一种可能的实施方式中,基于所述目标点云数据,生成初始稀疏矩阵,包括:确定所述目标点云数据对应的目标区域;按照预设的栅格数量,将所述目标区域划分为多个栅格区域;基于所述目标点云数据对应的目标点云的点所处的栅格区域,确定每个栅格区域对应的矩阵元素值;基于每个栅格区域对应的矩阵元素值,生成所述目标点云数据对应的初始稀疏矩阵。
这里,可以基于目标点云数据,判断每个栅格区域中是否存在目标点云的点,基于判断结果,确定每个栅格区域的矩阵元素值,比如,若栅格区域中存在目标点云的点,则该栅格区域的矩阵元素值为1,表征该栅格区域存在目标对象,进而基于各个栅格区域对应的矩阵元素值,生成了初始稀疏矩阵,为后续确定目标对象的三维检测数据提供了数据支持。
一种可能的实施方式中,基于所述初始稀疏矩阵,确定与输入至所述神经网络的每一层卷积模块的特征图的目标尺寸匹配的目标稀疏矩阵,包括以下任一:基于所述初始稀疏矩阵,确定所述神经网络中,每一层卷积模块对应的输出稀疏矩阵,将该输出稀疏矩阵作为所述目标稀疏矩阵;基于所述初始稀疏矩阵,确定所述神经网络中,每一层卷积模块对应的输入稀疏矩阵,将该输入稀疏矩阵作为所述目标稀疏矩阵;基于所述初始稀疏矩阵,确定所述神经网络中,每一层卷积模块对应的输入稀疏矩阵和输出稀疏矩阵,将所述输入稀疏矩阵和输出稀疏矩阵进行融合,得到融合稀疏矩阵,将所述融合稀疏矩阵作为该层卷积模块对应的目标稀疏矩阵。
上述实施方式中,设置多种方式,生成每一层卷积模块对应的目标稀疏矩阵,即目标稀疏矩阵可以为输入稀疏矩阵,也可以为输出稀疏矩阵,还可以为基于输入稀疏矩阵和输出稀疏矩阵生成的融合稀疏矩阵。
一种可能的实施方式中,所述基于所述初始稀疏矩阵,确定所述神经网络中,每一层卷积模块对应的输入稀疏矩阵,包括:将所述初始稀疏矩阵作为所述神经网络的第一层卷积模块对应的输入稀疏矩阵;基于第i-1层卷积模块对应的输入稀疏矩阵,确定第i层卷积模块对应的、与所述第i层卷积模块输入的特征图的目标尺寸匹配的输入稀疏矩阵;其中,i为大于1、且小于n+1的正整数,n为所述神经网络的卷积模块的总层数。
上述方式中,可以将初始稀疏矩阵作为第一层卷积模块对应的输入稀疏矩阵,并依次确定得到每一层卷积模块的输入稀疏矩阵,进而可以基于该输入稀疏矩阵确定目标稀疏矩阵,为后续基于每一层卷积模块的目标稀疏矩阵,确定目标对象的三维检测数据提供了数据支持。
一种可能的实施方式中,所述基于所述初始稀疏矩阵,确定所述神经网络中,每一 层卷积模块对应的输出稀疏矩阵,包括:基于所述目标对象的尺寸阈值和所述初始稀疏矩阵,确定所述神经网络对应的输出稀疏矩阵;基于所述输出稀疏矩阵,生成第n层卷积模块对应的、与所述第n层卷积模块输入的特征图的目标尺寸匹配的输出稀疏矩阵;基于第j+1层卷积模块对应的输出稀疏矩阵,生成第j层卷积模块对应的、与所述第j层卷积模块输入的特征图的目标尺寸匹配的输出稀疏矩阵,其中,j为大于等于1、且小于n的正整数,n为所述神经网络的卷积模块的总层数。
上述方式中,可以基于初始稀疏矩阵,确定输出稀疏矩阵,利用输出稀疏矩阵依次确定第n层卷积模块的输出稀疏矩阵、…、第一层卷积模块的输出稀疏矩阵,进而可以基于每一层的输出稀疏矩阵确定目标稀疏矩阵,为后续基于每一层卷积模块的目标稀疏矩阵,确定目标对象的三维检测数据提供了数据支持。
一种可能的实施方式中,基于所述至少一个目标稀疏矩阵、和所述目标点云数据,确定所述目标场景中包括的目标对象的三维检测数据,包括:基于所述目标点云数据,生成所述目标点云数据对应的目标点云特征图;基于所述目标点云特征图和所述至少一个目标稀疏矩阵,利用检测目标对象的神经网络,确定所述目标场景中包括的目标对象的三维检测数据,其中,所述神经网络中包括多层卷积模块。
一种可能的实施方式中,基于所述目标点云数据,生成所述目标点云数据对应的目标点云特征图,包括:针对每个栅格区域,基于位于所述栅格区域内的目标点云的点对应的目标点云数据所指示的坐标信息,确定所述栅格区域对应的特征信息;其中,所述栅格区域为按照预设的栅格数量,将所述目标点云数据对应的目标区域划分生成的;基于每个栅格区域对应的特征信息,生成所述目标点云数据对应的目标点云特征图。
在上述实施方式下,基于每个栅格区域对应的特征信息,生成了目标点云数据对应的目标点云特征图,目标点云特征图中包括每个目标点云的点的位置信息,进而基于目标点云特征图和至少一个目标稀疏矩阵,可以较准确的确定目标场景中包括的目标对象的三维检测数据。
一种可能的实施方式中,基于所述目标点云特征图和所述至少一个目标稀疏矩阵,利用检测目标对象的神经网络,确定所述目标场景中包括的目标对象的三维检测数据,包括:基于所述神经网络中第一层卷积模块对应的目标稀疏矩阵,确定所述目标点云特征图中的待卷积特征信息,利用所述第一层卷积模块,对所述目标点云特征图中的所述待卷积特征信息进行卷积处理,生成输入至第二层卷积模块的特征图;基于所述神经网络中第k层卷积模块对应的目标稀疏矩阵,确定输入至所述第k层卷积模块的特征图中的待卷积特征信息,利用所述神经网络的第k层卷积模块,对所述第k层卷积模块的特征图中的待卷积特征信息进行卷积处理,生成输入至第k+1层卷积模块的特征图,其中,k为大于1、小于n的正整数,n为所述神经网络的卷积模块的总层数;基于所述神经网络中第n层卷积模块对应的目标稀疏矩阵,确定输入至所述第n层卷积模块的特征图中的待卷积特征信息,利用所述神经网络的第n层卷积模块,对所述第n层卷积模块的特征图中的待卷积特征信息进行卷积处理,得到所述目标场景中包括的目标对象的三维检测数据。
这里,可以基于每一层卷积模块的目标稀疏矩阵和输入的特征图,确定待卷积特征信息,对待卷积特征信息进行卷积处理,对特征图中除待卷积特征信息之外的其他特征信息不进行卷积处理,减少了每一层卷积模块进行卷积处理的计算量,提高了每一层卷 积模块的运算效率,进而可以减少神经网络的运算量,提高目标对象的检测效率。
一种可能的实施方式中,基于所述目标点云特征图和所述至少一个目标稀疏矩阵,利用检测目标对象的神经网络,确定所述目标场景中包括的目标对象的三维检测数据,包括:针对所述神经网络中除最后一层卷积模块之外的其他每一层卷积模块,基于该层卷积模块对应的目标稀疏矩阵和输入至该层卷积模块的特征图,确定该层卷积模块对应的卷积向量;基于该层卷积模块对应的所述卷积向量,确定输入至下一层卷积模块的特征图;基于最后一层卷积模块对应的目标稀疏矩阵和输入至最后一层卷积模块的特征图,确定最后一层卷积模块对应的卷积向量;基于最后一层卷积模块对应的所述卷积向量,确定所述目标场景中包括的目标对象的三维检测数据。
这里,可以基于每一层卷积模块的目标稀疏矩阵和输入的特征图,生成每一层卷积模块对应的卷积向量,该卷积向量中包括特征图中的待处理的特征信息,该待处理的特征信息为:与目标稀疏矩阵中指示的存在目标对象的三维检测数据的位置匹配的、特征图中的特征信息,对生成的卷积向量进行处理,而对特征图中除待处理的特征信息之外的其他特征信息不进行处理,减少了每一层卷积模块进行卷积处理的计算量,提高了每一层卷积模块的运算效率,进而可以减少神经网络的运算量,提高目标对象的检测效率。
以下装置、电子设备等的效果描述参见上述方法的说明,这里不再赘述。
第二方面,本公开提供了一种目标对象检测装置,包括:获取模块,用于获取雷达装置采集的目标场景的目标点云数据;生成模块,用于基于所述目标点云数据,生成所述目标点云数据对应的至少一个目标稀疏矩阵;所述目标稀疏矩阵用于表征所述目标场景的不同位置处是否具有目标对象;确定模块,用于基于所述至少一个目标稀疏矩阵、和所述目标点云数据,确定所述目标场景中包括的目标对象的三维检测数据。
第三方面,本公开提供一种电子设备,包括处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如上述第一方面或任一实施方式所述的目标对象检测方法的步骤。
第四方面,本公开提供一种计算机可读存储介质,其上存储的计算机程序被处理器运行时执行如上述第一方面或任一实施方式所述的目标对象检测方法的步骤。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍。这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种目标对象检测方法的流程示意图;
图2示出了本公开实施例所提供的一种目标对象检测方法中,基于目标点云数据,确定神经网络中每一层卷积模块对应的目标稀疏矩阵的具体方式的流程示意图;
图3示出了本公开实施例所提供的一种目标区域和该目标区域对应的初始稀疏矩阵 的示意图;
图4示出了本公开实施例所提供的一种目标对象检测装置的架构示意图;
图5示出了本公开实施例所提供的一种电子设备的结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
一般的,基于卷积神经网络的物体检测与分割算法通过庞大的参数量组织一个复杂的计算模型来完成具体的任务,这样的计算模型往往对计算设备的性能有着极高的要求,并且实际应用中存在计算量大、功耗大、延迟高的问题,导致目标对象的检测过程较为复杂、计算量较大、耗时较长。
为了解决上述问题,本公开实施例提供了一种目标对象检测方法,可以减少神经网络的运算量,提高目标对象的检测效率。
为便于对本公开实施例进行理解,下面对本公开实施例所公开的一种目标对象检测方法进行详细介绍。
该方法的执行主体可以为服务器,也可以为终端设备,比如,终端设备可以为手机、平板电脑、车载电脑等。
图1为本公开实施例所提供的一种目标对象检测方法的流程示意图,如图1所示,该方法包括S101至S103,其中:
S101,获取雷达装置采集的目标场景的目标点云数据。
S102,基于目标点云数据,生成目标点云数据对应的至少一个目标稀疏矩阵;目标稀疏矩阵用于表征目标场景的不同位置处是否具有目标对象。
S103,基于至少一个目标稀疏矩阵、和目标点云数据,确定目标场景中包括的目标对象的三维检测数据。
上述方法中,可以为获取到的目标点云数据生成对应的至少一个目标稀疏矩阵,该目标稀疏矩阵用于表征目标场景的不同位置处是否具有目标点云数据所对应的目标对象,例如,在自动驾驶领域,目标对象可以为装有雷达装置的自动驾驶车辆周围的机动车辆、非机动车辆、行人、障碍物等;相应地,在基于目标稀疏矩阵和目标点云数据,确定目标对象的三维检测数据时,可以基于目标稀疏矩阵,确定目标场景中存在目标对象的目标位置,从而可以将与该目标位置对应的特征进行处理,而目标场景中除目标位置之外的其他位置对应的特征不进行处理,进而减少了得到目标对象的三维检测数据所需的计算量,提高了检测效率。
针对S101:雷达装置可以为激光雷达、毫米波雷达等,本公开实施例以雷达装置为激光雷达装置为例进行说明。激光雷达装置通过实时的发射扫描线,采集目标场景的目标点云数据。其中,目标场景可以为任一场景,比如,在自动驾驶领域,目标场景可以 为配置有激光雷达装置的车辆在行驶过程中遇到的实时场景。
针对S102:在获取到目标场景的目标点云数据之后,可以基于目标点云数据,生成目标点云数据对应的至少一个目标稀疏矩阵。其中,目标稀疏矩阵可以表征目标场景的不同位置处是否具有目标对象。
这里,该稀疏矩阵可以为包括0和1的矩阵,该稀疏矩阵的元素值为0或1。比如,可以将目标场景中存在目标对象的位置处对应的矩阵元素值设置为1,则该位置为目标位置;可以将目标场景中不存在目标对象的位置处对应的矩阵元素值设置为0。
一种可选实施方式中,基于目标点云数据,生成目标点云数据对应的至少一个目标稀疏矩阵,可以包括:基于目标点云数据,确定用于检测目标对象的神经网络中每一层卷积模块所对应的目标稀疏矩阵。
这里,该神经网络可以为已训练的、用于对目标对象进行检测的神经网络。该神经网络中可以包括多层卷积模块,每层卷积模块中可以包括一层卷积层,具体实施时,可以为每层卷积模块确定对应的目标稀疏矩阵,即为每层卷积层确定对应的目标稀疏矩阵;或者,该神经网络中可以包括多个网络模块(block),每个网络模块中包括多层卷积层,具体实施时,可以为每个网络模块确定对应的目标稀疏矩阵,即为网络模块中包括的多层卷积层确定对应的一个目标稀疏矩阵。其中,用于检测目标对象的神经网络的结构可以根据需要进行设置,此处仅为示例性说明。
对于已训练的、用于检测目标对象的神经网络,可以基于目标点云数据,为神经网络中的每一层卷积模块确定对应的目标稀疏矩阵。
在对用于检测目标对象的神经网络进行训练时,可以获取训练样本点云数据,并基于训练样本点云数据,生成该训练样本点云数据对应的至少一个样本稀疏矩阵,进而可以基于训练样本点云数据和对应的至少一个样本稀疏矩阵,训练该神经网络,从而得到已训练的神经网络。
上述实施方式中,可以基于目标点云数据,为神经网络的每一层卷积模块确定对应的目标稀疏矩阵,使得每一层卷积模块可以基于目标稀疏矩阵对输入的特征图(feature map)进行处理。
一种可选实施方式中,参见图2所示,基于目标点云数据,确定神经网络中每一层卷积模块对应的目标稀疏矩阵,可以包括:
S201:基于目标点云数据,生成初始稀疏矩阵。
S202:基于初始稀疏矩阵,确定与输入至神经网络的每一层卷积模块的特征图的目标尺寸匹配的目标稀疏矩阵。
上述实施方式中,可以基于目标点云数据,生成初始稀疏矩阵,再基于初始稀疏矩阵,为神经网络的每一层卷积模块确定对应的目标稀疏矩阵,且每一层卷积模块对应的目标稀疏矩阵与输入至该层卷积模块的特征图的目标尺寸相匹配,使得每一层卷积模块可以基于目标稀疏矩阵对输入的特征图进行处理。
针对S201:作为一可选实施方式,基于目标点云数据,生成初始稀疏矩阵,包括:
A1,确定目标点云数据对应的目标区域,并按照预设的栅格数量,将目标区域划分为多个栅格区域。
A2,基于目标点云数据对应的目标点云的点所处的栅格区域,确定每个栅格区域对应的矩阵元素值。
A3,基于每个栅格区域对应的矩阵元素值,生成目标点云数据对应的初始稀疏矩阵。
这里,可以基于目标点云数据,判断每个栅格区域中是否存在目标点云的点,基于判断结果,确定每个栅格区域的矩阵元素值,比如,若栅格区域中存在目标点云的点,则该栅格区域的矩阵元素值为1,表征该栅格区域位置处存在目标对象,进而基于各个栅格区域对应的矩阵元素值,生成了初始稀疏矩阵,为后续确定目标对象的三维检测数据提供了数据支持。
示例性的,目标点云数据对应的目标区域可以为:基于激光雷达装置获取目标点云数据时的位置(例如,以该位置为起始位置)以及激光雷达装置能够探测的最远距离(例如,以该最远距离为长度),确定得到的探测区域。其中,目标区域可以根据实际情况结合目标点云数据进行确定。
具体实施时,预设的栅格数量可以为N×M个,则可以将目标区域划分为N×M个栅格区域,N和M为正整数。其中,N和M的值可以根据实际需要进行设置。
具体实施时,目标点云数据中包括目标点云的多个点的位置信息,可以基于点的位置信息,确定每个点所处的栅格区域,进而,可以针对每个栅格区域,在该栅格区域中存在对应的目标点云的点时,则该栅格区域对应的矩阵元素值的可以为1;在该栅格区域中不存在对应的目标点云的点时,则该栅格区域对应的矩阵元素值可以为0,因此确定了每个栅格区域对应的矩阵元素值。
在确定了每个栅格区域对应的矩阵元素值之后,可以基于每个栅格区域对应的矩阵元素值,生成目标点云数据对应的初始稀疏矩阵,其中,该初始稀疏矩阵的行数、列数与栅格数量对应,比如,若栅格数量为N×M个,则初始稀疏矩阵的行数为N,列数为M,即初始稀疏矩阵为N×M的矩阵。
参见图3所示,图中包括激光雷达装置31,以该激光雷达装置为中心,得到的目标区域32,并按照预设的栅格数量,将该目标区域划分为多个栅格区域,得到划分后的多个栅格区域321。再确定目标点云数据对应的目标点云的多个点所处的栅格区域,将存在目标点云的点的栅格区域(即图中存在黑色阴影的栅格区域)的矩阵元素值设置为1,将不存在目标点云的点的栅格区域的矩阵元素值设置为0,得到了每个栅格区域的矩阵元素值。最后,基于每个栅格区域对应的矩阵元素值,生成目标点云数据对应的初始稀疏矩阵33。
针对S202:在得到了初始稀疏矩阵之后,可以基于初始稀疏矩阵,确定与输入至神经网络的每一层卷积模块的特征图的目标尺寸匹配的目标稀疏矩阵。
作为一可选实施方式,可以通过下述方式,确定与输入至神经网络的每一层卷积模块的特征图的目标尺寸匹配的目标稀疏矩阵:
方式一、基于初始稀疏矩阵,确定神经网络中每一层卷积模块对应的输出稀疏矩阵,将该输出稀疏矩阵作为目标稀疏矩阵。
方式二、基于初始稀疏矩阵,确定神经网络中每一层卷积模块对应的输入稀疏矩阵,将该输入稀疏矩阵作为目标稀疏矩阵。
方式三、基于初始稀疏矩阵,确定神经网络中每一层卷积模块对应的输入稀疏矩阵和输出稀疏矩阵,将输入稀疏矩阵和输出稀疏矩阵进行融合,得到融合稀疏矩阵,将融合稀疏矩阵作为该层卷积模块对应的目标稀疏矩阵。
这里,目标稀疏矩阵可以为由输出稀疏矩阵得到的,也可以由输入稀疏矩阵得到的, 或者,还可以为由输入稀疏矩阵和输出稀疏矩阵融合得到的。
上述实施方式中,设置多种方式,生成每一层卷积模块对应的目标稀疏矩阵,即目标稀疏矩阵可以为输入稀疏矩阵,也可以为输出稀疏矩阵,还可以为基于输入稀疏矩阵和输出稀疏矩阵生成的融合稀疏矩阵。
针对方式一,该方式是由输出稀疏矩阵得到目标稀疏矩阵。具体实施时,可以基于初始稀疏数据,确定神经网络中每一层卷积模块对应的输出稀疏矩阵,该输出稀疏矩阵即为目标稀疏矩阵。其中,该输出稀疏矩阵可以用于表征神经网络每一层卷积模块的输出结果中对应目标场景的不同位置处是否具有目标对象,比如,若神经网络中每一层卷积模块的输出结果中对应目标场景的位置A处具有目标对象时,则在输出稀疏矩阵中,与该位置A对应的位置处的矩阵元素值可以为1;若位置A处不具有目标对象时,则在输出稀疏矩阵中,与该位置A对应的位置处的矩阵元素值可以为0。
针对方式二,该方式是由输入稀疏矩阵得到目标稀疏矩阵。具体实施时,可以基于初始稀疏数据,确定神经网络中每一层卷积模块对应的输入稀疏矩阵,该输入稀疏矩阵即为目标稀疏矩阵。其中,输入稀疏矩阵可以为表征神经网络中每一层卷积模块的输入数据中对应目标场景的不同位置处是否具有目标对象。比如,若神经网络中每一层卷积模块的输入数据中对应目标场景的位置A处具有目标对象时,则在输入稀疏矩阵中,与该位置A对应的位置处的矩阵元素值可以为1;若位置A处不具有目标对象时,则在输入稀疏矩阵中,与该位置A对应的位置处的矩阵元素值可以为0。
针对方式三,可以通过方式一确定每一层卷积模块对应的输出稀疏矩阵,并通过方式二确定每一层卷积模块对应的输入稀疏矩阵,将每一层卷积模块对应的输入稀疏矩阵和输出稀疏矩阵进行融合,得到融合稀疏矩阵,将融合稀疏矩阵作为该层卷积模块对应的目标稀疏矩阵。
在具体实施时,可以将输入稀疏矩阵和输出稀疏矩阵取交集,得到融合稀疏矩阵;也可以将输入稀疏矩阵和输出稀疏矩阵取并集,得到融合稀疏矩阵。比如,若输入稀疏矩阵为:
Figure PCTCN2021102684-appb-000001
若输出稀疏矩阵为:
Figure PCTCN2021102684-appb-000002
则将输入稀疏矩阵和输出稀疏矩阵取交集,得到的融合稀疏矩阵为:
Figure PCTCN2021102684-appb-000003
则将输入稀疏矩阵和输出稀疏矩阵取并集,得到的融合稀疏矩阵为:
Figure PCTCN2021102684-appb-000004
一种可选实施方式中,基于初始稀疏矩阵,确定神经网络中每一层卷积模块对应的输入稀疏矩阵,可以包括:
B1,将初始稀疏矩阵作为神经网络的第一层卷积模块对应的输入稀疏矩阵。
B2,基于第i-1层卷积模块对应的输入稀疏矩阵,确定第i层卷积模块对应的、与第i层卷积模块输入的特征图的目标尺寸匹配的输入稀疏矩阵;其中,i为大于1、且小于n+1的正整数,n为神经网络的卷积模块的总层数。
这里,初始稀疏矩阵可以作为神经网络的第一层卷积模块对应的输入稀疏矩阵。第二层卷积模块对应的输入稀疏矩阵可以由第一层卷积模块对应的输入稀疏矩阵得到,且第二层卷积模块对应的输入稀疏矩阵的行数和列数与输入至第二层卷积模块的特征图的目标尺寸一致。
示例性的,可以利用图像膨胀处理操作或图像腐蚀处理操作,对第一层卷积模块对应的输入稀疏矩阵进行处理,得到处理后的稀疏矩阵,将处理后的稀疏矩阵的行列数调整为与第二层卷积模块输入的特征图的目标尺寸匹配之后,得到第二层卷积模块的输入稀疏矩阵。依次类推,可以得到第一层卷积模块对应的输入稀疏矩阵、第二层卷积模块对应的输入稀疏矩阵、……、第n层卷积模块对应的输入稀疏矩阵(即神经网络最后一层卷积模块对应的输入稀疏矩阵)。
示例性的,可以预先确定膨胀处理范围,基于膨胀处理范围对输入稀疏矩阵进行图像膨胀处理,得到处理后的稀疏矩阵,其中,膨胀处理范围可以为基于目标对象的尺寸阈值确定的,也可以根据实际需要进行确定。
比如,若输入稀疏矩阵为:
Figure PCTCN2021102684-appb-000005
则膨胀处理后的稀疏矩阵可以为:
Figure PCTCN2021102684-appb-000006
其中,上述膨胀处理过程仅为示例性说明。
示例性的,输入稀疏矩阵的腐蚀处理过程为膨胀处理过程的逆过程,具体的,可以 预先确定腐蚀处理范围,基于腐蚀处理范围对输入稀疏矩阵进行图像腐蚀处理,得到处理后的稀疏矩阵。其中,腐蚀处理范围可以为基于目标对象的尺寸阈值确定的,也可以根据实际需要进行确定。
比如,若输入稀疏矩阵为:
Figure PCTCN2021102684-appb-000007
则腐蚀处理后的稀疏矩阵可以为:
Figure PCTCN2021102684-appb-000008
其中,上述腐蚀处理过程仅为示例性说明。
在具体实施时,可以通过上采样或下采样的方式,将处理后的稀疏矩阵的行数和列数调整为与第二层卷积模块输入的特征图的目标尺寸匹配的矩阵,得到第二层卷积模块的输入稀疏矩阵,其中,对处理后的稀疏矩阵的行数和列数进行调整的过程有多种,此处仅为示例性说明。
在具体实施时,还可以对稀疏矩阵的稀疏程度进行调整,比如,可以通过调整栅格的数量,对稀疏矩阵的稀疏程度进行调整;或者也可以通过腐蚀处理过程对稀疏矩阵的稀疏程度进行调整。其中,稀疏矩阵的稀疏程度为:稀疏矩阵中矩阵元素值为1的矩阵元素的数量与稀疏矩阵中包括的全部矩阵元素的总数的比值。
上述方式中,可以将初始稀疏矩阵作为第一层卷积模块对应的输入稀疏矩阵,并依次确定得到每一层卷积模块的输入稀疏矩阵,进而可以基于该输入稀疏矩阵确定目标稀疏矩阵,为后续基于每一层卷积模块的目标稀疏矩阵,确定目标对象的三维检测数据提供了数据支持。
一种可能的实施方式中,基于初始稀疏矩阵,确定神经网络中每一层卷积模块对应的输出稀疏矩阵,可以包括:
C1,基于目标对象的尺寸阈值和初始稀疏矩阵,确定神经网络对应的输出稀疏矩阵。
C2,基于输出稀疏矩阵,生成第n层卷积模块对应的、与第n层卷积模块输入的特征图的目标尺寸匹配的输出稀疏矩阵。
C3,基于第j+1层卷积模块对应的输出稀疏矩阵,生成第j层卷积模块对应的、与第j层卷积模块输入的特征图的目标尺寸匹配的输出稀疏矩阵,其中,j为大于等于1、且小于n的正整数,n为神经网络的卷积模块的总层数。
这里,可以先根据目标对象的尺寸阈值,确定膨胀处理范围,基于膨胀处理范围对初始稀疏矩阵进行膨胀处理,得到处理后的稀疏矩阵,该处理后的稀疏矩阵即为神经网络对应的输出稀疏矩阵。其中,膨胀处理过程可参考上述描述,此处不再进行赘述。
利用输出稀疏矩阵,确定神经网络第n层卷积模块(即神经网络的最后一层卷积模块)的输出稀疏矩阵,依次类推,得到第n-1层卷积模块的输出稀疏矩阵、……、第二层卷积模块的输出稀疏矩阵、第一层卷积模块的输出稀疏矩阵。
示例性的,可以利用图像膨胀处理操作或图像腐蚀处理操作,对前一层卷积模块对应的输出稀疏矩阵进行处理,得到处理后的稀疏矩阵,将处理后的稀疏矩阵的行数和列数调整为与当前层卷积模块输入的特征图的目标尺寸匹配之后,得到当前层卷积模块的输出稀疏矩阵。其中,确定每一层卷积模块的输出稀疏矩阵的过程,可参考上述确定输入稀疏矩阵的过程,此处不再进行详细说明。
对于由输入稀疏矩阵和输出稀疏矩阵的融合得到神经网络的每一层卷积模块的目标稀疏矩阵的情况,可以分别利用上述方法得到每一层卷积模块的输出稀疏矩阵和输入稀疏矩阵,将得到的输出稀疏矩阵和输入稀疏矩阵进行融合,得到每一卷积模块的目标稀疏矩阵。
上述方式中,可以基于初始稀疏矩阵,确定输出稀疏矩阵,利用输出稀疏矩阵依次确定第n层卷积模块的输出稀疏矩阵、…、第一层卷积模块的输出稀疏矩阵,进而可以基于每一层的输出稀疏矩阵确定目标稀疏矩阵,为后续基于每一层卷积模块的目标稀疏矩阵来确定目标对象的三维检测数据提供了数据支持。
针对S103:在具体实施时,可以基于至少一个目标稀疏矩阵、目标点云数据、和用于检测目标对象的神经网络,确定目标场景中包括的目标对象的三维检测数据。该三维检测数据包括目标对象的检测框的中心点的坐标、检测框的三维尺寸、检测框的朝向角、检测框的类别、检测框的置信度、目标跟踪的ID、目标对象的速度、加速度以及时间戳等等中的一个或者多个。
这里,目标对象的三维检测框的位置不能超出目标区域的位置,即若三维检测框的中心点坐标为(X,Y,Z),尺寸为长L、宽W、高H时,则满足以下条件:0≤X-2/L,X+2/L<N max,0≤Y-W/2,Y+W/2<M max,其中,N max和M max是目标区域的长度阈值和宽度阈值。
一种可选实施方式中,基于至少一个目标稀疏矩阵、和目标点云数据,确定目标场景中包括的目标对象的三维检测数据,包括:
步骤一、基于目标点云数据,生成目标点云数据对应的目标点云特征图。
步骤二、基于目标点云特征图和至少一个目标稀疏矩阵,利用检测目标对象的神经网络,确定目标场景中包括的目标对象的三维检测数据,其中,神经网络中包括多层卷积模块。
在具体实施时,可以将目标点云数据输入至神经网络中,对目标点云数据进行预处理,生成目标点云数据对应的目标点云特征图,再利用目标点云特征图、至少一个目标稀疏矩阵、和神经网络,确定目标场景中包括的目标对象的三维检测数据。在此,将与目标点云数据对应(即,与目标点云对应)的特征图简称为目标点云特征图。
步骤一中,基于目标点云数据,生成目标点云数据对应的目标点云特征图,可以包括:
针对每个栅格区域,基于位于栅格区域内的目标点云的点对应的目标点云数据所指示的坐标信息,确定栅格区域对应的特征信息;其中,栅格区域为按照预设的栅格数量,将目标点云数据对应的目标区域划分生成的。
基于每个栅格区域对应的特征信息,生成目标点云数据对应的目标点云特征图。
针对每个栅格区域,若该栅格区域中存在目标点云的点时,则各个点对应的目标点云数据所指示的坐标信息构成该栅格区域对应的特征信息;若该栅格区域中不存在目标点云的点时,则该栅格区域的特征信息可以为0。
基于每个栅格区域对应的特征信息,生成了目标点云数据对应的目标点云特征图。其中,目标点云特征图的尺寸可以为N×M×C,目标点云特征图的尺寸N×M与第一层卷积模块的目标稀疏矩阵的尺寸相一致,目标点云特征图的C可以为各个栅格区域中包括的目标点云的点的数量最大值,比如,若各个栅格区域中栅格区域A中包括的目标点云的点的数量最多,比如,栅格区域中包括目标点云的50个点,则C的值为50,即目标点云特征图中包括50个尺寸为N×M的特征图,每个特征图中包括至少一个目标点云的点的坐标信息。
在上述实施方式下,基于每个栅格区域对应的特征信息,生成了目标点云数据对应的目标点云特征图,目标点云特征图中包括目标点云的每个点的位置信息,进而基于目标点云特征图和至少一个目标稀疏矩阵,可以较准确的确定目标场景中包括的目标对象的三维检测数据。
步骤二中,可以基于目标点云特征图、至少一个目标稀疏矩阵、和神经网路,确定目标场景中包括的目标对象的三维检测数据。
具体实施时,可以通过下述两种方式,确定目标场景中包括的目标对象的三维检测数据:
方式一:基于目标点云特征图和至少一个目标稀疏矩阵,确定目标场景中包括的目标对象的三维检测数据,包括:
一、基于神经网络中第一层卷积模块对应的目标稀疏矩阵,确定目标点云特征图中的待卷积特征信息,利用第一层卷积模块,对目标点云特征图中的待卷积特征信息进行卷积处理,生成输入至第二层卷积模块的特征图。
二、基于神经网络中第k层卷积模块对应的目标稀疏矩阵,确定输入至第k层卷积模块的目标点云特征图中的待卷积特征信息,利用神经网络的第k层卷积模块,对第k层卷积模块的目标点云特征图中的待卷积特征信息进行卷积处理,生成输入至第k+1层卷积模块的特征图,其中,k为大于1、小于n的正整数,n为神经网络的卷积模块的总层数。
三、基于神经网络中第n层卷积模块对应的目标稀疏矩阵,确定输入至第n层卷积模块的目标点云特征图中的待卷积特征信息,利用神经网络的第n层卷积模块,对第n层卷积模块的目标点云特征图中的待卷积特征信息进行卷积处理,得到目标场景中包括的目标对象的三维检测数据。
上述实施方式中,可以利用第一层卷积模块的目标稀疏矩阵,确定输入至第一层卷积模块中的目标点云特征图中的待卷积特征信息。具体的,可以确定目标稀疏矩阵中矩阵值为1的目标位置,将目标点云特征图中与目标位置对应的位置的特征信息,确定为待卷积特征信息。
进而利用第一层卷积模块,对目标点云特征图中的待卷积特征信息进行卷积处理,生成输入至第二层卷积模块的特征图。在接着利用第二层卷积模块的目标稀疏矩阵,确定输入至第二层卷积模块的特征图中的待卷积信息,并利用第二层卷积模块对第二层卷积模块的特征图中的待卷积特征信息进行卷积处理,生成输入至第三层卷积模块的特征图,依次类推,得到输入至第n层卷积模块(神经网络中最后一层卷积模块)的特征图,通过确定第n层卷积模块的待卷积信息,并对第n层卷积模块的待卷积信息进行卷积处理,得到目标场景中包括的目标对象的三维检测数据。
这里,可以基于每一层卷积模块的目标稀疏矩阵和输入的特征图,确定待卷积特征信息,对待卷积特征信息进行卷积处理,而对特征图中除待卷积特征信息之外的其他特征信息不进行卷积处理,减少了每一层卷积模块进行卷积处理的计算量,提高了每一层卷积模块的运算效率,进而可以减少神经网络的运算量,提高目标对象的检测效率。
方式二,基于目标点云特征图和至少一个目标稀疏矩阵,确定目标场景中包括的目标对象的三维检测数据,包括:
一、针对神经网络中除最后一层卷积模块之外的其他每一层卷积模块,基于该层卷积模块对应的目标稀疏矩阵和输入至该层卷积模块的特征图,确定该层卷积模块对应的卷积向量;基于该层卷积模块对应的卷积向量,确定输入至下一层卷积模块的特征图。
二、基于最后一层卷积模块对应的目标稀疏矩阵和输入至最后一层卷积模块的特征图,确定最后一层卷积模块对应的卷积向量;基于最后一层卷积模块对应的卷积向量,确定目标场景中包括的目标对象的三维检测数据。
上述实施方式中,还可以基于每一层卷积模块对应的目标输入矩阵和输入至该层卷积模块的特征图,确定该层卷积模块对应的卷积向量。比如,针对第一层卷积模块,可以确定第一层卷积模块的目标稀疏矩阵中矩阵值为1的目标位置,并确定目标点云特征图中与目标位置对应的位置的特征信息,提取与目标位置对应的特征信息,构成了第一层卷积模块对应的卷积向量。
进一步的,可以利用img2col和col2img技术,通过第一层卷积模块对对应的卷积向量进行矩阵乘法运算,得到输入至第二卷积模块的特征图。基于相同的处理过程,可以得到输入至最后一层卷积模块的特征图,基于最后一层卷积模块对应的目标稀疏矩阵和特征图,确定最后一层卷积模块对应的卷积向量,对最后一层卷积模块对应的卷积向量进行处理,确定目标场景中包括的目标对象的三维检测数据。
这里,可以基于每一层卷积模块的目标稀疏矩阵和输入的特征图,生成每一层卷积模块对应的卷积向量,该卷积向量中包括特征图中的待处理的特征信息,该待处理的特征信息为:与目标稀疏矩阵中指示的存在目标对象的三维检测数据的位置匹配的、特征图中的特征信息,对生成的卷积向量进行处理,而对特征图中除待处理的特征信息之外的其他特征信息不进行处理,减少了每一层卷积模块进行卷积处理的计算量,提高了每一层卷积模块的运算效率,进而可以减少神经网络的运算量,提高目标对象的检测效率。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
基于相同的构思,本公开实施例还提供了一种目标对象检测装置,参见图4所示,为本公开实施例提供的目标对象检测装置的架构示意图,包括获取模块401、生成模块402和确定模块403。
获取模块401,用于获取雷达装置采集的目标场景的目标点云数据;
生成模块402,用于基于所述目标点云数据,生成所述目标点云数据对应的至少一个目标稀疏矩阵;所述目标稀疏矩阵用于表征所述目标场景的不同位置处是否具有目标对象;
确定模块403,用于基于所述至少一个目标稀疏矩阵、和所述目标点云数据,确定所述目标场景中包括的目标对象的三维检测数据。
一种可能的实施方式中,所述生成模块402,在基于所述目标点云数据,生成所述目标点云数据对应的至少一个目标稀疏矩阵时,用于:基于所述目标点云数据,确定用于检测所述目标对象的神经网络中每一层卷积模块对应的目标稀疏矩阵。
一种可能的实施方式中,所述生成模块402,在基于所述目标点云数据,确定所述用于检测所述目标对象的神经网络中每一层卷积模块对应的目标稀疏矩阵时,用于:基于所述目标点云数据,生成初始稀疏矩阵;基于所述初始稀疏矩阵,确定与输入至所述神经网络的每一层卷积模块的特征图的目标尺寸匹配的目标稀疏矩阵。
一种可能的实施方式中,所述生成模块402,在基于所述目标点云数据,生成初始稀疏矩阵时,用于:
确定所述目标点云数据对应的目标区域,并按照预设的栅格数量,将所述目标区域划分为多个栅格区域;
基于所述目标点云数据对应的目标点云的点所处的栅格区域,确定每个栅格区域对应的矩阵元素值;
基于每个栅格区域对应的矩阵元素值,生成所述目标点云数据对应的初始稀疏矩阵。
一种可能的实施方式中,所述生成模块402在基于所述初始稀疏矩阵,确定与输入至所述神经网络的每一层卷积模块的特征图的目标尺寸匹配的目标稀疏矩阵时,用于:
基于所述初始稀疏矩阵,确定所述神经网络中每一层卷积模块对应的输出稀疏矩阵,将该输出稀疏矩阵作为所述目标稀疏矩阵;或者,
基于所述初始稀疏矩阵,确定所述神经网络中每一层卷积模块对应的输入稀疏矩阵,将该输入稀疏矩阵作为所述目标稀疏矩阵;或者,
基于所述初始稀疏矩阵,确定所述神经网络中每一层卷积模块对应的输入稀疏矩阵和输出稀疏矩阵,将所述输入稀疏矩阵和输出稀疏矩阵进行融合,得到融合稀疏矩阵,将所述融合稀疏矩阵作为该层卷积模块对应的目标稀疏矩阵。
一种可能的实施方式中,所述生成模块402,在基于所述初始稀疏矩阵,确定所述神经网络中每一层卷积模块对应的输入稀疏矩阵时,用于:将所述初始稀疏矩阵作为所述神经网络的第一层卷积模块对应的输入稀疏矩阵;基于第i-1层卷积模块对应的输入稀疏矩阵,确定第i层卷积模块对应的、与所述第i层卷积模块输入的特征图的目标尺寸匹配的输入稀疏矩阵;其中,i为大于1、且小于n+1的正整数,n为所述神经网络的卷积模块的总层数。
一种可能的实施方式中,所述生成模块402,在基于所述初始稀疏矩阵,确定与 所述神经网络的每一层卷积模块输入的特征图的目标尺寸匹配的目标稀疏矩阵时,用于:基于所述目标对象的尺寸阈值和所述初始稀疏矩阵,确定所述神经网络对应的输出稀疏矩阵;
基于所述输出稀疏矩阵,生成第n层卷积模块对应的、与所述第n层卷积模块输入的特征图的目标尺寸匹配的输出稀疏矩阵;
基于第j+1层卷积模块对应的输出稀疏矩阵,生成第j层卷积模块对应的、与所述第j层卷积模块输入的特征图的目标尺寸匹配的输出稀疏矩阵,其中,j为大于等于1、且小于n的正整数,n为所述神经网络的卷积模块的总层数。
一种可能的实施方式中,所述确定模块403在基于所述至少一个目标稀疏矩阵、和所述目标点云数据,确定所述目标场景中包括的所述目标对象的三维检测数据时,用于:
基于所述目标点云数据,生成所述目标点云数据对应的目标点云特征图;
基于所述目标点云特征图和所述至少一个目标稀疏矩阵,利用检测目标对象的神经网络,确定所述目标场景中包括的目标对象的三维检测数据,其中,所述神经网络中包括多层卷积模块。
一种可能的实施方式中,所述确定模块403,在基于所述目标点云数据,生成所述目标点云数据对应的目标点云特征图时,用于:
针对每个栅格区域,基于位于所述栅格区域内的目标点云的点对应的目标点云数据所指示的坐标信息,确定所述栅格区域对应的特征信息;其中,所述栅格区域为按照预设的栅格数量,将所述目标点云数据对应的目标区域划分生成的;
基于每个栅格区域对应的特征信息,生成所述目标点云数据对应的目标点云特征图。
一种可能的实施方式中,所述确定模块403,在基于所述目标点云特征图和所述至少一个目标稀疏矩阵,利用检测目标对象的神经网络,确定所述目标场景中包括的目标对象的三维检测数据时,用于:
基于所述神经网络中第一层卷积模块对应的目标稀疏矩阵,确定所述目标点云特征图中的待卷积特征信息,利用所述第一层卷积模块,对所述目标点云特征图中的所述待卷积特征信息进行卷积处理,生成输入至第二层卷积模块的特征图;
基于所述神经网络中第k层卷积模块对应的目标稀疏矩阵,确定输入至所述第k层卷积模块的特征图中的待卷积特征信息,利用所述神经网络的第k层卷积模块,对所述第k层卷积模块的特征图中的待卷积特征信息进行卷积处理,生成输入至第k+1层卷积模块的特征图,其中,k为大于1、小于n的正整数,n为所述神经网络的卷积模块的总层数;
基于所述神经网络中第n层卷积模块对应的目标稀疏矩阵,确定输入至所述第n层卷积模块的特征图中的待卷积特征信息,利用所述神经网络的第n层卷积模块,对所述第n层卷积模块的特征图中的待卷积特征信息进行卷积处理,得到所述目标场景中包括的目标对象的三维检测数据。
一种可能的实施方式中,所述确定模块403,在基于所述目标点云特征图和所述至少一个目标稀疏矩阵,利用检测目标对象的神经网络,确定所述目标场景中包括的目标对象的三维检测数据时,用于:
针对所述神经网络中除最后一层卷积模块之外的其他每一层卷积模块,基于该层卷积模块对应的目标稀疏矩阵和输入至该层卷积模块的特征图,确定该层卷积模块对应的卷积 向量;基于该层卷积模块对应的所述卷积向量,确定输入至下一层卷积模块的特征图;基于最后一层卷积模块对应的目标稀疏矩阵和输入至最后一层卷积模块的特征图,确定最后一层卷积模块对应的卷积向量;基于最后一层卷积模块对应的所述卷积向量,确定所述目标场景中包括的目标对象的三维检测数据。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模板可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
基于同一技术构思,参照图5所示,本公开实施例还提供了一种电子设备,包括处理器501、存储器502和总线503。其中,存储器502用于存储执行指令,包括内存5021和外部存储器5022;这里的内存5021也称内存储器,用于暂时存放处理器501中的运算数据,以及与硬盘等外部存储器5022交换的数据,处理器501通过内存5021与外部存储器5022进行数据交换,当电子设备500运行时,处理器501与存储器502之间通过总线503通信,使得处理器501在执行以下指令:
获取雷达装置采集的目标场景的目标点云数据;
基于所述目标点云数据,生成所述目标点云数据对应的至少一个目标稀疏矩阵;所述目标稀疏矩阵用于表征所述目标场景的不同位置处是否具有目标对象;
基于所述至少一个目标稀疏矩阵、和所述目标点云数据,确定所述目标场景中包括的目标对象的三维检测数据。
此外,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的目标对象检测方法。
本公开实施例所提供的目标对象检测方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的目标对象检测方法的步骤,具体可参见上述方法实施例,在此不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本 公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (14)

  1. 一种目标对象检测方法,包括:
    获取雷达装置采集的目标场景的目标点云数据;
    基于所述目标点云数据,生成所述目标点云数据对应的至少一个目标稀疏矩阵;所述目标稀疏矩阵用于表征所述目标场景的不同位置处是否具有目标对象;
    基于所述至少一个目标稀疏矩阵、和所述目标点云数据,确定所述目标场景中包括的所述目标对象的三维检测数据。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述目标点云数据,生成所述目标点云数据对应的至少一个目标稀疏矩阵,包括:
    基于所述目标点云数据,确定用于检测所述目标对象的神经网络中每一层卷积模块对应的目标稀疏矩阵。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述目标点云数据,确定用于检测所述目标对象的神经网络中每一层卷积模块对应的目标稀疏矩阵,包括:
    基于所述目标点云数据,生成初始稀疏矩阵;
    基于所述初始稀疏矩阵,确定与输入至所述神经网络的每一层卷积模块的特征图的目标尺寸匹配的目标稀疏矩阵。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述目标点云数据,生成初始稀疏矩阵,包括:
    确定所述目标点云数据对应的目标区域;
    按照预设的栅格数量,将所述目标区域划分为多个栅格区域;
    基于所述目标点云数据对应的目标点云的点所处的栅格区域,确定每个栅格区域对应的矩阵元素值;
    基于每个栅格区域对应的矩阵元素值,生成所述目标点云数据对应的初始稀疏矩阵。
  5. 根据权利要求3或4所述的方法,其特征在于,所述基于所述初始稀疏矩阵,确定与输入至所述神经网络的每一层卷积模块的特征图的目标尺寸匹配的目标稀疏矩阵,包括以下任一:
    基于所述初始稀疏矩阵,确定所述神经网络中每一层卷积模块对应的输出稀疏矩阵,将该输出稀疏矩阵作为所述目标稀疏矩阵;
    基于所述初始稀疏矩阵,确定所述神经网络中每一层卷积模块对应的输入稀疏矩阵,将该输入稀疏矩阵作为所述目标稀疏矩阵;
    基于所述初始稀疏矩阵,确定所述神经网络中每一层卷积模块对应的输入稀疏矩阵和输出稀疏矩阵,将所述输入稀疏矩阵和输出稀疏矩阵进行融合,得到融合稀疏矩阵,将所述融合稀疏矩阵作为该层卷积模块对应的目标稀疏矩阵。
  6. 根据权利要求5所述的方法,其特征在于,所述基于所述初始稀疏矩阵,确定所述神经网络中每一层卷积模块对应的输入稀疏矩阵,包括:
    将所述初始稀疏矩阵作为所述神经网络的第一层卷积模块对应的输入稀疏矩阵;
    基于第i-1层卷积模块对应的输入稀疏矩阵,确定第i层卷积模块对应的、与所述第i层卷积模块输入的特征图的目标尺寸匹配的输入稀疏矩阵,其中,i为大于1、且小于n+1的正整数,n为所述神经网络的卷积模块的总层数。
  7. 根据权利要求5所述的方法,其特征在于,所述基于所述初始稀疏矩阵,确定 所述神经网络中每一层卷积模块对应的输出稀疏矩阵,包括:
    基于所述目标对象的尺寸阈值和所述初始稀疏矩阵,确定所述神经网络对应的输出稀疏矩阵;
    基于所述输出稀疏矩阵,生成第n层卷积模块对应的、与所述第n层卷积模块输入的特征图的目标尺寸匹配的输出稀疏矩阵;
    基于第j+1层卷积模块对应的输出稀疏矩阵,生成第j层卷积模块对应的、与所述第j层卷积模块输入的特征图的目标尺寸匹配的输出稀疏矩阵,其中,j为大于等于1、且小于n的正整数,n为所述神经网络的卷积模块的总层数。
  8. 根据权利要求1至7任一所述的方法,其特征在于,所述基于所述至少一个目标稀疏矩阵、和所述目标点云数据,确定所述目标场景中包括的所述目标对象的三维检测数据,包括:
    基于所述目标点云数据,生成所述目标点云数据对应的目标点云特征图;
    基于所述目标点云特征图和所述至少一个目标稀疏矩阵,利用检测所述目标对象的神经网络,确定所述目标场景中包括的所述目标对象的三维检测数据,其中,所述神经网络中包括多层卷积模块。
  9. 根据权利要求8所述的方法,其特征在于,所述基于所述目标点云数据,生成所述目标点云数据对应的目标点云特征图,包括:
    针对每个栅格区域,基于位于所述栅格区域内的目标点云的点对应的目标点云数据所指示的坐标信息,确定所述栅格区域对应的特征信息;其中,所述栅格区域为按照预设的栅格数量,将所述目标点云数据对应的目标区域划分生成的;
    基于每个栅格区域对应的特征信息,生成所述目标点云数据对应的目标点云特征图。
  10. 根据权利要求8或9所述的方法,其特征在于,所述基于所述目标点云特征图和所述至少一个目标稀疏矩阵,利用检测所述目标对象的神经网络,确定所述目标场景中包括的所述目标对象的三维检测数据,包括:
    基于所述神经网络中第一层卷积模块对应的目标稀疏矩阵,确定所述目标点云特征图中的待卷积特征信息,利用所述第一层卷积模块,对所述目标点云特征图中的所述待卷积特征信息进行卷积处理,生成输入至第二层卷积模块的特征图;
    基于所述神经网络中第k层卷积模块对应的目标稀疏矩阵,确定输入至所述第k层卷积模块的特征图中的待卷积特征信息,利用所述神经网络的第k层卷积模块,对所述第k层卷积模块的特征图中的待卷积特征信息进行卷积处理,生成输入至第k+1层卷积模块的特征图,其中,k为大于1、小于n的正整数,n为所述神经网络的卷积模块的总层数;
    基于所述神经网络中第n层卷积模块对应的目标稀疏矩阵,确定输入至所述第n层卷积模块的特征图中的待卷积特征信息,利用所述神经网络的第n层卷积模块,对所述第n层卷积模块的特征图中的待卷积特征信息进行卷积处理,得到所述目标场景中包括的所述目标对象的三维检测数据。
  11. 根据权利要求8或9所述的方法,其特征在于,所述基于所述目标点云特征图和所述至少一个目标稀疏矩阵,利用检测所述目标对象的神经网络,确定所述目标场景中包括的所述目标对象的三维检测数据,包括:
    针对所述神经网络中除最后一层卷积模块之外的其他每一层卷积模块,基于该层卷 积模块对应的目标稀疏矩阵和输入至该层卷积模块的特征图,确定该层卷积模块对应的卷积向量;基于该层卷积模块对应的所述卷积向量,确定输入至下一层卷积模块的特征图;
    基于最后一层卷积模块对应的目标稀疏矩阵和输入至最后一层卷积模块的特征图,确定最后一层卷积模块对应的卷积向量;基于最后一层卷积模块对应的所述卷积向量,确定所述目标场景中包括的所述目标对象的三维检测数据。
  12. 一种目标对象检测装置,包括:
    获取模块,用于获取雷达装置采集的目标场景的目标点云数据;
    生成模块,用于基于所述目标点云数据,生成所述目标点云数据对应的至少一个目标稀疏矩阵;所述目标稀疏矩阵用于表征所述目标场景的不同位置处是否具有目标对象;
    确定模块,用于基于所述至少一个目标稀疏矩阵、和所述目标点云数据,确定所述目标场景中包括的所述目标对象的三维检测数据。
  13. 一种电子设备,包括处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至11任一所述的目标对象检测方法。
  14. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至11任一所述的目标对象检测方法。
PCT/CN2021/102684 2020-07-22 2021-06-28 目标对象检测方法、装置、电子设备及存储介质 WO2022017129A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010712645.3A CN113971734A (zh) 2020-07-22 2020-07-22 目标对象检测方法、装置、电子设备及存储介质
CN202010712645.3 2020-07-22

Publications (1)

Publication Number Publication Date
WO2022017129A1 true WO2022017129A1 (zh) 2022-01-27

Family

ID=79584945

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102684 WO2022017129A1 (zh) 2020-07-22 2021-06-28 目标对象检测方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN113971734A (zh)
WO (1) WO2022017129A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237614A (zh) * 2023-11-10 2023-12-15 江西啄木蜂科技有限公司 基于深度学习的湖面漂浮物小目标检测方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357773A1 (en) * 2017-06-13 2018-12-13 TuSimple Sparse image point correspondences generation and correspondences refinement system for ground truth static scene sparse flow generation
US20190147245A1 (en) * 2017-11-14 2019-05-16 Nuro, Inc. Three-dimensional object detection for autonomous robotic systems using image proposals
CN110222626A (zh) * 2019-06-03 2019-09-10 宁波智能装备研究院有限公司 一种基于深度学习算法的无人驾驶场景点云目标标注方法
CN110414577A (zh) * 2019-07-16 2019-11-05 电子科技大学 一种基于深度学习的激光雷达点云多目标地物识别方法
CN110738194A (zh) * 2019-11-05 2020-01-31 电子科技大学中山学院 一种基于点云有序编码的三维物体识别方法
CN111199206A (zh) * 2019-12-30 2020-05-26 上海眼控科技股份有限公司 三维目标检测方法、装置、计算机设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357773A1 (en) * 2017-06-13 2018-12-13 TuSimple Sparse image point correspondences generation and correspondences refinement system for ground truth static scene sparse flow generation
US20190147245A1 (en) * 2017-11-14 2019-05-16 Nuro, Inc. Three-dimensional object detection for autonomous robotic systems using image proposals
CN110222626A (zh) * 2019-06-03 2019-09-10 宁波智能装备研究院有限公司 一种基于深度学习算法的无人驾驶场景点云目标标注方法
CN110414577A (zh) * 2019-07-16 2019-11-05 电子科技大学 一种基于深度学习的激光雷达点云多目标地物识别方法
CN110738194A (zh) * 2019-11-05 2020-01-31 电子科技大学中山学院 一种基于点云有序编码的三维物体识别方法
CN111199206A (zh) * 2019-12-30 2020-05-26 上海眼控科技股份有限公司 三维目标检测方法、装置、计算机设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237614A (zh) * 2023-11-10 2023-12-15 江西啄木蜂科技有限公司 基于深度学习的湖面漂浮物小目标检测方法
CN117237614B (zh) * 2023-11-10 2024-02-06 江西啄木蜂科技有限公司 基于深度学习的湖面漂浮物小目标检测方法

Also Published As

Publication number Publication date
CN113971734A (zh) 2022-01-25

Similar Documents

Publication Publication Date Title
CN111489358B (zh) 一种基于深度学习的三维点云语义分割方法
WO2022017131A1 (zh) 点云数据的处理方法、智能行驶控制方法及装置
CN111160214B (zh) 一种基于数据融合的3d目标检测方法
CN111126359B (zh) 基于自编码器与yolo算法的高清图像小目标检测方法
CN111046767B (zh) 一种基于单目图像的3d目标检测方法
WO2023193401A1 (zh) 点云检测模型训练方法、装置、电子设备及存储介质
CN112184867B (zh) 点云特征提取方法、装置、设备及存储介质
CN113759338B (zh) 一种目标检测方法、装置、电子设备及存储介质
CN113850136A (zh) 基于yolov5与BCNN的车辆朝向识别方法及系统
CN112613450A (zh) 一种增强在困难样本上表现的3d目标检测方法
WO2022017129A1 (zh) 目标对象检测方法、装置、电子设备及存储介质
CN115731382A (zh) 点云目标检测方法、装置、计算机设备和存储介质
CN115147692A (zh) 目标检测方法、装置、电子设备及存储介质
CN115861595B (zh) 一种基于深度学习的多尺度域自适应异源图像匹配方法
CN115909255B (zh) 图像生成、图像分割方法、装置、设备、车载终端及介质
CN115239776B (zh) 点云的配准方法、装置、设备和介质
CN112257686B (zh) 人体姿态识别模型的训练方法、装置及存储介质
CN116758214A (zh) 遥感图像的三维建模方法、装置、电子设备及存储介质
CN113033578B (zh) 基于多尺度特征匹配的图像校准方法、系统、终端及介质
CN116129069A (zh) 平面区域面积的计算方法、装置、电子设备和存储介质
CN116310899A (zh) 基于YOLOv5改进的目标检测方法及装置、训练方法
CN111967579A (zh) 使用卷积神经网络对图像进行卷积计算的方法和装置
CN112949656B (zh) 水下地形匹配定位方法、设备及计算机存储介质
CN117710755B (zh) 一种基于深度学习的车辆属性识别系统及方法
CN116524329B (zh) 用于低算力平台的网络模型构建方法、装置、设备和介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15/06/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21845234

Country of ref document: EP

Kind code of ref document: A1