CN117994755A - Parking space detection method and device - Google Patents
Parking space detection method and device Download PDFInfo
- Publication number
- CN117994755A CN117994755A CN202211375789.XA CN202211375789A CN117994755A CN 117994755 A CN117994755 A CN 117994755A CN 202211375789 A CN202211375789 A CN 202211375789A CN 117994755 A CN117994755 A CN 117994755A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- frame
- image
- target
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 75
- 230000004927 fusion Effects 0.000 claims abstract description 33
- 238000007499 fusion processing Methods 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims description 26
- 238000012216 screening Methods 0.000 claims description 19
- 238000000605 extraction Methods 0.000 claims description 18
- 238000011176 pooling Methods 0.000 claims description 15
- 230000011218 segmentation Effects 0.000 claims description 13
- 239000000463 material Substances 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 6
- 238000000034 method Methods 0.000 abstract description 43
- 230000008569 process Effects 0.000 description 27
- 238000004891 communication Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 6
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 240000004050 Pentaglottis sempervirens Species 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000004931 aggregating effect Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/586—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of parking space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application provides a parking space detection method and device, and relates to the technical field of parking space detection, wherein the method comprises the following steps: acquiring an image frame and at least one point cloud frame after space-time synchronization; taking the image frames as target image frames, taking at least one point cloud frame as a target point cloud frame, respectively extracting the characteristics of the target image frames and the target point cloud frames, taking the characteristics of the target image frames as image coding characteristics, and taking the characteristics of the target point cloud frames as point cloud characteristics; respectively carrying out voxelization treatment on the image coding feature and the point cloud feature in an abstract point cloud space to obtain an image voxel feature and a point cloud voxel feature; carrying out overlooking fusion processing on the image voxel characteristics and the point cloud voxel characteristics in an abstract point cloud space to obtain fusion overhead view characteristics; based on the fused top view features, parking space information for the vehicle parking is determined. The method can extract the image coding features and the point cloud features, and fuse the corresponding voxel features in the abstract point cloud space, so that the accuracy and the reliability of the parking space information are improved.
Description
Technical Field
The application relates to the technical field of parking space detection, in particular to a parking space detection method and device.
Background
The intelligent parking auxiliary system relies on accurate positioning of a parking space, the traditional method is to position a scribing parking space or a space parking space in a parking scene by using characteristics of sensors such as millimeter waves, ultrasonic radars and the like and through algorithms such as classical point cloud clustering and the like, but the method relies on the point cloud precision of a radar sensor, and false detection can occur due to excessive noise points in certain parking scenes; at present, a camera sensor is used for acquiring plane corner coordinates of a ground scribing parking space, and then the conversion relation between a camera system and the physical world is used for acquiring the space position of an actual scribing parking space, but when the method encounters severe weather such as camera shielding, severe illumination change, rain, fog, snow and the like, false detection or omission is caused due to the lack of image information or errors, meanwhile, a parking line with a relatively complete ground is relied as information input, and the space parking space without complete scribing is lack of priori and generalization capability.
In summary, the existing parking space detection methods all adopt data of a single sensor to detect the parking space, but the method has the defect of low parking space detection accuracy and reliability, and needs to be solved.
Disclosure of Invention
In view of the above, the application provides a parking space detection method and device, which are used for solving the problems of low parking space detection accuracy and reliability in the prior art, and the technical scheme is as follows:
A parking space detection method comprises the following steps:
Acquiring an image frame and at least one point cloud frame after space-time synchronization, wherein different point cloud frames are acquired based on different types of radar sensors on a vehicle, the image frame is acquired based on a camera on the vehicle, the difference value of the acquisition time stamp of each point cloud frame and the image frame is smaller than a preset time threshold, and the image frame and each point cloud frame are in a coordinate system of a pre-built abstract point cloud space taking the vehicle as the center;
Taking the acquired image frames as target image frames, taking at least one acquired point cloud frame as a target point cloud frame, respectively extracting characteristics of the target image frames and the target point cloud frames, taking the characteristics of the target image frames as image coding characteristics, and taking the characteristics of the target point cloud frames as point cloud characteristics, wherein semantic information and depth prediction information of the target image frames are fused in the image coding characteristics, and the point cloud characteristics represent material information of a point cloud target and distance information between the point cloud target and a vehicle;
respectively carrying out voxelization treatment on the image coding feature and the point cloud feature in an abstract point cloud space to obtain an image voxel feature and a point cloud voxel feature;
Carrying out overlooking fusion processing on the image voxel characteristics and the point cloud voxel characteristics in an abstract point cloud space to obtain fusion overhead view characteristics;
based on the fused top view features, parking space information for the vehicle parking is determined.
Optionally, acquiring an image frame and at least one point cloud frame after the space-time synchronization includes:
Acquiring an image frame set and a point cloud frame set, wherein the image frame set comprises a plurality of image frames acquired by a camera, the point cloud frame set comprises a plurality of point cloud frames respectively acquired by a plurality of types of radar sensors, and each image frame in the image frame set and each point cloud frame in the point cloud frame set correspond to an acquisition time stamp;
determining an image frame and at least one point cloud frame of which the acquisition time stamps meet the requirement of a preset time threshold from the image frame set and the point cloud frame set;
An abstract point cloud space centered on a vehicle is constructed, and an image frame and at least one point cloud frame meeting the requirement of a preset time threshold are converted into a coordinate system of the abstract point cloud space, so that an image frame and at least one point cloud frame after space-time synchronization are obtained.
Optionally, extracting the features of the target image frame includes:
Scaling the target image frame to a preset size to obtain a scaled image frame;
Extracting semantic features of the zoomed image frames, and up-sampling the semantic features to obtain feature information of target dimensions, wherein preset dimensions contained in the feature information of the target dimensions represent depth prediction information of the target image frames;
And determining the image coding characteristics of the target image frame according to the characteristic information of the target dimension.
Optionally, extracting the feature of the target point cloud frame includes:
Screening out point cloud data which does not meet the preset space range corresponding to the extracted point cloud space in the target point cloud frame to obtain a target point Yun Zhen after screening the point cloud data;
and extracting characteristics of the target point cloud frames after the point cloud data are screened.
Optionally, the abstract point cloud space is divided into a plurality of subspaces in advance, and the space range of each subspace is the same;
voxel operation is carried out on the image coding features in the abstract point cloud space, and the voxel operation comprises the following steps:
Determining a space coordinate tensor corresponding to the image coding characteristics;
and voxelizing the image coding feature and the space coordinate tensor to divide the image coding feature into a plurality of subspaces so as to obtain the image voxel feature.
Optionally, determining the spatial coordinate tensor corresponding to the image coding feature includes:
Determining depth prediction information of pixel points in a target image frame;
Determining three-dimensional coordinate information of the pixel points in the target image frame according to the depth prediction information of the pixel points in the target image frame and the two-dimensional coordinate information of the pixel points in the target image frame;
And determining a space coordinate tensor corresponding to the image coding feature according to the three-dimensional coordinate information of the pixel point in the target image frame.
Optionally, performing top view fusion processing on the image voxel features and the point cloud voxel features in an abstract point cloud space to obtain fused top view features, including:
carrying out pooling treatment on voxel features respectively positioned in each subspace in the image voxel features to obtain aggregate image voxel features corresponding to each subspace, and obtaining a first top view feature by the aggregate image voxel features respectively corresponding to each subspace;
Carrying out pooling treatment on voxel features respectively positioned in each subspace in the point cloud voxel features to obtain aggregate point cloud voxel features corresponding to each subspace, and obtaining a second top view feature by the aggregate point cloud voxel features respectively corresponding to each subspace;
and superposing and fusing the first top view feature and the second top view feature to obtain a fused top view feature.
Optionally, determining parking space information for parking the vehicle based on the fused top view feature includes:
carrying out planar feature extraction treatment on the fused top view features to obtain a segmentation feature map;
Carrying out regional clustering on the segmentation feature map to obtain initial parking space information;
carrying out rectangular processing on the initial parking space information to obtain rectangular parking space information;
converting rectangular parking space information into a world coordinate system to obtain the parking space information in the world coordinate system;
and screening out the parking space information which does not meet the preset size threshold value in the parking space information under the world coordinate system to obtain the parking space information for parking the vehicle.
Optionally, the target point cloud frame includes a target laser point cloud frame and a target millimeter wave point cloud frame, wherein the target laser point cloud frame is a point cloud frame acquired by a laser radar sensor, and the target millimeter wave point cloud frame is a point cloud frame acquired by a millimeter wave radar sensor.
A parking space detection device, comprising:
The system comprises a target frame acquisition module, a target frame acquisition module and a target frame acquisition module, wherein the target frame acquisition module is used for acquiring an image frame and at least one point cloud frame after space-time synchronization, different point cloud frames are acquired based on different types of radar sensors on a vehicle, the image frame is acquired based on a camera on the vehicle, the difference value of an acquisition time stamp of each point cloud frame and the image frame is smaller than a preset time threshold, and the image frame and each point cloud frame are positioned in a coordinate system of an abstract point cloud space which is built in advance and takes the vehicle as a center;
The feature extraction module is used for taking the acquired image frames as target image frames, taking at least one acquired point cloud frame as a target point cloud frame, respectively extracting features of the target image frames and the target point cloud frames, taking the features of the target image frames as image coding features and taking the features of the target point cloud frames as point cloud features, wherein semantic information and depth prediction information of the target image frames are fused in the image coding features, and the point cloud features represent material information of a point cloud target and distance information between the point cloud target and a vehicle;
The voxelization module is used for respectively voxelization processing of the image coding feature and the point cloud feature in the abstract point cloud space to obtain an image voxel feature and a point cloud voxel feature;
The feature fusion module is used for carrying out overlooking fusion processing on the image voxel features and the point cloud voxel features in the abstract point cloud space to obtain fusion overhead view features;
And the parking space detection module is used for determining parking space information for parking the vehicle based on the fused top view characteristics.
According to the technical scheme, the parking space detection method comprises the steps of firstly obtaining an image frame and at least one point cloud frame after space-time synchronization, then taking the obtained image frame as a target image frame, taking the obtained at least one point cloud frame as a target point cloud frame, respectively extracting features of the target image frame and the target point cloud frame, taking the features of the target image frame as image coding features, taking the features of the target point cloud frame as point cloud features, then respectively carrying out voxelization processing on the image coding features and the point cloud features in an abstract point cloud space to obtain image voxel features and point cloud voxel features, then carrying out overlooking fusion processing on the image voxel features and the point cloud voxel features in the abstract point cloud space to obtain fusion top view features, and finally determining parking space information for a vehicle based on the fusion top view features. According to the application, parking space detection can be performed based on the target image frames and the target points Yun Zhen after space-time synchronization, the image coding features and the point cloud features can be extracted efficiently during parking space detection, and voxel features respectively corresponding to the image coding features and the point cloud features can be fused in an abstract point cloud space, so that the accuracy and the reliability of parking space information are improved effectively.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a parking space detection method according to an embodiment of the present application;
fig. 2 is a schematic flow chart of another parking space detection method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a parking space detection device according to an embodiment of the present application;
Fig. 4 is a hardware structure block diagram of a parking space detection device provided by the embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The application provides a parking space detection method, which is described in detail by the following embodiment.
Referring to fig. 1, a flow chart of a parking space detection method provided by an embodiment of the present application is shown, where the parking space detection method may include:
step S101, acquiring an image frame and at least one point cloud frame after space-time synchronization.
The method comprises the steps that different point cloud frames are acquired based on radar sensors of different types on a vehicle, image frames are acquired based on cameras on the vehicle, the difference value of acquisition time stamps of the image frames and each point cloud frame is smaller than a preset time threshold, and the image frames and each point cloud frame are located in a coordinate system of an abstract point cloud space which is built in advance and is centered on the vehicle.
The image frames comprise parking space lines and barriers (such as physical objects in the environments of walls, upright posts, vehicles and the like), and can be acquired by cameras (such as binocular cameras and the like) arranged on the vehicles; the point cloud frame contains obstacles, and the obstacles can be acquired by radar sensors (such as millimeter wave radar sensors, laser radar sensors and the like) installed on the vehicle.
It is noted that, in the at least one point cloud frame acquired in this step, different point cloud frames are acquired by different types of radar sensors, for example, one point cloud frame acquired by a laser radar sensor and one point cloud frame acquired by a millimeter wave radar sensor are included in the at least one point cloud frame.
The spatio-temporal synchronization in this step includes time synchronization and spatial synchronization.
Considering that the time stamps of the data frames acquired by different acquisition devices (i.e. the camera and the radar sensor) may be different, if the time stamps of the two data frames differ greatly, movement of an obstacle may occur between the acquisition time stamps of the two data frames, resulting in inaccuracy of the detected parking space. For this, this step requires acquisition of time-synchronized image frames and at least one point cloud frame.
Here, time synchronization does not mean that the acquisition time stamps of the image frame and at least one point cloud frame are identical, but that the difference between the acquisition time stamp of the image frame and the acquisition time stamp of any one point cloud frame is smaller than a preset time threshold. Optionally, the preset time threshold corresponding to the difference between the collection timestamp of the image frame and the collection timestamp of the different point cloud frames may be different, for example, at least one point cloud frame includes a first point cloud frame and a second point cloud frame, if the difference between the collection timestamp of the image frame and the collection timestamp of the first point cloud frame is smaller than the preset time threshold 1, and the difference between the collection timestamp of the image frame and the collection timestamp of the second point cloud frame is smaller than the preset time threshold 2, then the image frame, the first point cloud frame and the second point cloud frame are considered to be time-synchronized.
It should be understood that the image frames acquired by the camera are in an image coordinate system, while the point cloud frames acquired by the radar sensor are in a corresponding radar coordinate system, and that for facilitating subsequent processing, spatial synchronization needs to be performed on the image frames and at least one point cloud frame, i.e. the image frames and each point cloud frame are converted into a coordinate system of a pre-built vehicle-centered abstract point cloud space.
Here, the abstract point cloud space includes a plurality of subspaces parallel to the ground and the same resolution.
Optionally, the process of constructing the abstract point cloud space centered on the vehicle in the step includes: with the vehicle as the center, a square range of 100 meters formed by 100 meters is respectively arranged in the x (such as the x direction is parallel to the ground and points to the right front of the vehicle) and the y (such as the y direction is parallel to the ground and points to the left side of the driver) (namely, the application needs to detect the parking space information in the range of 100 meters around the vehicle), then the square range is uniformly divided into 200 x 200 square grids, and the resolution of each square grid is 0.5 meter by 0.5 meter of the vehicle.
Of course, the preset spatial range (100 meters×100 meters) in the abstract point cloud space and the resolution of each square are examples, and are not limiting to the present application.
It should also be noted that the present application can perform data information interaction, integration and processing through the middleware tool (ROS 2) of the vehicle.
Step S102, taking the acquired image frames as target image frames, taking at least one acquired point cloud frame as a target point cloud frame, respectively extracting the characteristics of the target image frames and the target point cloud frames, taking the characteristics of the target image frames as image coding characteristics, and taking the characteristics of the target point cloud frames as point cloud characteristics.
Wherein, the image coding characteristics are fused with semantic information and depth prediction information of the target image frame. Optionally, the point cloud features extracted in the step represent material information of a point cloud target and distance information between the point cloud target and a vehicle, where the point cloud target specifically refers to an obstacle.
Optionally, the point cloud features include, but are not limited to, the following information: the coordinate value of the point cloud, the radar cross section RCS of the millimeter wave point cloud and the speed of the millimeter wave point cloud.
Considering that the millimeter wave radar sensor is not influenced by illumination and weather, depth and other target characteristic information can be provided, the image contains rich semantic information, the detection precision is high, the laser radar sensor has the characteristics of high precision, high detection resolution, long detection distance and the like, the capability of providing different modes and certain complementarity of sensing information for the current environment by the millimeter wave radar sensor, the laser radar sensor and the camera can be fully excavated, and the high-precision parking space detection is realized.
In an alternative embodiment, the target point cloud frame includes a target laser point cloud frame and a target millimeter wave point cloud frame, wherein the target laser point cloud frame is a point cloud frame acquired by the laser radar sensor, and the target millimeter wave point cloud frame is a point cloud frame acquired by the millimeter wave radar sensor. And step S103, respectively carrying out voxelization processing on the image coding feature and the point cloud feature in the abstract point cloud space to obtain an image voxel feature and a point cloud voxel feature.
The voxelization process in this step is used to divide the image coding features and the point cloud features into an extracted point cloud space.
And step S104, carrying out overlooking fusion processing on the image voxel characteristics and the point cloud voxel characteristics in an abstract point cloud space to obtain fusion overhead view characteristics.
In this step, the top view fusion process is used to fuse the image voxel features and the point cloud voxel features in the same subspace.
The abstract point cloud space constructed by the method is a two-dimensional space, and depth prediction information is fused in the image coding features, so that the image voxel features corresponding to the image coding features are features in a three-dimensional space, and the image voxel features and the point cloud voxel feature abstract point cloud space can be subjected to overlook fusion processing in the processing mode of the step.
Here, the top view fusion process is used for fusing the image voxel features and the point cloud voxel features under different heights in the same subspace, for example, one subspace contains the image voxel feature 1, the point cloud voxel feature 1 of the first height, and the point cloud voxel feature 2 of the second height, and then the image voxel feature 1, the point cloud voxel feature 1 and the point cloud voxel feature 2 can be fused in this step to obtain the fusion feature under the square, so as to obtain the final fusion top view feature.
Step S105, based on the fused top view features, parking space information for parking the vehicle is determined.
The fused top view feature is essentially a bird's eye view feature map about parking space lines and obstacles, so that it can be accurately determined which positions have parking spaces and which parking spaces have obstacles in the current scene through the fused top view feature, and parking space information corresponding to vehicles is obtained. Here, the parking space information refers to three-dimensional space position information of a parking space in which a vehicle can park.
According to the parking space detection method, firstly, an image frame and at least one point cloud frame after space-time synchronization are acquired, then the acquired image frame is taken as a target image frame, the acquired at least one point cloud frame is taken as a target point cloud frame, the characteristics of the target image frame and the target point cloud frame are respectively extracted, the characteristics of the target image frame are taken as image coding characteristics, the characteristics of the target point cloud frame are taken as point cloud characteristics, then voxel processing is carried out on the image coding characteristics and the point cloud characteristics in an abstract point cloud space to obtain image voxel characteristics and point cloud voxel characteristics, then overlook fusion processing is carried out on the image voxel characteristics and the point cloud characteristics in the abstract point cloud space to obtain fusion plan view characteristics, and finally parking space information for vehicle parking is determined based on the fusion plan view characteristics. According to the application, parking space detection can be performed based on the target image frames and the target points Yun Zhen after space-time synchronization, the image coding features and the point cloud features can be extracted efficiently during parking space detection, and voxel features respectively corresponding to the image coding features and the point cloud features can be fused in an abstract point cloud space, so that the accuracy and the reliability of parking space information are improved effectively.
In one possible implementation manner, the process of "step S101, acquiring an image frame and at least one point cloud frame after spatio-temporal synchronization" may include:
Step S1011, acquiring an image frame set and a point cloud frame set.
The image frame set comprises a plurality of image frames collected by the camera, the point cloud frame set comprises a plurality of point cloud frames respectively collected by a plurality of types of radar sensors, and each image frame in the image frame set and each point cloud frame in the point cloud frame set correspond to a collection time stamp.
Here, the image frame acquired by the camera is in YUV format, and the point cloud frame acquired by the radar sensor is the point cloud frame under the corresponding radar coordinate system. The method can acquire a plurality of YUV format image frames and a plurality of point cloud frames under a radar coordinate system, which are respectively acquired by a camera and a radar sensor on a vehicle in a period of time. Here, the plurality of point cloud frames include a plurality of types of point cloud frames acquired by a radar sensor, for example, a plurality of laser points Yun Zhen acquired by a laser radar sensor and a plurality of millimeter wave point cloud frames acquired by a millimeter wave radar sensor.
For example, in this step, a plurality of image frames in YUV format acquired by a camera may be acquired, a millimeter wave point cloud frame under a plurality of millimeter wave radar polar coordinate systems acquired by a millimeter wave radar sensor may be acquired, and a laser point cloud frame under a plurality of laser radar three-dimensional coordinate systems acquired by a laser radar sensor may be acquired.
In order to facilitate the subsequent space-time synchronization, after the image frames in the YUV format are acquired, the image frames in the YUV format are analyzed, a plurality of analyzed image frames form an image frame set in the step, after the point cloud frames in a plurality of radar coordinate systems are acquired, the point cloud frames in the plurality of radar coordinate systems are analyzed, and the analyzed plurality of point cloud frames form the point cloud frame set in the step.
Optionally, parsing the image frame includes: and performing RGB format conversion and preprocessing on the image frame to obtain the image frame composed of normalized floating point pixel data in RGB format.
Specifically, the present step may first perform RGB format conversion on an image frame in YUV format to obtain an image frame in RGB format, and then perform preprocessing on the image frame in RGB format to obtain an image frame composed of normalized floating point pixel data in RGB format. Here, preprocessing is used to process pixel data in an image frame in TGB format into normalized floating point data.
The calculation formula for converting the YUV image frame into the RGB format is as follows:
wherein Y, U, V represents the Y value, U value and V value in the image frame of YUV format, R, G, B represents the R value, G value and B value obtained by format conversion, respectively.
The normalized calculation formula is:
i= (RGB-mean)/std equation (2)
In the formula, RGB is three channel pixel values corresponding to R value, G value and B value, mean is an image mean calculated based on the RGB value, std is an image variance calculated based on the RGB value, and I is an image frame formed by normalized floating point pixel data in RGB format, wherein the image frame is one image frame in an image frame set.
Optionally, taking an example that the point cloud frames in the point cloud frame set include a laser point cloud frame in a laser radar three-dimensional coordinate system and a millimeter wave point cloud frame in a millimeter wave radar polar coordinate system, the step can convert the laser point cloud frame in the laser radar three-dimensional coordinate system and the millimeter wave point cloud frame in the millimeter wave radar polar coordinate system into a world coordinate system.
Taking a millimeter wave point cloud frame under a millimeter wave radar polar coordinate system as an example for explanation, assuming that an offset vector of a millimeter wave radar sensor relative to a world coordinate system is tr= [ Trx, try, trz ], trx represents an offset component of the millimeter wave radar sensor relative to the world coordinate system in an x direction, try represents an offset component of the millimeter wave radar sensor relative to the world coordinate system in a y direction, trz represents an offset component of the millimeter wave radar sensor relative to the world coordinate system in a z direction, a calculation formula for converting millimeter wave point cloud data in the millimeter wave point cloud frame into a three-dimensional world coordinate system is as follows:
wherein x w、yw and z w are coordinate values of the millimeter wave point cloud in x, y and z directions under a world coordinate system, R is a radial distance between the millimeter wave radar sensor and a target (e.g. an obstacle), and θ is an azimuth angle between the millimeter wave radar sensor and the target.
Step S1012, determining an image frame and at least one point cloud frame whose acquisition time stamps meet a preset time threshold requirement from the image frame set and the point cloud frame set.
Specifically, the step can use the acquisition time stamp to perform time synchronization, so as to obtain the same frame of data in time alignment.
Taking an example that the point cloud frame set includes a plurality of laser point cloud frames and a plurality of millimeter wave point cloud frames, the following formula can be adopted for time alignment in this step:
wherein, T sr represents an acquisition time stamp corresponding to a millimeter wave point cloud frame, T sl represents an acquisition time stamp corresponding to a laser point cloud frame, T sc represents an acquisition time stamp corresponding to an image frame in an image frame set, and threshold represents a preset time threshold, which can be defined according to specific requirements.
In this step, when Sync is true, it represents that the millimeter wave point cloud frame, the laser point cloud frame and the image frame currently substituted in formula (4) are time-synchronized, and as in the above formula (4), two comparisons with threshold are required to be performed, so that both the millimeter wave point cloud frame and the laser point Yun Zhen can be matched with the image frame; when Sync is false, it represents that the millimeter wave point cloud frame, the laser point cloud frame and the image frame currently substituted in formula (4) are not time-synchronized.
The above formula (4) is only an example, and is not limited to this step.
It should be noted that, the process from step S1011 to step S1012 is to analyze the image frame and the point cloud frame first, and then time synchronize the analyzed image frame and the point cloud frame, but the process is only an optional implementation manner of the present application, and the present application may also time synchronize the image frame and the point cloud frame first, and then analyze the time synchronized point cloud frame and the time synchronized image frame, so as to improve efficiency.
Step S1013, constructing an abstract point cloud space with a vehicle as a center, and converting an image frame and at least one point cloud frame meeting the requirement of a preset time threshold into a coordinate system of the abstract point cloud space to obtain an image frame and at least one point cloud frame after space-time synchronization.
The process of constructing the extraction point cloud space may refer to the description in step S101, and will not be described herein.
The image frame and the at least one point cloud frame meeting the requirement of the preset time threshold are an image frame and the at least one point cloud frame after time synchronization. The step can respectively convert an image frame and at least one point cloud frame after time synchronization into a coordinate system of an abstract point cloud space.
Taking the example that the at least one point cloud frame after time synchronization comprises a laser point Yun Zhen and a millimeter wave point cloud frame, the step can convert the external parameter matrix obtained by the laser point cloud frame and the millimeter wave point cloud frame after time synchronization through a radar sensor calibration algorithm into a coordinate system under an abstract point cloud space, so that the space synchronization of the laser point cloud frame and the millimeter wave point cloud frame after time synchronization is realized.
The coordinate conversion of any point cloud data in the laser point cloud frame and the millimeter wave point cloud frame after time synchronization adopts the following formula (5):
Wherein x w、yw and z w are three-dimensional coordinates of a point cloud in a world coordinate system (for millimeter wave point clouds, the meanings of the point clouds are the same as those of x w、yw and z w in the formula (3)), x c、yc and z c are three-dimensional coordinates of the point cloud in an abstract point cloud space coordinate system, tc is a translation matrix (obtained by calibrating a calibration plate when the equipment leaves a factory), rc is a rotation matrix (obtained by rotating point cloud data by an angle alpha degree around an x axis, a rotation angle beta degree around a y axis, and a rotation angle gamma degree around a z axis in sequence), alpha, beta and gamma are preset values, "0" below Rc in the formula (5) is a 0 matrix of 2x 2, and "1" below Tc is a1 matrix of 2x 2.
In this step, the calculation formula of the rotation matrix Rc is as follows (formula 6).
In summary, in the embodiment, the space-time synchronization is performed on the image frames and the point cloud frames, so that the subsequent parking space detection is performed on the point cloud frames and the image frames based on the space-time synchronization, and the accuracy of the parking space detection is improved.
The following describes a process of "extracting features of a target image frame" in step S102.
Optionally, the process of "extracting features of the target image frame" includes the steps of:
step S1021, scaling the target image frame to a preset size to obtain a scaled image frame.
Optionally, the scaled image frame has a size of 128x352x3.
Step S1022, extracting semantic features of the scaled image frames, and up-sampling the semantic features to obtain feature information of the target dimension.
The preset dimension in the characteristic information of the target dimension represents the depth prediction information of the target image frame.
Alternatively, this step may pre-train a EFFICIENTNET network and input the scaled image frames into the trained EFFICIENTNET network to extract the semantic features (which are a high-dimensional semantic feature) of the scaled image frames over the EFFICIENTNET network. The construction and training process of EFFICIENTNET networks is the prior art, and detailed description of the prior art will be referred to herein.
The step can expand the dimension of the semantic feature to a target dimension through an upsampling layer of EFFICIENTNET network, where the target dimension refers to d+c dimension, D represents the dimension of the depth prediction information (i.e., a preset dimension), and C represents the final planar image feature dimension (the dimension of the semantic feature).
Referring to the abstract point cloud space constructed in step S101, the depth prediction range of the image depth prediction network (for example, convolutional neural CNN network) may be limited in this step, alternatively, this step may be limited to a range from 4 meters to 45 meters from the vehicle, and the depth resolution is 1 meter, so that the image depth prediction network may obtain a depth prediction value of 41 dimensions, that is, the D value in this step may be specifically 41.
Of course, the semantic features are extracted by using EFFICIENTNET networks in this embodiment only as an example, other networks may be used in addition, for example, the convolutional neural network model RegNet may be used, for example, the semantic features may be extracted by using EFFICIENTNET networks and the convolutional neural network model RegNet in this step, respectively.
Step S1023, determining the image coding characteristics of the target image frame according to the characteristic information of the target dimension.
Specifically, the step can multiply the front D-dimensional depth feature information and the rear C-dimensional semantic feature in the feature information of the target dimension to obtain the image coding feature of the target image frame.
According to the embodiment, the image coding features fused with the semantic features and the depth prediction information can be extracted through EFFICIENTNET networks, so that the accuracy and the reliability of parking space detection are effectively improved.
In an alternative embodiment, the process of "extracting the feature of the target point cloud frame" in step S102 may include: and screening out the point cloud data which does not meet the preset space range corresponding to the extracted point cloud space in the point cloud frame to obtain a point cloud frame after screening the point cloud data, and extracting the characteristics of the point cloud frame after screening the point cloud data.
Specifically, the preset space range of the abstract point cloud space constructed in the aforementioned step S101 is a range of 50 meters in the front-rear and left-right directions of the vehicle, that is, a range from minus 50 meters to plus 50 meters in the x-and y-directions of the abstract point cloud space.
And the coordinate value of a part of the point clouds possibly existing in the target point cloud frame acquired in the step S101 exceeds the preset space range, the coordinate value is screened out of the target point cloud frame through the step, and the target point cloud frame after the point cloud data are screened is obtained from the rest of the point cloud data. And then, extracting the characteristics of the target point cloud frames after screening the point cloud data to obtain the point cloud characteristics.
After the image encoding features and the point cloud features are extracted in step S102, a voxelization operation can be performed in step S103.
In one possible implementation, the process of "voxelizing the image coding feature in the abstract point cloud space" in step S103 includes:
step S1031, determining a spatial coordinate tensor corresponding to the image coding feature.
Here, the "determining the spatial coordinate tensor corresponding to the image coding feature" includes: determining depth prediction information of pixel points in a target image frame; determining three-dimensional coordinate information of the pixel points in the target image frame according to the depth prediction information of the pixel points in the target image frame and the two-dimensional coordinate information of the pixel points in the target image frame; and determining a space coordinate tensor corresponding to the image coding feature according to the three-dimensional coordinate information of the pixel point in the target image frame.
Optionally, the depth prediction information of each pixel in the target image frame may be determined by the image depth prediction network mentioned in the foregoing step S1022, specifically, the target image frame is input into the image depth prediction network to obtain the depth prediction information of each pixel output by the image depth prediction network, where the depth prediction information is one data in a range of 4 meters to 45 meters, for example, the depth prediction information of a pixel is 30 meters, which indicates that the object corresponding to the pixel is 30 meters from the vehicle.
In this step, "two-dimensional coordinate information of a pixel point in a target image frame" refers to two-dimensional coordinate information of a pixel point in the target image frame, for example, a pixel point is located at the (20,54) position of the target image frame with the upper left corner as the origin of coordinates. Then, assuming that the depth prediction information of the pixel is 30, the three-dimensional coordinate information of the pixel is (20,54,30).
After the three-dimensional coordinate information of each pixel point in the target image frame is obtained through the process, the spatial coordinate tensor corresponding to the image coding feature can be determined based on the three-dimensional coordinate information of the pixel point in the target image frame.
Optionally, the image encoding features have a size of 41x8x22x64 and the spatial coordinate tensor has a size of 41x8x22x3.
Step S1032, voxelizing the image coding feature and the space coordinate tensor to divide the image coding feature into a plurality of subspaces, thereby obtaining the image voxel feature.
Specifically, in this step, the image coding feature and the corresponding space coordinate tensor are subjected to voxelization operation, so that the image coding feature can be divided into the abstract point cloud space, and feature information of the image coding feature in each subspace included in the abstract point cloud space, namely, the image voxel feature, is obtained. The process of voxelization is prior art and will not be described in detail herein.
Correspondingly, the process of "voxel processing on point cloud features in abstract point cloud space" in step S103 includes: and voxelizing the point cloud features to divide the point cloud features into a plurality of subspaces, so as to obtain the point cloud voxel features.
Specifically, in this step, the point cloud features may be subjected to voxelization (the common operation of processing data of the point cloud by voxelization of the point cloud, converting disordered three-dimensional points into ordered features of 3D voxels, which are widely applied to vehicle-mounted and ground three-dimensional lasers), and may be divided into each subspace included in the abstract point cloud space, so as to obtain features of the point cloud features in each subspace included in the abstract point cloud space, i.e., the point cloud voxel features in each subspace.
Through the voxelization operation of the embodiment, the image coding feature and the point cloud feature can be divided into each subspace contained in the extracted point cloud space, so that the subsequent fusion is facilitated.
In an alternative embodiment, the process of step S104 "performing the top view fusion processing on the image voxel feature and the point cloud voxel feature in the abstract point cloud space to obtain the fused top view feature" is described.
Specifically, the process may include:
Step S1041, performing sum pooling processing on voxel features respectively located in each subspace in the image voxel features to obtain aggregate image voxel features corresponding to each subspace, and obtaining a first top view feature from the aggregate image voxel features respectively corresponding to each subspace.
This step may aggregate the image voxel features per subspace. Specifically, the image voxel feature may represent feature information of an object (such as an obstacle, a parking space line, etc.) included in the target image frame, and the pooling (sum pooling) processing in this step means that the image voxel feature is mapped to voxel features in each subspace included in the abstract point cloud space and is aggregated, that is, this step is equivalent to aggregating feature information of an object in each sub-interval included in the abstract point cloud space, so that the aggregate image voxel feature corresponding to each sub-interval included in the abstract point cloud space obtained can represent the object in each sub-interval, and therefore, the first top view feature obtained from the aggregate image voxel feature corresponding to each sub-interval is essentially a bird's eye view feature map under the target image frame. And step S1042, carrying out pooling treatment on voxel features respectively located in each subspace in the point cloud voxel features to obtain an aggregate point cloud voxel feature corresponding to each subspace, and obtaining a second top view feature by the aggregate point cloud voxel features respectively corresponding to each subspace.
The step may aggregate the point cloud voxel features according to each subspace, specifically, the point cloud voxel features may reflect the information of the obstacle in at least one point cloud frame acquired in the step S101, and the pooling processing in the step refers to aggregating the voxel features of the point cloud voxel features in each subinterval included in the abstract point cloud space, that is, the step is equivalent to aggregating the information of the obstacle in each subinterval included in the abstract point cloud space, so that the aggregate point cloud voxel features corresponding to each subinterval included in the abstract point cloud space obtained by the step may represent the obstacle in each square, and therefore, the second top view feature information obtained by the aggregate point cloud voxel features respectively corresponding to each subinterval is essentially a bird' S eye view feature map under the target point cloud frame.
And step S1043, overlapping and fusing the first top view feature and the second top view feature to obtain a fused top view feature.
In this step, the first top view feature and the second top view feature may be superimposed to obtain a final fused top view feature.
Alternatively, the fused top view features have dimensions of 64x1x200x200.
In one possible implementation manner, the present application designs a bird 'S-eye view feature image encoder, in which steps S1041 to S1043 are implemented, and then the image voxel feature and the point cloud voxel feature are input into the bird' S-eye view feature image encoder, that is, the fused top view feature may be encoded.
In another possible implementation manner, the present application designs a multi-modal feature level fusion network, and implements the voxelization in the step S103 and the steps S1041 to 1043 in the multi-modal feature level fusion network, and then inputs the image coding feature and the point cloud feature into the multi-modal feature level fusion network, so as to obtain the fused top view feature. The multi-mode feature level fusion network can realize feature level fusion of visual images and radar point cloud data, can efficiently predict the specific position of a parking space in a three-dimensional space, simultaneously carries out optimal compression to the greatest extent on a network structure, and can achieve real-time requirements to be applied to real application scenes.
According to the embodiment, the features (namely the point cloud features and the image coding features) of different data modes can be aggregated, so that the parking space information can be detected based on the fused top view features obtained by aggregation, and the accuracy and the reliability of parking space detection are improved.
In still another possible implementation manner, the present application may further design a parking space detection model, in which the above steps S102 to S105 are implemented, and then input an image frame and at least one point cloud frame obtained in step S101 after the space-time synchronization into the parking space detection model, so as to obtain the parking space information. Here, the parking space detection model is a neural network model, and the specific training process can refer to the training process of the neural network model in the prior art, which is not described herein.
The parking space detection model in the embodiment forms a single network structure for a part of structures integrating a plurality of classical detection networks, so that a series of processes such as feature extraction and fusion can be performed on input target image frames and target point cloud frames, and end-to-end target detection is realized.
The application also provides the following embodiment, and the process of determining parking space information for parking a vehicle based on the fused top view features in step S105 is described.
Optionally, "step S105, determining parking space information for parking the vehicle based on the fused overhead view feature information" may include:
and step S1051, performing planar feature extraction processing on the fused top view features to obtain a segmentation feature map.
Optionally, the plane feature extraction processing in the step can be implemented by adopting the first three layers of convolution layers layer 1-3 of the resnet neural network structure and up-sampling through two layers of bilinear interpolation, that is, the step can be implemented by adopting the first three layers of convolution layers layer 1-3 of the resnet neural network structure to process the fusion top view features and up-sampling the information obtained after the processing through two layers of bilinear interpolation to obtain the segmentation feature map.
And step S1052, carrying out regional clustering on the segmentation feature map to obtain initial parking space information.
Through regional clustering, the method can cluster the features belonging to one parking space to be parked in the segmentation feature map together to obtain initial parking space information in a preset space range of the extraction point cloud space.
And step S1053, carrying out rectangular processing on the initial parking space information to obtain rectangular parking space information.
The actual parking space is a rectangular parking space, and the obtained initial parking space information may not be a rectangular parking space, so that rectangular (rectangle) processing is required to be performed on the initial parking space information in this step to obtain rectangular parking space information.
Alternatively, the rectangular parking space information may be abstract point cloud space coordinate information of the parking space to be parked.
And step S1054, converting the rectangular parking space information into a world coordinate system to obtain the parking space information in the world coordinate system.
Because the rectangular parking space information is information in the abstract point cloud space, the information needs to be converted into a world coordinate system to obtain the real parking space information.
Optionally, the step can utilize the physical world resolution of the segmentation feature map to perform inverse transformation of the real physical space coordinates of the parking space of the vehicle to be parked, so as to obtain the parking space information under the world coordinate system.
And step S1055, screening out the parking space information which does not meet the preset size threshold value in the parking space information under the world coordinate system to obtain the parking space information for parking the vehicle.
Considering that the size of the parking space of the vehicle to be parked needs to meet a certain size threshold value, the parking space meeting the requirement is the parking space which can be parked, if the size of the parking space to be parked is smaller than that of the vehicle, the vehicle cannot park at the parking space, and therefore the screening process of the step is also needed, and the parking space information which can be used for parking the vehicle can be obtained.
Alternatively, the step may first determine the size of the parking space information in the world coordinate system, and then compare the determined size with a preset size threshold value to screen out the parking space information that does not satisfy the preset size threshold value.
Here, the preset size threshold may be set according to actual conditions, for example, may be set according to actual conditions of the vehicle, or may be set according to a maximum size of an existing type of vehicle, or the like, and for example, alternatively, the preset size threshold may be 2 meters by 5 meters.
In summary, according to the parking space detection method provided by the application, only the LiftSplatShoot algorithm of the image is used as a base line, the method of combining radar point cloud information is used for optimizing and enhancing the parking space detection method, the fusion structure of the space part of the extracted point cloud is modified, the radar point cloud characteristics are added on the basis of visual semantic characteristics, the fusion of high-dimensional characteristic layers is realized, and the problem of insufficient information of a single sensor is solved. The method has the advantages that effective information of visual image data and radar data is efficiently extracted and fused in the feature space, the accuracy and reliability of parking space three-dimensional position detection in a complex scene are effectively improved, meanwhile, the effective detection distance is improved, and a larger range of space parking spaces than that of the traditional detection method can be detected.
In order for those skilled in the art to more understand the present application, the procedure of the foregoing embodiments is summarized by the following examples.
Referring to fig. 2, a flow chart of another parking space detection method provided by the application is shown in fig. 2, and the parking space detection method comprises the following steps:
And S1, acquiring an image frame set and a point cloud frame set.
And S2, determining an image frame and at least one point cloud frame of which the acquisition time stamps meet the preset time threshold requirement from the image frame set and the point cloud frame set.
And S3, constructing an abstract point cloud space with the vehicle as the center, and converting an image frame and at least one point cloud frame meeting the requirement of a preset time threshold into a coordinate system of the abstract point cloud space to obtain an image frame and at least one point cloud frame after space-time synchronization.
And S4a, taking the image frames after space-time synchronization as target image frames, scaling the target image frames to a preset size, and after extracting semantic features of the scaled image frames, upsampling the semantic features to obtain feature information of a target dimension, and determining image coding features of the target image frames according to the feature information of the target dimension.
And S4b, taking at least one point cloud frame after space-time synchronization as a target point cloud frame, extracting characteristics of the target point cloud frame, and taking the characteristics of the target point cloud frame as point cloud characteristics.
And S5, respectively carrying out voxelization processing on the image coding feature and the point cloud feature in the abstract point cloud space to obtain an image voxel feature and a point cloud voxel feature.
And S6, carrying out overlooking fusion processing on the image voxel characteristics and the point cloud voxel characteristics in an abstract point cloud space to obtain fusion overhead view characteristics.
And S7, determining parking space information for parking the vehicle based on the fused top view features.
The process from step S1 to step S7 corresponds to the steps in the foregoing embodiments, and detailed description of the corresponding steps is referred to herein and will not be repeated.
The embodiment of the application also provides a parking space detection device, which is described below, and the parking space detection device described below and the parking space detection method described above can be referred to correspondingly.
Referring to fig. 3, a schematic structural diagram of a parking space detection device provided by an embodiment of the present application is shown, and as shown in fig. 3, the parking space detection device may include: a target frame acquisition module 301, a feature extraction module 302, a voxelization module 303, a feature fusion module 304 and a parking space detection module 305.
The target frame acquisition module 301 is configured to acquire an image frame and at least one point cloud frame after space-time synchronization, where different point cloud frames are acquired based on different types of radar sensors on a vehicle, the image frame is acquired based on a camera on the vehicle, a difference value between an acquisition timestamp of the image frame and an acquisition timestamp of each point cloud frame is smaller than a preset time threshold, and the image frame and each point cloud frame are in a coordinate system of a pre-built abstract point cloud space centered on the vehicle.
The feature extraction module 302 is configured to take the acquired image frame as a target image frame, take the acquired at least one point cloud frame as a target point cloud frame, respectively extract features of the target image frame and the target point cloud frame, take features of the target image frame as image coding features, and take features of the target point cloud frame as point cloud features, wherein semantic information and depth prediction information of the target image frame are fused in the image coding features, and the point cloud features represent material information of a point cloud target and distance information between the point cloud target and a vehicle.
And the voxelization module 303 is used for respectively voxelizing the image coding feature and the point cloud feature in the abstract point cloud space to obtain the image voxel feature and the point cloud voxel feature.
The feature fusion module 304 is configured to perform top view fusion processing on the image voxel feature and the point cloud voxel feature in the abstract point cloud space, so as to obtain a fused top view feature.
The parking space detection module 305 is configured to determine parking space information for parking the vehicle based on the fused top view feature.
The parking space detection device provided by the application is characterized in that firstly, an image frame and at least one point cloud frame after space-time synchronization are acquired, then the acquired image frame is taken as a target image frame, the acquired at least one point cloud frame is taken as a target point cloud frame, the characteristics of the target image frame and the target point cloud frame are respectively extracted, the characteristics of the target image frame are taken as image coding characteristics, the characteristics of the target point cloud frame are taken as point cloud characteristics, then the image coding characteristics and the point cloud characteristics are respectively subjected to voxelization processing in an abstract point cloud space to obtain image voxel characteristics and point cloud voxel characteristics, then the image voxel characteristics and the point cloud voxel characteristics are subjected to overlook fusion processing in the abstract point cloud space to obtain fusion plan view characteristics, and finally, parking space information for vehicle parking is determined based on the fusion plan view characteristics. According to the application, parking space detection can be performed based on the target image frames and the target points Yun Zhen after space-time synchronization, the image coding features and the point cloud features can be extracted efficiently during parking space detection, and voxel features respectively corresponding to the image coding features and the point cloud features can be fused in an abstract point cloud space, so that the accuracy and the reliability of parking space information are improved effectively.
In one possible implementation manner, the target frame acquisition module 301 may include: the system comprises a data frame set acquisition module, a time synchronization module and a space synchronization module.
The system comprises a data frame set acquisition module, a point cloud frame set acquisition module and a data frame set acquisition module, wherein the data frame set acquisition module is used for acquiring an image frame set and a point cloud frame set, the image frame set comprises a plurality of image frames acquired by a camera, the point cloud frame set comprises a plurality of point cloud frames acquired by a plurality of types of radar sensors respectively, and each image frame in the image frame set and each point cloud frame in the point cloud frame set correspond to an acquisition time stamp.
And the time synchronization module is used for determining an image frame and at least one point cloud frame of which the acquisition time stamps meet the preset time threshold requirement from the image frame set and the point cloud frame set.
The space synchronization module is used for constructing an abstract point cloud space centering on the vehicle, converting an image frame and at least one point cloud frame meeting the requirement of a preset time threshold into a coordinate system of the abstract point cloud space, and obtaining an image frame and at least one point cloud frame after space-time synchronization.
In one possible implementation, the feature extraction module 302 may include, when extracting the features of the target image frame: an image frame scaling module, a semantic feature extraction module and an image coding feature determination module.
And the image frame scaling module is used for scaling the target image frame to a preset size to obtain a scaled image frame.
The semantic feature extraction module is used for extracting semantic features of the scaled image frames, and up-sampling the semantic features to obtain feature information of the target dimension, wherein the preset dimension contained in the feature information of the target dimension characterizes the depth prediction information of the target image frames.
And the image coding feature determining module is used for determining the image coding feature of the target image frame according to the feature information of the target dimension.
In one possible implementation manner, the feature extraction module 302 may include, when extracting the feature of the target point cloud frame: and the point cloud screening module and the point cloud characteristic determining module.
And the point cloud screening module is used for screening out the point cloud data which does not meet the preset space range corresponding to the extracted point cloud space in the target point cloud frame, and obtaining the target point cloud frame after the point cloud data are screened out.
And the point cloud characteristic determining module is used for extracting characteristics of the target point cloud frames after screening the point cloud data.
In one possible implementation manner, the above-mentioned extraction point cloud space is divided into a plurality of subspaces in advance, and the resolution of each subspace is the same.
The voxelization module 303 may include: the system comprises a space coordinate tensor determination module and an image voxelization module.
And the space coordinate tensor determining module is used for determining the space coordinate tensor corresponding to the image coding characteristic.
And the image voxelization module is used for voxelizing the image coding feature and the space coordinate tensor so as to divide the image coding feature into a plurality of subspaces to obtain the image voxel feature.
In one possible implementation manner, the spatial coordinate tensor determination module may include: the device comprises a depth prediction information determining module, a three-dimensional coordinate information determining module and a three-dimensional coordinate information reference module.
And the depth prediction information determining module is used for determining the depth prediction information of the pixel points in the target image frame.
And the three-dimensional coordinate information determining module is used for determining the three-dimensional coordinate information of the pixel points in the target image frame according to the depth prediction information of the pixel points in the target image frame and the two-dimensional coordinate information of the pixel points in the target image frame.
And the three-dimensional coordinate information reference module is used for determining a space coordinate tensor corresponding to the image coding feature according to the three-dimensional coordinate information of the pixel points in the target image frame.
In one possible implementation, the feature fusion module 304 may include: the system comprises a first pooling module, a second pooling module and a superposition fusion module.
And the first pooling module is used for performing pooling treatment on voxel features respectively positioned in each subspace in the image voxel features to obtain aggregate image voxel features corresponding to each subspace, and obtaining a first top view feature from the aggregate image voxel features respectively corresponding to each subspace.
And the second pooling module is used for performing pooling treatment on voxel features respectively positioned in each subspace in the point cloud voxel features to obtain an aggregate point cloud voxel feature corresponding to each subspace, and obtaining a second top view feature from the aggregate point cloud voxel features respectively corresponding to each subspace.
And the superposition fusion module is used for superposing and fusing the first top view characteristic and the second top view characteristic to obtain a fused top view characteristic.
In one possible implementation manner, the parking space detection module 305 may include: the system comprises a segmentation feature map determining module, a regional clustering module, a rectangular parking space information determining module, a world coordinate converting module and a parking space screening module.
And the segmentation feature map determining module is used for carrying out planar feature extraction processing on the fused top view features to obtain a segmentation feature map.
And the regional clustering module is used for carrying out regional clustering on the segmentation feature map to obtain initial parking space information.
And the rectangular parking space information determining module is used for carrying out rectangular processing on the initial parking space information to obtain rectangular parking space information.
And the world coordinate conversion module is used for converting the rectangular parking space information into a world coordinate system to obtain the parking space information in the world coordinate system.
And the parking space screening module is used for screening out the parking space information which does not meet the preset size threshold value in the parking space information under the world coordinate system to obtain the parking space information for parking the vehicle.
In one possible implementation manner, the target point cloud frame includes a target laser point cloud frame and a target millimeter wave point cloud frame, wherein the target laser point cloud frame is a point cloud frame acquired by a laser radar sensor, and the target millimeter wave point cloud frame is a point cloud frame acquired by the millimeter wave radar sensor.
The embodiment of the application also provides parking space detection equipment. Optionally, fig. 4 shows a block diagram of a hardware structure of the parking space detection device, and referring to fig. 4, the hardware structure of the parking space detection device may include: at least one processor 401, at least one communication interface 402, at least one memory 403, and at least one communication bus 404;
In the embodiment of the present application, the number of the processor 401, the communication interface 402, the memory 403 and the communication bus 404 is at least one, and the processor 401, the communication interface 402 and the memory 403 complete communication with each other through the communication bus 404;
processor 401 may be a central processing unit CPU, or an Application-specific integrated Circuit ASIC (Application SPECIFIC INTEGRATED Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 403 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), etc., such as at least one magnetic disk memory;
Wherein the memory 403 stores a program, the processor 401 may call the program stored in the memory 403, the program being for:
Acquiring an image frame and at least one point cloud frame after space-time synchronization, wherein different point cloud frames are acquired based on different types of radar sensors on a vehicle, the image frame is acquired based on a camera on the vehicle, the difference value of the acquisition time stamp of each point cloud frame and the image frame is smaller than a preset time threshold, and the image frame and each point cloud frame are in a coordinate system of a pre-built abstract point cloud space taking the vehicle as the center;
Taking the acquired image frames as target image frames, taking at least one acquired point cloud frame as a target point cloud frame, respectively extracting characteristics of the target image frames and the target point cloud frames, taking the characteristics of the target image frames as image coding characteristics, and taking the characteristics of the target point cloud frames as point cloud characteristics, wherein semantic information and depth prediction information of the target image frames are fused in the image coding characteristics, and the point cloud characteristics represent material information of a point cloud target and distance information between the point cloud target and a vehicle;
respectively carrying out voxelization treatment on the image coding feature and the point cloud feature in an abstract point cloud space to obtain an image voxel feature and a point cloud voxel feature;
Carrying out overlooking fusion processing on the image voxel characteristics and the point cloud voxel characteristics in an abstract point cloud space to obtain fusion overhead view characteristics;
based on the fused top view features, parking space information for the vehicle parking is determined.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the application also provides a readable storage medium, on which a computer program is stored, which when executed by a processor, implements the parking space detection method as described above.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
Finally, it is further noted that relational terms such as second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. The parking space detection method is characterized by comprising the following steps of:
acquiring an image frame and at least one point cloud frame after space-time synchronization, wherein different point cloud frames are acquired based on different types of radar sensors on a vehicle, the image frames are acquired based on cameras on the vehicle, the difference value of the acquisition time stamp of each image frame and each point cloud frame is smaller than a preset time threshold, and the image frame and each point cloud frame are in a coordinate system of an abstract point cloud space which is built in advance and takes the vehicle as the center;
Taking the obtained image frame as a target image frame, taking the obtained at least one point cloud frame as a target point cloud frame, respectively extracting characteristics of the target image frame and the target point cloud frame, taking the characteristics of the target image frame as image coding characteristics and taking the characteristics of the target point cloud frame as point cloud characteristics, wherein semantic information and depth prediction information of the target image frame are fused in the image coding characteristics, and the point cloud characteristics represent material information of a point cloud target and distance information of the point cloud target and the vehicle;
respectively voxelizing the image coding feature and the point cloud feature in the extracted point cloud space to obtain an image voxel feature and a point cloud voxel feature;
Performing overlooking fusion processing on the image voxel characteristics and the point cloud voxel characteristics in the extracted point cloud space to obtain fusion overhead view characteristics;
and determining parking space information for parking the vehicle based on the fused top view features.
2. The parking space detection method according to claim 1, wherein the acquiring an image frame and at least one point cloud frame after the spatio-temporal synchronization includes:
acquiring an image frame set and a point cloud frame set, wherein the image frame set comprises a plurality of image frames acquired by the camera, the point cloud frame set comprises a plurality of point cloud frames acquired by a plurality of types of radar sensors respectively, and each image frame in the image frame set and each point cloud frame in the point cloud frame set correspond to an acquisition time stamp;
determining an image frame and at least one point cloud frame with acquisition time stamps meeting the preset time threshold requirement from the image frame set and the point cloud frame set;
And constructing the abstract point cloud space centering on the vehicle, and converting the image frame and at least one point cloud frame meeting the preset time threshold requirement into a coordinate system of the abstract point cloud space to obtain the image frame and at least one point cloud frame after the time-space synchronization.
3. The parking space detection method according to claim 1, wherein extracting features of the target image frame includes:
scaling the target image frame to a preset size to obtain a scaled image frame;
Extracting semantic features of the scaled image frames, and up-sampling the semantic features to obtain feature information of target dimensions, wherein preset dimensions contained in the feature information of the target dimensions represent depth prediction information of the target image frames;
And determining the image coding characteristics of the target image frame according to the characteristic information of the target dimension.
4. The parking space detection method according to claim 1, wherein extracting the characteristics of the target point cloud frame comprises:
Screening out point cloud data which does not meet the preset space range corresponding to the abstract point cloud space in the target point cloud frame to obtain a target point Yun Zhen after screening the point cloud data;
And extracting the characteristics of the target point cloud frame after the point cloud data are screened.
5. The parking space detection method according to claim 1, wherein the abstract point cloud space is divided into a plurality of subspaces in advance, and the resolution of each subspace is the same;
voxel operation is carried out on the image coding feature in the extraction point cloud space, and the voxel operation comprises the following steps:
Determining a spatial coordinate tensor corresponding to the image coding feature;
And voxelizing the image coding feature and the space coordinate tensor to divide the image coding feature into the plurality of subspaces so as to obtain the image voxel feature.
6. The parking space detection method according to claim 5, wherein the determining the spatial coordinate tensor corresponding to the image coding feature includes:
Determining depth prediction information of pixel points in the target image frame;
Determining three-dimensional coordinate information of the pixel points in the target image frame according to the depth prediction information of the pixel points in the target image frame and the two-dimensional coordinate information of the pixel points in the target image frame;
and determining a space coordinate tensor corresponding to the image coding feature according to the three-dimensional coordinate information of the pixel point in the target image frame.
7. The parking space detection method according to claim 1, wherein the performing a top view fusion process on the image voxel feature and the point cloud voxel feature in the extracted point cloud space to obtain a fused top view feature includes:
Carrying out pooling treatment on voxel features respectively positioned in each subspace in the image voxel features to obtain aggregate image voxel features corresponding to each subspace, and obtaining a first top view feature by the aggregate image voxel features respectively corresponding to each subspace;
performing pooling treatment on voxel features respectively located in each subspace in the point cloud voxel features to obtain aggregate point cloud voxel features corresponding to each subspace, and obtaining a second top view feature by the aggregate point cloud voxel features respectively corresponding to each subspace;
And superposing and fusing the first top view feature and the second top view feature to obtain the fused top view feature.
8. The parking space detection method according to claim 1, wherein the determining parking space information for the vehicle to park based on the fused top view feature includes:
Performing planar feature extraction processing on the fused top view features to obtain a segmentation feature map;
carrying out regional clustering on the segmentation feature map to obtain initial parking space information;
carrying out rectangular processing on the initial parking space information to obtain rectangular parking space information;
converting the rectangular parking space information into a world coordinate system to obtain the parking space information in the world coordinate system;
and screening out the parking space information which does not meet the preset size threshold value in the parking space information in the world coordinate system to obtain the parking space information for parking the vehicle.
9. The parking space detection method according to any one of claims 1 to 8, wherein the target point cloud frame includes a target laser point cloud frame and a target millimeter wave point cloud frame, wherein the target laser point cloud frame is a point cloud frame acquired by a laser radar sensor, and the target millimeter wave point cloud frame is a point cloud frame acquired by a millimeter wave radar sensor.
10. The utility model provides a parking stall detection device which characterized in that includes:
The target frame acquisition module is used for acquiring an image frame and at least one point cloud frame after space-time synchronization, wherein different point cloud frames are acquired based on different types of radar sensors on a vehicle, the image frames are acquired based on cameras on the vehicle, the difference value of the acquisition time stamp of each image frame and each point cloud frame is smaller than a preset time threshold, and the image frame and each point cloud frame are in a coordinate system of a pre-constructed abstract point cloud space taking the vehicle as the center;
The feature extraction module is used for taking the acquired image frames as target image frames, taking the acquired at least one point cloud frame as a target point cloud frame, respectively extracting features of the target image frames and the target point cloud frame, taking the features of the target image frames as image coding features and taking the features of the target point cloud frames as point cloud features, wherein semantic information and depth prediction information of the target image frames are fused in the image coding features, and the point cloud features represent material information of a point cloud target and distance information of the point cloud target and the vehicle;
The voxelization module is used for voxelization processing is carried out on the image coding feature and the point cloud feature in the abstract point cloud space respectively to obtain an image voxel feature and a point cloud voxel feature;
the feature fusion module is used for carrying out overlooking fusion processing on the image voxel features and the point cloud voxel features in the extracted point cloud space to obtain fusion top view features;
and the parking space detection module is used for determining parking space information for parking the vehicle based on the fused top view characteristics.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211375789.XA CN117994755A (en) | 2022-11-04 | 2022-11-04 | Parking space detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211375789.XA CN117994755A (en) | 2022-11-04 | 2022-11-04 | Parking space detection method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117994755A true CN117994755A (en) | 2024-05-07 |
Family
ID=90889930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211375789.XA Pending CN117994755A (en) | 2022-11-04 | 2022-11-04 | Parking space detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117994755A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118522178A (en) * | 2024-07-22 | 2024-08-20 | 深圳云游四海信息科技有限公司 | Parking space state detection method, system, device and computer readable storage medium |
-
2022
- 2022-11-04 CN CN202211375789.XA patent/CN117994755A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118522178A (en) * | 2024-07-22 | 2024-08-20 | 深圳云游四海信息科技有限公司 | Parking space state detection method, system, device and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111583337B (en) | Omnibearing obstacle detection method based on multi-sensor fusion | |
CN111179345B (en) | Front vehicle line-crossing violation behavior automatic detection method and system based on vehicle-mounted machine vision | |
CN112288667B (en) | Three-dimensional target detection method based on fusion of laser radar and camera | |
CN112149550A (en) | Automatic driving vehicle 3D target detection method based on multi-sensor fusion | |
CN102982537B (en) | A kind of method and system detecting scene change | |
Perrollaz et al. | Probabilistic representation of the uncertainty of stereo-vision and application to obstacle detection | |
CN103206957B (en) | The lane detection and tracking method of vehicular autonomous navigation | |
CN114821507A (en) | Multi-sensor fusion vehicle-road cooperative sensing method for automatic driving | |
CN117274749B (en) | Fused 3D target detection method based on 4D millimeter wave radar and image | |
CN114694011A (en) | Fog penetrating target detection method and device based on multi-sensor fusion | |
CN117975436A (en) | Three-dimensional target detection method based on multi-mode fusion and deformable attention | |
CN113034586B (en) | Road inclination angle detection method and detection system | |
CN115019043A (en) | Image point cloud fusion three-dimensional target detection method based on cross attention mechanism | |
CN112130153A (en) | Method for realizing edge detection of unmanned vehicle based on millimeter wave radar and camera | |
Li et al. | Automatic parking slot detection based on around view monitor (AVM) systems | |
CN114445479A (en) | Equal-rectangular projection stereo matching two-stage depth estimation machine learning algorithm and spherical distortion layer | |
CN117994755A (en) | Parking space detection method and device | |
CN115588047A (en) | Three-dimensional target detection method based on scene coding | |
CN115457354A (en) | Fusion method, 3D target detection method, vehicle-mounted device and storage medium | |
CN117452396A (en) | 3D target detection system and method based on 4D millimeter wave radar and camera fusion | |
CN117671616A (en) | Target identification method, device, equipment and storage medium | |
CN116912786A (en) | Intelligent network-connected automobile multi-mode fusion detection method based on vehicle-road cooperation | |
CN113627569B (en) | Data fusion method and device for radar video all-in-one machine of traffic large scene and storage medium | |
CN115272450A (en) | Target positioning method based on panoramic segmentation | |
CN114783172B (en) | Parking lot empty space recognition method and system and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |