CN118244281A

CN118244281A - Vision and radar fusion target positioning method and device

Info

Publication number: CN118244281A
Application number: CN202410380680.8A
Authority: CN
Inventors: 樊国鹏; 夏可; 殷磊
Original assignee: Chengxin Zhilian Wuhan Technology Co ltd
Current assignee: Chengxin Zhilian Wuhan Technology Co ltd
Priority date: 2024-03-30
Filing date: 2024-03-30
Publication date: 2024-06-25

Abstract

The invention relates to a vision and radar fusion target positioning method, which comprises the following steps: s100: acquiring radar point cloud data and first visible light image data, and unifying the radar point cloud data and the first visible light image data in a mode to form initial data; s200: acquiring initial data, designing a depth complement network, and outputting a dense depth map; acquiring two-dimensional boundary frame information of a target object, combining the dense depth map with the two-dimensional boundary frame information, and extracting depth information of the target object on the dense depth map; s300: performing three-dimensional position calculation by using the extracted depth information, the internal parameters of the looking-around camera and the external parameters between the vehicle-mounted laser radar and the looking-around camera to obtain the coordinate position of the target object in the three-dimensional space; the technology of vision and radar data fusion is adopted, radar point cloud information is supplemented and enhanced, a denser depth map is generated, and then the effects of enhancing target identification and accurate positioning are achieved.

Description

Vision and radar fusion target positioning method and device

Technical Field

The invention relates to the technical field of road traffic, in particular to a vision and radar integrated target positioning method and device.

Background

The application prospect of the automatic driving technology is very wide. In urban traffic, the automatic driving automobile can relieve road congestion, reduce traffic accidents and improve the utilization rate of social resources. In the logistics field, the automatic driving technology can also improve the transportation efficiency and reduce the cost, and provides a more intelligent solution for the express industry, the logistics industry and the like.

The invention patent of China with the application number of CN202111113213.1 provides a method, a device, a terminal and a medium for positioning a target object, wherein the method comprises the following steps: acquiring target point cloud data acquired by a radar arranged in a set range corresponding to a target object in the moving process of the target object; acquiring target point cloud distribution information in a target range corresponding to the current position of the target object based on the target point cloud data; determining candidate point cloud distribution information matched with the target point cloud distribution information from the point cloud distribution information included in the pre-established map based on the target point cloud distribution information; and the candidate point cloud distribution information with the minimum function value of the target error function of the target point cloud data is based on the corresponding point cloud data in the candidate point cloud distribution information after rotation and/or translation.

However, the radar point cloud data is used for positioning, as the distance increases, laser beams emitted by the radar spread in space, so that the quantity of returned effective point clouds is reduced, and the radar system can not accurately identify information such as the shape, the size and the position of a target due to sparse point cloud data, so that the target is difficult to identify or is erroneously identified.

Disclosure of Invention

The invention provides a visual and radar integrated target positioning method and device aiming at the technical problems in the prior art.

The technical scheme for solving the technical problems is as follows: a vision and radar fused target positioning method, the method comprising:

based on the vehicle-mounted laser radar and the vehicle-mounted camera set, S100: acquiring radar point cloud data and first visible light image data, and unifying the radar point cloud data and the first visible light image data in a mode to form initial data;

S200: acquiring initial data, designing a depth complement network, and outputting a dense depth map; acquiring two-dimensional boundary frame information of a target object, combining the dense depth map with the two-dimensional boundary frame information, and extracting depth information of the target object on the dense depth map;

S300: performing three-dimensional position calculation by using the extracted depth information, the internal parameters of the looking-around camera and the external parameters between the vehicle-mounted laser radar and the looking-around camera to obtain the coordinate position of the target object in the three-dimensional space;

the method comprises the steps of converting two-dimensional image coordinates into normalized coordinates by utilizing an internal parameter of a looking-around camera, converting the normalized coordinates into three-dimensional world coordinates by combining an external parameter between a vehicle-mounted laser radar and the looking-around camera, and providing dimension information in the conversion process by depth information.

Further, the method further comprises the following steps: s400: acquiring the coordinate position of the target object in the three-dimensional space, matching the coordinate position with a pre-manufactured high-precision map, and determining the final position of the target object on the map; the method comprises the steps of converting the coordinate position of a target object into a global coordinate system identical to a high-precision map, acquiring motion information by using a sensor, wherein the motion information comprises speed and direction information of the current target object, simultaneously matching the converted coordinate and motion information with a plurality of road networks in the high-precision map by using a map matching algorithm to obtain matching conformity, and further determining the final position of the target object on the map.

Further, if the matching conformity is only the unique value of which is more than 96%, the road network position is the final position; if the matching coincidence degree is more than 96%, marking as a suspected position is carried out.

Further, acquiring second visible light image data and radar point cloud data, extracting key features such as edges, corner points and textures of the second visible light image data, distinguishing the key features from the first visible light image data, and performing modal unification with the radar point cloud data at the corresponding position to form retest data; and matching the suspected positions by using the recheck data through a map matching algorithm to form new matching degree, wherein the final position is the highest matching degree.

Further, second visible light image data are acquired and divided into front image data and rear image data, preprocessing and stabilizing processing are respectively carried out on the front image data and the rear image data, key feature points are extracted from the processed front image data and rear image data, feature points in the front image data and the rear image data are matched through a feature matching algorithm, a corresponding relation between images is established, and third visible light image data is formed.

Further, analyzing the characteristic point motion between the continuous frames of the front image data and the rear image data to estimate the motion state of the target object, fusing the information of the continuous multi-frame images, improving the accuracy and the robustness of state estimation, tracking the characteristic points and obtaining the motion trail between the continuous frames; and estimating state vectors such as speed, acceleration and direction change of the target object by using the acquired motion track, and predicting the state of the target object at the next moment by using a Kalman filtering algorithm according to the state vectors so as to realize more accurate positioning.

Further, the kalman filtering algorithm step includes:

S501: initializing parameters, an initial state vector and a covariance matrix; the state vector contains an estimated value of the system state and a current speed value parameter; the covariance matrix represents the uncertainty of the state vector, the size of the uncertainty is the same as the dimension of the state vector, and diagonal elements of the covariance matrix represent the variances of all state variables;

S502: predicting a state at a next time using a dynamic model of the system, applying the state and control input at a previous time to the model to obtain a predicted state; at the same time as predicting the state, the covariance matrix needs to be updated to reflect the increase in uncertainty due to process noise.

Further, the steps further include: s503: calculating a measurement residual error and a Kalman gain by using the features extracted from the radar point cloud data and the third visible light image data and combining the predicted value of the state vector; the measurement residual is the difference between the actual measurement value and the predicted value; the Kalman gain is used as a weight factor for balancing the predicted state and the current measured value; the Kalman gain is calculated according to the covariance matrix and the measurement noise;

S504: updating the state vector by using the Kalman gain and the measurement residual error, and fusing the predicted and measured information to obtain the optimal state estimation at the current moment; updating the covariance matrix to reflect changes in the uncertainty of the state estimate due to measurement noise and kalman gain, continuously fusing the radar point cloud data and the third visible light image data, and updating the state estimate of the vehicle.

A vision and radar fused target positioning device, comprising:

The acquisition unit is used for acquiring visible light image data and radar point cloud data and forming initial data, wherein the looking-around camera acquires first visible light image data, the front camera and the rear camera acquire second visible light image data, and the laser radar acquires radar point cloud data;

the processing unit is used for designing a depth complement network to process the initial data, generating a dense depth map, acquiring two-dimensional boundary box information of a target object, extracting the depth information, fixedly storing the depth information, directly reading the information for use in the subsequent large target positioning process, and not repeatedly calculating;

The prediction unit is used for predicting the state of the target object at the next moment, wherein a Kalman filtering algorithm is used for data fusion, and the target object in the high-speed motion state is predicted;

And the positioning unit is used for generating the final position of the target object on the map, wherein the coordinate position in the three-dimensional space is acquired by using three-dimensional position calculation, the matching conformity is obtained by using a map matching algorithm, and the final position of the target object on the map is determined.

A computer readable storage medium having stored therein a computer software program which when executed by a processor implements the vision-radar fusion target positioning method described above.

The beneficial effects of the invention are as follows: by adopting a vision and radar data fusion technology, the acquired image data and radar point cloud data are fused, radar point cloud information is supplemented and enhanced, a depth complement network is designed, sparse point cloud data are interpolated through an algorithm, a denser depth map is generated, and then the effects of enhancing target identification and accurate positioning are achieved.

Drawings

FIG. 1 is a flow chart of a method for locating a target by combining vision and radar according to the present invention;

FIG. 2 is a flowchart of an algorithm of a vision and radar fusion target positioning method according to the present invention;

FIG. 3 is a diagram of a visual and radar integrated target positioning device according to the present invention.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the description of the present application, the term "for example" is used to mean "serving as an example, instance, or illustration. Any embodiment described as "for example" in this disclosure is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for purposes of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes have not been described in detail so as not to obscure the description of the application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

Example 1

As shown in fig. 1, the method for positioning a target by combining vision and radar according to the application comprises the following steps: based on the vehicle-mounted laser radar and the vehicle-mounted camera set, S100: and acquiring radar point cloud data and first visible light image data, and performing modal unification on the radar point cloud data and the first visible light image data to form initial data.

The vehicle-mounted camera unit comprises a looking-around camera, 360-degree environment perception of a target object is achieved, a scanning area is smaller than that of the laser radar, and the looking-around camera is used for collecting first visible light image data.

Specifically, the mode unifies time synchronization, space calibration and data preprocessing of radar point cloud data and first visible light image data; the time synchronization uses a unified clock source to synchronize the sensors, and meanwhile, the closest time point is found out to approximate synchronous data by analyzing the time stamp of the sensor data stream, so that the data captured by different sensors are ensured to be obtained at the same or similar time points; the space calibration is carried out by selecting a unified reference coordinate system, using clear visible features as calibration points, calculating and optimizing a rotation matrix R and a translation vector T between radar point cloud data and first visible light image data, and aligning the radar point cloud data and the first visible light image data to the reference coordinate system by using the optimized rotation matrix R and the translation vector T with the calibration points as centers; and the data preprocessing carries out denoising and filtering processing on the radar point cloud data, and carries out brightness, contrast enhancement and distortion correction on the first visible light image data.

The method comprises the steps of obtaining initial data, designing a depth completion network, collecting and labeling a group of initial data as labeling data, generating a sparse depth map by using the labeling data as input of the depth completion network, using the rest initial data as a real label during training, and outputting a dense depth map.

The sparse depth map is generated by projecting radar point clouds onto an image plane, the dense depth map is consistent with the size of an input image, and each pixel point has a corresponding depth value which represents the distance between a certain point in a scene and a camera.

And identifying and positioning a target object by applying a two-dimensional target detection algorithm to the visible light image in the initial data, acquiring two-dimensional boundary box information of the target object, combining the dense depth map processed by the depth completion network with the two-dimensional boundary box information, and extracting depth information of the target object on the dense depth map.

Specifically, the two-dimensional target detection algorithm identifies an object existing in the image by extracting, classifying and positioning the image, and marks the target object by using a rectangular frame, wherein the two-dimensional boundary frame is used for representing the position and the size of the target object in the image.

In step S300, the position of the target object in the three-dimensional space is calculated through the projection geometry relation, the perspective projection method is adopted to perform three-dimensional position calculation, the internal parameters of the looking-around camera are utilized to convert the two-dimensional image coordinates into normalized coordinates, the normalized coordinates are converted into three-dimensional world coordinates by combining the external parameters between the vehicle-mounted laser radar and the looking-around camera, and the depth information provides dimension information in the conversion process.

S400: acquiring the coordinate position of the target object in the three-dimensional space, matching the coordinate position with a pre-manufactured high-precision map, and determining the final position of the target object on the map; the method comprises the steps of converting the coordinate position of a target object into a global coordinate system identical to a high-precision map, acquiring motion information by using a sensor, wherein the motion information comprises speed and direction information of the current target object, simultaneously matching the converted coordinate and motion information with a plurality of road networks in the high-precision map by using a map matching algorithm to obtain matching conformity, and further determining the final position of the target object on the map.

The map matching algorithm is to receive the converted coordinates and motion information, pass through the positions on the map through the scanned point cloud, calculate the error and distance between the scanned point and the corresponding point on the high-precision map at each position, then sum the squares of the errors, find out candidate road sections similar to the target object according to the square sum of the errors, calculate the distance, angle difference and track information between the roads through the coordinate data, obtain an error value range, obtain matching conformity according to the error value range, the smaller the error value range is, and the higher the matching conformity is.

The technical scheme provided by the embodiment of the application at least has the following technical effects or advantages:

According to the method, a technology of vision and radar data fusion is adopted, the acquired image data and radar point cloud data are fused, radar point cloud information is supplemented and enhanced, a depth complement network is designed, sparse point cloud data are interpolated through an algorithm, a denser depth map is generated, and then the effects of enhancing target identification and accurate positioning are achieved.

Example 2

In embodiment 1, precise positioning is achieved using a technique of combining vision and radar, but when identifying the surrounding area of the target object, there are a large number of road areas with high similarity, resulting in positioning errors.

If the matching conformity is only the unique value of which is more than 96%, the road network position is the final position; if the matching coincidence degree is more than 96%, marking as a suspected position is carried out.

The vehicle-mounted camera unit further comprises a front camera and a rear camera, the front camera and the rear camera respectively conduct long-distance scanning on the front and the rear of the target object, the scanning areas are distributed in a sector shape by taking the target object as the center, the scanning ranges of the front camera and the rear camera are larger than the scanning range of the looking-around camera, the scanning areas exceeding the scanning range of the looking-around camera are difference areas, and second visible light image data are acquired for the difference areas.

Acquiring second visible light image data and radar point cloud data, extracting key features such as edges, corner points and textures of the second visible light image data, distinguishing the key features from the first visible light image data, and performing modal unification with the radar point cloud data at the corresponding position to form reinspection data; and matching the suspected positions by using the recheck data through a map matching algorithm to form new matching degree, wherein the final position is the highest matching degree.

According to the application, the front camera and the rear camera are adopted to carry out long-distance scanning to obtain second visible light image data, the front and rear expanded scanning area of the target object is subjected to secondary identification, and the suspected position is independently identified through the map matching algorithm, so that the effect of accurately positioning the highly similar road area is realized.

Example 3

In the above embodiment, the accurate positioning of the road area with high similarity is realized by performing the secondary recognition on the front and rear areas, but when the target object moves at high speed, the position of the first laser beam and the position of the last laser beam (after the radar scans one circle) have been subjected to relative displacement, the distance of the relative displacement is determined according to the vehicle speed, the point clouds of the target object at two different moments are different in the information displayed in the coordinate system, so that the positioning hysteresis is caused, and the positioning error is caused.

And acquiring and dividing second visible light image data, dividing the second visible light image data into front image data and rear image data, respectively preprocessing and stabilizing the front image data and the rear image data, extracting key feature points from the processed front image data and rear image data, matching the feature points in the front image data and the rear image data by using a feature matching algorithm, establishing a corresponding relation between images, and forming third visible light image data.

Analyzing the characteristic point motion between the front image data and the rear view image data continuous frames to estimate the motion state of the target object, fusing the information of continuous multi-frame images, improving the accuracy and the robustness of state estimation, tracking the characteristic points and obtaining the motion trail between the continuous frames; and estimating state vectors such as speed, acceleration and direction change of the target object by using the acquired motion track, and predicting the state of the target object at the next moment by using a Kalman filtering algorithm according to the state vectors so as to realize more accurate positioning.

The front camera collects front image data, the rear camera collects rear view image data, and preprocessing is carried out on the collected front image and rear view image, wherein the preprocessing comprises denoising, distortion correction, brightness and contrast adjustment and is used for improving image quality; for a high-speed dynamic environment, stabilizing treatment is carried out through image registration, so that the influence of rapid motion on images is reduced; the motion state of the target object includes velocity, acceleration, angular velocity, and angular acceleration.

The front image and the rear image acquired at the same moment are divided into a group, a plurality of groups of front images and rear images are acquired, the data features and the state vectors of the rear images are extracted and used as the input of a Kalman filtering algorithm, the front image data features and the state vectors at the same moment are used for verifying the state predicted by the Kalman filtering algorithm, and the adaptability of the Kalman filtering algorithm to the image data in the current scene is improved.

As shown in fig. 2, the kalman filtering algorithm includes:

in particular, covariance matrices are a tool to describe the correlation between the individual components of a multidimensional random variable. In kalman filtering, it is used to represent the uncertainty of the state vector and the correlation between the individual state variables. By constantly updating the covariance matrix, the Kalman filtering algorithm can dynamically adjust the uncertainty of the state estimation, thereby achieving more accurate state estimation. At initialization, the covariance matrix is set according to the initial uncertainty on the system state. The specific values depend on a priori knowledge of the state of the system, setting different variance values for the uncertainty of the initial vector parameters. For example, if the uncertainty for the initial position is large and the uncertainty for the velocity is small, a large position variance and a small velocity variance may be provided.

Specifically, the dynamic model uses a motion equation of the vehicle to describe the dynamic behavior of the vehicle, and is built according to the physical characteristics and kinematic constraints of the vehicle, so as to predict the vehicle state at the next moment. Generating a state transition matrix and a control input matrix according to an initial state vector of the target object, wherein the state transition matrix represents a process of how a state evolves with time under the condition of no control input; the control input matrix describes how the control input affects the change of state.

The method comprises the steps of modeling process noise into zero-mean Gaussian white noise, wherein the process noise represents uncertainty and external interference of a model, a covariance matrix is used for describing uncertainty of the noise, selection of the covariance matrix is adjusted according to actual application scenes and experience, and the covariance matrix is set to be a diagonal matrix, wherein diagonal elements represent variances of process noise of all state variables.

S503: calculating a measurement residual error and a Kalman gain by using the features extracted from the radar point cloud data and the third visible light image data and combining the predicted value of the state vector; the measurement residual is the difference between the actual measurement value and the predicted value; the Kalman gain is used as a weight factor for balancing the predicted state and the current measured value; the Kalman gain is calculated from the covariance matrix and the measurement noise.

Specifically, the third visible light image data and the radar point cloud data are subjected to data fusion by using a Kalman filtering algorithm, the state of the target object at the next moment is predicted, and the map matching algorithm of the embodiment is used for matching according to the predicted data state information at the next moment, so that the final position of the target object on the map is obtained.

Specifically, a plurality of groups of front-end images and rear-end images acquired at the same moment are used as comparison groups to be input in batches, the front-end image data characteristics and the state vectors at the same moment are used for verifying the state predicted by a Kalman filtering algorithm, the real-time state vectors of a target object are continuously updated, the performance of the Kalman filtering algorithm under the scene is improved, the subsequent front-end images and rear-end images are subjected to characteristic fusion to form third visible light image data, the correlation between the front-end image characteristic states and the rear-end image characteristic states can be displayed to the greatest extent, and the positioning accuracy is further improved.

According to the application, the Kalman filtering algorithm is adopted to conduct data fusion, the state of the target object at the next moment is predicted, effective data information can be obtained under the high-speed motion state, positioning is conducted, and the adaptability of the Kalman filtering algorithm under the scene is continuously improved by using a plurality of groups of front images and rear images obtained at the same moment as a comparison group for batch input, so that the effect of accurately positioning the predicted motion state is realized.

Example 4

Based on the above embodiment, as shown in fig. 3, in an embodiment of the present application, there is further provided a target positioning device for combining vision and radar, including: the acquisition unit is used for acquiring visible light image data and radar point cloud data and forming initial data, wherein the looking-around camera acquires first visible light image data, the front camera and the rear camera acquire second visible light image data, and the laser radar acquires radar point cloud data;

The embodiment of the application also provides a computer readable storage medium, wherein the storage medium stores a computer software program, and the computer software program realizes the vision and radar fusion target positioning method when being executed by a processor.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for locating a target by combining vision with radar, the method comprising: based on the vehicle-mounted laser radar and the vehicle-mounted camera set, S100: acquiring radar point cloud data and first visible light image data, and unifying the radar point cloud data and the first visible light image data in a mode to form initial data;

2. A vision-radar-fused target positioning method as defined in claim 1, further comprising: s400: acquiring the coordinate position of the target object in the three-dimensional space, matching the coordinate position with a pre-manufactured high-precision map, and determining the final position of the target object on the map; the method comprises the steps of converting the coordinate position of a target object into a global coordinate system identical to a high-precision map, acquiring motion information by using a sensor, wherein the motion information comprises speed and direction information of the current target object, simultaneously matching the converted coordinate and motion information with a plurality of road networks in the high-precision map by using a map matching algorithm to obtain matching conformity, and further determining the final position of the target object on the map.

3. The vision and radar integrated target positioning method according to claim 2, wherein the road network position is the final position if only the unique value of the matching conformity is above 96%; if the matching coincidence degree is more than 96%, marking as a suspected position is carried out.

4. The method for positioning a target by combining vision and radar according to claim 3, wherein the method is characterized by acquiring second visible light image data and radar point cloud data, extracting key features of edges, corner points and textures of the second visible light image data, distinguishing the key features from the first visible light image data, and unifying the key features with the radar point cloud data at a corresponding position to form reinspection data; and matching the suspected positions by using the recheck data through a map matching algorithm to form new matching degree, wherein the final position is the highest matching degree.

5. The method for locating a target by combining vision and radar according to claim 4, wherein the second visible light image data is acquired and divided into front image data and rear image data, the front image data and the rear image data are preprocessed and stabilized respectively, key feature points are extracted from the processed front image data and rear image data, feature points in the front image data and rear image data are matched by using a feature matching algorithm, a corresponding relation between images is established, and third visible light image data is formed.

6. The method for positioning a target by combining vision and radar according to claim 5, wherein the method is characterized in that the motion state of a target object is estimated by analyzing the motion of characteristic points between continuous frames of front image data and rear image data, the information of continuous multi-frame images is combined, the accuracy and the robustness of state estimation are improved, the characteristic points are tracked, and the motion trail between the continuous frames is obtained; and estimating state vectors such as speed, acceleration and direction change of the target object by using the acquired motion track, and predicting the state of the target object at the next moment by using a Kalman filtering algorithm according to the state vectors so as to realize more accurate positioning.

7. The method for locating a target by combining vision and radar according to claim 6, wherein the kalman filtering algorithm step comprises:

8. The vision and radar fusion target positioning method of claim 7, wherein the steps further comprise: s503: calculating a measurement residual error and a Kalman gain by using the features extracted from the radar point cloud data and the third visible light image data and combining the predicted value of the state vector; the measurement residual is the difference between the actual measurement value and the predicted value; the Kalman gain is used as a weight factor for balancing the predicted state and the current measured value; the Kalman gain is calculated according to the covariance matrix and the measurement noise;

9. A vision and radar fusion target positioning device, comprising:

10. A computer readable storage medium, characterized in that the storage medium has stored therein a computer software program which, when executed by a processor, implements a vision-radar-converged target positioning method according to any one of claims 1 to 8.