CN115082712B

CN115082712B - Target detection method and device based on radar-vision fusion and readable storage medium

Info

Publication number: CN115082712B
Application number: CN202211014407.0A
Authority: CN
Inventors: 陈向阳; 李冬冬; 李乾坤; 吴函; 向超; 王刚; 江哲; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-11-22
Anticipated expiration: 2042-08-23
Also published as: CN115082712A

Abstract

The application discloses a target detection method, a target detection device and a readable storage medium based on radar vision fusion, wherein the method comprises the following steps: acquiring a current radar frame and a current video frame acquired at the current moment, and determining a current target matching result of the current radar frame and the current video frame, wherein the current target matching result comprises a matching relation between a radar target in the current radar frame and a visual target in the current video frame; obtaining historical target matching results of at least one historical radar frame and at least one historical video frame collected at a historical moment before the current moment, wherein the historical target matching results comprise matching relations between radar targets in the historical radar frames and visual targets in the historical video frames; and updating the current target matching result based on each historical target matching result to obtain the updated current target matching result. Through the mode, the matching accuracy and stability can be improved.

Description

Target detection method and device based on radar-vision fusion and readable storage medium

Technical Field

The application relates to the technical field of target detection, in particular to a target detection method and device based on radar vision fusion and a readable storage medium.

Background

Monitoring technology based on radar and video all-in-one machine is gradually emphasized in the field of security protection, the radar can obtain measurement information of a moving target, namely spatial position and motion information, with high detection probability, but the radar cannot obtain high target identification accuracy; the video/image equipment can obtain the identification information of the target with high accuracy, but the motion information and the spatial position information of the target are not easy to obtain; if the radar information and the video information are effectively fused, higher target identification accuracy, motion information and spatial position information can be obtained, but the accuracy of the fusion scheme adopted in the related technology is poor.

Disclosure of Invention

The application provides a target detection method and device based on radar vision fusion and a readable storage medium, which can improve the accuracy and stability of matching.

In order to solve the technical problem, the technical scheme adopted by the application is as follows: a target detection method based on Ravigilance fusion is provided, and the method comprises the following steps: acquiring a current radar frame and a current video frame acquired at the current moment, and determining a current target matching result of the current radar frame and the current video frame, wherein the current target matching result comprises a matching relation between a radar target in the current radar frame and a visual target in the current video frame; obtaining historical target matching results of at least one historical radar frame and at least one historical video frame collected at a historical moment before the current moment, wherein the historical target matching results comprise matching relations between radar targets in the historical radar frames and visual targets in the historical video frames; and updating the current target matching result based on each historical target matching result to obtain the updated current target matching result.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an object detection apparatus comprising a memory and a processor connected to each other, wherein the memory is used for storing a computer program, and the computer program is used for implementing the object detection method based on the radar fusion in the above technical solution when being executed by the processor.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an object detection apparatus including: the matching module is used for acquiring a current radar frame and a current video frame acquired at the current moment and determining a current target matching result of the current radar frame and the current video frame, wherein the current target matching result comprises a matching relation between a radar target in the current radar frame and a visual target in the current video frame; the acquisition module is used for acquiring historical target matching results of at least one historical radar frame and at least one historical video frame which are acquired at a historical moment before the current moment, wherein the historical target matching results comprise matching relations between radar targets in the historical radar frames and visual targets in the historical video frames; the processing module is connected with the matching module and the obtaining module and used for updating the current target matching result based on each historical target matching result to obtain the updated current target matching result.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a computer-readable storage medium for storing a computer program, which, when being executed by a processor, is used for implementing the method for detecting an object based on a radar fusion in the above technical solution.

Through the scheme, the beneficial effects of the application are that: firstly, acquiring a current radar frame and a current video frame which are acquired at the current moment; performing target matching processing on a current radar frame and a current video frame to obtain a current target matching result, and storing the current target matching result, wherein the current target matching result comprises a matching relation between a radar target in the current radar frame and a visual target in the current video frame; obtaining a historical target matching result of at least one historical moment before the current moment, wherein the historical target matching result comprises a matching relation between a radar target in a historical radar frame and a visual target in a historical video frame; updating the current target matching result based on each historical target matching result to obtain an updated current target matching result; the target matching result of the current moment is corrected by using the target matching result of the historical moment, so that the target matching results of the historical moment and the current moment are fused, the updated target matching results have enough correlation in time sequence, the fusion error caused by inaccurate matching of a single moment is avoided, the effect of radar-vision fusion is improved, and the accurate and stable target matching result can be obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. Wherein:

fig. 1 is a schematic flowchart of an embodiment of a target detection method based on a radar fusion provided in the present application;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a method for detecting an object based on a Ravignette fusion provided in the present application;

FIG. 3 is a schematic flowchart of a radar fusion-based target detection method according to another embodiment of the present disclosure;

FIG. 4 is a schematic view of a sliding window provided herein;

FIG. 5 is a schematic flow chart of S35 in the embodiment shown in FIG. 3;

FIG. 6 is a schematic diagram of an embodiment of an object detection device provided in the present application;

FIG. 7 is a schematic structural diagram of another embodiment of an object detection apparatus provided herein;

FIG. 8 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be noted that the following examples are only illustrative of the present application, and do not limit the scope of the present application. Likewise, the following examples are only some examples and not all examples of the present application, and all other examples obtained by a person of ordinary skill in the art without any inventive work are within the scope of the present application.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

It should be noted that the terms "first", "second" and "third" in the present application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of indicated technical features. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Referring to fig. 1, fig. 1 is a schematic flowchart of an embodiment of a target detection method based on a radar fusion according to the present application, where an execution subject of the embodiment is a target detection device, and the target detection device is a device having a radar detection function and a video recording function, such as: a robot, the method comprising:

s11: and acquiring a current radar frame and a current video frame acquired at the current moment, and determining a current target matching result of the current radar frame and the current video frame.

The object detection device includes an image sensor and a radar sensor. The traditional security equipment is mainly a visible light camera, but the visible light camera cannot work at night; although the infrared camera can make up for the defects of the visible light camera, the cost and the operation difficulty are increased; furthermore, the optical sensors in the infrared camera are susceptible to weather influences, such as: and the monitoring effect is poor in fog days or rainy and snowy days. The millimeter wave Radar actively transmits electromagnetic waves and receives signals with the same frequency, has very high detection probability for moving objects or objects with larger Radar reflection areas (RCS), and has lower detection probability for static objects, and the detection probability is not zero; the millimeter wave radar can work all day long and is less influenced by weather.

Based on the above reasons, the radar sensor in this embodiment adopts a millimeter wave radar, which can measure the distance, angle, radial Speed (Radial Speed) and RCS between a target and a target detection device in a target scene, and can set a measurement period (i.e., a period for receiving and transmitting a signal) of the millimeter wave radar according to specific application needs, where the measurement period is generally set to 0.1 second, i.e., the working frequency is 10HZ; the image sensor may obtain appearance detail information of objects in the object scene.

It is to be noted that, for the convenience of description, a target in a target scene detected by the millimeter wave radar is referred to as a radar target, and a target in a target scene captured by the image sensor is referred to as a visual target.

Acquiring image data of the current moment by adopting an image sensor to obtain a current video frame; acquiring radar data of the current moment by adopting a radar sensor to obtain a current radar frame; and matching the current radar frame with the current video frame to obtain a matching result (recorded as a current target matching result) of the radar target in the current radar frame and the visual target in the current video frame, wherein the current target matching result comprises a matching relation between the radar target in the current radar frame and the visual target in the current video frame. Specifically, the coordinates of the radar target in the radar coordinate system can be identified from the current radar frame, so as to obtain a first position coordinate; identifying coordinates of the visual target in a visual coordinate system from the current video frame to obtain second position coordinates; converting the second position coordinate into a radar coordinate system to obtain a third position coordinate; and matching the third position coordinate with the first position coordinate to generate a current target matching result, and storing the current target matching result.

In an embodiment, the current radar frame may be processed by using a radar target detection method, to detect the radar target and the position coordinate where the radar target is located in the current radar frame, where the radar target detection method may be a kalman filtering method or other conventional radar target detection algorithms, which is not limited herein. The neural network model for realizing target detection in the technical field of image processing can be utilized to process the current video frame so as to obtain the visual target and the position coordinate where the visual target is located in the current video frame, wherein the neural network model for realizing target detection can be selected according to actual conditions, for example: a Regional Convolutional Neural Network (RCNN), a Fast regional Convolutional neural Network (Fast RCNN), a Spatial Pyramid Pooling Network (SPP-Net), a regional full Convolutional neural Network (R-FCN), a YOLO (Young Look one) or an SSD (Single Shot Multi Box Detector), etc., without limitation.

Further, the conversion of the position coordinates between the radar coordinate system and the vision coordinate system may be achieved using conventional coordinate conversion methods, such as: four pixel points which are distributed in a rectangular shape can be selected on a current video frame and are marked as A, B, C and D, visual coordinates of the four pixel points A, B, C and D on a visual coordinate system and radar coordinates corresponding to the pixel points in a radar coordinate system are obtained, a homography matrix between the radar coordinate system and the visual coordinate system is calculated based on the visual coordinates and the radar coordinates by using a four-point method, and therefore conversion of position coordinates between the radar coordinate system and the visual coordinate system can be achieved by using the homography matrix; the calculation of the homography matrix by the four-point method and the conversion of the coordinates into coordinates by the homography matrix are conventional operations in the coordinate conversion method, which are not described in detail and limited herein.

In one embodiment, for each visual target at the current moment, calculating the distance between the visual target and each radar target in the current radar frame based on the third position coordinate of the visual target and the first position coordinate of the radar target in the current radar frame to obtain a plurality of distances; based on the plurality of distances, a current target matching result is generated. Specifically, substituting the third position coordinate of the visual target and the first position coordinate of the radar target into a distance calculation formula to obtain the distance between the third position coordinate and the first position coordinate; calculating the minimum value of the plurality of distances, and determining that the visual target corresponding to the minimum value is matched with the radar target corresponding to the minimum value; and establishing a matching relation between the radar target corresponding to the minimum value and the visual target corresponding to the minimum value.

In other embodiments, for each radar target at the current time, calculating a distance between the radar target and each visual target in the current video frame based on the first position coordinates of the radar target and the third position coordinates of the visual target in the current video frame, so as to obtain a plurality of distances; based on the plurality of distances, a current target matching result is generated.

S12: and obtaining a historical target matching result of at least one historical radar frame and at least one historical video frame which are collected at a historical moment before the current moment.

Since the target matching result generated at each acquisition time is stored, the historical target matching result can be directly read from the storage device, and the historical target matching result comprises the matching relationship between the radar target in the historical radar frame and the visual target in the historical video frame. It can be understood that the number of the historical time may be set according to the specific application requirement, and this embodiment is not limited herein, for example: the number of the historical moments is one or more than one.

It can be understood that the current time and the historical time are relative, and if the current time is the 5 th time, the 1 st to 4 th times are the historical times corresponding to the current time; if the current time is 7s, the 1 st to 4 th time and the 5 th time are historical times.

S13: and updating the current target matching result based on each historical target matching result to obtain the updated current target matching result.

Counting the matching times of the visual target and each radar target aiming at each visual target at the current moment and the historical moment; and determining the updated matching relationship based on the matching times. Specifically, the updated current target matching result includes the updated matching relationship between the radar target in the current radar frame and the visual target in the current video frame, and the maximum value of all matching times corresponding to the visual target can be calculated; and establishing a matching relation between the radar target corresponding to the maximum value and the visual target corresponding to the maximum value to obtain an updated matching relation.

Furthermore, the matching times are the same as the times of radar targets matched with the visual targets, each radar target matched with the visual targets corresponds to one matching time, a set formed by collecting the current time and at least one historical time before the current time is recorded as a current collecting time period, and the number of the radar targets matched with each visual target in the current collecting time period may be one or more than one; for example, the visual target is marked S, the radar targets in the current radar frame are marked L1-L3, and the historical time T ₃₁ Visual target S is matched with radar target L3 at historical time T ₃₂ The visual target S is matched with the radar target L1; at the historical time T ₃₃ The visual target S is matched with the radar target L1; at the current time T ₃₄ And if the visual target S is matched with the radar target L2, the updated matching relation is that the visual target S is matched with the radar target L1.

The radar-vision fusion method provided by the embodiment uses the target matching result at the historical moment, and the target matching result at the historical moment and the target matching result at the current moment are fused, so that the final target matching result at the current moment has enough correlation in time sequence, the influence of abnormal detection at a single acquisition moment on fusion is avoided, and the matching accuracy and stability are improved; in addition, the scheme provided by the embodiment has a wide application range, such as: parks, construction sites, crossroads, roads, park entrances and exits or gates and the like.

Referring to fig. 2, fig. 2 is a schematic flowchart of another embodiment of a method for detecting a target based on a radar fusion, the method including:

s21: and collecting radar visual information of the target scene, and processing the radar visual information to obtain detection information.

The radar visual information comprises first radar information and first visual information, and an image sensor is adopted to shoot a target scene to obtain the first radar information; detecting a target scene by using a radar sensor to obtain first visual information; and then, processing the first radar information and the first visual information to obtain detection information.

In a specific embodiment, the target detection apparatus further includes a computer processor, and since information delay in the acquisition process has a significant influence on the labeling result, the number of intermediate nodes between the device side (i.e. the image sensor and the radar sensor) and the computer processor needs to be as small as possible, so as to improve the information transmission efficiency and reduce the time delay; the computer processor may be a System on a Chip (SOC), among others.

S22: and processing the detection information to obtain a plurality of radar vision detection information sets.

After the detection information is obtained, splitting the detection information to generate a plurality of radar vision detection information sets; specifically, an intersection exists between the acquisition time periods corresponding to two adjacent radar vision detection information sets, each radar vision detection information set comprises a radar detection information set and a visual detection information set corresponding to the radar detection information set, each radar detection information set comprises radar detection information of a radar target in the acquisition time period, each visual detection information set comprises visual detection information of the visual target in the acquisition time period, each acquisition time period comprises a history moment and a current moment, the history moment can be adjacent to the current moment, and the number of the history moments is at least 1.

It is understood that the lengths of the radar detection information set and the visual detection information set may be set according to application requirements. For example, taking a radar detection information set as an example for explanation, assume that the acquisition time is denoted as T ₁ ～T ₈ The length of the radar detection information set and the length of the visual detection information set are set to be 4, namely the radar detection information set comprises radar detection information of 4 acquisition time periods, and the first radar detection information set comprises an acquisition time T ₁ ～T ₄ Of a second radar detection information set packetAcquisition time T ₂ ～T ₅ The third radar detection information set comprises a collection time T ₃ ～T ₆ The fourth radar detection information set comprises a collection time T ₄ ～T ₇ The fifth radar detection information set comprises the acquisition time T ₅ ～T ₈ The radar detection information of (1); alternatively, the first radar detection information set comprises the acquisition time T ₁ ～T ₄ The second set of radar detection information includes a time of acquisition T ₃ ～T ₆ The third radar detection information set comprises a collection time T ₅ ～T ₈ The radar detection information of (1).

S23: and taking out a current radar vision detection information set from the plurality of radar vision detection information sets, matching all radar detection information in the current radar vision detection information set with corresponding visual detection information in the current radar vision detection information set, and generating a current target matching result.

After the current radar detection information set is obtained, matching the radar detection information set in the current radar detection information set with the visual detection information set to obtain a current target matching result; and then, returning to the step of taking out the current radar vision detection information set from the plurality of radar vision detection information sets until all the radar vision detection information sets are traversed, and finally obtaining a target matching result corresponding to each radar vision detection information set. Specifically, the target matching result is obtained by processing detection information (including radar detection information and visual detection information) of the current time and the historical time.

For example, assume that the current time is denoted as T _m History time is marked as T _m-p ～T _m-1 The radar detection information set comprises a collection time T _m-p ～T _m The detection information set comprises a collection time T _m-p ～T _m Will acquire the time T _m-p Radar detection information and acquisition time T _m-p The visual inspection information is matched to obtain a first target matching result, and the target matching result is included in the acquisitionSet time T _m-p Which radar target the visual target matches; will acquire the time T _m-p+1 Radar detection information and acquisition time T _m-p+1 The visual inspection information is matched to obtain a second target matching result, which includes the acquisition time T _m-p+1 Which radar target the visual target matches; by analogy, the acquisition time T is finally obtained _m Radar detection information and acquisition time T _m The visual inspection information is matched to obtain the last target matching result, and the target matching result is included in the acquisition time T _m Which radar target the visual target matches; after the sequential matching, the matching results of (p + 1) acquisition moments can be finally obtained, and the acquisition moments T can be obtained by processing the matching results _m And (5) final target matching results.

The method for fusing the radar and the vision uses the information of the historical moment, and the updated target matching result of the current moment is sufficiently related in time sequence by fusing the information of the historical moment and the information of the current moment, so that the influence of abnormal detection of a single acquisition moment on fusion is avoided, and the matching accuracy and stability are improved.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a target detection method based on a radar fusion according to another embodiment of the present application, where the method includes:

s31: and collecting radar visual information of the target scene.

S31 is the same as S21 in the above embodiment, and is not described again here.

S32: and performing time synchronization processing on the first radar information and the first visual information to obtain second radar information and second visual information.

In order to obtain a more accurate target matching result, the first radar information and the first visual information may be time-synchronized, that is, the acquisition time of the first radar information and the acquisition time of the first visual information are aligned.

In a specific embodiment, the time synchronization can be divided into software time synchronization and hardware time synchronization, and the software time synchronization method has a large error, but is low in cost, flexible and configurable; the error of the hardware time synchronization method is small, but the cost is high, an additional circuit needs to be designed, and the hardware time synchronization method is not easy to modify. Therefore, aiming at the application scene of low-speed target tracking, a software time synchronization method is adopted; aiming at the application scene of high-speed target tracking, a hardware time synchronization method is adopted. After the time synchronization operation is performed, the second radar information and the second visual information are almost the same time information, and the time error between the second radar information and the second visual information is small, so that the time-synchronized radar visual information can be output in a bundled manner.

S33: and generating detection information based on the second radar information and the second visual information.

Extracting second radar information from the radar visual information after time synchronization; filtering the second radar information to obtain third radar information; and carrying out target detection processing on the third radar information to obtain radar detection information of the radar target. Specifically, the third radar information includes a plurality of radar frames including a current radar frame and a historical radar frame, and the second radar information may be predicted and updated using a kalman filter to filter the interference signal; it is to be understood that other filtering methods may be used to filter out the interference information in the second radar information, and are not limited to the kalman filter.

Extracting second visual information from the radar visual information after time synchronization; and carrying out target detection processing on the second visual information to obtain visual detection information of the detection target. Specifically, the second visual information comprises a plurality of video frames, wherein the plurality of video frames comprise a current video frame and a historical video frame; the second visual information can be processed by using a deep learning target detection model to detect visual targets such as people, animals, motor vehicles or non-motor vehicles; the deep learning object detection model may be an object detection model commonly used in the related art.

S34: and splitting the detection information by adopting a sliding window method to obtain a plurality of thunder vision detection information sets.

The length of the sliding window is fixed and slides forwards according to a preset step length, so that the sliding window is maintainedThere is enough information in the window, and the information in the sliding window is updated continuously as the sliding window goes; wherein the length of the sliding window is larger than the sliding step length. For example, a sliding window with the length of N and the step length of 1 is set and moves on a time sequence, the information in the sliding window is kept to be the latest previous N frames of information, and then the N frames of information in the sliding window are fused; such as: as shown in FIG. 4, N is 2 and W is the sliding window; at T ₁₁ At the moment, the information in the sliding window comprises the information of the Kth frame and the information of the Kth + 1; at T ₁₂ At the moment, the information in the sliding window comprises the information of the K +1 th frame and the information of the K +2 th frame; at T ₁₃ At the moment, the information in the sliding window comprises the information of the K +2 th frame and the information of the K +3 th frame; at T ₁₄ At the moment, the information in the sliding window comprises the information of a K +3 frame and the information of a K +4 frame; at T ₁₅ At the moment, the information in the sliding window includes the information of the (K + 4) th frame and the (K + 5) th frame. Further, as shown in fig. 4, the first radar information and the first visual information may be stored in a certain number of frames after time synchronization; for example, the second radar information and the second visual information may be stored in a storage space a, and another storage space B may be opened up to store the information in the current set of radar detection information.

A plurality of radar detection information sets can be obtained by adopting a sliding window method, and the acquisition time corresponding to each piece of visual detection information in the visual detection information sets is the same as the acquisition time corresponding to each piece of radar detection information in the radar detection information sets, or the time difference between the two is smaller than a set value.

S35: and taking out a current radar vision detection information set from the plurality of radar vision detection information sets, matching all radar detection information in the current radar vision detection information set with corresponding visual detection information in the current radar vision detection information set, and generating a current target matching result.

The current thunder vision detection information set corresponds to the current acquisition time period; the radar detection information comprises a first position coordinate where a radar target is located, and the visual detection information comprises a second position coordinate where a visual target is located; the following is a detailed description of how fusion is performed.

For a target in the real world, which is represented in the form of point cloud coordinates in the radar coordinate system, in the visual coordinate system, the image in the second visual information is processed by the target detection model to obtain the position coordinates (denoted as second position coordinates) of the visual target, and the second position coordinates may be marked in the image by a detection frame (e.g., a rectangular frame) to obtain pixels containing the visual target. If a target can be observed by the radar sensor and the image sensor at the same time, the appearance detail information of the target can be obtained, and the speed and the position coordinates can be obtained by matching the radar target in the radar coordinate system with the visual target in the visual coordinate system.

In a specific embodiment, the scheme shown in fig. 5 may be adopted to generate the target matching result, which includes the following steps:

s51: and respectively selecting the current radar detection information and the current visual detection information from the radar detection information set in the current radar visual detection information set and the visual detection information set in the current radar visual detection information set.

According to the time sequence, taking out a thunder vision detection information set from all the thunder vision detection information sets to obtain a current thunder vision detection information set; then, according to the time sequence, respectively taking out one radar detection information and one visual detection information from a radar detection information set in the current radar detection information set and a visual detection information set in the current radar detection information set to obtain the current radar detection information and the current visual detection information, wherein the acquisition time corresponding to the current radar detection information is the same as the acquisition time corresponding to the current visual detection information.

S52: and converting each second position coordinate in the current visual detection information into a radar coordinate system to obtain a corresponding third position coordinate.

The second position coordinate may be a pixel coordinate of a center of a detection frame of the visual target; specifically, for a visual target, the pixel coordinate is labeled (u) _i ,v _i ) The pixel coordinates (u) are mapped according to a calibration mapping function _i ,v _i ) Mapping into coordinates in a radar coordinate system to obtain mapping coordinates (x) _ri ,y _ri ) Mapping coordinates (x) _ri ,y _ri ) I.e. the third position coordinates.

S53: and calculating the distance between the third position coordinate of the visual target and each first position coordinate in the current radar detection information aiming at each visual target corresponding to the current visual detection information to obtain a plurality of distances.

For each set of radar detection information (including radar detection information and visual detection information) in the sliding window, the mapping coordinates (x) of each visual target are calculated _ci ,y _ci ) For the coordinates (x) of the radar target in the current radar detection information _rj ,y _rj ) Traversing and calculating the distance between the two; specifically, the calculation formula of the distance is as follows:

wherein d is a distance, (x) _ci ,y _ci ) To map the coordinates, (x) _rj ,y _rj ) The coordinates of the radar target (denoted as radar coordinates).

S54: based on the plurality of distances, a current target matching result is generated.

After the distances between the visual target in the current visual detection information and all radar visual targets in the current radar detection information are calculated, determining a matching object of the visual target by using all the distances, and returning to the step of selecting the current radar detection information and the current visual detection information from the radar detection information set in the current radar visual detection information set and the visual detection information set in the current radar visual detection information set respectively, namely executing S51 until the current radar visual detection information set is traversed completely.

In one embodiment, the minimum value of all distances corresponding to the third position coordinate is calculated; and determining that the radar target corresponding to the minimum value is matched with the visual target corresponding to the minimum value, and establishing a matching relation between the radar target corresponding to the minimum value and the visual target corresponding to the minimum value to obtain a current target matching result.

After the current target matching result and the historical target matching result are obtained, the matching times of the visual target and each radar target can be counted based on the current target matching result and the historical target matching result, wherein the matching times are the same as the times of the radar target matched with the visual target; and determining the updated matching relation based on the matching times.

Further, for each visual target, selecting the radar target which is matched with the visual target most frequently in the sliding window as an object matched with the visual target (recorded as a matched object) by using a voting method; specifically, the maximum value of all matching times is calculated; and establishing a matching relation between the visual target corresponding to the maximum value and the radar target corresponding to the maximum value to obtain an updated matching relation.

For example, suppose that the current acquisition time period corresponding to the sliding window is denoted as T _n-3 ～T _n The current visual inspection information includes a visual target H ₁ ～H ₃ Corresponding detection information, the current radar detection information including a radar target G ₁ ～G ₃ Corresponding detection information, radar target G ₁ ～G ₃ The first position coordinates of (1) are respectively marked as F1-F3; for visual target H _a (a is 1, 2 or 3), calculating the visual target H _a Corresponding mapping coordinates, denoted as Z _a (ii) a Respectively calculate Z _a Distances from F1, F2 and F3 to obtain L _a1 、L _a2 And L _a3 Calculating L _a1 、L _a2 And L _a3 Minimum value of (d) to obtain L _azmin (ii) a Mixing L with _azmin Corresponding radar target as visual target H _a The target object of (1); through the above processing, the visual target H can be obtained ₁ ～H ₃ The corresponding target object. Suppose for visual target H ₁ In other words, it is at the acquisition time T _n-3 Target object of (2) is a radar target G ₁ At the time of acquisition T _n-2 Target object of (2) is a radar target G ₂ At the acquisition time T _n-1 Target object of (2) is a radar target G ₁ At the time of acquisition T _n The target object being a mineTo the target G ₁ Then its corresponding target object includes radar target G ₁ And radar target G ₃ The corresponding matching times are 3 times and 1 time, respectively, and thus, the radar target G is set ₁ Determined as a visual target H ₁ I.e. both are the same target.

It can be understood that the matching relationship of the visual target may be determined and stored, so that the calculation does not need to be repeated after the subsequent sliding window moves, and time is saved. For example, assume that the acquisition time period corresponding to the current sliding window is T ₂₁ ～T ₂₃ The acquisition time period corresponding to the next sliding window is T ₂₂ ～T ₂₄ Then there is no need to calculate the acquisition time T ₂₂ And the acquisition time T ₂₃ The matching relationship of (2) can be obtained by directly calling the previously stored matching relationship.

Understandably, the radar coordinate can be converted into a visual coordinate system to obtain a radar mapping coordinate; and then, calculating the distance between the radar mapping coordinate and the pixel coordinate of each visual target to determine the visual target matched with the radar target, wherein the specific implementation scheme is similar to the scheme and is not repeated herein.

In conclusion, the above scheme records radar information and visual information of historical time and current time in a sliding window; for each visual target, recording a radar target matched with the visual target and the matching times thereof according to a single-frame nearest neighbor matching principle; selecting the radar target with the most matching times as the matching relation of the visual target by adopting a voting method; because more information of historical moments is used, the fusion result is more accurate.

It is worth to be noted that when the size of the sliding window is 1 and the step length is 1, it is indicated that the historical target matching result is not used at all, and only the single-frame matching result at the current moment is trusted; when the size of the sliding window is increased, the weight of the historical target matching result is increased, and the weight of the single-frame matching result is reduced; therefore, the weights of the historical target matching result and the single-frame matching result can be adjusted by adjusting the parameters of the sliding window, so that the influence of the historical target matching result and the single-frame matching result on the final fusion result is changed; the parameters of the sliding window can be adjusted according to different environments, so that the algorithm has stronger adaptability.

Because the requirement of single frame matching on information is high, if the visual detection is unstable, the detection frame of the visual target shakes, and a large deviation is generated when coordinate conversion is performed subsequently through a calibration mapping function, so that matching errors can be caused when the targets are dense, and the fusion precision is influenced. Based on this, in the radar vision fusion method based on the sliding window provided by the embodiment, the sliding window is used for processing radar detection information and visual detection information in a time sequence, and single-frame matching is performed on each group of radar detection information and visual detection information in the sliding window according to the nearest neighbor principle; then, selecting a matching relation in the whole sliding window by adopting a voting method; fusing radar detection information of the matching relation with corresponding visual detection information to obtain a target matching result at the current moment; because more historical target matching results are used, the influence of inaccurate single-frame radar or visual detection on fusion is avoided, the matching accuracy and stability are improved, and the anti-interference capability is stronger; in addition, compared with a single-frame fusion scheme, the weight of the historical target matching result and the current target matching result in the fusion result can be adjusted by adjusting the parameters of the sliding window, and the method can adapt to more complex scenes.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of the object detection apparatus provided in the present application, the object detection apparatus 60 includes a memory 61 and a processor 62 connected to each other, the memory 61 is used for storing a computer program, and the computer program is used for implementing the method for detecting an object based on the radar fusion in the foregoing embodiment when being executed by the processor 62.

Referring to fig. 7, fig. 7 is a schematic structural diagram of another embodiment of the object detection device provided in the present application, where the object detection device 70 includes: a matching module 71, an acquisition module 72 and a processing module 73.

The matching module 71 is configured to obtain a current radar frame and a current video frame acquired at a current time, and determine a current target matching result of the current radar frame and the current video frame, where the current target matching result includes a matching relationship between a radar target in the current radar frame and a visual target in the current video frame;

the obtaining module 72 is configured to obtain a historical target matching result of a historical radar frame and a historical video frame collected at least one historical time before the current time, where the historical target matching result includes a matching relationship between a radar target in the historical radar frame and a visual target in the historical video frame;

the processing module 73 is connected to the matching module 71 and the obtaining module 72, and is configured to update the current target matching result based on each historical target matching result, so as to obtain an updated current target matching result.

According to the embodiment, the current target matching result is corrected by using the historical target matching result, so that the influence of inaccuracy of single-frame radar or visual detection on fusion is avoided, the matching accuracy and stability are improved, and the anti-interference capability is stronger.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a computer-readable storage medium 80 provided by the present application, where the computer-readable storage medium 80 is used for storing a computer program 81, and when the computer program 81 is executed by a processor, the computer program is used for implementing the target detection method based on the radar fusion in the foregoing embodiment.

The computer readable storage medium 80 may be a server, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various media capable of storing program codes.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules or units is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is regarded as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization in the modes of pop-up window information or asking the person to upload personal information thereof and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

The above embodiments are merely examples, and not intended to limit the scope of the present application, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present application, or those directly or indirectly applied to other related arts, are included in the scope of the present application.

Claims

1. A target detection method based on radar fusion is characterized by comprising the following steps:

acquiring a current radar frame and a current video frame acquired at the current moment, and determining a current target matching result of the current radar frame and the current video frame, wherein the current target matching result comprises a matching relation between a radar target in the current radar frame and a visual target in the current video frame;

obtaining historical target matching results of at least one historical radar frame and at least one historical video frame which are collected at a historical moment before the current moment, wherein the historical target matching results comprise matching relations between radar targets in the historical radar frames and visual targets in the historical video frames;

updating the current target matching result based on each historical target matching result to obtain an updated current target matching result; the updated current target matching result comprises an updated matching relationship between the radar target in the current radar frame and the visual target in the current video frame;

wherein, the step of updating the current target matching result based on each historical target matching result to obtain an updated current target matching result comprises:

counting the matching times of each visual target and the radar target at the current time and the historical time, wherein the matching times are the same as the times of the radar target matched with the visual target;

calculating the maximum value of all the matching times corresponding to the visual target;

and establishing a matching relation between the radar target corresponding to the maximum value and the visual target corresponding to the maximum value to obtain the updated matching relation.

2. The method for detecting the target based on the radar fusion as claimed in claim 1, wherein the step of determining the matching result of the current target of the current radar frame and the current video frame comprises:

identifying coordinates of the radar target in a radar coordinate system from the current radar frame to obtain first position coordinates;

identifying coordinates of the visual target in a visual coordinate system from the current video frame to obtain second position coordinates;

converting the second position coordinate into the radar coordinate system to obtain a third position coordinate;

and matching the third position coordinate with the first position coordinate to generate the current target matching result.

3. The method of claim 2, wherein the step of matching the third position coordinate with the first position coordinate to generate the current target matching result comprises:

for each visual target at the current moment, calculating the distance between the visual target and each radar target in the current radar frame based on the third position coordinate of the visual target and the first position coordinate of the radar target in the current radar frame to obtain a plurality of distances; or

For each radar target at the current moment, calculating the distance between the radar target and each visual target in the current video frame based on the first position coordinates of the radar target and the third position coordinates of the visual target in the current video frame to obtain a plurality of distances;

generating the current target matching result based on the plurality of distances.

4. The method of claim 3, wherein the step of generating the current target matching result based on the plurality of distances comprises:

calculating the minimum value of the plurality of distances, and determining that the visual target corresponding to the minimum value is matched with the radar target corresponding to the minimum value;

and establishing a matching relation between the radar target corresponding to the minimum value and the visual target corresponding to the minimum value.

5. An object detection apparatus, comprising a memory and a processor connected to each other, wherein the memory is configured to store a computer program, which when executed by the processor is configured to implement the method for object detection based on radon fusion of any one of claims 1 to 4.

6. An object detection device, comprising:

the matching module is used for acquiring a current radar frame and a current video frame acquired at the current moment, and determining a current target matching result of the current radar frame and the current video frame, wherein the current target matching result comprises a matching relation between a radar target in the current radar frame and a visual target in the current video frame;

the acquisition module is used for acquiring historical target matching results of at least one historical radar frame and at least one historical video frame which are acquired at a historical moment before the current moment, wherein the historical target matching results comprise matching relations between radar targets in the historical radar frames and visual targets in the historical video frames;

the processing module is connected with the matching module and the acquisition module and used for updating the current target matching result based on each historical target matching result to obtain an updated current target matching result; the updated current target matching result comprises an updated matching relationship between the radar target in the current radar frame and the visual target in the current video frame; wherein the step of updating the current target matching result based on each historical target matching result to obtain an updated current target matching result comprises: counting the matching times of each visual target and the radar target at the current time and the historical time, wherein the matching times are the same as the times of the radar target matched with the visual target; calculating the maximum value of all the matching times corresponding to the visual target; and establishing a matching relation between the radar target corresponding to the maximum value and the visual target corresponding to the maximum value to obtain the updated matching relation.

7. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, is configured to implement the method for radar fusion based target detection of any one of claims 1-4.