CN112097732A

CN112097732A - Binocular camera-based three-dimensional distance measurement method, system, equipment and readable storage medium

Info

Publication number: CN112097732A
Application number: CN202010769781.6A
Authority: CN
Inventors: 杨超; 张文涛
Original assignee: Beijing Smarter Eye Technology Co Ltd
Current assignee: Beijing Smarter Eye Technology Co Ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2020-12-18

Abstract

The embodiment of the application discloses a binocular camera-based three-dimensional distance measurement method, a binocular camera-based three-dimensional distance measurement system, binocular camera-based three-dimensional distance measurement equipment and a readable storage medium, wherein the method comprises the following steps: performing deep learning target detection on an original image of a binocular camera, determining a target feature, and acquiring frame information of a target area of the target feature; calculating the parallax of the original image point by point to obtain a corresponding parallax map, and determining a real coordinate three-dimensional image according to the parallax map; determining real three-dimensional coordinate values of all corner points based on a set region coordinate value clustering method according to the boundary corner points of the target feature object in the two-dimensional image and the real coordinate three-dimensional image; and calculating and outputting the position coordinates of the corner points of the cube in the image coordinate system through the conversion relation between the world coordinate system and the image coordinate system. So that the three-dimensional coordinates of the vehicle target in the space are directly obtained through the vision sensor. And the disadvantages of the monocular and laser radar based scheme are made up from the aspects of hardware cost, computational calculation force and sample marking cost.

Description

Binocular camera-based three-dimensional distance measurement method, system, equipment and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of three-dimensional distance measurement, in particular to a binocular camera-based three-dimensional distance measurement method, system, equipment and readable storage medium.

Background

In recent years, vehicle detection has become an indispensable function in the field of automatic driving. More sophisticated is the two-dimensional vehicle detection based on RGB images, outputting the category of the vehicle and the coordinate information in the two-dimensional space. However, in the automatic driving scene, indexes such as the three-dimensional size and the rotation angle of the target object need to be provided from the image, and meanwhile, the information of the bird's eye view angle projection also plays a crucial role in path planning and control in the subsequent automatic driving scene.

At present, 3D target detection is in a high-speed development period, a main scheme is to comprehensively utilize a monocular camera and a multiline laser radar to carry out 3D target detection, and both hardware cost (a sensor and a calculating unit) and sample labeling cost are obstacles influencing the floor popularization of the technical scheme.

Disclosure of Invention

Therefore, the embodiment of the application provides a binocular camera-based three-dimensional distance measurement method, system, equipment and readable storage medium, the output information of monocular two-dimensional detection and binocular distance measurement is comprehensively utilized for fusion, and the three-dimensional coordinates of a vehicle target in a space can be directly obtained through a vision sensor. And the disadvantages of the monocular and laser radar based scheme are made up from the aspects of hardware cost, computational calculation force and sample marking cost.

In order to achieve the above object, the embodiments of the present application provide the following technical solutions:

according to a first aspect of embodiments of the present application, there is provided a binocular camera-based three-dimensional distance measurement method, the method including:

acquiring an original image of a binocular camera;

performing deep learning target detection on the original image, determining a target feature, and acquiring frame information of a target area of the target feature;

calculating the parallax of the original image point by point to obtain a corresponding parallax map, and determining a real coordinate three-dimensional image according to the parallax map;

determining real three-dimensional coordinate values of four corner points based on a set region coordinate value clustering method according to the boundary corner points of the target feature object in the two-dimensional image and the real coordinate three-dimensional image;

determining the real three-dimensional coordinate values of the other four corner points of the outer envelope cube of the target feature according to the real three-dimensional coordinate values of the four corner points of the outer envelope cube of the target feature;

and calculating and outputting the position coordinates of the corner points of the cube in the image coordinate system through the conversion relation between the world coordinate system and the image coordinate system.

Optionally, the target features are a vehicle tail and a wheel;

the obtaining of the frame information of the target area of the target feature object includes:

performing edge positioning on the detected target areas of the vehicle tail and the wheels, searching frames around the vehicle tail and the wheels by using an edge enhancement algorithm, and further positioning the vehicle tail and the wheels to obtain frame information;

and performing straight line fitting on the acquired frame information in the left eye camera and the right eye camera, and respectively solving the intersection points of the side lines corresponding to more frame information so as to obtain the position of the envelope line of the side body of the vehicle in the image coordinate system.

Optionally, the cluster statistical method is to sort the coordinate values inside the region, remove the first noise, and take the mean value to obtain the real three-dimensional coordinate values inside the region.

Optionally, the calculating the disparity point by point for the original image to obtain a corresponding disparity map includes:

and calculating the parallax of the left image and the right image of the binocular camera which are overlapped and effective image areas point by point to obtain a parallax map corresponding to the original image, and converting the parallax map into a spatial information point cloud map.

According to a second aspect of embodiments of the present application, there is provided a binocular camera-based three-dimensional ranging system, the system including:

the image acquisition module is used for acquiring original images of the binocular camera;

the frame information determining module is used for carrying out deep learning target detection on the original image, determining a target feature object and acquiring frame information of a target area of the target feature object;

the real coordinate three-dimensional image determining module is used for calculating the parallax of the original image point by point to obtain a corresponding parallax image, and determining a real coordinate three-dimensional image according to the parallax image;

the boundary corner three-dimensional coordinate calculation module is used for determining real three-dimensional coordinate values of four corners based on a coordinate value clustering statistical method in a set area according to the boundary corners of the target feature object in the two-dimensional image and the real coordinate three-dimensional image; the three-dimensional coordinate values of the four corner points of the outer envelope cube of the target feature are determined according to the three-dimensional coordinate values of the four corner points of the outer envelope cube of the target feature;

and the real three-dimensional coordinate output module is used for calculating and outputting the position coordinates of the corner points of the cube in the image coordinate system through the conversion relation between the world coordinate system and the image coordinate system.

Optionally, the target features are a vehicle tail and a wheel;

the frame information determining module is specifically configured to:

Optionally, the real coordinate three-dimensional image determining module is specifically configured to:

According to a third aspect of embodiments herein, there is provided an apparatus comprising: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method of any of the first aspect.

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium having one or more program instructions embodied therein for performing the method of any of the first aspects.

In summary, the embodiment of the present application provides a binocular camera-based three-dimensional distance measurement method, system, device, and readable storage medium, which perform deep learning target detection on an original image of a binocular camera, determine a target feature, and obtain frame information of a target area of the target feature; calculating the parallax of the original image point by point to obtain a corresponding parallax map, and determining a real coordinate three-dimensional image according to the parallax map; determining real three-dimensional coordinate values of all corner points based on a set region coordinate value clustering method according to the boundary corner points of the target feature object in the two-dimensional image and the real coordinate three-dimensional image; and calculating and outputting the position coordinates of the corner points of the cube in the image coordinate system through the conversion relation between the world coordinate system and the image coordinate system. Output information of monocular two-dimensional detection and binocular ranging is comprehensively utilized for fusion, and three-dimensional coordinates of a vehicle target in a space can be directly obtained through a visual sensor. And the disadvantages of the monocular and laser radar based scheme are made up from the aspects of hardware cost, computational calculation force and sample marking cost.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.

Fig. 1 is a schematic flow chart of a binocular camera-based three-dimensional distance measurement method according to an embodiment of the present disclosure;

fig. 2 is a schematic view of an embodiment of binocular camera-based three-dimensional distance measurement provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of sample labeling rules provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a bounding envelope outline provided by an embodiment of the present application;

fig. 5 is a schematic view of a target parallax provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of an image of real coordinates X, Y, Z provided by an embodiment of the present application;

fig. 7 is a schematic diagram illustrating a parallax of a calculated boundary corner point according to an embodiment of the present application;

fig. 8 is a schematic diagram of a cluster statistical method provided in an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating a three-dimensional frame display of a vehicle target according to an embodiment of the present application;

fig. 10 is a block diagram of a binocular camera-based three-dimensional distance measuring system according to an embodiment of the present application.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a binocular camera-based three-dimensional distance measurement method provided in an embodiment of the present application, including the following steps:

step 101: and acquiring an original image of the binocular camera.

Step 102: and carrying out deep learning target detection on the original image, determining a target feature, and acquiring frame information of a target area of the target feature.

Step 103: and calculating the parallax of the original image point by point to obtain a corresponding parallax map, and determining a real coordinate three-dimensional image according to the parallax map.

Step 104: and determining real three-dimensional coordinate values of the four corner points based on a set region coordinate value clustering method according to the boundary corner points of the target feature object in the two-dimensional image and the real coordinate three-dimensional image.

Step 105: and determining the real three-dimensional coordinate values of the other four corner points of the outer envelope cube of the target feature according to the real three-dimensional coordinate values of the four corner points of the outer envelope cube of the target feature.

Step 106: and calculating and outputting the position coordinates of the corner points of the cube in the image coordinate system through the conversion relation between the world coordinate system and the image coordinate system.

In one possible embodiment, in step 102, the target features are a vehicle tail and a vehicle wheel; the obtaining of the frame information of the target area of the target feature object includes: performing edge positioning on the detected target areas of the vehicle tail and the wheels, searching frames around the vehicle tail and the wheels by using an edge enhancement algorithm, and further positioning the vehicle tail and the wheels to obtain frame information; and performing straight line fitting on the acquired frame information in the left eye camera and the right eye camera, and respectively solving the intersection points of the side lines corresponding to more frame information so as to obtain the position of the envelope line of the side body of the vehicle in the image coordinate system.

In a possible implementation manner, in step 103, the calculating the disparity of the original image point by point to obtain a corresponding disparity map includes: and calculating the parallax of the left image and the right image of the binocular camera which are overlapped and effective image areas point by point to obtain a parallax map corresponding to the original image, and converting the parallax map into a spatial information point cloud map.

In a possible implementation manner, in step 104, the cluster statistical method is to sort the coordinate values inside the region, remove the first noise, and take the mean value to obtain the real three-dimensional coordinate values inside the region.

In order to make the method provided by the embodiment of the application clearer, the explanation is further explained by combining fig. 2, and fig. 2 is a flow of a vehicle three-dimensional detection method based on vehicle tail, wheel detection and binocular ranging.

A first part: and detecting the two-dimensional target of the vehicle tail and the vehicle wheel based on deep learning.

The deep learning target detection mainly comprises three parts, namely sample marking, model training and model reasoning, wherein the sample marking defines detection categories (tail and wheels of a motor vehicle) as shown in FIG. 3; training a model by using a yolov 3-based target detection network; the model reasoning part sets a target filtering mechanism (NMS: non-maximum suppression; setting confidence threshold; matching by the position relation of the vehicle tail wheels).

A second part: and obtaining a vehicle two-dimensional envelope bounding box.

And obtaining a vehicle tail (blue frame) and a wheel (yellow frame) target frame through a deep learning model, and connecting the center coordinates of the two wheel yellow frames to obtain the slope k of a wheel direction line (shown as a yellow dotted line in fig. 4). The red line is an envelope of the vehicle side body (y ═ kx + b), and the position of the envelope of the vehicle side body in the image coordinate system can be determined by knowing the slope k and the cross line point a.

And a third part: a parallax image is calculated.

By using the left and right images of the binocular camera, the parallax (only for the image area where the left and right cameras are overlapped and effective) is calculated point by point, and a parallax map corresponding to the original image is obtained. Because the size of the disparity represents different distances, the disparity map can be converted into a spatial information point cloud map. As shown in fig. 5, different gray values in the disparity map represent different disparities, and black is a background (the disparity value at the background is invalid).

The fourth part: true coordinates X, Y, Z are computed for the image.

From the disparity map and the calibration file, X, Y, Z images of real coordinates are obtained, as shown in fig. 6.

The fifth part is that: the coordinates of the boundary points are calculated.

The real coordinate values of the four corner points are obtained by setting a clustering statistical method (as shown in fig. 7) for the coordinate values in the region at the boundary corner point A, B, C, D and the real coordinate graph X, Y, Z of the known vehicle target in the two-dimensional image. As shown in fig. 7, the red area is X, Z real coordinate calculation area, and the green area is Y real coordinate calculation area (area selection criteria: boundary area with clear texture and rich parallax). The real coordinates of the four corner points are obtained by calculating X, Y, Z values for corresponding coordinate points inside the region.

Fig. 8 shows a cluster statistical method, which counts and sequences the coordinate values in the region, removes the first noise, and obtains the coordinate value required in the region by averaging.

A sixth part: and outputting the real three-dimensional coordinates of the vehicle target.

Given the three-dimensional coordinates of the four corner points (A, B, C, D) of the outer envelope cube of the vehicle, the three-dimensional coordinates of the remaining four corner points (E, F, G, H) can be found. And the position coordinates of the corner points of the cube in the image coordinate system can be obtained through the conversion relation between the world coordinate system and the image coordinate system, and the display effect of the two-dimensional image is shown in fig. 9.

The method for realizing the three-dimensional detection of the vehicle target based on the fusion of monocular target detection and binocular ranging not only greatly reduces the cost of hardware and sample marking, but also has the calculation force demand far less than that of the traditional three-dimensional detection method based on deep learning, so that the application product and the landing scene of the three-dimensional detection function of the vehicle are well expanded.

In summary, the embodiment of the present application provides a binocular camera-based three-dimensional distance measurement method, which includes performing deep learning target detection on an original image of a binocular camera, determining a target feature, and acquiring frame information of a target area of the target feature; calculating the parallax of the original image point by point to obtain a corresponding parallax map, and determining a real coordinate three-dimensional image according to the parallax map; determining real three-dimensional coordinate values of all corner points based on a set region coordinate value clustering method according to the boundary corner points of the target feature object in the two-dimensional image and the real coordinate three-dimensional image; and calculating and outputting the position coordinates of the corner points of the cube in the image coordinate system through the conversion relation between the world coordinate system and the image coordinate system. Output information of monocular two-dimensional detection and binocular ranging is comprehensively utilized for fusion, and three-dimensional coordinates of a vehicle target in a space can be directly obtained through a visual sensor. And the disadvantages of the monocular and laser radar based scheme are made up from the aspects of hardware cost, computational calculation force and sample marking cost.

Based on the same technical concept, an embodiment of the present application further provides a binocular camera-based three-dimensional distance measurement system, as shown in fig. 10, the system includes:

an image obtaining module 1001 is configured to obtain a binocular camera original image.

The frame information determining module 1002 is configured to perform deep learning target detection on the original image, determine a target feature, and obtain frame information of a target area of the target feature.

A real coordinate three-dimensional image determining module 1003, configured to calculate a parallax point by point for the original image to obtain a corresponding parallax map, and determine a real coordinate three-dimensional image according to the parallax map.

A boundary corner three-dimensional coordinate calculation module 1004, configured to determine, according to a boundary corner of the target feature in the two-dimensional image and a real coordinate three-dimensional image, a real three-dimensional coordinate value of four corners based on a coordinate value cluster statistic method in a set region; and the three-dimensional coordinate value of the other four corner points of the outer envelope cube of the target feature is determined according to the three-dimensional coordinate values of the four corner points of the outer envelope cube of the target feature.

And a real three-dimensional coordinate output module 1005, configured to calculate and output position coordinates of the corner points of the cube in the image coordinate system through a conversion relationship between the world coordinate system and the image coordinate system.

Optionally, the target features are a vehicle tail and a wheel; the frame information determining module 1002 is specifically configured to: performing edge positioning on the detected target areas of the vehicle tail and the wheels, searching frames around the vehicle tail and the wheels by using an edge enhancement algorithm, and further positioning the vehicle tail and the wheels to obtain frame information; and performing straight line fitting on the acquired frame information in the left eye camera and the right eye camera, and respectively solving the intersection points of the side lines corresponding to more frame information so as to obtain the position of the envelope line of the side body of the vehicle in the image coordinate system.

Optionally, the real coordinate three-dimensional image determining module 1003 is specifically configured to: and calculating the parallax of the left image and the right image of the binocular camera which are overlapped and effective image areas point by point to obtain a parallax map corresponding to the original image, and converting the parallax map into a spatial information point cloud map.

Based on the same technical concept, an embodiment of the present application further provides an apparatus, including: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method described above.

Based on the same technical concept, the embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium contains one or more program instructions, and the one or more program instructions are used for executing the method described above.

In the present specification, each embodiment of the method is described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Reference is made to the description of the method embodiments.

It is noted that while the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not a requirement or suggestion that the operations must be performed in this particular order or that all of the illustrated operations must be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Although the present application provides method steps as in embodiments or flowcharts, additional or fewer steps may be included based on conventional or non-inventive approaches. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded.

The units, devices, modules, etc. set forth in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the present application, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of a plurality of sub-modules or sub-units, and the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The above-mentioned embodiments are further described in detail for the purpose of illustrating the invention, and it should be understood that the above-mentioned embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A binocular camera-based three-dimensional distance measurement method is characterized by comprising the following steps:

acquiring an original image of a binocular camera;

2. The method of claim 1, wherein the target features are vehicle tails and wheels;

3. The method of claim 1, wherein the cluster statistic method is to sort the coordinate values inside the region, remove the first noise, and take the mean value to obtain the real three-dimensional coordinate values inside the region.

4. The method of claim 1, wherein said calculating the disparity for the original image point-by-point to obtain a corresponding disparity map comprises:

5. A binocular camera based three-dimensional ranging system, the system comprising:

6. The system of claim 5, wherein the target features are a vehicle tail and a vehicle wheel;

the frame information determining module is specifically configured to:

7. The system of claim 5, wherein the cluster statistic method is to sort the coordinate values inside the region, remove the first noise, and take the mean value to obtain the real three-dimensional coordinate values inside the region.

8. The system of claim 5, wherein the real coordinate three-dimensional image determination module is specifically configured to:

9. An apparatus, characterized in that the apparatus comprises: the device comprises a data acquisition device, a processor and a memory;

the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor, configured to execute one or more program instructions to perform the method of any of claims 1-4.

10. A computer-readable storage medium having one or more program instructions embodied therein for performing the method of any of claims 1-4.