CN112927281B

CN112927281B - Depth detection method, depth detection device, storage medium and electronic equipment

Info

Publication number: CN112927281B
Application number: CN202110367514.0A
Authority: CN
Inventors: 庞若愚
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2024-07-02
Anticipated expiration: 2041-04-06
Also published as: CN112927281A

Abstract

The disclosure provides a depth detection method, a depth detection device, a storage medium and electronic equipment, and relates to the technical field of computer vision. The method comprises the following steps: acquiring point cloud data of an object to be detected, which is acquired by a laser radar, and at least two images of the object to be detected, which are acquired by at least two cameras; determining first depth information of the object to be detected by analyzing the point cloud data, wherein the first depth information comprises first depth values of different areas of the object to be detected; determining second depth information of the object to be detected by performing stereo matching on the at least two images, wherein the second depth information comprises second depth values of different areas of the object to be detected; determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value of the different areas; and fusing the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected. The method and the device expand the applicable scene and have higher practicability.

Description

Depth detection method, depth detection device, storage medium and electronic equipment

Technical Field

The disclosure relates to the technical field of computer vision, and in particular relates to a depth detection method, a depth detection device, a computer readable storage medium and electronic equipment.

Background

Depth detection refers to detecting a distance between an observer and an object to be measured in a depth direction so as to restore three-dimensional stereo information of the object to be measured.

In the related art, depth detection is mostly implemented by a specific sensor and its matched algorithm, where the sensor includes a binocular camera, a laser radar (LIGHTLASER DETECTION AND RANGING, abbreviated as LiDAR), a TOF (Time Of Flight) sensor, a structured light camera, and so on. The depth detection by each sensor has a certain limitation, for example, the accuracy of the depth value detected by all the sensors on the object to be detected exceeding the detection range is low, the accuracy of the depth value detected by the binocular camera on the weak texture part of the object is low, the laser radar is easily affected by the multipath interference effect, the accuracy of the depth value detected by the edge part of the object is low, and the like. Therefore, the related art has high requirements for depth detection scenes and low practicality.

Disclosure of Invention

The disclosure provides a depth detection method, a depth detection device, a computer readable storage medium and an electronic device, so as to at least improve the problem that the requirements of the related technology on a depth detection scene are high to a certain extent.

According to a first aspect of the present disclosure, there is provided a depth detection method, including: acquiring point cloud data of an object to be detected, which is acquired by a laser radar, and at least two images of the object to be detected, which are acquired by at least two cameras; determining first depth information of the object to be detected by analyzing the point cloud data, wherein the first depth information comprises first depth values of different areas of the object to be detected; determining second depth information of the object to be detected by performing stereo matching on the at least two images, wherein the second depth information comprises second depth values of different areas of the object to be detected; determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value of the different region; and fusing the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected.

According to a second aspect of the present disclosure, there is provided a depth detection apparatus comprising: the system comprises a data acquisition module, a detection module and a display module, wherein the data acquisition module is configured to acquire point cloud data of an object to be detected, which is acquired by a laser radar, and at least two images of the object to be detected, which are acquired by at least two cameras; a first depth information determining module configured to determine first depth information of the object to be measured by analyzing the point cloud data, the first depth information including first depth values of different regions of the object to be measured; a second depth information determining module configured to determine second depth information of the object to be measured by stereo matching the at least two images, the second depth information including second depth values of different areas of the object to be measured; the weight value determining module is configured to determine a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value of the different area; and the depth information fusion module is configured to fuse the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the depth detection method of the first aspect described above and possible implementations thereof.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the depth detection method of the first aspect described above and possible implementations thereof via execution of the executable instructions.

The technical scheme of the present disclosure has the following beneficial effects:

The scheme realizes the fusion of the depth information detected by the laser radar and the binocular (or multi-view) camera, can overcome the limitation of a single sensor system, expands the range of the detectable depth value and the applicable depth detection scene, improves the accuracy of the depth detection, and has higher practicability.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Fig. 1 shows a system architecture diagram of an operating environment in the present exemplary embodiment;

Fig. 2 shows a schematic structural diagram of an electronic device in the present exemplary embodiment;

Fig. 3 shows a flowchart of a depth detection method in the present exemplary embodiment;

fig.4 shows a flowchart of acquiring point cloud data in the present exemplary embodiment;

Fig. 5 shows a flowchart of determining second depth information in the present exemplary embodiment;

fig. 6 shows a flowchart of determining a first weight value and a second weight value in the present exemplary embodiment;

Fig. 7 shows a schematic diagram of a depth value range in the present exemplary embodiment;

fig. 8 shows another flowchart for determining the first weight value and the second weight value in the present exemplary embodiment;

fig. 9 shows a flowchart of another depth detection method in the present exemplary embodiment;

Fig. 10 shows a schematic structural diagram of a depth detection device in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only and not necessarily all steps are included. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

In the related art, a scheme of fusing an active depth sensor with a binocular camera has emerged. For example, the laser radar and the data acquired by the binocular camera are fused, and the data acquired by the two sensors are mutually verified to remove the error data. However, the laser radar and the binocular camera have respective detection ranges, and the detection range obtained by intersecting the detection ranges of the laser radar and the binocular camera is small, so that the depth detection scene is very limited.

In view of the foregoing, exemplary embodiments of the present disclosure first provide a depth detection method. Fig. 1 shows a system architecture diagram of an operating environment of the present exemplary embodiment. Referring to FIG. 1, the system architecture includes a data acquisition device 110 and a computing device 120. The data acquisition device 110 includes a camera system 111, a lidar 112 and a synchronizer 113. The camera system 111 may be used to collect image data of an object to be measured, and includes at least two cameras, such as the first camera 1111 and the second camera 1112 shown in fig. 1, and the camera system 111 is a binocular camera system. The camera system 111 may further include a third camera, a fourth camera, etc., which is not limited by the present disclosure. The lidar 112 may be configured to transmit a laser signal to an object to be measured, and obtain point cloud data of the object to be measured by analyzing the received reflected signal. Synchronizer 113 may be used to time synchronize camera system 111 with lidar 112 such that the time at which camera system 111 collects image data is synchronized with the time at which lidar 112 collects point cloud data. The data acquisition device 110 and the computing device 120 may form a connection through a wired or wireless communication link such that the data acquisition device 110 transmits the acquired data to the computing device 120. Computing device 120 includes a processor 121 and a memory 122. The memory 122 is used for storing executable instructions of the processor 121, and may also store application data, such as image data, video data, etc. The processor 121 is configured to perform the depth detection method of the present exemplary embodiment via execution of executable instructions to process data transmitted by the data acquisition device 110 to obtain corresponding target depth information.

In one embodiment, the data acquisition device 110 and the computing device 120 may be two devices that are independent of each other, e.g., the data acquisition device 110 is a robot and the computing device 120 is a computer for controlling the robot.

In another embodiment, the data acquisition device 110 and the computing device 120 may also be integrated in the same device, for example, the vehicle-mounted intelligent device includes the data acquisition device 110 and the computing device 120, and the overall process of data acquisition and data processing is performed to implement depth detection and automatic driving of the vehicle.

Application scenarios of the depth detection method of the present exemplary embodiment include, but are not limited to: in the automatic running of the vehicle or robot, at least two cameras are controlled to acquire image data of a front object to be measured, a laser radar is controlled to acquire point cloud data of the object to be measured, the acquired image data and the point cloud data are processed by executing the depth detection method of the exemplary embodiment to obtain target depth information of the object to be measured, a three-dimensional structure of the object to be measured is reconstructed, and a decision in the automatic running is determined according to the target depth information.

The exemplary embodiments of the present disclosure also provide an electronic device for performing the above-described depth detection method. The electronic device may be the computing device 120 described above or the computing device 120 including the data acquisition device 110.

The configuration of the above-described electronic device will be exemplarily described below taking the mobile terminal 200 in fig. 2 as an example. It will be appreciated by those skilled in the art that the configuration of fig. 2 can also be applied to stationary type devices in addition to components specifically for mobile purposes.

As shown in fig. 2, the mobile terminal 200 may specifically include: processor 210, internal memory 221, external memory interface 222, USB (Universal Serial Bus ) interface 230, charge management module 240, power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, headset interface 274, sensor module 280, display 290, camera module 291, indicator 292, motor 293, keys 294, and SIM (Subscriber Identification Module, subscriber identity module) card interface 295, and the like.

Processor 210 may include one or more processing units such as, for example: processor 210 may include an AP (Application Processor ), modem Processor, GPU (Graphics Processing Unit, graphics Processor), ISP (IMAGE SIGNAL Processor ), controller, encoder, decoder, DSP (DIGITAL SIGNAL Processor ), baseband Processor and/or NPU (Neural-Network Processing Unit, neural network Processor), and the like.

The encoder can encode (i.e. compress) the image or video data, for example, encode the photographed image of the object to be tested, and form corresponding code stream data so as to reduce the bandwidth occupied by data transmission; the decoder may decode (i.e. decompress) the code stream data of the image or video to restore the image or video data, for example, decode the code stream data corresponding to the image of the object to be detected, to obtain the original image data. The mobile terminal 200 may support one or more encoders and decoders. In this way, the mobile terminal 200 can process images or videos in various encoding formats, such as: image formats such as JPEG (Joint Photographic Experts Group ), PNG (Portable Network Graphics, portable network graphics), BMP (Bitmap), and Video formats such as MPEG (Moving Picture Experts Group ) 1, MPEG2, h.263, h.264, HEVC (HIGH EFFICIENCY Video Coding).

In one embodiment, processor 210 may include one or more interfaces through which connections are made with other components of mobile terminal 200.

Internal memory 221 may be used to store computer executable program code that includes instructions. The internal memory 221 may include a volatile memory and a nonvolatile memory. The processor 210 performs various functional applications of the mobile terminal 200 and data processing by executing instructions stored in the internal memory 221.

The external memory interface 222 may be used to connect an external memory, such as a Micro SD card, to enable expansion of the memory capabilities of the mobile terminal 200. The external memory communicates with the processor 210 through the external memory interface 222 to implement data storage functions, such as storing files of images, videos, and the like.

The USB interface 230 is an interface conforming to the USB standard specification, and may be used to connect a charger to charge the mobile terminal 200, or may be connected to a headset or other electronic device.

The charge management module 240 is configured to receive a charge input from a charger. The charging management module 240 may also supply power to the device through the power management module 241 while charging the battery 242; the power management module 241 may also monitor the status of the battery.

The wireless communication function of the mobile terminal 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 250 may provide a solution including 2G/3G/4G/5G wireless communication applied on the mobile terminal 200. The wireless Communication module 260 may provide wireless Communication solutions including WLAN (Wireless Local Area Networks, wireless local area network) such as Wi-Fi (WIRELESS FIDELITY ) network, BT (Bluetooth), GNSS (Global Navigation SATELLITE SYSTEM ), FM (Frequency Modulation, frequency modulation), NFC (NEAR FIELD Communication), IR (Infrared), etc. applied on the mobile terminal 200.

The mobile terminal 200 may implement a display function through a GPU, a display screen 290, an AP, and the like, and display a user interface. For example, when the user turns on the photographing function, the mobile terminal 200 may display a photographing interface, a preview image, and the like in the display screen 290.

The mobile terminal 200 may implement a photographing function through an ISP, a camera module 291, an encoder, a decoder, a GPU, a display 290, an AP, and the like. For example, the user may activate a related service for depth detection, trigger a shooting function to be started, and at this time, an image of the object to be measured may be acquired through the camera module 291.

The mobile terminal 200 may implement audio functions through an audio module 270, a speaker 271, a receiver 272, a microphone 273, a headphone interface 274, an AP, and the like.

The sensor module 280 may include an ambient light sensor 2801, a pressure sensor 2802, a gyro sensor 2803, a barometric pressure sensor 2804, etc. to implement a corresponding sensing function.

The indicator 292 may be an indicator light, which may be used to indicate a state of charge, a change in power, a message indicating a missed call, a notification, etc. The motor 293 may generate vibration cues, may also be used for touch vibration feedback, or the like. The keys 294 include a power on key, a volume key, etc.

The mobile terminal 200 may support one or more SIM card interfaces 295 for interfacing with a SIM card to enable telephony and mobile communications, among other functions.

The depth detection method of the present exemplary embodiment is described below with reference to fig. 3, and fig. 3 shows an exemplary flow of the depth detection method, which may include:

Step S310, acquiring point cloud data of an object to be detected acquired by a laser radar and at least two images of the object to be detected acquired by at least two cameras;

step S320, determining first depth information of an object to be detected by analyzing point cloud data, wherein the first depth information is first depth values of different areas of the object to be detected;

step S330, determining second depth information of the object to be detected by performing stereo matching on the at least two images, wherein the second depth information comprises second depth values of different areas of the object to be detected;

step S340, determining a first weight value corresponding to a first depth value and a second weight value corresponding to a second depth value of the different areas;

step S350, the first depth information and the second depth information are fused by using the first weight value and the second weight value, so as to obtain the target depth information of the object to be measured.

By the method, fusion of depth information detected by the laser radar and the binocular (or multi-view) camera is realized, the limitation of a single sensor system can be overcome, the range of the detectable depth value and the applicable depth detection scene are expanded, the accuracy of the depth detection is improved, and the method has higher practicability.

Each step in fig. 3 is specifically described below.

Referring to fig. 3, in step S310, point cloud data of an object to be measured acquired by a lidar and at least two images of the object to be measured acquired by at least two cameras are acquired.

The object to be measured refers to an environment in front of the laser radar and the camera, and comprises objects in the environment. The laser radar generally comprises a transmitter and a receiver, wherein the transmitter transmits laser signals, the receiver receives the reflected laser signals after the laser signals are reflected at an object to be detected, depth information of the object to be detected can be calculated by analyzing time difference between the transmitted and received laser signals, and meanwhile three-dimensional information of the object to be detected is determined according to a coordinate system of the laser radar, so that point cloud data of the object to be detected are generated.

At the same time, a camera system comprising at least two cameras may acquire images of the object to be measured. Taking a binocular camera system as an example, the at least two captured images include a first image, which may be a left view in the binocular, and a second image, which may be a right view.

In one embodiment, referring to fig. 4, the acquiring the point cloud data of the object to be measured acquired by the lidar may include:

Step S410, acquiring multi-frame point cloud data acquired by a laser radar in a motion process;

step S420, registering the multi-frame point cloud data, and fusing the registered multi-frame point cloud data to obtain point cloud data of the object to be detected.

In the motion process of the laser radar, along with the change of the pose, the coordinate system of the laser radar is also changed, and multi-frame point cloud data are acquired under different poses, wherein each frame of point cloud data is three-dimensional information of an object to be detected in different coordinate systems of the laser radar. Therefore, the multi-frame point cloud data can be registered, the registered point cloud data are located in the same coordinate system, and then the multi-frame point cloud data are fused to obtain more dense point cloud data compared with Shan Zhen, and partial error points in the point cloud data can be removed, so that the accuracy of the point cloud data is improved.

In one embodiment, one frame may be selected from the multi-frame point cloud data as a reference frame, and the other frames may be registered to the reference frame. For example, the lidar collects k frames of point cloud data together during motion, and registers each of the 2 nd to the k th frames to the 1 st frame based on the 1 st frame.

In general, the object to be measured is a static object, that is, the shape of the object to be measured is unchanged in the moving process of the laser radar, so that the point cloud data of different frames corresponds to the object to be measured with the same shape. Thus, when the registration is performed, the optimal transformation parameters are determined for the frame to be registered, so that the transformed transformation parameters and the reference frame are overlapped as much as possible.

The present disclosure is not limited to a particular registration algorithm. For example, an ICP algorithm (ITERATIVE CLOSEST POINT, iterating the nearest point) may be used to transform the 2 nd frame point cloud data into the 1 st frame coordinate system based on the initial transformation parameters (typically including rotation matrices and translation vectors) and to pair the nearest point with the 1 st frame point cloud data when aligning the 2 nd frame to the 1 st frame; calculating the average distance of the nearest point pair to construct a loss function; the transformation parameters are optimized through iteration, so that the loss function value is continuously reduced until convergence, and the optimized transformation parameters are obtained; and transforming the 2 nd frame point cloud data into a1 st frame coordinate system by using the optimized transformation parameters, thereby completing the registration from the 2 nd frame to the 1 st frame.

In one embodiment, step S420 may be implemented by:

Determining a reference frame in the multi-frame point cloud data, and registering other frame point cloud data except the reference frame into a coordinate system corresponding to the reference frame, wherein the coordinate system is a three-dimensional coordinate system;

in a coordinate system corresponding to the reference frame, dividing a cube or a cuboid lattice according to the resolution, actual requirements and the like of the laser radar;

Dividing the points in the registered point cloud data of each frame into grids according to the x, y and z coordinates of the points, wherein the points in the same grid are regarded as homonymous points;

Counting the number of points in each grid, if the number of points is smaller than a threshold value of the number of homonymous points, judging the points in the grid as error points, and eliminating the error points, wherein the threshold value of the number of homonymous points can be determined empirically or can be determined by combining the number of frames of point cloud data, for example, under the condition of acquiring k frames of point cloud data altogether, the threshold value of the number of homonymous points can be s x k, s is a coefficient smaller than 1, for example, the threshold value can be 0.5, 0.25 and the like;

and forming the rest points into a set to obtain fused point cloud data.

With continued reference to fig. 3, in step S320, first depth information of the object to be measured is determined by analyzing the above-mentioned point cloud data, where the first depth information includes first depth values of different regions of the object to be measured.

The first depth information refers to depth information of the object to be measured, which is determined based on point cloud data of the laser radar, and comprises first depth values of different areas of the object to be measured. In order to facilitate discrimination, the present exemplary embodiment refers to a depth value obtained based on a laser radar as a first depth value and a depth value obtained based on a camera system as a second depth value.

The coordinates of the points are included in the point cloud data, where the axis coordinates corresponding to the depth direction are depth values, for example, the general z-axis coordinates are depth values, and of course, the x-axis or the y-axis may also be used as the axis of the depth direction, which is related to the direction setting of the coordinate system, which is not limited in this disclosure. It can be seen that the depth value of the object to be measured can be directly obtained from the point cloud data.

Considering that the depth values in the point cloud data are coordinate values in the lidar coordinate system, they may be further transformed into the coordinate system of the camera system to facilitate the fusion of subsequent depth information. In one embodiment, step S320 may include:

And based on a first calibration parameter between the laser radar and a first camera of the at least two cameras, projecting the point cloud data to a coordinate system of the first camera to obtain first depth information of the object to be detected.

In a camera system, each camera has a respective camera coordinate system, from which one camera is typically selected as the main camera, and the coordinate system of the main camera is taken as the coordinate system of the whole camera system. The selected main camera is denoted as the first camera in the present exemplary embodiment, and may be any one of the at least two cameras described above. For example, in a binocular camera system, a left camera is generally taken as a main camera, and the left camera may be taken as a first camera.

In this exemplary embodiment, the laser radar and the first camera may be calibrated in advance, for example, a Zhang Zhengyou calibration method may be used. The first calibration parameter is a calibration parameter between the laser radar and the first camera, and may be a transformation parameter between a coordinate system of the laser radar and a coordinate system of the first camera. Therefore, after the point cloud data of the laser radar are obtained, the point cloud data can be projected into the coordinate system of the first camera from the coordinate system of the laser radar by adopting the first calibration parameters, so that the coordinates of each point in the coordinate system of the first camera are obtained, and further, the depth value of each point in the coordinate system of the first camera, namely the first depth value, is obtained.

It should be noted that, both the laser radar and the camera system have a certain resolution when detecting depth information. A receiver, such as a lidar, comprises an array of elements that receive laser signals reflected from different areas of an object to be measured, and the depth value resolved by each element is represented as the depth value of a point on the object to be measured, in effect corresponding to a localized area of the object to be measured. Thus, the greater the number of elements, the greater the density of the array and the higher the resolution of the resulting depth values, i.e., the smaller the area corresponding to each point. For camera systems, the resolution of the detected depth information is related to the texture features, the number of feature points, etc. in the image. In the present exemplary embodiment, no particular distinction is made between two concepts of points and areas in the object to be measured.

The first depth information mainly comprises a set of first depth values for different regions. In addition, the first depth information may further include other information than the first depth value, for example, a first confidence level corresponding to the first depth value, and the like.

In one embodiment, the lidar may output a first confidence. For example, a first confidence corresponding to a first depth value of the different region is quantitatively calculated according to the intensity of the laser signal received by the receiver, and in general, the intensity of the laser signal is positively correlated with the first confidence.

In another embodiment, the first confidence level may be determined according to the fusion result of the multi-frame point cloud data. For example, after the points in the point cloud data of each frame after registration are divided into the grids according to the x, y and z coordinates thereof, the number of points in each grid and the depth difference between different points in each grid (i.e., the difference between the depth-direction axis coordinates of different points, such as the difference between the z-axis coordinates) are counted, the larger the number of points in the grid, the smaller the depth difference between the points, the higher the first confidence of the points. By way of example, the following equation (1) may be employed for calculation:

Where Conf1 represents the first confidence level. grid _i represents the ith grid, p is the point in the ith grid, and count (grid _i) represents the number of points in grid _i. k is the number of frames of the point cloud data. d represents depth, where σd (grid _i) represents the standard deviation of the depth values for different points within grid _i, and Δd (grid _i) represents the span of the depth values for grid _i, which may be the span on the z-axis, for example. a and b are two empirical indices, and illustratively, a and b are both in the (0, 1) range.

As can be seen from equation (1), the first confidence is obtained by the two-part product. Wherein, the larger the number of points in the grid, the larger the ratio of the number of points in the grid to the number of frames k, the larger the first partial value, and the larger the first confidence; the more concentrated the depth values within the grid, the smaller the standard deviation and the smaller the ratio of the depth value span, the larger the second partial value and the larger the first confidence.

With continued reference to fig. 3, in step S330, second depth information of the object to be measured is determined by stereo matching the at least two images, where the second depth information includes second depth values of different regions of the object to be measured.

The second depth information refers to depth information of the object to be measured, which is determined based on the camera system, and includes second depth values of different areas of the object to be measured. The at least two images are images acquired by different cameras aiming at the same object to be measured, so that the three-dimensional information of the object to be measured can be recovered through three-dimensional reconstruction, and the second depth information is obtained.

In one embodiment, referring to fig. 5, step S330 may include:

step S510, based on the second calibration parameters between the at least two cameras, performing stereo matching on the at least two images to obtain a binocular disparity map;

step S520, determining second depth information of the object to be measured according to the binocular disparity map.

As can be seen from the above, each camera in the camera system has a respective camera coordinate system, and the coordinate system of the first camera (i.e. the main camera) is selected as the coordinate system of the whole camera system, so that other cameras in the camera system and the first camera can be calibrated in advance, for example, a Zhang Zhengyou calibration method can be adopted. The second calibration parameter is a calibration parameter between the other cameras and the first camera, and may be a transformation parameter between the coordinate system of the other cameras and the coordinate system of the first camera. When the camera system comprises three or more cameras, the second camera, the third camera and the like can be calibrated to the first camera, so that a plurality of groups of second calibration parameters are obtained.

Based on the second calibration parameters, the images can be subjected to pairwise stereo matching. For example, if the camera system includes two cameras, each of the two cameras captures an image, wherein a first camera captures a first image and a second camera captures a second image; and performing stereo matching on the first image and the second image based on a second calibration parameter between the first camera and the second camera to obtain a binocular disparity map corresponding to the first image and the second image. If the camera system comprises three cameras, the three cameras respectively acquire one image, wherein a first camera acquires a first image, a second camera acquires a second image, and a third camera acquires a third image; the first image and the second image are subjected to stereo matching based on a second calibration parameter between the first camera and the second camera, so that binocular disparity maps corresponding to the first image and the second image are obtained, and the binocular disparity maps can be recorded as (1-2) binocular disparity maps for distinguishing conveniently; and (3) performing stereo matching on the first image and the third image based on a second calibration parameter between the first camera and the third camera to obtain a binocular disparity map corresponding to the first image and the third image, which can be recorded as (1-3) binocular disparity maps.

The present disclosure is not limited to a specific algorithm of stereo Matching, and may be implemented, for example, by adopting SGM algorithm (Semi-Global Matching).

The binocular disparity map includes a disparity value of each point, and a second depth value of each point can be calculated by combining a parameter of the camera and a second calibration parameter (mainly, a base line length between different cameras), and the second depth value of each point in the first image is calculated generally with the first camera as a reference, so as to obtain the second depth information.

When a plurality of binocular disparity maps are obtained, a set of second depth information may be calculated according to each binocular disparity map, for example, the (1-2) second depth information may be obtained according to the (1-2) binocular disparity map, and the (1-3) second depth information may be obtained according to the (1-3) binocular disparity map. Further, the second depth values of the same region (or the same point) in the second depth information of different groups are fused, for example, an average value may be calculated, so as to obtain a group of fused second depth information.

The second depth information comprises a set of second depth values for the different regions. In addition, the second depth information may further include other information than the second depth value, for example, a second confidence level corresponding to the second depth value, and the like.

In one embodiment, the second confidence level may be estimated using a machine learning model. For example, the convolutional neural network is trained in advance, the at least two images and the second depth information (usually, the depth image corresponding to the first image) are input into the convolutional neural network, and the images with the second confidence are output after processing, wherein the images comprise the second confidence of each point in the object to be tested.

In another embodiment, an LRC (Left-Right Consistency ) detection algorithm may be used to detect false parallax matches, especially occlusion regions at the depth fault of the object under test, given a lower second confidence.

With continued reference to fig. 3, in step S340, a first weight value corresponding to a first depth value and a second weight value corresponding to a second depth value of the different region are determined.

As can be seen from the above, the first depth information and the second depth information are depth information of the object to be measured obtained through different ways, and different areas of the object to be measured have a first depth value in the first depth information and a second depth value in the second depth information. The present exemplary embodiment determines a first weight value corresponding to a first depth value and a second weight value corresponding to a second depth value of a different region, so as to facilitate subsequent weighted fusion.

It should be noted that, it is generally difficult for the first depth information and the second depth information to cover all areas or all points of the object to be measured. Thus, there may be areas where depth values are detected only in one of the first depth information and the second depth information, i.e. the areas have only one of the first depth value and the second depth value. When the first depth information and the second depth information are fused, the first depth value or the second depth value can be directly adopted, and the first weight value and the second weight value do not need to be calculated. Other regions are detected as depth values in the first depth information and the second depth information, i.e. have the first depth value and the second depth value, and step S340 mainly calculates the first weight value and the second weight value for the regions.

In one embodiment, referring to fig. 6, step S340 may include:

step S610, a first depth value range and a second depth value range are obtained, wherein the first depth value range is a depth value detection range of the laser radar, and the second depth value range is a depth value detection range of the at least two cameras;

Step S620, determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the first depth value range, the second depth value range, and the first depth value and the second depth value of the different areas.

The first depth value range, that is, the range of the laser radar, is generally an index of the performance parameters of the laser radar. The second range of depth values may be determined based on internal parameters, baseline lengths, etc. of the at least two cameras. For example, when the laser radar and the binocular camera are arranged in the mobile phone to perform depth detection, the depth value detection range of the laser radar is generally close, the depth value detection range of the binocular camera is relatively far, and referring to fig. 7, for example, the first depth value range of the laser radar is 0.1-3 meters, and the second depth value range of the camera system is 0.6-5 meters.

In the first depth value range, the first depth value detected based on the laser radar is more reliable, a higher first weight value can be set, and in the second depth value range, the second depth value detected based on the camera system is more reliable, and a higher second weight value can be set. Therefore, the first weight value and the second weight value can be determined by comparing the first depth value and the second depth value of different areas of the object to be detected with the first depth value range and the second depth value range.

In one embodiment, referring to fig. 8, step S620 may include:

step S810, determining a first depth median and a second depth median, wherein the first depth median is the median of the first depth value range, and the second depth median is the median of the second depth value range;

Step S820, determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the difference between the first depth value and the first median value and the difference between the second depth value and the second median value.

Generally, the closer the first depth value is to the first depth median, the higher the confidence level, the greater the first weight value may be set, and the second depth value is the same. Therefore, the difference between the first depth value and the first depth median, and the difference between the second depth value and the second depth median can be used as the basis for measuring the first weight value and the second weight value.

In one embodiment, a first weight corresponding to the first depth value may be calculated according to a difference between the first depth value and the first depth median, a second weight corresponding to the second depth value may be calculated according to a difference between the second depth value and the second depth median, and the first weight and the second weight may be normalized.

In one embodiment, the first weight value and the second weight value may be calculated simultaneously in combination with a difference between the first depth value and the first depth median and a difference between the second depth value and the second depth median. By way of example, the following equation (2) may be employed for calculation:

Where w1 (p) represents a first weight value of p-point and w2 (p) represents a second weight value of p-point. d1 (p) represents a first depth value of p-point, and d2 (p) represents a second depth value of p-point. med1 represents a first depth median and med2 represents a second depth median. The |d1 (p) -med1| is the difference between the first depth value and the first depth median, and the |d2 (p) -med2| is the difference between the second depth value and the second depth median. Δd1 represents the span of the first depth value range (i.e., the difference between the upper and lower limit values), and Δd2 represents the span of the second depth value range. It can be seen that diff1 (p) is the difference between the normalized first depth value and the first depth median, and diff2 (p) is the difference between the normalized second depth value and the second depth median. And calculating according to diff1 (p) and diff2 (p) to obtain a first weight value and a second weight value.

In one embodiment, the first depth value range may be partitioned from the second depth value range. Referring to fig. 7, the first depth value range of the laser radar and the second depth value range of the camera system are intersected to obtain a common range, namely a range of 0.6-3 meters; the complement of the common range in the first depth value range is a first unilateral range, namely a range of 0.1-0.6 meter; the complement of the common range in the second depth value range is a second single-sided range, i.e. a range of 3-5 meters. For any region of the object to be measured, if the first depth value is within the first single-side range, the second depth value approaches to the boundary between the first single-side range and the common range (for example, the difference from 0.6 meter is smaller than a certain boundary threshold, the boundary threshold is for example, 0.1 meter), the first weight value of the region is set to 1, and the second weight value is set to 0; if the first depth value is close to the boundary between the second single-side range and the public range (for example, the difference value between the first depth value and 3 meters is smaller than a certain boundary threshold value, the boundary threshold value is 0.1 meter, for example), and the second depth value is in the second single-side range, the first weight value of the region is set to 0, and the second weight value is set to 1; if the first depth value and the second depth value are both in the common range, the first weight value and the second weight value are calculated according to the difference between the first depth value and the first depth median and the difference between the second depth value and the second depth median, for example, the calculation can be performed by referring to the above formula (2).

When the first depth information and the second depth information are calculated, a corresponding first confidence coefficient and a corresponding second confidence coefficient can be obtained and can be used for calculating a first weight value and a second weight value. In one embodiment, step S620 may include:

And determining a second weight value corresponding to the first weight value and the second weight value corresponding to the second depth value according to at least one of the first depth value range, the second depth value range, the first depth value and the second depth value of the different areas and at least one of a first confidence corresponding to the first depth value and a second confidence corresponding to the second depth value.

The following is described in two cases:

(1) One of the first confidence and the second confidence is obtained.

Taking the first confidence coefficient as an example, for any region of the object to be detected, if the first confidence coefficient of the region is lower than a first confidence lower limit threshold value, setting a first weight value of the region to be 0 and setting a second weight value to be 1; if the first confidence coefficient of the region is higher than the first confidence upper threshold value, setting the first weight value of the region to be 1 and setting the second weight value to be 0; if the first confidence of the region is between the first confidence upper threshold and the first confidence lower threshold, calculating a first weight value and a second weight value according to the difference between the first depth value and the first depth median and the difference between the second depth value and the second depth median. The first upper confidence threshold and the first lower confidence threshold may be set according to experience and actual requirements, for example, the first upper confidence threshold is 0.8, and the first lower confidence threshold is 0.2.

(2) Acquiring a first confidence coefficient and a second confidence coefficient

In one embodiment, for any region of the object to be measured, if the first confidence is lower than the first confidence threshold and the second confidence is higher than the second confidence threshold, the first weight value of the region is set to 0, and the second weight value is set to 1; if the first confidence coefficient is higher than the first confidence threshold value and the second confidence coefficient is lower than the second confidence threshold value, setting the first weight value of the region to be 1 and setting the second weight value to be 0; if the first confidence coefficient is higher than the first confidence threshold value and the second confidence coefficient is higher than the second confidence threshold value, calculating a first weight value and a second weight value according to the difference between the first depth value and the first depth median value and the difference between the second depth value and the second depth median value, for example, the calculation can be performed by referring to the formula (2); if the first confidence level is below the first confidence threshold and the second confidence level is below the second confidence threshold, discarding the first depth value and the second depth value for the region. The first confidence threshold is a confidence lower limit threshold of the depth value detected by the laser radar, and the second confidence threshold is a confidence lower limit threshold of the depth value detected by the camera system, and can be set according to performance, actual requirements and the like of the sensor, and the first confidence threshold and the second confidence threshold can be the same or different. For example, the first confidence threshold and the second confidence threshold may each be 0.2.

In one embodiment, the first weight value and the second weight value calculated according to the difference between the first depth value and the first depth median and the difference between the second depth value and the second depth median may be further fused with the first confidence and the second confidence, for example, the first weight value may be subjected to index or coefficient correction calculation according to the first confidence, the second weight value may be subjected to index or coefficient correction calculation according to the second confidence, so as to optimize the first weight value and the second weight value, and the optimized first weight value and second weight value may be normalized, so as to output the final first weight value and second weight value.

With continued reference to fig. 3, in step S350, the first depth information and the second depth information are fused by using the first weight value and the second weight value, so as to obtain target depth information of the object to be measured.

Specifically, for any region of the object to be measured, if the region has a first depth value and does not have a second depth value, the first depth value is taken as a target depth value of the region; if the region has the second depth value and does not have the first depth value, taking the second depth value as a target depth value of the region; and if the region has the first depth value and the second depth value, weighting calculation is carried out on the first depth value and the second depth value by using the first weight value and the second weight value, so as to obtain the target depth value of the region. The target depth value of each region is thus obtained and formed into a set, i.e. the target depth information described above.

By fusing the first depth information and the second depth information, the advantage of depth detection of a laser radar and a camera system can be combined, depth holes or information missing caused by material reflectivity of an object to be detected and multipath interference effect in the first depth information are filled, depth holes or information missing at a depth fault position caused by shielding in the second depth information are filled, meanwhile, depth values with lower credibility in the first depth information and the second depth information are improved, accuracy of depth detection is improved, and more accurate and reliable target depth information is obtained. The depth value detection range and the applicable scene are expanded, so that the scheme has higher practicability.

In one embodiment, filtering processing may be further performed on the target depth information, for example, a guaranteed edge Filtering algorithm such as BF (Bilateral Filter, bilateral Filtering), GF (Guided Filtering), FBS (Filter Bank Summation ) may be used, so that the depth information smoothing processing may be implemented, and edge information in the object to be measured may be retained.

Fig. 9 shows an implementation flow of the depth detection method, taking a hardware configuration of a laser radar and a binocular camera as an example, including:

Step S901, calibrating a binocular camera;

Step S902, calibrating a first camera and a laser radar in the binocular camera;

step S903, acquiring multi-frame point cloud data through a laser radar, and acquiring two images through a binocular camera;

Step S904, registering the acquired multi-frame point cloud data, and fusing the multi-frame point cloud data after registration to obtain a frame of dense point cloud data;

Step S905, according to the calibration parameters of the first camera and the laser radar, projecting the point cloud data into a coordinate system of the first camera to obtain first depth information;

step S906, binocular stereo matching is carried out on the two acquired images according to the calibration parameters of the binocular camera, and second depth information is obtained;

Step S907, processing the two images and the second depth information by using an LRC algorithm or a machine learning model, and calculating a second confidence coefficient;

Step S908, determining a first weight value and a second weight value according to the second confidence coefficient, the first depth information and the second depth information, and carrying out weighted fusion on the first depth information and the second depth information according to the first weight value and the second weight value;

step S909, further filtering the fused depth information;

step S910, outputting the target depth information after the filtering process.

In one embodiment, the target depth information may be formed into a dataset with images acquired by a camera system, wherein the images serve as training data and the target depth information serves as annotation data (Ground truth), which may be used to train a machine learning model associated with depth estimation, which may be advantageous for improving accuracy and integrity of the dataset.

The exemplary embodiments of the present disclosure also provide a depth detection apparatus. Referring to fig. 10, the depth detection apparatus 1000 may include:

A data acquisition module 1010 configured to acquire point cloud data of an object to be measured acquired by a lidar and at least two images of the object to be measured acquired by at least two cameras;

A first depth information determining module 1020 configured to determine first depth information of the object to be measured by analyzing the point cloud data, the first depth information including first depth values of different regions of the object to be measured;

a second depth information determining module 1030 configured to determine second depth information of the object to be measured by stereo matching the at least two images, the second depth information including second depth values of different regions of the object to be measured;

The weight value determining module 1040 is configured to determine a second weight value corresponding to the second depth value and a first weight value corresponding to the first depth value of the different region;

The depth information fusion module 1050 is configured to fuse the first depth information and the second depth information by using the first weight value and the second weight value, so as to obtain target depth information of the object to be measured.

In one embodiment, the data acquisition module 1010 is configured to:

Acquiring multi-frame point cloud data acquired by a laser radar in a motion process;

registering the multi-frame point cloud data, and fusing the registered multi-frame point cloud data to obtain the point cloud data of the object to be detected.

In one embodiment, the first depth information determination module 1020 is configured to:

In one embodiment, the second depth information determination module 1030 is configured to:

Based on second calibration parameters between at least two cameras, performing stereo matching on at least two images to obtain a binocular disparity map;

and determining second depth information of the object to be measured according to the binocular disparity map.

In one embodiment, the weight value determination module 1040 is configured to:

acquiring a first depth value range and a second depth value range, wherein the first depth value range is a depth value detection range of the laser radar, and the second depth value range is a depth value detection range of at least two cameras;

And determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the first depth value range, the second depth value range and the first depth value and the second depth value of different areas.

In one embodiment, the weight value determination module 1040 is configured to:

Determining a first depth median and a second depth median, the first depth median being a median of the first range of depth values, the second depth median being a median of the second range of depth values;

And determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the difference between the first depth value and the first median value and the difference between the second depth value and the second median value.

In one embodiment, the weight value determination module 1040 is configured to:

And determining a second weight value corresponding to the first weight value and the second weight value corresponding to the second depth value according to at least one of the first depth value range, the second depth value range, the first depth value and the second depth value of different areas and the first confidence corresponding to the first depth value and the second confidence corresponding to the second depth value.

Details of each part of the above apparatus are already described in the method part of the embodiments, and thus will not be described in detail.

Exemplary embodiments of the present disclosure also provide a computer readable storage medium, which may be implemented in the form of a program product comprising program code for causing an electronic device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the above section of the "exemplary method" when the program product is run on the electronic device. In one embodiment, the program product may be implemented as a portable compact disc read only memory (CD-ROM) and includes program code and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A depth detection method, comprising:

acquiring point cloud data of an object to be detected, which is acquired by a laser radar, and at least two images of the object to be detected, which are acquired by at least two cameras;

determining first depth information of the object to be detected by analyzing the point cloud data, wherein the first depth information comprises first depth values of different areas of the object to be detected;

Determining second depth information of the object to be detected by performing stereo matching on the at least two images, wherein the second depth information comprises second depth values of different areas of the object to be detected;

determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value of the different region;

fusing the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected;

the determining the second weight value corresponding to the first weight value and the second weight value corresponding to the first depth value of the different area includes:

acquiring a first depth value range and a second depth value range, wherein the first depth value range is a depth value detection range of the laser radar, and the second depth value range is a depth value detection range of the at least two cameras;

And determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the first depth value range, the second depth value range, and the first depth value and the second depth value of the different areas.

2. The method of claim 1, wherein the acquiring the point cloud data of the object under test acquired by the lidar comprises:

3. The method according to claim 1, wherein the determining the first depth information of the object to be measured by analyzing the point cloud data comprises:

4. The method according to claim 1, wherein determining the second depth information of the object to be measured by stereo matching the at least two images comprises:

Based on a second calibration parameter between the at least two cameras, performing stereo matching on the at least two images to obtain a binocular disparity map;

And determining second depth information of the object to be detected according to the binocular disparity map.

5. The method of claim 1, wherein the determining a second weight value corresponding to the first depth value and a first weight value corresponding to the second depth value according to the first depth value range, the second depth value range, and the first depth value and the second depth value of the different region comprises:

determining a first depth median and a second depth median, the first depth median being a median of the first range of depth values and the second depth median being a median of the second range of depth values;

And determining a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value according to the difference between the first depth value and the first depth median and the difference between the second depth value and the second depth median.

6. The method of claim 1, wherein the determining a second weight value corresponding to the first depth value and a first weight value corresponding to the second depth value according to the first depth value range, the second depth value range, and the first depth value and the second depth value of the different region comprises:

and determining a second weight value corresponding to the first weight value and the second weight value corresponding to the second depth value according to at least one of the first depth value range, the second depth value range, the first depth value and the second depth value of the different areas and at least one of a first confidence coefficient corresponding to the first depth value and a second confidence coefficient corresponding to the second depth value.

7. A depth detection device, comprising:

the system comprises a data acquisition module, a detection module and a display module, wherein the data acquisition module is configured to acquire point cloud data of an object to be detected, which is acquired by a laser radar, and at least two images of the object to be detected, which are acquired by at least two cameras;

A first depth information determining module configured to determine first depth information of the object to be measured by analyzing the point cloud data, the first depth information including first depth values of different regions of the object to be measured;

A second depth information determining module configured to determine second depth information of the object to be measured by stereo matching the at least two images, the second depth information including second depth values of different areas of the object to be measured;

the weight value determining module is configured to determine a first weight value corresponding to the first depth value and a second weight value corresponding to the second depth value of the different area;

the depth information fusion module is configured to fuse the first depth information and the second depth information by using the first weight value and the second weight value to obtain target depth information of the object to be detected;

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any one of claims 1 to 6.

9. An electronic device, comprising:

A processor; and

A memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any one of claims 1 to 6 via execution of the executable instructions.