WO2024056030A1 - 一种图像深度估计方法、装置、电子设备及存储介质 - Google Patents
一种图像深度估计方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2024056030A1 WO2024056030A1 PCT/CN2023/118825 CN2023118825W WO2024056030A1 WO 2024056030 A1 WO2024056030 A1 WO 2024056030A1 CN 2023118825 W CN2023118825 W CN 2023118825W WO 2024056030 A1 WO2024056030 A1 WO 2024056030A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- binocular
- image
- disparity
- pixel
- pixels
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 96
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 78
- 238000013507 mapping Methods 0.000 claims abstract description 59
- 230000003287 optical effect Effects 0.000 claims description 30
- 230000008569 process Effects 0.000 claims description 28
- 230000002123 temporal effect Effects 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 16
- 230000001419 dependent effect Effects 0.000 claims description 5
- 238000009499 grossing Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 description 20
- 230000006870 function Effects 0.000 description 17
- 238000012545 processing Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 230000000644 propagated effect Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
Definitions
- Embodiments of the present disclosure relate to the field of image processing technology, for example, to an image depth estimation method, device, electronic device, and storage medium.
- binocular ranging methods include binocular ranging methods.
- the principle of binocular ranging includes matching the feature points extracted from the binocular images (ie, the left eye image and the right eye image), using the pre-calibrated parameters of the binocular image acquisition equipment, and matching the feature points in the binocular images respectively.
- the pixel position of the image determines the depth of the feature point in the binocular image.
- this method can only estimate the depth of the image part where the feature points are prominent, but cannot determine the depth of the part where the feature points are not prominent.
- Embodiments of the present disclosure provide an image depth estimation method, device, electronic device, and storage medium, which can realize depth estimation of multiple parts of the entire image.
- embodiments of the present disclosure provide an image depth estimation method, including:
- the absolute depth value of the pixel to be determined in the binocular image is determined.
- embodiments of the present disclosure also provide an image depth estimation device, including:
- the relative depth acquisition module is configured to obtain the relative depth values of multiple pixels in the binocular image
- a target area determination module configured to obtain the bidirectional disparity of the same scene point between the binocular images, and determine the target area where the bidirectional disparity is consistent
- a region absolute depth determination module configured to determine the absolute depth value of the pixels in the target region based on the bidirectional disparity of the pixels in the target region;
- a mapping relationship building module configured to construct a mapping relationship between relative depth values and absolute depth values based on the absolute depth values of pixels in the target area and the relative depth values of pixels in the binocular image;
- the overall absolute depth determination module is configured to determine the absolute depth value of the pixel to be determined in the binocular image according to the mapping relationship and the relative depth value of the pixel to be determined in the binocular image.
- embodiments of the present disclosure also provide an electronic device, including:
- a storage device arranged to store at least one program
- the at least one processor When the at least one program is executed by the at least one processor, the at least one processor is caused to implement the image depth estimation method as described in any one of the embodiments of the present disclosure.
- embodiments of the disclosure further provide a storage medium containing computer-executable instructions that, when executed by a computer processor, perform the image depth estimation method as described in any embodiment of the disclosure. .
- Figure 1 is a schematic flowchart of an image depth estimation method provided by an embodiment of the present disclosure
- Figure 2 is a schematic flowchart of an image depth estimation method provided by an embodiment of the present disclosure
- Figure 3 is a schematic structural diagram of an image depth estimation device provided by an embodiment of the present disclosure.
- FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
- the term “include” and its variations are open-ended, ie, “including but not limited to.”
- the term “based on” means “based at least in part on.”
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
- FIG. 1 is a schematic flowchart of an image depth estimation method provided by an embodiment of the present disclosure.
- the embodiments of the present disclosure are suitable for full-image depth estimation of binocular images.
- the method may be performed by an image depth estimation device, which may be implemented in the form of at least one of software and hardware, and may be configured in an electronic device, such as a computer.
- the image depth estimation method provided by this embodiment includes:
- the relative depth values of multiple pixels in the binocular image can be predetermined and stored in a preset storage space, and the multiple pixels in the binocular image can be directly read from the preset storage space. relative depth value.
- the relative depth values of multiple pixels in the binocular image can also be determined in real time by processing the binocular image after acquiring the binocular image. The above-mentioned method of reading from the preset storage space, the method of real-time determination, or other optional methods can be used to realize multiple images in the binocular image. Obtain the relative depth value of the pixel.
- the binocular images may include left-eye images and right-eye images.
- Existing monocular image depth estimation methods can be used, such as traditional clue-based methods, traditional machine learning-based methods, supervised or unsupervised deep learning methods, etc., to estimate multiple pixels in the left-eye image and the right-eye image respectively.
- the relative distance and nearness of multiple pixels in the image can be completely estimated, and the relative depth values of all pixels in the image can be obtained.
- the relative depth value does not have scale information, that is, it cannot determine the distance between the object corresponding to the pixel and the image acquisition device.
- S120 Obtain the bidirectional disparity between binocular images of the same scene point, and determine the target area with consistent bidirectional disparity.
- the scene is the actual scene corresponding to the binocular image, and a point in the scene can be projected to the left-eye image and the right-eye image respectively.
- a point in the scene can be projected to the left-eye image and the right-eye image respectively.
- the bidirectional disparity of the same scene point between binocular images can include: the offset distance in image coordinates of the scene point in the left-eye image relative to the same scene point in the right-eye image, and the scene point in the right-eye image relative to the left-eye image. The offset distance of the same scene point in image coordinates.
- the offset distance in image coordinates of the scene point in the left-eye image relative to the same scene point in the right-eye image can be understood as: taking the left-eye image as the reference image, taking the right-eye image as the target image, the same scene point in the left-eye image is The offset distance from the projection point to the projection point in the right eye image.
- the offset distance in image coordinates of a scene point in the right eye image relative to the same scene point in the left eye image can be understood as: the right eye image is used as the reference image, the left eye image is used as the target image, and the same scene point is in the right eye image.
- the offset distance from the projection point in to the projection point in the left eye image can be understood as: taking the left-eye image as the reference image, taking the right-eye image as the target image, the same scene point in the right eye image.
- step S120 and step S110 have no strict sequence and can be executed at the same time or in random order.
- the bidirectional disparity of the same scene point between binocular images can also be obtained by reading the predetermined bidirectional disparity from the preset storage space, determining it in real time, or other optional methods.
- the matching of the same scene points can be achieved based on the traditional scale-invariant feature transform algorithm (Scale-invariant feature transform, SIFT), accelerated robust feature algorithm (Speeded Up Robust Features, SURF) and other matching algorithms, and then based on the same scene point
- SIFT Scale-invariant feature transform
- SURF accelerated robust feature algorithm
- the position in the image coordinates of the binocular image determines the bidirectional disparity of the same scene point between the binocular images.
- the optical flow information of the same scene point between binocular images can also be determined through traditional optical flow algorithms or optical flow estimation based on deep learning, and the optical flow information can be used as the same scene point between binocular images. Bidirectional disparity between images.
- the target area can be determined. For example, at least one connected area composed of multiple pixels with consistent bidirectional disparity is used as the target area; for another example, the area with the largest area among the at least one connected area can be used as the target area, etc.
- determining the target area with consistent bidirectional disparity may also include: determining the target pixel based on the bidirectional disparity; wherein the bidirectional disparity includes a first disparity and a second disparity, and the first disparity is the same scene point viewed by the left eye.
- the disparity from the image to the right eye image, the second disparity is the disparity from the right eye image to the left eye image at the same scene point; the target area is determined based on multiple target pixels.
- the pixel corresponding to the scene point where the difference between the first disparity and the second disparity is within a preset range may be used as the target pixel.
- the preset range can be set based on empirical values or experimental values, for example, it can be 1-2 pixels.
- the difference between the first parallax and the second parallax may be the larger value minus the smaller value among the absolute values of the first parallax and the second parallax. If the difference is within the preset range, then The corresponding pixel can be used as the target pixel, that is, the pixel with consistent bidirectional disparity.
- the target area can be determined based on at least one connected area composed of multiple target pixels, for example, the connected area with the largest area is used as the target area.
- the bidirectional disparity of multiple pixels in the target area is consistent, it can be considered that the disparity information in this area is more accurate. By determining the target area with consistent bidirectional disparity, it facilitates the calculation of more accurate absolute depth values.
- S130 Determine the absolute depth value of the pixels in the target area according to the bidirectional disparity of the pixels in the target area.
- t can represent the distance between binocular image acquisition devices
- f can represent the focal length of the binocular image acquisition device.
- any one of the first disparity, the second disparity, or the average of the first disparity and the second disparity can be used as d to calculate the pixel
- the absolute depth value has scale information, that is, it can determine the distance between the object corresponding to the pixel in the target area and the image acquisition device.
- the absolute depth value can be ⁇ millimeter, ⁇ centimeter, ⁇ meter, etc.
- S140 Construct a mapping relationship between relative depth values and absolute depth values based on the absolute depth values of pixels in the target area and the relative depth values of pixels in the binocular image.
- the final relative depth value of each pixel in the target area can be determined according to the relative depth value of each pixel in the target area in the left-eye image and the right-eye image respectively.
- the average or weighted value of the relative depth values of each pixel in the target area in the left-eye image and the right-eye image can be used as the final relative depth value of each pixel in the target area.
- a mapping relationship between the relative depth value and the absolute depth value can be constructed based on the final relative depth value and absolute depth value of the pixel in the target area.
- the final relative depth value of the pixel in the target area is positively correlated with the absolute depth value.
- multiple candidate mapping relationships that conform to the mapping rules from relative depth values to absolute depth values can be preset, such as preset positive correlation linear mapping relationships, logarithmic mapping relationships, etc.
- the final relative depth values and absolute depth values of multiple pixels in the target area can be substituted into these candidate mapping relationships respectively, and the candidate mapping relationship with the highest fitting degree can be used as the mapping between the relative depth value and the absolute depth value. relation.
- a mapping relationship between the relative depth value and the absolute depth value is constructed, which may also include: The absolute depth value of multiple pixels is used as the dependent variable, and the relative depth value of the corresponding pixel in the binocular image is used as the independent variable to perform function fitting; the result of the function fitting is used as the relative depth value. Mapping relationship from degree value to absolute depth value.
- the relative depth values of multiple pixels in the target area in the binocular image are used as independent variables, which may be the final relative depth values of multiple pixels in the target area as the independent variables.
- Existing fitting tools (such as MATLAB, etc.) can be used to fit the final relative depth value and absolute depth value as a function from the independent variable to the dependent variable, and the obtained fitting result can be used as a mapping relationship. Therefore, there is no need to preset candidate mapping relationships, and the obtained mapping relationship can be made most consistent with the mapping relationship between relative depth values and absolute depth values in the current binocular image, thereby improving the accuracy of depth estimation in different binocular images.
- the mapping relationship determined based on the relative depth values and absolute depth values of multiple pixels in the target area is more accurate and can be used to map all pixels in the binocular image from relative depth values to absolute depth values.
- the final relative depth value of each pixel in the binocular image can be determined, and by substituting the final relative depth value into the mapping relationship, the absolute depth value of each pixel in the binocular image can be obtained.
- the absolute depth value of the pixel in the other area is obtained, thereby improving the determination efficiency of the absolute depth value to a certain extent.
- obtaining the relative depth values of multiple pixels in the binocular image may include: processing the binocular image through a monocular depth estimation model to obtain the relative depth values of multiple pixels in the binocular image;
- the supervision information during the training process of the monocular depth estimation model can include the absolute depth value of the pixels in the target area of the sample binocular image.
- the monocular depth estimation model can be considered a deep learning model.
- the training process of the monocular depth estimation model may include: obtaining the sample binocular image; obtaining the bidirectional disparity of the same scene point between the sample binocular images, and determining the target area with consistent bidirectional disparity; determining the sample binocular based on the bidirectional disparity of the pixels in the target area in the sample binocular image.
- the absolute depth value of the pixel in the target area in the image is more accurate.
- the absolute depth value of the pixels in the target area of the sample binocular image can be used as supervision information to train the monocular depth estimation model. That is, after determining the absolute depth value of the pixel in the target area in the sample binocular image, the training process of the monocular depth estimation model may also include: processing the sample binocular image through the monocular depth estimation model to obtain the sample binocular image. The relative depth value of the pixel; determine the error of the relative depth value and absolute depth value of the pixel in the target area in the sample binocular image, and transmit it back to train the monocular depth estimation model.
- the error of the relative depth value and the absolute depth value of the pixels in the target area in the sample binocular image is determined, for example, the degree of distribution dispersion of the relative depth value and the absolute depth value of the pixels in the target area in the sample binocular image can be used as the error;
- the relative depth value of the pixels in the target area in the sample binocular image can be multiplied by a preset scale coefficient, and the difference between the value multiplied by the scale factor and the absolute depth value can be used as the error.
- other error calculation methods can also be applied here, and they are not exhaustive here.
- the estimation accuracy of the monocular depth estimation model can be improved by using the absolute depth values of pixels in the target area of the sample binocular image as supervision information to determine the error in the training process of the monocular depth estimation model.
- the monocular depth estimation model can be set to estimate the relative depth of pixels in the binocular image.
- obtaining the bidirectional disparity of the same scene point between binocular images may include: processing the binocular images through an optical flow estimation model to obtain the bidirectional disparity of the same scene point between binocular images; where , The supervision information of the optical flow estimation model during the training process includes the two-way disparity between the same scene points in other areas in the sample binocular image except the target area.
- Optical flow estimation models can also be considered deep learning models.
- the training process of the optical flow estimation model may include: obtaining a sample binocular image; obtaining the relative depth values of multiple pixels in the sample binocular image; inputting the sample binocular image into the optical flow estimation model; processing the sample binocular image through the optical flow estimation model image to get The bidirectional disparity between the sample binocular images of the same scene point; determine the target area with consistent bidirectional disparity, and determine other areas in the sample binocular image except the target area. Among them, the accuracy of disparity information in other areas is poor. By optimizing this part of the disparity information, a better optical flow estimation model can be trained.
- the training process of the optical flow estimation model may also include: determining the absolute depth value based on the relative depth values of pixels in other areas of the sample binocular image; The absolute depth value determines the standard disparity of pixels in other areas; determines the bidirectional disparity and standard disparity errors between the same scene points in other areas of the sample binocular image, and transmits it back to train the optical flow estimation model. Among them, the error between the bidirectional parallax and the standard parallax between the same scene points in other areas of the sample binocular image is determined.
- the difference between the first parallax and the standard parallax corresponding to other areas in the sample binocular image can be calculated, and the error of the sample binocular disparity can be calculated.
- the difference between the second parallax and the standard parallax corresponding to other areas in the eye image is determined based on the two differences, for example, the mean or weighted value of the two differences is used as the error.
- the error in the training process of the optical flow estimation model is determined by using the error between the two-way disparity and the standard disparity between the same scene points in other areas except the target area in the sample binocular image as supervision information. , a more accurate optical flow estimation model can be obtained. After training, the optical flow estimation model can be set up to estimate the bidirectional disparity between binocular images of the same scene point.
- the relative depth value of the pixels in the binocular image is obtained; the bidirectional disparity of the same scene point between the binocular images is obtained, and a target area with consistent bidirectional disparity is determined; according to the bidirectional disparity of the pixels in the target area , determine the absolute depth value of the pixel in the target area; according to the absolute depth value of the pixel in the target area and the relative depth value of the pixel in the binocular image, construct a mapping relationship between the relative depth value and the absolute depth value; according to the mapping relationship, and the binocular The relative depth value of the pixel to be determined in the binocular image is determined, and the absolute depth value of the pixel to be determined in the binocular image is determined.
- a target area with consistent bidirectional disparity in the binocular image can be considered to have more accurate disparity information, and the absolute depth value determined based on the disparity information of the target area is also more accurate.
- this mapping relationship can be used to obtain the complete relative depth of the entire image. Values are mapped to absolute depth values, enabling depth estimation for multiple parts of the entire image.
- the embodiments of the present disclosure may be combined with the optional solution of the image depth estimation method provided in the above embodiments.
- the image depth estimation method provided in this embodiment can be applied to depth estimation of binocular videos.
- the absolute depth values of pixels in the current binocular video frame and the next binocular video frame can be determined.
- the absolute depth value of the current binocular video frame can also be moved toward the current binocular video frame through the temporal disparity from the current binocular video frame to the next binocular video frame. and then propagated to the next binocular video frame to correct the absolute depth value of the pixel in the next binocular video frame, thereby improving the accuracy of estimating the absolute depth value of the pixel in the next binocular video frame.
- FIG. 2 is a schematic flowchart of an image depth estimation method provided by an embodiment of the present disclosure.
- the binocular image may include binocular video frames, and the method may include:
- the binocular video frame can be any video frame in the binocular video file, and the binocular video file can be a video stream collected in real time, or a video file pre-stored locally; this embodiment provides The method can be used for frame-by-frame processing of binocular video files.
- L_K can be used to represent the left-eye video frame of the current binocular video frame
- R_K can be used to represent the right-eye video frame of the current binocular video frame
- L_K+1 can be used to represent the left-eye video frame of the next binocular video frame
- R_K+1 can be used to represent the lower eye video frame.
- the relative depth values of multiple pixels in L_K, R_K, L_K+1 and R_K+1 may be obtained by reading from a preset storage space, determining in real time, or other optional methods.
- the trained monocular depth estimation model can be used to process L_K, R_K, L_K+1 and R_K+1 to obtain the relative depth values of multiple pixels in L_K, R_K, L_K+1 and R_K+1.
- step S220 the same scene point between L_K and R_K, and between L_K+1 and L_K+1 can also be obtained by reading the predetermined bidirectional disparity from the preset storage space, determining it in real time, or other optional methods.
- Bidirectional disparity between R_K+1 For example, you can use the trained optical flow estimation model to process L_K and R_K images, and process L_K+1 and R_K+1 images to obtain the disparity of the same scene point from L_K to R_K, from R_K to L_K, and from L_K+ The disparity from 1 to R_K+1 and the disparity from R_K+1 to L_K+1.
- the disparity from L_K to R_K can be called the first disparity of the current binocular video frame
- the disparity from R_K to L_K can be called the second disparity of the current binocular video frame
- the disparity from L_K+1 to R_K+1 can be called is the first disparity of the next binocular video frame
- the disparity from R_K+1 to L_K+1 is called the second disparity of the next binocular video frame.
- the pixels whose difference between the first disparity and the second disparity in the current binocular video frame are within a preset range can be used as target pixels, and the target area of the current binocular video frame can be determined based on the target pixels; similarly, the next pixel can be Pixels in the binocular video frame in which the difference between the first disparity and the second disparity is within a preset range are used as target pixels, and the target area of the next binocular video frame is determined based on the target pixels.
- the absolute depth value of the pixels in the target area in the current binocular video frame can be determined based on the two-way disparity of the pixels in the target area in the current binocular video frame based on the three-dimensional imaging principle; and based on the next binocular video frame
- the bidirectional disparity of the pixels in the target area is determined to determine the absolute depth value of the pixels in the target area in the next binocular video frame.
- the current binocular video can be constructed based on the absolute depth values of multiple pixels in the target area in the current binocular video frame and the relative depth values of multiple pixels in the target area in the current binocular video frame.
- the mapping relationship between relative depth values and absolute depth values in the frame can be constructed. Mapping relationship between depth values and absolute depth values.
- S250 Determine the current binocular video frame and the next binocular video frame according to the mapping relationship between the current binocular video frame and the next binocular video frame, and the relative depth values of the pixels to be determined in the current binocular video frame and the next binocular video frame.
- the relative depth value of the pixel to be determined in the current binocular video frame can be substituted into the mapping relationship of the current binocular video frame to determine the absolute depth value of the pixel to be determined in the current binocular video frame.
- the relative depth value of the pixel to be determined in the next binocular video frame can be substituted into the mapping relationship of the next binocular video frame to determine the absolute depth value of the pixel to be determined in the next binocular video frame.
- step S260 the same scene point from L_K to L_K+1, and from R_K to R_K can also be obtained by reading the predetermined timing parallax from the preset storage space, determining it in real time, or other optional methods. +1 for timing parallax.
- the absolute depth value of the pixel in the current binocular video frame can be used to calculate the absolute depth value of the corresponding pixel in the next binocular video frame.
- the depth value is corrected, thereby improving the accuracy of the absolute depth value of the pixel in the next binocular video frame.
- correcting the absolute depth value of the pixel in the next binocular video frame based on the absolute depth value of the pixel in the current binocular video frame and the temporal disparity may include: based on the current binocular video The absolute depth value of the pixel in the frame, as well as the temporal disparity, determine the next binocular video frame The temporal estimation value of the absolute depth of the pixel in the middle; according to the temporal estimation value, the absolute depth value of the pixel in the next binocular video frame is weighted and smoothed.
- the pixels in the current binocular video frame can be moved according to the temporal disparity of the same scene point from L_K to L_K+1 and from R_K to R_K+1 to determine the next binocular Corresponding pixels in the video frame.
- the absolute depth value of the pixel in the current binocular video frame can be propagated backward to the corresponding pixel in the next binocular video frame to estimate the absolute depth value of the corresponding pixel in the next binocular video frame, that is, Get a temporal estimate of the absolute depth of the pixel in the next binocular video frame.
- the timing estimate value of the absolute depth of the pixel in the next binocular video frame and the absolute depth value of the pixel to be determined in the next binocular video frame determined in step S250 can be weighted and summed to determine the value in step S250.
- the absolute depth value of the pixel to be determined in the next binocular video frame is subjected to temporal smoothing correction processing to obtain a more accurate absolute depth value of the pixel in the next binocular video frame.
- the timing estimation value of the absolute depth of the corresponding pixel in the next binocular video frame can also be determined based on the absolute depth value of the pixel in the current binocular video frame and the timing disparity.
- the distribution of the absolute depth values of the pixels in the next binocular video frame is corrected so that the distribution of the absolute depth values of the pixels in the next binocular video frame is closer to the distribution of the timing estimate values. It can be understood that other methods of correcting the absolute depth values of pixels in the next binocular video frame can also be applied here, and are not exhaustive here.
- binocular video frames can originate from binocular videos in virtual reality (Virtual Reality, VR) scenes.
- the image depth estimation method provided by the embodiments of the present disclosure can be used to determine the absolute depth value of the frame pixels in the VR binocular video, which can be beneficial to the implementation of subsequent services such as adding special effects to the VR binocular video, and help to increase the number of video viewing users. viewing experience.
- the technical solutions of the embodiments of the present disclosure can be applied to depth estimation of binocular videos.
- the absolute depth values of pixels in the current binocular video frame and the next binocular video frame can be determined.
- the absolute depth value of the current binocular video frame can also be moved toward the current binocular video frame through the temporal disparity from the current binocular video frame to the next binocular video frame.
- the image depth estimation method provided by the embodiments of the present disclosure belongs to the same concept as the image depth estimation method provided by the above-mentioned embodiments.
- Technical details that are not described in detail in this embodiment can be referred to the above-mentioned embodiments, and the same technical features are used in this embodiment. It has the same effect as in the above embodiment.
- FIG. 3 is a schematic structural diagram of an image depth estimation device provided by an embodiment of the present disclosure.
- the image depth estimation device provided in this embodiment is suitable for full image depth estimation of binocular images.
- the image depth estimation device may include:
- the relative depth acquisition module 310 is configured to acquire relative depth values of multiple pixels in the binocular image
- the target area determination module 320 is configured to obtain the bidirectional disparity of the same scene point between binocular images, and determine the target area with consistent bidirectional disparity;
- the area absolute depth determination module 330 is configured to determine the absolute depth value of the pixels in the target area based on the two-way disparity of the pixels in the target area;
- the mapping relationship building module 340 is configured to construct a mapping relationship between the relative depth value and the absolute depth value based on the absolute depth value of the pixel in the target area and the relative depth value of the pixel in the binocular image;
- the overall absolute depth determination module 350 is configured to determine the absolute depth value of the pixel to be determined in the binocular image according to the mapping relationship and the relative depth value of the pixel to be determined in the binocular image.
- the relative depth acquisition module can be set to:
- the binocular image is processed through the monocular depth estimation model to obtain the relative depth values of multiple pixels in the binocular image;
- the supervision information of the monocular depth estimation model during the training process includes the absolute depth value of the pixels in the target area of the sample binocular image.
- the target area determination module can be set to:
- Binocular images are processed through the optical flow estimation model to obtain the bidirectional disparity between binocular images of the same scene point;
- the supervision information of the optical flow estimation model during the training process includes the two-way disparity between the same scene points in other areas except the target area in the sample binocular image.
- the target area determination module can be set to:
- the target pixel is determined based on the bidirectional disparity; where the bidirectional disparity includes a first disparity and a second disparity.
- the first disparity is the disparity of the same scene point from the left eye image to the right eye image
- the second disparity is the same scene point from the right eye image to the left eye image. Parallax;
- mapping relationship building module can be set to:
- the absolute depth values of multiple pixels in the target area are used as dependent variables, and the relative depth values of corresponding pixels in the binocular image are used as independent variables to perform function fitting;
- the result of function fitting is used as a mapping relationship from relative depth values to absolute depth values.
- the binocular images may include binocular video frames
- the image depth estimation device may also include:
- the temporal disparity determination module is configured to determine the temporal disparity of the same scene point from the current binocular video frame to the next binocular video frame;
- the overall absolute depth determination module can also be configured to determine the absolute depth value of the pixels in the current binocular video frame and the next binocular video frame based on the absolute depth value of the pixels in the current binocular video frame and the timing parallax. The absolute depth value of the pixel in the next binocular video frame is corrected.
- the overall absolute depth determination module can be set to:
- the absolute depth value of the pixel in the next binocular video frame is weighted and smoothed.
- the image depth estimation device provided by the embodiments of the disclosure can execute the image depth estimation method provided by any embodiment of the disclosure, and has functional modules and effects corresponding to the execution method.
- FIG. 4 shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 4 ) 400 suitable for implementing embodiments of the present disclosure.
- Electronic devices in embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA), tablet computers (PAD), portable multimedia players (Portable Media Player , PMP), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital TVs, desktop computers, etc.
- PDA Personal Digital Assistant
- PMP portable multimedia players
- mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals)
- fixed terminals such as digital TVs, desktop computers, etc.
- the electronic device shown in FIG. 4 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
- the electronic device 400 may include a processor (such as a central processing unit, a graphics processor, etc.) 401.
- the processor 401 may process data according to a program stored in a read-only memory (Read-Only Memory, ROM) 402 or from a program.
- the storage device 408 loads the program in the random access memory (Random Access Memory, RAM) 403 to perform various appropriate actions and processes.
- RAM Random Access Memory
- various programs and data required for the operation of the electronic device 400 are also stored.
- the processing device 401, ROM 402 and RAM 403 are connected to each other via a bus 404.
- An input/output (I/O) interface 405 is also connected to bus 404.
- the following devices can be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) , an output device 407 such as a speaker, a vibrator, etc.; a storage device 408 including a magnetic tape, a hard disk, etc.; and a communication device 409.
- the communication device 409 may allow the electronic device 400 to communicate wirelessly or wiredly with other devices to exchange data.
- FIG. 4 illustrates electronic device 400 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
- the computer program may communicate via The device 409 is downloaded and installed from the network, or from the storage device 408, or from the ROM 402.
- the computer program is executed by the processor 401, the above-mentioned functions defined in the image depth estimation method of the embodiment of the present disclosure are performed.
- the electronic device provided by the embodiments of the present disclosure belongs to the same concept as the image depth estimation method provided by the above embodiments.
- Technical details that are not described in detail in this embodiment can be referred to the above embodiments, and this embodiment has the same features as the above embodiments. Effect.
- Embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored.
- the program is executed by a processor, the image depth estimation method provided in the above embodiments is implemented.
- the computer-readable storage medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
- the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof.
- Examples of computer readable storage media may include, but are not limited to: an electrical connection having at least one conductor, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), erasable programmable read only memory ( Erasable Programmable Read-Only Memory (EPROM) or flash memory (FLASH), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable above combination.
- a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
- a computer-readable signal medium may also be any computer-readable storage medium other than computer-readable storage media that can be sent, propagated, or transmitted for use by or in connection with an instruction execution system, apparatus, or device program.
- Program code contained on a computer-readable storage medium can be transmitted using any appropriate medium, including but not limited to: wires, optical cables, radio frequency (Radio Frequency, RF), etc., or any suitable method mentioned above. The combination.
- the client and server can communicate using any currently known or future developed network protocol, such as HyperText Transfer Protocol (HTTP), and can communicate with digital data in any form or medium.
- HTTP HyperText Transfer Protocol
- Data communications e.g., communications network
- Examples of communication networks include Local Area Networks (LANs), Wide Area Networks (WANs), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any current network for knowledge or future research and development.
- LANs Local Area Networks
- WANs Wide Area Networks
- the Internet e.g., the Internet
- end-to-end networks e.g., ad hoc end-to-end networks
- the above-mentioned computer-readable storage medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
- the above-mentioned computer-readable medium carries at least one program.
- the electronic device executes the above-mentioned at least one program.
- each pixel in the binocular image obtains the bidirectional disparity of the same scene point between the binocular images, and determine the target area with consistent bidirectional disparity; determine each pixel in the target area based on the bidirectional disparity of each pixel in the target area
- the absolute depth value of each pixel in the target area and the relative depth value in the binocular image are used to construct a mapping relationship between the relative depth value and the absolute depth value; according to the mapping relationship and the relative depth value of each pixel in the binocular image Relative depth value determines the absolute depth value of each pixel in the binocular image.
- Computer program code for performing the operations of the present disclosure may be written in at least one programming language, including but not limited to object-oriented programming languages such as Java, Smalltalk, C++, and conventional programming languages, or a combination thereof.
- a procedural programming language such as "C” or a similar programming language.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through Internet connection).
- LAN local area network
- WAN wide area network
- Internet service provider such as an Internet service provider through Internet connection
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains at least one operable function for implementing the specified logical function.
- Execute instructions may also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
- the units involved in the embodiments of the present disclosure can be implemented in software or hardware. Among them, the names of units and modules do not constitute limitations on the units and modules themselves.
- exemplary types of hardware logic components include: field programmable gate array (Field Programmable Gate Array, FPGA), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), application specific standard product (Application Specific Standard Parts (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), etc.
- a machine-readable storage medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- the machine-readable storage medium may be a machine-readable signal medium or a machine-readable storage medium.
- Machine-readable storage media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
- machine-readable storage media examples include at least one wire-based electrical connection, a portable computer disk, a hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), flash memory, optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- RAM random access memory
- ROM read only memory
- EPROM erasable programmable read only memory
- flash memory optical fiber
- portable compact disk read-only memory (CD-ROM) optical storage device
- magnetic storage device or any suitable combination of the above.
- an image depth estimation method which method includes:
- the absolute depth value of the pixel to be determined in the binocular image is determined.
- an image depth estimation method is provided, further comprising:
- obtaining the relative depth values of multiple pixels in the binocular image includes:
- the supervision information of the monocular depth estimation model during the training process includes the absolute depth value of the pixels in the target area of the sample binocular image.
- an image depth estimation method is provided, further comprising:
- obtaining the bidirectional disparity of the same scene point between the binocular images includes:
- the supervision information of the optical flow estimation model during the training process includes the sample binocular images.
- an image depth estimation method is provided, further comprising:
- determining the target area where the bidirectional disparity is consistent includes:
- the bidirectional disparity includes a first disparity and a second disparity
- the first disparity is the disparity from the left eye image to the right eye image of the same scene point
- the second disparity is the same scene point
- a target area is determined based on a plurality of the target pixels.
- an image depth estimation method is provided, further comprising:
- mapping relationship between relative depth values and absolute depth values based on the absolute depth values of pixels in the target area and the relative depth values of pixels in the binocular image includes:
- the result of fitting the function is used as a mapping relationship from relative depth values to absolute depth values.
- an image depth estimation method is provided, further comprising:
- the binocular images include binocular video frames
- the method further includes:
- the absolute depth value of each pixel in the next binocular video frame is corrected.
- an image depth estimation method is provided, further comprising:
- the absolute depth value of the pixel in the next binocular video frame is corrected based on the absolute depth value of the pixel in the current binocular video frame and the temporal disparity, include:
- a weighted smoothing process is performed on the absolute depth value of the pixel in the next binocular video frame.
- an image depth estimation device which device includes:
- the relative depth acquisition module is configured to obtain the relative depth values of multiple pixels in the binocular image
- a target area determination module configured to obtain the bidirectional disparity of the same scene point between the binocular images, and determine the target area where the bidirectional disparity is consistent
- a region absolute depth determination module configured to determine the absolute depth value of the pixels in the target region based on the bidirectional disparity of each pixel in the target region
- a mapping relationship building module configured to construct a mapping relationship between relative depth values and absolute depth values based on the absolute depth values of pixels in the target area and the relative depth values of pixels in the binocular image;
- the overall absolute depth determination module is configured to determine the absolute depth value of the pixel to be determined in the binocular image according to the mapping relationship and the relative depth value of the pixel to be determined in the binocular image.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
本公开实施例公开了一种图像深度估计方法、装置、电子设备及存储介质,该方法包括:获取双目图像中多个像素的相对深度值;获取同一场景点在所述双目图像间的双向视差,并确定所述双向视差一致的目标区域;根据所述目标区域中像素的双向视差,确定所述目标区域中像素的绝对深度值;根据所述目标区域中像素的绝对深度值,以及所述双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系;根据所述映射关系,以及所述双目图像中待确定像素的相对深度值,确定所述双目图像中待确定像素的绝对深度值。
Description
本申请要求在2022年9月14日提交中国专利局、申请号为202211114979.6的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
本公开实施例涉及图像处理技术领域,例如涉及一种图像深度估计方法、装置、电子设备及存储介质。
相关技术的图像深度估计方法包括双目测距方法。双目测距的原理包括,将双目图像(即左目图像和右目图像)中提取的特征点进行匹配,利用双目图像采集设备预先标定的参数,以及匹配的特征点分别在双目图像中的像素位置,确定双目图像中特征点的深度。然而,该方法仅可对特征点突出的图像部分进行深度估计,而无法确定特征点不突出的部分的深度。
发明内容
本公开实施例提供了一种图像深度估计方法、装置、电子设备及存储介质,能够实现全图像多个部分的深度估计。
第一方面,本公开实施例提供了一种图像深度估计方法,包括:
获取双目图像中多个像素的相对深度值;
获取同一场景点在所述双目图像间的双向视差,并确定所述双向视差一致的目标区域;
根据所述目标区域中像素的双向视差,确定所述目标区域中像素的绝对深度值;
根据所述目标区域中像素的绝对深度值,以及所述双目图像中像素的相对
深度值,构建相对深度值与绝对深度值的映射关系;
根据所述映射关系,以及所述双目图像中待确定像素的相对深度值,确定所述双目图像中待确定像素的绝对深度值。
第二方面,本公开实施例还提供了一种图像深度估计装置,包括:
相对深度获取模块,设置为获取双目图像中多个像素的相对深度值;
目标区域确定模块,设置为获取同一场景点在所述双目图像间的双向视差,并确定所述双向视差一致的目标区域;
区域绝对深度确定模块,设置为根据所述目标区域中像素的双向视差,确定所述目标区域中像素的绝对深度值;
映射关系构建模块,设置为根据所述目标区域中像素的绝对深度值,以及所述双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系;
整体绝对深度确定模块,设置为根据所述映射关系,以及所述双目图像中待确定像素的相对深度值,确定所述双目图像中待确定像素的绝对深度值。
第三方面,本公开实施例还提供了一种电子设备,包括:
至少一个处理器;
存储装置,设置为存储至少一个程序,
当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如本公开实施例任一所述的图像深度估计方法。
第四方面,本公开实施例还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时执行如本公开实施例任一所述的图像深度估计方法。
图1为本公开实施例所提供的一种图像深度估计方法的流程示意图;
图2为本公开实施例所提供的一种图像深度估计方法的流程示意图;
图3为本公开实施例所提供的一种图像深度估计装置的结构示意图;
图4为本公开实施例所提供的一种电子设备的结构示意图。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“至少一个”。
可以理解的是,本技术方案所涉及的数据(包括但不限于数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。
图1为本公开实施例所提供的一种图像深度估计方法的流程示意图。本公开实施例适用于双目图像的全图像深度估计的情形。该方法可以由图像深度估计装置来执行,该装置可以通过软件和硬件中至少之一的形式实现,该装置可配置于电子设备中,例如配置于计算机中。
如图1所示,本实施例提供的图像深度估计方法,包括:
S110、获取双目图像中多个像素的相对深度值。
本公开实施例中,双目图像中多个像素的相对深度值,可以是预先确定并存储到预设存储空间中的,可以从该预设存储空间中直接读取双目图像中多个像素的相对深度值。此外,双目图像中多个像素的相对深度值,也可以在获取双目图像后,对双目图像进行处理,以实时确定的。可以采用上述从预设存储空间中读取的方式、实时确定的方式或其他可选的方式,实现双目图像中多个
像素的相对深度值的获取。
其中,双目图像可以包括左目图像和右目图像。可以通过已有的单目图像深度估计的方法,例如基于线索的传统方法、基于机器学习的传统方法、基于有监督或无监督的深度学习方法等,分别对左目图像中多个像素和右目图像中多个像素进行深度估计,以确定左目图像和右目图像中多个像素的相对深度值,即确定双目图像中多个像素的相对深度值。
通过已有的单目图像深度估计的方法,能够完整地估计出图像中多个像素的相对远、近,即可得到图像中全部像素的相对深度值。但是,该相对深度值并不具备尺度信息,即不能确定像素对应的物体距图像采集装置之间的距离。
S120、获取同一场景点在双目图像间的双向视差,并确定双向视差一致的目标区域。
本公开实施例中,场景即为双目图像对应的实际场景,场景中的一点可分别投影至左目图像和右目图像,换言之,左右目图像中各自存在一像素,两像素表征了同一场景点。同一场景点在双目图像间的双向视差,可以包括:左目图像中的场景点相对于右目图像中同一场景点在图像坐标中的偏移距离,以及右目图像中的场景点相对于左目图像中同一场景点在图像坐标中的偏移距离。
其中,左目图像中的场景点相对于右目图像中同一场景点在图像坐标中的偏移距离,可以理解为:以左目图像为参考图像,以右目图像为目标图像,同一场景点在左目图像中的投影点到右目图像中的投影点的偏移距离。同理,右目图像中的场景点相对于左目图像中同一场景点在图像坐标中的偏移距离,可以理解为:以右目图像为参考图像,以左目图像为目标图像,同一场景点在右目图像中的投影点到左目图像中的投影点的偏移距离。
其中,S120步骤与S110步骤并无严格的先后时序,可同时执行,也可随机先后执行。在S120步骤中,同样可以采用从预设存储空间中读取预先确定的双向视差的方式、实时确定的方式或其他可选方式,获取同一场景点在双目图像间的双向视差。
其中,可以基于传统的尺度不变特征变换算法(Scale-invariant feature transform,SIFT)、加速稳健特征算法(Speeded Up Robust Features,SURF)等匹配算法实现同一场景点的匹配,进而可根据同一场景点在双目图像的图像坐标中的位置,确定同一场景点在双目图像间的双向视差。或者,也可以通过传统的光流算法,或基于深度学习的光流估计等方法,确定同一场景点在双目图像间的光流信息,且可将该光流信息作为同一场景点在双目图像间的双向视差。
在确定同一场景点在双目图像间的双向视差之后,针对每个像素,可以先利用左目图像相对于右目图像的视差,将像素进行位置移动;再利用右目图像相对于左目图像的视差,将像素再次进行位置移动;若经两次移动后,像素回到原来的初始位置、或距初始位置非常近的位置,则可认为该像素的双向视差一致。通过由双向视差一致的多个像素,可确定目标区域。例如,将双向视差一致的多个像素构成的至少一个连通区域,作为目标区域;又如,可以将上述至少一个连通区域中面积最大的区域,作为目标区域等。
在一些可选的实现方式中,确定双向视差一致的目标区域,也可以包括:根据双向视差确定目标像素;其中,双向视差包括第一视差和第二视差,第一视差为同一场景点由左目图像到右目图像的视差,第二视差为同一场景点由右目图像到左目图像的视差;根据多个目标像素确定目标区域。
在这些可选的实现方式中,可以将第一视差与第二视差的差值在预设范围内的场景点对应的像素,作为目标像素。其中,预设范围可根据经验值或实验值进行设置,例如可以为1-2个像素。第一视差和第二视差的差值,可以是第一视差的绝对值和第二视差的绝对值中较大值减去较小值的差值,若该差值在预设范围内,则可以将对应的像素作为目标像素,即双向视差一致的像素。进而,可根据多个目标像素构成的至少一个连通区域确定目标区域,例如将面积最大的连通区域作为目标区域。
目标区域内多个像素的双向视差一致,可以认为该区域的视差信息更加准确。通过确定双向视差一致的目标区域,有利于更准确的绝对深度值的计算。
S130、根据目标区域中像素的双向视差,确定目标区域中像素的绝对深度值。
根据三维成像原理,视差d与像素的绝对深度值Z可存在如下关系:d=ft/Z;其中,t可表示双目图像采集设备间的距离,f可表示双目图像采集设备的焦距。本公开实施例中,由于目标区域中像素的双向视差具备一致性,故可取第一视差、第二视差或第一视差与第二视差的均值等中的任意一个作为d,以用于计算像素的绝对深度值Z。该绝对深度值具备尺度信息,即能够确定目标区域中像素对应的物体距图像采集装置的距离,例如绝对深度值可以是×毫米、×厘米、×米等。
S140、根据目标区域中像素的绝对深度值,以及双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系。
其中,可根据目标区域中每个像素分别在左目图像和右目图像中的相对深度值,确定目标区域中每个像素的最终相对深度值。例如,可以将目标区域中每个像素在左目图像和右目图像中的相对深度值的均值或加权值等,作为目标区域中每个像素的最终相对深度值。
进而,可以根据目标区域中像素的最终相对深度值以及绝对深度值,构建相对深度值与绝对深度值的映射关系。其中,目标区域中像素的最终相对深度值与绝对深度值呈正相关的映射关系。可以根据经验或实验,预设多个符合相对深度值到绝对深度值映射规律的候选映射关系,例如预设正相关的线性映射关系、对数映射关系等等。之后,可将目标区域中多个像素的最终相对深度值以及绝对深度值,分别代入该些候选映射关系中,并可将拟合度最高的候选映射关系作为相对深度值与绝对深度值的映射关系。
在一些可选的实现方式中,根据目标区域中像素的绝对深度值,以及双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系,也可以包括:将目标区域中多个像素的绝对深度值作为因变量,双目图像中相应像素的相对深度值作为自变量,进行函数拟合;将函数拟合的结果,作为由相对深
度值到绝对深度值的映射关系。
在这些可选的实现方式中,将目标区域中多个像素在双目图像中的相对深度值作为自变量,可以是将目标区域中多个像素的最终相对深度值作为自变量。可以通过已有的拟合工具(例如MATLAB等),对最终相对深度值和绝对深度值进行自变量到因变量的函数拟合,得到的拟合结果可作为映射关系。从而可无需预设候选映射关系,也可使得到的映射关系最为符合当前的双目图像中相对深度值与绝对深度值的映射关系,提高不同双目图像的深度估计的准确度。
经大量实验证明,通常相对深度值到绝对深度值符合正相关的线性映射关系。为提高映射关系的确定效率,也可以在确定目标区域中多个像素的最终相对深度值以及绝对深度值之后,根据多个最终相对深度值和对应的绝对深度值,确定一比例系数,以使最终相对深度值乘以该比例系数后与绝对深度值等大。
S150、根据映射关系,以及双目图像中待确定像素的相对深度值,确定双目图像中待确定像素的绝对深度值。
根据目标区域中多个像素的相对深度值和绝对深度值确定的映射关系的准确程度更高,且可以用于双目图像中全部像素由相对深度值到绝对深度值的映射。其中,可确定双目图像中每个像素的最终相对深度值,将最终相对深度值代入映射关系中,可得到双目图像中每个像素的绝对深度值。在该步骤中,由于目标区域中像素的绝对深度值已确定,也可只确定双目图像中除目标区域外的其他区域的像素的最终相对深度值,将最终相对深度值代入映射关系中,得到该其他区域的像素的绝对深度值,从而可在一定程度上提高绝对深度值的确定效率。
在一些可选的实现方式中,获取双目图像中多个像素的相对深度值,可以包括:通过单目深度估计模型处理双目图像,以得到双目图像中多个像素的相对深度值;其中,单目深度估计模型在训练过程中的监督信息,可以包括样本双目图像的目标区域中像素的绝对深度值。
单目深度估计模型可认为是深度学习模型。单目深度估计模型的训练过程,
可以包括:获取样本双目图像;获取同一场景点在样本双目图像间的双向视差,并确定双向视差一致的目标区域;根据样本双目图像中目标区域的像素的双向视差,确定样本双目图像中目标区域中像素的绝对深度值。其中,基于样本双目图像的目标区域的视差信息,确定的绝对深度值的准确度更高。
由于绝对深度值同样可体现像素所表征物体之间的远近关系,因此可以将样本双目图像的目标区域中像素的绝对深度值作为监督信息,对单目深度估计模型进行训练。即在确定样本双目图像中目标区域中像素的绝对深度值之后,单目深度估计模型的训练过程,还可以包括:通过单目深度估计模型处理样本双目图像,以得到样本双目图像中像素的相对深度值;确定样本双目图像中目标区域的像素的相对深度值和绝对深度值的误差,并回传,以训练单目深度估计模型。其中,确定样本双目图像中目标区域的像素的相对深度值和绝对深度值的误差,例如可以将样本双目图像中目标区域的像素的相对深度值和绝对深度值的分布离散程度作为误差;又如可以将样本双目图像中目标区域的像素的相对深度值乘以预设比例系数,将乘以比例系数后的数值与绝对深度值的差值,作为误差。此外,其他误差计算方式也可应用于此,在此不做穷举。
在这些可选的实现方式中,通过以样本双目图像的目标区域中像素的绝对深度值作为监督信息,确定单目深度估计模型训练过程中的误差,能够提高单目深度估计模型的估计精度。在训练完毕后,单目深度估计模型可设置为对双目图像中像素的相对深度进行估计。
在一些可选的实现方式中,获取同一场景点在双目图像间的双向视差,可以包括:通过光流估计模型处理双目图像,以得到同一场景点在双目图像间的双向视差;其中,光流估计模型在训练过程中的监督信息,包括在样本双目图像中除目标区域外的其他区域中同一场景点间的双向视差。
光流估计模型也可认为是深度学习模型。光流估计模型的训练过程,可以包括:获取样本双目图像;获取样本双目图像中多个像素的相对深度值;将样本双目图像输入光流估计模型;通过光流估计模型处理样本双目图像,以得到
同一场景点在样本双目图像间的双向视差;确定双向视差一致的目标区域,并确定样本双目图像中除目标区域外的其他区域。其中,其他区域的视差信息的准确性较差,通过对该部分视差信息进行优化,可训练得到效果更佳的光流估计模型。
在确定出样本双目图像中其他区域后,光流估计模型的训练过程,还可以包括:根据样本双目图像的其他区域中像素的相对深度值,确定绝对深度值;根据其他区域中像素的绝对深度值,确定其他区域中像素的标准视差;确定在样本双目图像其他区域中同一场景点间的双向视差和标准视差的误差,并回传,以训练光流估计模型。其中,确定在样本双目图像其他区域中同一场景点间的双向视差和标准视差的误差,例如可以求样本双目图像中其他区域对应的第一视差和标准视差的差值,以及求样本双目图像中其他区域对应的第二视差和标准视差的差值,根据该两个差值确定误差,例如将两个差值的均值或加权值作为误差。
在这些可选的实现方式中,通过以样本双目图像中除目标区域外的其他区域中同一场景点间的双向视差与标准视差的误差作为监督信息,确定光流估计模型训练过程中的误差,能够得到精度更高的光流估计模型。在训练完毕后,光流估计模型可以设置为估计同一场景点在双目图像间的双向视差。
本公开实施例的技术方案中,获取双目图像中像素的相对深度值;获取同一场景点在双目图像间的双向视差,并确定双向视差一致的目标区域;根据目标区域中像素的双向视差,确定目标区域中像素的绝对深度值;根据目标区域中像素的绝对深度值,以及双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系;根据映射关系,以及双目图像中待确定像素的相对深度值,确定双目图像中待确定像素的绝对深度值。
双目图像中双向视差一致的目标区域,可认为其视差信息更加准确,基于该目标区域的视差信息确定的绝对深度值也更加准确。通过建立目标区域的绝对深度与相对深度的映射关系,可以利用该映射关系将全图像完整的相对深度
值映射至绝对深度值,从而能够实现全图像多个部分的深度估计。
本公开实施例与上述实施例中所提供的图像深度估计方法的可选方案可以结合。本实施例所提供的图像深度估计方法,可以应用于双目视频的深度估计。通过本公开实施例提供图像深度估计方法,可以确定当前双目视频帧和下一双目视频帧中像素的绝对深度值。此外,由于当前双目视频帧和下一双目视频帧具有时序连续性,还可以通过当前双目视频帧到下一双目视频帧的时序视差,将当前双目视频帧的绝对深度值向后传播到下一双目视频帧,以对下一双目视频帧中像素的绝对深度值进行修正,从而可提高估计下一双目视频帧中像素的绝对深度值的准确性。
图2为本公开实施例所提供的一种图像深度估计方法的流程示意图。如图2所示,本实施例提供的图像深度估计方法中,双目图像可以包括双目视频帧,该方法可以包括:
S210、获取当前双目视频帧以及下一双目视频帧中多个像素的相对深度值。
本实施例中,双目视频帧可以为双目视频文件中的任一视频帧,双目视频文件可以为实时采集的视频流,也可以为预先存储至本地的视频文件;本实施例提供的方法可以用于双目视频文件的逐帧处理。
其中,可用L_K表示当前双目视频帧的左目视频帧,用R_K表示当前双目视频帧的右目视频帧,用L_K+1表示下一双目视频帧的左目视频帧,用R_K+1表示下一双目视频帧的右目视频帧。
其中,可以采用从预设存储空间中读取的方式、实时确定的方式或其他可选的方式获取L_K、R_K、L_K+1和R_K+1中多个像素的相对深度值。例如,可以采用训练完成的单目深度估计模型,对L_K、R_K、L_K+1和R_K+1进行处理,以得到L_K、R_K、L_K+1和R_K+1中多个像素的相对深度值。
S220、获取同一场景点在当前双目视频帧间的双向视差,以及同一场景点在下一双目视频帧间的双向视差,并确定当前双目视频帧以及下一双目视频帧中双向视差一致的目标区域。
其中,S220步骤与S210步骤并无严格的先后时序。在S220步骤中,也可采用从预设存储空间中读取预先确定的双向视差的方式、实时确定的方式或其他可选方式,获取同一场景点在L_K和R_K间,以及在L_K+1和R_K+1间的双向视差。例如,可以采用训练完成的光流估计模型处理L_K和R_K图像,以及处理L_K+1和R_K+1图像,以得到同一场景点从L_K到R_K的视差、从R_K到L_K的视差、从L_K+1到R_K+1的视差和从R_K+1到L_K+1的视差。
其中,可以将L_K到R_K的视差称为当前双目视频帧的第一视差,将R_K到L_K的视差称为当前双目视频帧的第二视差,将L_K+1到R_K+1的视差称为下一双目视频帧的第一视差,将R_K+1到L_K+1的视差称为下一双目视频帧的第二视差。并且,可将当前双目视频帧中第一视差与第二视差的差值在预设范围内的像素作为目标像素,根据目标像素确定当前双目视频帧的目标区域;同样,可将下一双目视频帧中第一视差与第二视差的差值在预设范围内的像素作为目标像素,根据目标像素确定下一双目视频帧的目标区域。
S230、根据当前双目视频帧中目标区域以及下一双目视频帧中目标区域中像素的双向视差,确定当前双目视频帧中目标区域以及下一双目视频帧中目标区域中像素的绝对深度值。
本实施例中,可基于三维成像原理,根据当前双目视频帧中目标区域中像素的双向视差,确定当前双目视频帧中目标区域中像素的绝对深度值;以及根据下一双目视频帧中目标区域中像素的双向视差,确定下一双目视频帧中目标区域中像素的绝对深度值。
S240、根据当前双目视频帧中目标区域以及下一双目视频帧中目标区域中多个像素的绝对深度值,以及在当前双目视频帧以及下一双目视频帧中的多个像素相对深度值,构建当前双目视频帧以及下一双目视频帧中相对深度值与绝对深度值的映射关系。
本实施例中,可根据当前双目视频帧中目标区域中多个像素的绝对深度值,以及当前双目视频帧中目标区域中多个像素的相对深度值,构建当前双目视频
帧中相对深度值与绝对深度值的映射关系。以及,可根据下一双目视频帧中目标区域中多个像素的绝对深度值,以及下一双目视频帧中目标区域中多个像素的相对深度值,构建下一双目视频帧中相对深度值与绝对深度值的映射关系。
S250、根据当前双目视频帧以及下一双目视频帧的映射关系,以及当前双目视频帧以及下一双目视频帧中待确定像素的相对深度值,确定当前双目视频帧以及下一双目视频帧中待确定像素的绝对深度值。
本实施例中,可将当前双目视频帧中待确定像素的相对深度值,代入当前双目视频帧的映射关系中,确定当前双目视频帧中待确定像素的绝对深度值。以及,可将下一双目视频帧中待确定像素的相对深度值,代入下一双目视频帧的映射关系中,确定下一双目视频帧中待确定像素的绝对深度值。
S260、获取同一场景点从当前双目视频帧到下一双目视频帧的时序视差。
其中,S260步骤与上述S210-S250并无严格的先后时序。在S260步骤中,也可采用从预设存储空间中读取预先确定的时序视差的方式、实时确定的方式或其他可选方式,获取同一场景点从L_K到L_K+1,以及从R_K到R_K+1的时序视差。例如,也可以采用训练完成的光流估计模型处理L_K和L_K+1图像,以及处理R_K和R_K+1图像,以得到同一场景点从L_K到L_K+1和从R_K到R_K+1的时序视差。
S270、根据当前双目视频帧中像素的绝对深度值,以及时序视差,对下一双目视频帧中像素的绝对深度值进行修正。
由于相邻视频帧之间,像素对应的物体在时序、空间上具备连续性,可以利用像素在当前双目视频帧中的绝对深度值,来对下一双目视频帧中相对应像素的绝对深度值进行修正,从而可提高下一双目视频帧中像素的绝对深度值的准确性。
在一些可选的实现方式中,根据当前双目视频帧中像素的绝对深度值,以及时序视差,对下一双目视频帧中像素的绝对深度值进行修正,可以包括:根据当前双目视频帧中像素的绝对深度值,以及时序视差,确定下一双目视频帧
中像素的绝对深度的时序估计值;根据时序估计值,对下一双目视频帧中像素的绝对深度值进行加权平滑处理。
在这些可选的实现方式中,可根据同一场景点从L_K到L_K+1和从R_K到R_K+1的时序视差,将当前双目视频帧中的像素进行位置移动,以确定下一双目视频帧中相对应像素。进而,可将当前双目视频帧中像素的绝对深度值,向后传播给下一双目视频帧中相对应像素,以估计出下一双目视频帧中相对应像素的绝对深度值,即得到下一双目视频帧中像素的绝对深度的时序估计值。可将下一双目视频帧中像素的绝对深度的时序估计值,与S250步骤确定出的下一双目视频帧中待确定像素的绝对深度值,进行加权求和,以对S250步骤确定出下一双目视频帧中待确定像素的绝对深度值进行时序的平滑修正处理,得到更准确的下一双目视频帧中像素的绝对深度值。
此外,还可以在根据当前双目视频帧中像素的绝对深度值,以及时序视差,确定下一双目视频帧中相对应像素的绝对深度的时序估计值之后,根据相对应像素的时序估计值的分布,对下一双目视频帧中像素的绝对深度值进行修正,以使下一双目视频帧中像素的绝对深度值的分布向时序估计值的分布靠近。可以理解的是,其他的对下一双目视频帧中像素的绝对深度值进行修正的方式,也皆可应用于此,在此不做穷举。
在一些实际应用中,双目视频帧可以源于虚拟现实(Virtual Reality,VR)场景下的双目视频。可通过本公开实施例提供的图像深度估计方法,确定VR双目视频中帧像素的绝对深度值,进而可有利于VR双目视频的特效添加等后续业务的实现,有助于提高视频观看用户的观感体验。
本公开实施例的技术方案,可以应用于双目视频的深度估计。通过本公开实施例提供图像深度估计方法,可以确定当前双目视频帧和下一双目视频帧中像素的绝对深度值。此外,由于当前双目视频帧和下一双目视频帧具有时序连续性,还可以通过当前双目视频帧到下一双目视频帧的时序视差,将当前双目视频帧的绝对深度值向后传播到下一双目视频帧,以对下一双目视频帧中像素
的绝对深度值进行修正,从而可提高下一双目视频帧中像素的绝对深度值的准确性。本公开实施例提供的图像深度估计方法与上述实施例提供的图像深度估计方法属于同一构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且相同的技术特征在本实施例与上述实施例中具有相同的效果。
图3为本公开实施例所提供的一种图像深度估计装置的结构示意图。本实施例提供的图像深度估计装置适用于双目图像的全图像深度估计的情形。
如图3所示,本公开实施例提供的图像深度估计装置,可以包括:
相对深度获取模块310,设置为获取双目图像中多个像素的相对深度值;
目标区域确定模块320,设置为获取同一场景点在双目图像间的双向视差,并确定双向视差一致的目标区域;
区域绝对深度确定模块330,设置为根据目标区域中像素的双向视差,确定目标区域中像素的绝对深度值;
映射关系构建模块340,设置为根据目标区域中像素的绝对深度值,以及双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系;
整体绝对深度确定模块350,设置为根据映射关系,以及双目图像中待确定像素的相对深度值,确定双目图像中待确定像素的绝对深度值。
在一些可选的实现方式中,相对深度获取模块,可以设置为:
通过单目深度估计模型处理双目图像,以得到双目图像中多个像素的相对深度值;
其中,单目深度估计模型在训练过程中的监督信息,包括样本双目图像的目标区域中像素的绝对深度值。
在一些可选的实现方式中,目标区域确定模块,可以设置为:
通过光流估计模型处理双目图像,以得到同一场景点在双目图像间的双向视差;
其中,光流估计模型在训练过程中的监督信息,包括在样本双目图像中除目标区域外的其他区域同一场景点间的双向视差。
在一些可选的实现方式中,目标区域确定模块,可以设置为:
根据双向视差确定目标像素;其中,双向视差包括第一视差和第二视差,第一视差为同一场景点由左目图像到右目图像的视差,第二视差为同一场景点由右目图像到左目图像的视差;
根据多个目标像素确定目标区域。
在一些可选的实现方式中,映射关系构建模块,可以设置为:
将目标区域中多个像素的绝对深度值作为因变量,双目图像中相应像素的相对深度值作为自变量,进行函数拟合;
将函数拟合的结果,作为由相对深度值到绝对深度值的映射关系。
在一些可选的实现方式中,双目图像可以包括双目视频帧;
相应的,图像深度估计装置,还可以包括:
时序视差确定模块,设置为确定同一场景点从当前双目视频帧到下一双目视频帧的时序视差;
整体绝对深度确定模块,还可以设置为在确定当前双目视频帧以及下一双目视频帧中像素的绝对深度值之后,根据当前双目视频帧中像素的绝对深度值,以及时序视差,对下一双目视频帧中像素的绝对深度值进行修正。
在一些可选的实现方式中,整体绝对深度确定模块,可以设置为:
根据当前双目视频帧中像素的绝对深度值,以及时序视差,确定下一双目视频帧中像素的绝对深度的时序估计值;
根据时序估计值,对下一双目视频帧中像素的绝对深度值进行加权平滑处理。
本公开实施例所提供的图像深度估计装置,可执行本公开任意实施例所提供的图像深度估计方法,具备执行方法相应的功能模块和效果。
值得注意的是,上述装置所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的名称也只是为了便于相互区分,并不用于限制本公开实施例的保护
范围。
下面参考图4,图4示出了适于用来实现本公开实施例的电子设备(例如图4中的终端设备或服务器)400的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、平板电脑(PAD)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图4示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图4所示,电子设备400可以包括处理器(例如中央处理器、图形处理器等)401,处理器401可以根据存储在只读存储器(Read-Only Memory,ROM)402中的程序或者从存储装置408加载到随机访问存储器(Random Access Memory,RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的各种程序和数据。处理装置401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(Input/Output,I/O)接口405也连接至总线404。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置407;包括例如磁带、硬盘等的存储装置408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有各种装置的电子设备400,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装
置409从网络上被下载和安装,或者从存储装置408被安装,或者从ROM402被安装。在该计算机程序被处理器401执行时,执行本公开实施例的图像深度估计方法中限定的上述功能。
本公开实施例提供的电子设备与上述实施例提供的图像深度估计方法属于同一构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的效果。
本公开实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例所提供的图像深度估计方法。
需要说明的是,本公开上述的计算机可读存储介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子可以包括但不限于:具有至少一个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)或闪存(FLASH)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读存储介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适
的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(Hyper Text Transfer Protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读存储介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有至少一个程序,当上述至少一个程序被该电子设备执行时,使得该电子设备:
获取双目图像中各像素的相对深度值;获取同一场景点在双目图像间的双向视差,并确定双向视差一致的目标区域;根据目标区域中各像素的双向视差,确定目标区域中各像素的绝对深度值;根据目标区域中各像素的绝对深度值,以及在双目图像的相对深度值,构建相对深度值与绝对深度值的映射关系;根据映射关系,以及双目图像中各像素的相对深度值,确定双目图像中各像素的绝对深度值。
可以以至少一种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含至少一个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和流程图中的每个方框、以及框图和流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元、模块的名称并不构成对该单元、模块本身的限定。
本文中以上描述的功能可以至少部分地由至少一个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)等等。
在本公开的上下文中,机器可读存储介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读存储介质可以是机器可读信号介质或机器可读储存介质。机器可读存储介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的示例会包括基于至少一个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器
(EPROM)、快闪存储器、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,提供了一种图像深度估计方法,该方法包括:
获取双目图像中多个像素的相对深度值;
获取同一场景点在所述双目图像间的双向视差,并确定所述双向视差一致的目标区域;
根据所述目标区域中像素的双向视差,确定所述目标区域中像素的绝对深度值;
根据所述目标区域中像素的绝对深度值,以及所述双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系;
根据所述映射关系,以及所述双目图像中待确定像素的相对深度值,确定所述双目图像中待确定像素的绝对深度值。
根据本公开的一个或多个实施例,提供了一种图像深度估计方法,还包括:
在一些可选的实现方式中,所述获取双目图像中多个像素的相对深度值,包括:
通过单目深度估计模型处理所述双目图像,以得到所述双目图像中多个像素的相对深度值;
其中,所述单目深度估计模型在训练过程中的监督信息,包括样本双目图像的目标区域中像素的绝对深度值。
根据本公开的一个或多个实施例,提供了一种图像深度估计方法,还包括:
在一些可选的实现方式中,所述获取同一场景点在所述双目图像间的双向视差,包括:
通过光流估计模型处理所述双目图像,以得到同一场景点在所述双目图像间的双向视差;
其中,所述光流估计模型在训练过程中的监督信息,包括在样本双目图像
中除所述目标区域外的其他区域中同一场景点间的双向视差。
根据本公开的一个或多个实施例,提供了一种图像深度估计方法,还包括:
在一些可选的实现方式中,所述确定所述双向视差一致的目标区域,包括:
根据所述双向视差确定目标像素;其中,所述双向视差包括第一视差和第二视差,所述第一视差为同一场景点由左目图像到右目图像的视差,所述第二视差为同一场景点由右目图像到左目图像的视差;
根据多个所述目标像素确定目标区域。
根据本公开的一个或多个实施例,提供了一种图像深度估计方法,还包括:
在一些可选的实现方式中,所述根据所述目标区域中像素的绝对深度值,以及所述双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系,包括:
将所述目标区域中多个像素的绝对深度值作为因变量,所述双目图像中相应像素的相对深度值作为自变量,进行函数拟合;
将所述函数拟合的结果,作为由相对深度值到绝对深度值的映射关系。
根据本公开的一个或多个实施例,提供了一种图像深度估计方法,还包括:
在一些可选的实现方式中,所述双目图像包括双目视频帧;
相应的,在确定当前双目视频帧以及下一双目视频帧中像素的绝对深度值之后,所述方法还包括:
确定同一场景点从所述当前双目视频帧到所述下一双目视频帧的时序视差;
根据所述当前双目视频帧中像素的绝对深度值,以及所述时序视差,对所述下一双目视频帧中各像素的绝对深度值进行修正。
根据本公开的一个或多个实施例,提供了一种图像深度估计方法,还包括:
在一些可选的实现方式中,所述根据所述当前双目视频帧中像素的绝对深度值,以及所述时序视差,对所述下一双目视频帧中像素的绝对深度值进行修正,包括:
根据所述当前双目视频帧中像素的绝对深度值,以及所述时序视差,确定
所述下一双目视频帧中像素的绝对深度的时序估计值;
根据所述时序估计值,对所述下一双目视频帧中像素的绝对深度值进行加权平滑处理。
根据本公开的一个或多个实施例,提供了一种图像深度估计装置,该装置包括:
相对深度获取模块,设置为获取双目图像中多个像素的相对深度值;
目标区域确定模块,设置为获取同一场景点在所述双目图像间的双向视差,并确定所述双向视差一致的目标区域;
区域绝对深度确定模块,设置为根据所述目标区域中各像素的双向视差,确定所述目标区域中像素的绝对深度值;
映射关系构建模块,设置为根据所述目标区域中像素的绝对深度值,以及所述双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系;
整体绝对深度确定模块,设置为根据所述映射关系,以及所述双目图像中待确定像素的相对深度值,确定所述双目图像中待确定像素的绝对深度值。
Claims (10)
- 一种图像深度估计方法,包括:获取双目图像中多个像素的相对深度值;获取同一场景点在所述双目图像间的双向视差,并确定所述双向视差一致的目标区域;根据所述目标区域中像素的双向视差,确定所述目标区域中像素的绝对深度值;根据所述目标区域中像素的绝对深度值,以及所述双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系;根据所述映射关系,以及所述双目图像中待确定像素的相对深度值,确定所述双目图像中待确定像素的绝对深度值。
- 根据权利要求1所述的方法,其中,所述获取双目图像中多个像素的相对深度值,包括:通过单目深度估计模型处理所述双目图像,以得到所述双目图像中多个像素的相对深度值;其中,所述单目深度估计模型在训练过程中的监督信息,包括样本双目图像的目标区域中像素的绝对深度值。
- 根据权利要求1所述的方法,其中,所述获取同一场景点在所述双目图像间的双向视差,包括:通过光流估计模型处理所述双目图像,以得到同一场景点在所述双目图像间的双向视差;其中,所述光流估计模型在训练过程中的监督信息,包括在样本双目图像中除所述目标区域外的其他区域中同一场景点间的双向视差。
- 根据权利要求1所述的方法,其中,所述确定所述双向视差一致的目标区域,包括:根据所述双向视差确定目标像素;其中,所述双向视差包括第一视差和第二视差,所述第一视差为同一场景点由左目图像到右目图像的视差,所述第二 视差为同一场景点由右目图像到左目图像的视差;根据多个所述目标像素确定目标区域。
- 根据权利要求1所述的方法,其中,所述根据所述目标区域中像素的绝对深度值,以及所述双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系,包括:将所述目标区域中多个像素的绝对深度值作为因变量,所述双目图像中相应像素的相对深度值作为自变量,进行函数拟合;将所述函数拟合的结果,作为由相对深度值到绝对深度值的映射关系。
- 根据权利要求1-5中任一所述的方法,其中,所述双目图像包括双目视频帧;相应的,在确定当前双目视频帧以及下一双目视频帧中像素的绝对深度值之后,所述方法还包括:确定同一场景点从所述当前双目视频帧到所述下一双目视频帧的时序视差;根据所述当前双目视频帧中像素的绝对深度值,以及所述时序视差,对所述下一双目视频帧中像素的绝对深度值进行修正。
- 根据权利要求6所述的方法,其中,所述根据所述当前双目视频帧中像素的绝对深度值,以及所述时序视差,对所述下一双目视频帧中像素的绝对深度值进行修正,包括:根据所述当前双目视频帧中像素的绝对深度值,以及所述时序视差,确定所述下一双目视频帧中像素的绝对深度的时序估计值;根据所述时序估计值,对所述下一双目视频帧中像素的绝对深度值进行加权平滑处理。
- 一种图像深度估计装置,包括:相对深度获取模块,设置为获取双目图像中多个像素的相对深度值;目标区域确定模块,设置为获取同一场景点在所述双目图像间的双向视差,并确定所述双向视差一致的目标区域;区域绝对深度确定模块,设置为根据所述目标区域中像素的双向视差,确定所述目标区域中像素的绝对深度值;映射关系构建模块,设置为根据所述目标区域中像素的绝对深度值,以及所述双目图像中像素的相对深度值,构建相对深度值与绝对深度值的映射关系;整体绝对深度确定模块,设置为根据所述映射关系,以及所述双目图像中待确定像素的相对深度值,确定所述双目图像中待确定像素的绝对深度值。
- 一种电子设备,包括:至少一个处理器;存储装置,设置为存储至少一个程序,当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-7中任一所述的图像深度估计方法。
- 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时执行如权利要求1-7中任一所述的图像深度估计方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211114979.6 | 2022-09-14 | ||
CN202211114979.6A CN115937290B (zh) | 2022-09-14 | 2022-09-14 | 一种图像深度估计方法、装置、电子设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024056030A1 true WO2024056030A1 (zh) | 2024-03-21 |
Family
ID=86652960
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/118825 WO2024056030A1 (zh) | 2022-09-14 | 2023-09-14 | 一种图像深度估计方法、装置、电子设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115937290B (zh) |
WO (1) | WO2024056030A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115937290B (zh) * | 2022-09-14 | 2024-03-22 | 北京字跳网络技术有限公司 | 一种图像深度估计方法、装置、电子设备及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140185920A1 (en) * | 2013-01-02 | 2014-07-03 | International Business Machines Corporation | Image selection and masking using imported depth information |
CN104010178A (zh) * | 2014-06-06 | 2014-08-27 | 深圳市墨克瑞光电子研究院 | 双目图像视差调节方法及装置和双目相机 |
CN112985360A (zh) * | 2021-05-06 | 2021-06-18 | 中汽数据(天津)有限公司 | 基于车道线的双目测距校正方法、装置、设备和存储介质 |
CN115937290A (zh) * | 2022-09-14 | 2023-04-07 | 北京字跳网络技术有限公司 | 一种图像深度估计方法、装置、电子设备及存储介质 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5252642B2 (ja) * | 2009-04-13 | 2013-07-31 | 独立行政法人情報通信研究機構 | 奥行推定装置、奥行推定方法および奥行推定プログラム |
CN106204572B (zh) * | 2016-07-06 | 2020-12-04 | 合肥工业大学 | 基于场景深度映射的道路目标深度估计方法 |
CN108564536B (zh) * | 2017-12-22 | 2020-11-24 | 洛阳中科众创空间科技有限公司 | 一种深度图的全局优化方法 |
CN108335322B (zh) * | 2018-02-01 | 2021-02-12 | 深圳市商汤科技有限公司 | 深度估计方法和装置、电子设备、程序和介质 |
CN114511647A (zh) * | 2020-11-16 | 2022-05-17 | 深圳市万普拉斯科技有限公司 | 一种真实深度图的生成方法、装置、系统及电子设备 |
CN114820745B (zh) * | 2021-12-13 | 2024-09-13 | 南瑞集团有限公司 | 单目视觉深度估计系统、方法、计算机设备及计算机可读存储介质 |
CN115049717B (zh) * | 2022-08-15 | 2023-01-06 | 荣耀终端有限公司 | 一种深度估计方法及装置 |
-
2022
- 2022-09-14 CN CN202211114979.6A patent/CN115937290B/zh active Active
-
2023
- 2023-09-14 WO PCT/CN2023/118825 patent/WO2024056030A1/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140185920A1 (en) * | 2013-01-02 | 2014-07-03 | International Business Machines Corporation | Image selection and masking using imported depth information |
CN104010178A (zh) * | 2014-06-06 | 2014-08-27 | 深圳市墨克瑞光电子研究院 | 双目图像视差调节方法及装置和双目相机 |
CN112985360A (zh) * | 2021-05-06 | 2021-06-18 | 中汽数据(天津)有限公司 | 基于车道线的双目测距校正方法、装置、设备和存储介质 |
CN115937290A (zh) * | 2022-09-14 | 2023-04-07 | 北京字跳网络技术有限公司 | 一种图像深度估计方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN115937290B (zh) | 2024-03-22 |
CN115937290A (zh) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10699431B2 (en) | Method and apparatus for generating image generative model | |
CN112733820B (zh) | 障碍物信息生成方法、装置、电子设备和计算机可读介质 | |
CN113689372A (zh) | 图像处理方法、设备、存储介质及程序产品 | |
WO2023207379A1 (zh) | 图像处理方法、装置、设备及存储介质 | |
WO2023071707A1 (zh) | 视频图像处理方法、装置、电子设备及存储介质 | |
CN113362243B (zh) | 模型训练方法、图像处理方法及装置、介质和电子设备 | |
US20240320807A1 (en) | Image processing method and apparatus, device, and storage medium | |
WO2024056030A1 (zh) | 一种图像深度估计方法、装置、电子设备及存储介质 | |
CN112907628A (zh) | 视频目标追踪方法、装置、存储介质及电子设备 | |
CN114049417B (zh) | 虚拟角色图像的生成方法、装置、可读介质及电子设备 | |
CN115471658A (zh) | 一种动作迁移方法、装置、终端设备及存储介质 | |
CN111915532B (zh) | 图像追踪方法、装置、电子设备及计算机可读介质 | |
WO2024056020A1 (zh) | 一种双目图像的生成方法、装置、电子设备及存储介质 | |
CN113850212A (zh) | 图像生成方法、装置、设备及存储介质 | |
WO2024041623A1 (zh) | 特效图的生成方法、装置、设备及存储介质 | |
CN111814811A (zh) | 图像信息提取方法、训练方法及装置、介质和电子设备 | |
WO2023216918A1 (zh) | 渲染图像的方法、装置、电子设备及存储介质 | |
WO2023088104A1 (zh) | 视频的处理方法、装置、电子设备和存储介质 | |
CN112598732B (zh) | 目标设备定位方法、地图构建方法及装置、介质、设备 | |
KR20220080696A (ko) | 깊이 추정 방법, 디바이스, 전자 장비 및 컴퓨터 판독가능 저장 매체 | |
CN114419298A (zh) | 虚拟物体的生成方法、装置、设备及存储介质 | |
CN111292365A (zh) | 生成深度图的方法、装置、电子设备和计算机可读介质 | |
WO2023207360A1 (zh) | 图像分割方法、装置、电子设备及存储介质 | |
WO2023284412A1 (zh) | 图像处理方法、装置、电子设备及存储介质 | |
CN118823091A (zh) | 深度图像的置信度信息生成方法、装置、介质及终端设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23864760 Country of ref document: EP Kind code of ref document: A1 |