CN114332188B

CN114332188B - Parallax image generation method, device and medium for binocular vision device

Info

Publication number: CN114332188B
Application number: CN202111408110.8A
Authority: CN
Inventors: 张立晔; 蔡富东; 吕昌峰; 刘焕云; 刘伟; 郭国信
Original assignee: Shandong Senter Electronic Co Ltd
Current assignee: Shandong Senter Electronic Co Ltd
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2024-09-17
Anticipated expiration: 2041-11-19
Also published as: CN114332188A

Abstract

The application provides a parallax image generation method, device and medium for binocular vision equipment, which can be used for generating a parallax image by acquiring an image pair to be processed, and extracting image features of the image pair to be processed to obtain a plurality of feature image pairs with different resolutions. The feature image pair with the first resolution is segmented into a plurality of first feature blocks. And determining corresponding coordinates to be matched of the feature image pair with the first resolution according to the corresponding relation among the first feature blocks. And determining the matching coordinates in the characteristic image pair with the second resolution based on the coordinates to be matched. And taking the characteristic image pair corresponding to the second resolution as an image pair to be matched. The second resolution is greater than the first resolution. And determining a region pair to be matched in the image pair to be matched according to the matching coordinates. And determining corresponding parallax coordinates of the image pair to be matched based on the feature correlation values among the feature sub-blocks in the region pair to be matched, so as to generate a final parallax image of the image pair to be processed based on the parallax coordinates.

Description

Parallax image generation method, device and medium for binocular vision device

Technical Field

The present application relates to the field of computer vision, and in particular, to a parallax image generating method, apparatus, and medium for binocular vision apparatus.

Background

Along with development of science and technology, application scenes of stereoscopic images are wider and wider, and stereoscopic vision processing is performed on stereoscopic images mainly by means of images shot by the same object at different angles, so that images with image depth characteristics are obtained. The stereoscopic vision processing mainly comprises the steps of establishing a corresponding relation of pixel points in two images with different angles, thereby realizing the generation of a stereoscopic image.

Currently, stereo images mainly utilize binocular vision cameras to shoot stereo image pairs, and each pixel point in the stereo image pairs is matched. However, in the matching of the stereoscopic image pairs realized in the prior art, the image pairs are generally matched through pixel points with obvious distinction in the images, and the stereoscopic image pairs with clear images and good imaging effect can not be shot at any time because the shooting angles of the binocular vision cameras and the actual shooting environments are complex and various. If the existing parallax image generation technology is used, the stereo image pair with the condition of a shielding area, repeated textures, a reflecting surface and other pathological areas cannot be matched with each pixel point.

Disclosure of Invention

The embodiment of the application provides a parallax image generation method, equipment and medium for binocular vision equipment, which are used for matching pixel points of a stereoscopic image pair with shielding areas, repeated textures, reflecting surfaces and other pathological areas.

In one aspect, the present application provides a parallax image generating method for a binocular vision apparatus, the method comprising:

And acquiring an image pair to be processed, and extracting image features of the image pair to be processed to obtain a plurality of feature image pairs with different resolutions. The image to be processed corresponds to a plurality of images which comprise images of the same target at different angles and are acquired by the binocular vision equipment. The feature image pair with the first resolution is segmented into a plurality of first feature blocks. And determining corresponding coordinates to be matched of the feature image pair with the first resolution according to the corresponding relation among the first feature blocks. The corresponding relation is the correlation degree of any two first feature blocks in the feature image pair. And determining the matching coordinates in the characteristic image pair with the second resolution based on the coordinates to be matched. And taking the characteristic image pair corresponding to the second resolution as an image pair to be matched. The second resolution is greater than the first resolution. And determining a region pair to be matched in the image pair to be matched according to the matching coordinates in the characteristic image pair with the second resolution. And determining corresponding parallax coordinates of the image pair to be matched based on the feature correlation values among the feature sub-blocks in the region pair to be matched, so as to generate a final parallax image of the image pair to be processed based on the parallax coordinates. The feature sub-blocks are obtained by cutting the region pairs to be matched.

In one implementation of the application, a ratio of the second resolution to the first resolution is determined. And carrying out product operation on the ratio and the coordinates to be matched, and determining the corresponding matching coordinates of the coordinates to be matched in the image pair to be matched. The first coordinate value in the matching coordinates corresponds to a first image to be matched in the image pair to be matched, and the second coordinate value corresponds to a second image to be matched in the image pair to be matched.

In one implementation of the application, the pixel point corresponding to the matching coordinate is taken as the center of the matching window, and the coverage area of the matching window in the image pair to be matched is determined. The size of the matching window is a preset value. And respectively determining the region pairs to be matched in the image pairs to be matched according to the coverage regions.

In one implementation of the present application, a central feature sub-block corresponding to a central pixel point corresponding to a first region to be matched in a region pair to be matched is determined. And determining a second characteristic sub-block corresponding to each pixel point of a second region to be matched in the region to be matched. And calculating a characteristic correlation value between the vector of the central characteristic sub-block and the vector of each second characteristic sub-block to determine parallax coordinates according to the characteristic correlation value. Wherein the feature correlation value characterizes a degree of correlation of the vector of the central feature sub-block with the vector of each second feature sub-block.

In one implementation of the application, the parallax image generation model is trained by presetting sample image pairs in a sample library so as to obtain the trained parallax image generation model. The parallax image generation model is used for generating parallax images according to the image pairs to be processed. Wherein the parallax image generation model includes any one or more of the following modules: and the image feature extraction module is used for extracting image features of the image pair to be processed. And the segmentation module is used for segmenting the characteristic image pair with the first resolution into a plurality of first characteristic blocks. And the transducer module is used for determining the corresponding relation. And the first matching module is used for determining the corresponding coordinates to be matched of the characteristic image pair with the first resolution. And a second matching module for determining matching coordinates in the feature image pair of the second resolution. And a parallax calculation module for generating an original parallax image. And a disparity refinement module for refining the original disparity image into a final disparity image.

In one implementation of the application, the feature image pairs are vector processed to generate corresponding one-dimensional vectors of the feature image pairs. And carrying out position coding on the characteristic image pair according to the corresponding one-dimensional vector of the characteristic image pair so as to obtain corresponding position information of the characteristic image pair. And according to the position information, segmenting the characteristic image pair with the first resolution into a plurality of first characteristic blocks with the position information.

In one implementation of the application, the first feature blocks are the same size.

In one implementation of the application, a pair of stereoscopic images acquired by a binocular vision apparatus is received. Or a stereo image pair in a preset scene stream dataset is acquired. And carrying out alignment correction on each stereoscopic image in the stereoscopic image pair according to the polar lines in the preset direction. And (3) aligning the corrected stereo image pair to be used as an image pair to be processed.

In another aspect, the present application provides a parallax image generating apparatus for a binocular vision apparatus, the apparatus comprising:

At least one processor; and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:

In yet another aspect, the present application also provides a non-volatile computer storage medium storing computer-executable instructions for parallax image generation for a binocular vision apparatus, the computer-executable instructions being configured to:

According to the scheme, the pixel point matching relation is built on the image with low feature resolution, then the refined matching relation is built on the image with high feature resolution, and meanwhile, the function of the transform module is utilized, so that the image features can be matched on the global receptive field of the image, and the pixel points of the pathological area can be accurately matched.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

Fig. 1 is a schematic flow chart of a parallax image generating method for a binocular vision apparatus according to an embodiment of the present application;

fig. 2 is a schematic diagram of a parallax image generating method for a binocular vision apparatus according to an embodiment of the present application;

Fig. 3 is a schematic diagram of a parallax image generating method for a binocular vision apparatus according to an embodiment of the present application;

fig. 4 is a schematic diagram of a structure in a parallax image generating apparatus for a binocular vision apparatus in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Binocular stereo vision is an important form of machine vision, and is a method for acquiring three-dimensional geometric information of an object by calculating position deviation between corresponding points of images based on parallax principle and by utilizing imaging equipment to acquire two images of the object to be measured from different positions.

In the two images of the measured object, when the positions of the corresponding points deviate, the requirement on the matching accuracy of the corresponding points is higher, and if the corresponding points cannot be matched accurately, the stereoscopic effect of the stereoscopic image formed by binocular stereoscopic vision is very poor. The corresponding point matching between the pathological areas such as blank and shielding existing in the two images is a technical problem of obtaining parallax images through binocular stereoscopic vision. There is a need for a technical solution that can accurately match corresponding points in a pathological region such as a repeated texture and a shielding region in an image to obtain a good parallax stereoscopic image.

Based on the above, the embodiment of the application provides a parallax image generation method, equipment and medium for binocular vision equipment, which are used for accurately matching corresponding points of pathological areas such as repeated textures, shielding areas and the like in images.

Various embodiments of the present application are described in detail below with reference to the attached drawing figures.

The embodiment of the application provides a parallax image generation method for binocular vision equipment, as shown in fig. 1, the method can comprise the steps of S101-S106:

s101, a server acquires an image pair to be processed, and performs image feature extraction on the image pair to be processed to obtain a plurality of feature image pairs with different resolutions.

The image to be processed corresponds to a plurality of images which comprise images of the same target at different angles and are acquired by the binocular vision equipment.

In the embodiment of the application, the server can process the images of the same target shot by the binocular vision equipment at different angles to obtain the image pair to be processed.

In particular, the server may receive a stereoscopic image pair acquired by a binocular vision apparatus. The server may also obtain a stereoscopic image pair in a preset scene stream dataset.

The internal parameters and the external parameters of the binocular vision equipment can be calibrated equipment in advance, and the position data of the binocular vision equipment can be obtained through calibrated equipment parameters, for example, the position of the binocular vision equipment is placed in parallel with the horizon line or placed vertically with the horizon line.

And the server performs alignment correction on each stereoscopic image in the stereoscopic image pair according to the polar lines in the preset direction.

The polar line direction of the image collected by the binocular vision equipment can be obtained through the position data of the binocular vision equipment, for example, if the binocular vision equipment is placed horizontally, the polar line is a horizontal polar line. Binocular vision equipment is put for 45 degrees to one side, then the defending line of polar line is 45 degrees to one side. According to the polar lines, the server performs alignment correction on the stereoscopic images in the stereoscopic image pair, so that the polar lines of the stereoscopic image pair are parallel, namely, the pixel lines of the images acquired by each acquisition device in the binocular vision device are strictly aligned.

The server aligns the corrected stereoscopic image pair as a pending image pair.

In addition, in another embodiment of the present application, the server may also randomly clip the aligned and corrected stereo image pair to obtain a stereo image pair with uniform resolution, for example, a to-be-processed image pair with resolution (480, 640).

In the embodiment of the application, the binocular vision equipment can be two cameras or cameras which are preset, and can be a finished binocular stereo camera, and the contrast of the application is not particularly limited.

In the embodiment of the application, the server needs to perform image feature extraction on the image pair to be processed, and the feature extraction can be performed by adopting a feature pyramid network (Feature Pyramid Networks, FPN), so that a plurality of feature image pairs with different resolutions are obtained.

It should be noted that a server is an execution subject of the parallax image generation method for the binocular vision apparatus, and the execution subject is not limited to the server only by way of example, and the present application is not particularly limited thereto.

S102, the server segments the feature image pair with the first resolution into a plurality of first feature blocks.

In the embodiment of the application, the server can firstly perform flattening operation on the characteristic image pair with the first resolution through flattening the flat layer in the machine learning network, namely, according to each row of pixel points, the two-dimensional characteristic image pair is unfolded into a one-dimensional vector. For example, the vector matrix of one of the pair of characteristic images is in the form ofAfter passing through the flat layer, it becomes a one-dimensional vector (1, 2,3, 4). The first resolution is for example one eighth of the resolution of the original image.

Through the flat layer, the server can reduce the calculation amount and is convenient for carrying out position coding on the images of the characteristic image pair.

The embodiment of the application can obtain a plurality of first characteristic blocks with position codes by executing the following methods. The method comprises the following steps:

Firstly, the server carries out vector processing on the characteristic image pair to generate a one-dimensional vector corresponding to the characteristic image pair.

The process of generating the one-dimensional vector by the server through the flat layer is the process of generating the one-dimensional vector by the feature image pair, and will not be described herein.

And then, the server performs position coding on the characteristic image pair according to the corresponding one-dimensional vector of the characteristic image pair to obtain corresponding position information of the characteristic image pair.

Finally, the server segments the feature image pair with the first resolution into a plurality of first feature blocks with the position information according to the position information.

In the embodiment of the present application, the first feature blocks have the same size, for example, the first feature blocks each include 50 pixels. If the sizes of the cuts of the first feature blocks are different, the embodiment of the application can process the first feature blocks according to different sizes according to the number of specific sizes.

The application cuts a plurality of first characteristic blocks, can facilitate the subsequent transformer module to process the self-attention layer and the cross-attention layer, and improves the parallax image generation efficiency of the parallax image generation method for the binocular vision equipment.

S103, the server determines corresponding coordinates to be matched of the feature image pair with the first resolution according to the corresponding relation among the first feature blocks.

The corresponding relation is the correlation degree of any two first feature blocks in the feature image pair.

The server may process each first feature block through a preset transducer module, which may be a transducer model including an encoder and a decoder. The server can obtain the self-attention layer and the cross-attention layer of each first feature block through the transducer module. The self-attention layer performs image processing on the same image in the characteristic image pair, namely, determines characteristic data of the first characteristic block on the image by combining the position information of the first characteristic block of the image on the image and the pixel value of the first characteristic block.

The cross attention layer is used for processing different images in the characteristic image pair to determine the characteristics of the images, for example, the characteristic image pair comprises a first image and a second image, and the first characteristic block related to the first characteristic block A in the second image is determined by combining the position information of the first characteristic block A corresponding to the first image. And the server calculates the correlation degree of any two first characteristic blocks between the first image and the second image, so as to obtain the corresponding relation.

And then, the server determines the coordinates to be matched of the characteristic image pair with the first resolution according to the corresponding relation obtained by the transducer module. For example, there is a correspondence relationship, such as a-d, B-c, between the first feature block a, the second feature block B of the first image a and the first feature block c, the first feature block d of the second image B in the feature image pair. The server may determine the coordinates (a, d) (b, d) to be matched. the processing of the first feature block by the transducer module is shown in fig. 2. The image processing device comprises an image feature extraction module 201, a segmentation module 202, a transducer module 203, a first matching module 204, a second matching module 205, a parallax calculation module 206 and a parallax refinement module 207.

According to the first feature block, the specific calculation process for obtaining the coordinates to be matched is as follows:

the server calculates the feature vector of each pixel point of the first image and the second image in the feature image pair, wherein the feature vector of the first image is as follows: f _A, the feature vector of the second image is: f _B. The server processes the self-attention layer and the cross-attention layer of each first feature block according to the transducer module to obtain a transformation feature with a position relation and a pixel value relation: the transformed feature vector F '_A of the first image and the transformed feature vector F' _B of the second image.

Then, the server calculatesWhere S (i, j) is a confidence matrix between the transformed feature vector F '_A and the transformed feature vector F' _B; i is any point in the first image, j is any point in the second image; τ is a constant for reducing the calculation amount. The server calculates the matching probability of any two feature points according to the following formula:

p_c(i,j)＝softmax(S(i,·))_j·softmax(S(·,j))_i

Wherein p _c (i, j) represents the matching probability of the coordinates (i, j), i.e. the feature correlation degree between the i point and the j point, S (i,) represents the inner product operation between the i point in the first image and all the points in the second image, and S (·, j) represents the inner product operation between the j point in the second image and all the points in the first image.

The server may use the matching probability greater than a threshold as a trusted match, e.g., the point (x, y) with the matching probability greater than 0.8 as the trusted coordinate to be matched, according to the obtained matching probability. The coordinates to be matched can also be obtained by a mutual nearest neighbor algorithm MNN.

After the feature image pair corresponding to the first resolution finishes the determination of the coordinates to be matched, the server can realize the matching between the coordinates in the high-resolution image according to the coordinates to be matched in the high-resolution image higher than the first resolution. The application realizes high resolution image matching by executing the following method.

And S104, the server determines the matching coordinates in the characteristic image pair with the second resolution based on the coordinates to be matched, and takes the characteristic image pair with the corresponding second resolution as the image pair to be matched.

Wherein the second resolution is greater than the first resolution.

In the embodiment of the application, the server can pair the coordinates to be matched with the characteristic image corresponding to the second resolution to obtain the matched coordinates.

In one embodiment of the present application, the server may obtain the matching coordinates in the second resolution feature image pair by performing the following method:

First, the server determines a ratio of the second resolution to the first resolution.

For example, the image of the first resolution is one eighth of the resolution of the image captured by the binocular vision apparatus, the second resolution is one half of the resolution of the image captured by the binocular vision apparatus, and then the ratio of the second resolution to the first resolution is 4.

And then, the server carries out product operation on the ratio and the coordinates to be matched, and determines the corresponding matching coordinates of the coordinates to be matched in the image pair to be matched.

The first coordinate value in the matching coordinates corresponds to a first image to be matched in the image pair to be matched, and the second coordinate value corresponds to a second image to be matched in the image pair to be matched.

For example, the ratio is 4, the coordinates to be matched are (0, 0), (0, 0.5), (0.5 ), (0, 1), (1, 1), then the matching coordinates are the ratio multiplied by the coordinate value of the coordinates to be matched, i.e., the matching coordinates are: (0,0), (0,2), (2,2), (0,4), (4,4). In the embodiment of the application, the abscissa and the ordinate of the coordinates come from different images respectively, the abscissa comes from a first image in the image pair, and the ordinate comes from a second image in the image pair.

S105, the server determines a region pair to be matched in the image pair to be matched according to the matching coordinates in the feature image pair with the second resolution.

In the embodiment of the application, the server can determine the region pair to be matched by executing the following method, which is specifically as follows:

In the embodiment of the application, the server takes the pixel point corresponding to the matching coordinate as the center of the matching window, and determines the coverage area of the matching window in the image pair to be matched.

The size of the matching window is a preset value.

In the embodiment of the present application, the size of the matching window may be adjusted according to actual use, and the preset value may be, for example, 3*3, and the preset value may be an area value or a value including a length and a width.

And the server respectively determines the region pairs to be matched in the image pairs to be matched according to the coverage regions.

The region pairs to be matched can reduce the search region for feature matching, improve the efficiency of feature matching and reduce the error during image matching. For example, after the coordinates (o, p) to be matched are obtained for the feature image pair with the first resolution, feature processing is performed in the region pair to be matched corresponding to the matching coordinates (o, p) of the feature image pair with the second resolution, that is, feature processing is performed in the region with the (o, p) as the center. Because the first resolution is smaller than the second resolution, the pixels in the feature image pair with the second resolution are more than the first resolution, and the pixels in the feature image pair with the second resolution, which are more than the pixels in the feature image pair with the first resolution, can be accurately feature-processed through the region pair to be matched.

According to the scheme, sub-pixel precision matching is achieved, the final parallax image generation precision is improved, and meanwhile, the situation that accurate pixel corresponding points cannot be matched in a pathological area such as a shielding area, a repeated texture area, a reflecting surface and the like is avoided.

And S106, the server determines corresponding parallax coordinates of the image pair to be matched based on the feature correlation values among the feature sub-blocks in the region pair to be matched, so as to generate a final parallax image of the image pair to be processed based on the parallax coordinates.

The feature sub-blocks are obtained by dividing the region pairs to be matched.

In the embodiment of the application, parallax coordinates are obtained by executing the following method, specifically:

Firstly, a server determines a first region to be matched in a region to be matched pair, and a central feature sub-block corresponding to a corresponding central pixel point.

The server can obtain a plurality of corresponding feature sub-blocks of the region to be matched through executing the scheme of processing the first feature block by the transducer module. The first region to be matched is from a first image to be matched in the region to be matched image pair.

Then, the server determines a second characteristic sub-block corresponding to each pixel point of a second region to be matched in the region to be matched.

The second region to be matched is from a second image to be matched in the region to be matched image pair.

Finally, the server calculates the feature correlation value between the vector of the central feature sub-block and the vector of each second feature sub-block to determine the parallax coordinates according to the feature correlation value.

Wherein the feature correlation value characterizes a degree of correlation of the vector of the central feature sub-block with the vector of each second feature sub-block.

In the embodiment of the present application, the manner in which the server calculates the feature correlation value between the vector of the central feature sub-block and the vector of each second feature sub-block may refer to step S103, and the feature correlation value may be calculated by the matching probability formula in step S103.

After the server obtains the feature correlation value between the vector of the central feature sub-block and the vector of each second feature sub-block, if the central coordinate corresponding to the central feature sub-block isThe center coordinates of the second characteristic sub-block areBy calculating the feature correlation value, the server can generate a thermodynamic diagram, and determine the vector T in the second feature sub-block corresponding to the maximum value of the matching probability according to the thermodynamic diagram, wherein the vector T is defined as the difference from the center coordinateA kind of electronic deviceThus, the server can obtain the parallax coordinate as

After the server obtains the parallax coordinates, parallax calculation and parallax refinement can be performed on the image pair to be processed, so that a final parallax image is obtained. Parallax calculation is to obtain a parallax coordinate, and a depth value is obtained through a calculation formula of parallax and the depth value, so that a final parallax image is generated.

In the embodiment of the application, the parallax image generation method for the binocular vision equipment can obtain the final parallax image through a pre-trained parallax image generation model. It will be appreciated by those skilled in the art that the above steps may also be steps for generating a model for training the moveout image, which is not particularly limited by the present application.

The embodiment of the application generates a parallax image generation model for generating a final parallax image by the following method, which comprises the following steps:

the server trains the parallax image generation model through a sample image pair in a preset sample library so as to obtain a trained parallax image generation model.

The parallax image generation model is used for generating parallax images according to the image pairs to be processed.

Wherein the parallax image generation model includes any one or more of the following modules:

the image feature extraction module is used for extracting image features of the image pair to be processed;

The segmentation module is used for segmenting the characteristic image pair with the first resolution into a plurality of first characteristic blocks; a transducer module for determining correspondence;

The first matching module is used for determining corresponding coordinates to be matched of the characteristic image pair with the first resolution;

a second matching module for determining matching coordinates in the feature image pair of the second resolution;

A parallax calculation module for generating an original parallax image;

and a disparity refinement module for refining the original disparity image into a final disparity image.

In the embodiment of the present application, in order to alleviate the situation that the process of performing parallax regression on an original parallax image is performed only on an epipolar line, and thus lacks context information, the embodiment of the present application uses convolution to adjust an estimated value, the original parallax image is first connected with a left image collected by a binocular vision device (taking left and right binocular vision devices as an example) along a channel dimension, two convolution blocks are used for gathering occlusion information, then a linear rectification function (RECTIFIED LINEAR Unit, reLU) is utilized to perform parallax refinement through a residual block 301, the channel dimension of the residual block 301 is expanded to restore the original channel dimension thereof before the ReLU function is activated, cascade connection operation is repeatedly performed on the original parallax and the residual block 301 to obtain a better adjustment effect, and the final output of the residual block is added into the original parallax through long jump connection, so as to obtain a final parallax image. A specific schematic diagram is shown in fig. 3.

After the parallax image generation model is trained, the embodiment of the application can realize the transfer learning of the parallax image generation model, so that the parallax image generation model can generate the final parallax image of the image pair to be detected, which is different from the sample image pair.

The application obtains the coordinates to be matched of the feature image pair with the first resolution by processing the feature image pair with different resolutions, and then determines the matching coordinates of the feature image pair with the higher resolution by the coordinates to be matched. The matching coordinates are used for referring to the pixel point of the first image in the characteristic image pair and correspond to the pixel point position of the second image. And generating a parallax image according to the matched coordinates.

According to the application, firstly, a pixel point matching relation is established on an image with low feature resolution, then a refined matching relation is established on an image with high feature resolution, and meanwhile, the function of a transducer module is utilized, so that the image features can be matched on the global receptive field of the image, and the pixel points of a pathological area can be accurately matched.

Fig. 4 is a schematic structural diagram of a parallax image generating apparatus for a binocular vision apparatus according to an embodiment of the present application, as shown in fig. 4, the apparatus includes:

at least one processor. And a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:

An embodiment of the present application further provides a non-volatile computer storage medium for parallax image generation of a binocular vision apparatus, storing computer executable instructions, wherein the computer executable instructions are configured to:

The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the apparatus, medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

The device, medium and method provided by the embodiment of the application are in one-to-one correspondence, so that the device and medium also have similar beneficial technical effects as the corresponding method, and the beneficial technical effects of the device and medium are not repeated here because the beneficial technical effects of the method are described in detail above.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A parallax image generation method for a binocular vision apparatus, the method comprising:

acquiring an image pair to be processed, and extracting image features of the image pair to be processed to obtain a plurality of feature image pairs with different resolutions; the image to be processed corresponds to a plurality of images comprising images of the same target at different angles acquired by binocular vision equipment;

Segmenting the feature image pair with the first resolution into a plurality of first feature blocks;

According to the corresponding relation between the first feature blocks, determining corresponding coordinates to be matched of the feature image pair with the first resolution; the corresponding relation is the correlation degree of any two first characteristic blocks in the characteristic image pair;

determining matching coordinates in the characteristic image pair with the second resolution based on the coordinates to be matched; and

The characteristic image pair corresponding to the second resolution is used as an image pair to be matched; the second resolution is greater than the first resolution;

determining a region pair to be matched in the image pair to be matched according to the matching coordinates in the characteristic image pair with the second resolution;

Determining corresponding parallax coordinates of the image pair to be matched based on feature correlation values among feature sub-blocks in the region pair to be matched, so as to generate a final parallax image of the image pair to be processed based on the parallax coordinates; the characteristic sub-blocks are obtained by dividing the region pairs to be matched;

the determining, based on the feature correlation values among the feature sub-blocks in the to-be-matched region pair, the corresponding parallax coordinates of the to-be-matched image pair specifically includes:

determining a central characteristic sub-block corresponding to a central pixel point corresponding to a first region to be matched in the region pair to be matched;

Determining a second characteristic sub-block corresponding to each pixel point of a second region to be matched in the region to be matched;

calculating the feature correlation value between the vector of the central feature sub-block and the vector of each second feature sub-block to determine the parallax coordinate according to the feature correlation value; wherein the feature correlation value characterizes a degree of correlation of the vector of the central feature sub-block with the vector of each of the second feature sub-blocks.

2. The method according to claim 1, wherein determining matching coordinates in the feature image pair of a second resolution based on the coordinates to be matched, in particular comprises:

determining a ratio of the second resolution to the first resolution;

performing product operation on the ratio and the coordinates to be matched, and determining the corresponding matching coordinates of the coordinates to be matched in the image pair to be matched; the first coordinate value in the matching coordinates corresponds to a first image to be matched in the image to be matched pair, and the second coordinate value corresponds to a second image to be matched in the image to be matched pair.

3. The method according to claim 1, wherein determining the pair of regions to be matched in the pair of images to be matched from the matching coordinates in the pair of feature images of the second resolution, in particular comprises:

taking a pixel point corresponding to the matching coordinate as the center of a matching window, and determining the coverage area of the matching window in the image pair to be matched; the size of the matching window is a preset value;

and respectively determining the region pairs to be matched in the image pairs to be matched according to the coverage areas.

4. The method of claim 1, wherein prior to acquiring the image pair to be processed, the method further comprises:

Training a parallax image generation model through a sample image pair in a preset sample library to obtain the trained parallax image generation model; the parallax image generation model is used for generating the parallax image according to the image pair to be processed;

The segmentation module is used for segmenting the characteristic image pair with the first resolution into a plurality of first characteristic blocks;

a transducer module for determining the correspondence;

a first matching module for determining the corresponding coordinates to be matched of the characteristic image pair with the first resolution;

a second matching module for determining matching coordinates in the feature image pair for a second resolution;

A parallax calculation module for generating an original parallax image;

and a disparity refinement module for refining the original disparity image into the final disparity image.

5. The method according to claim 1, characterized in that the segmentation of the feature image pair of a first resolution into a number of first feature blocks, in particular comprises:

vector processing is carried out on the characteristic image pairs to generate corresponding one-dimensional vectors of the characteristic image pairs;

Performing position coding on the characteristic image pair according to the corresponding one-dimensional vector of the characteristic image pair to obtain corresponding position information of the characteristic image pair;

And according to the position information, segmenting the characteristic image pair with the first resolution into a plurality of first characteristic blocks with the position information.

6. The method of claim 1, wherein each of the first feature blocks is the same size.

7. The method of claim 1, wherein prior to acquiring the image pair to be processed, the method further comprises:

receiving a stereoscopic image pair acquired by binocular vision equipment; or alternatively

Acquiring a stereoscopic image pair in a preset scene flow data set;

according to the polar lines in the preset direction, aligning and correcting each stereoscopic image in the stereoscopic image pair;

and using the stereo image pair after alignment correction as the image pair to be processed.

8. A parallax image generating apparatus for a binocular vision apparatus, the apparatus comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions are executable by the at least one processor to enable the at least one processor to:

Wherein, based on the feature correlation value between each feature sub-block in the to-be-matched region pair, determining the corresponding parallax coordinates of the to-be-matched image pair, the at least one processor is capable of:

9. A non-volatile computer storage medium storing computer executable instructions for parallax image generation for a binocular vision apparatus, the computer executable instructions being configured to: