CN114066946A

CN114066946A - Image processing method and device

Info

Publication number: CN114066946A
Application number: CN202111250405.7A
Authority: CN
Inventors: 张文昌; 张冠南; 白路远
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2022-02-18

Abstract

The application discloses an image processing method and device. In this method, when frames are interpolated by the optical flow method, not only the optical flow calculation but also the depth calculation is performed. Since the depth can indicate that the distance between the subject and the lens is far and near, a larger depth indicates a farther distance, and the pixel corresponding to the depth should be lower. By utilizing the principle, the pixels in the optical flow can be strengthened or weakened through the depth, so that the inserted frame image is clearer and more reasonable and is closer to a real frame image, the video picture is smoother, and the watching experience is better.

Description

Image processing method and device

Technical Field

The present application relates to the field of image processing, and in particular, to an image processing method and apparatus.

Background

The optical flow method is a method for calculating motion information of an object between adjacent frames by using the change of pixels in an image sequence in a time domain and the correlation between adjacent frames to find the corresponding relationship between a previous frame and a current frame.

By the optical flow method, frame interpolation can be carried out on a frame sequence collected at fixed time (frames at a certain moment between the previous moment and the current moment are predicted according to the frames at the previous moment and the frames at the current moment), and the video is restored and synthesized into a smooth and complete frame stream.

In the existing optical flow method frame prediction neural network, an optical flow passing through a certain point at the time t is generally calculated by adopting a linear method, an image at the time t-1 and an image at the time t +1 are packaged by the optical flow, and finally, image fusion is completed to obtain an image at the time t. However, this method requires high accuracy for optical flow prediction, and if the accuracy is not high, the predicted frame is very blurred, and even the image in the predicted frame is deformed.

Disclosure of Invention

The applicant creatively provides an image processing method and device.

According to a first aspect of embodiments of the present application, there is provided an image processing method, including: performing optical flow operation according to a first frame image at the t-1 moment and a second frame image at the t +1 moment to respectively obtain a first optical flow and a second optical flow, wherein t is a natural number, the first optical flow is from a frame image at the t-1 moment to a frame image at the t +1 moment, and the second optical flow is from the frame image at the t +1 moment to the frame image at the t-1 moment; performing depth operation according to the first frame image and the second frame image to respectively obtain a first depth and a second depth; determining a first predicted frame image from the first frame image, the first optical flow, and the first depth; determining a second predicted frame image from the second frame image, the second optical flow, and the second depth; and synthesizing the first predicted frame image and the second predicted frame image to obtain a third predicted frame image.

According to an embodiment of the present application, performing a depth operation according to a first frame image and a second frame image to obtain a first depth and a second depth, respectively, includes: and obtaining a first depth and a second depth according to the first frame image at the time t-1, the second frame image at the time t +1 and the depth network model.

According to an embodiment of the present application, determining a first predicted frame image according to a first frame image, a first optical flow and a first depth includes: obtaining a fourth prediction frame image according to the first frame image and the first optical flow; performing monotonicity opposite function operation on the first depth to obtain the occurrence probability of each pixel point; and determining the first predicted frame image according to the fourth predicted frame image and the occurrence probability of each pixel point.

According to an embodiment of the present application, the process from performing optical flow calculation on the first frame image at the time t-1 and the second frame image at the time t +1 to synthesizing the first predicted frame image and the second predicted frame image to obtain the third predicted frame image is realized by the first frame prediction model.

According to an embodiment of the present application, after obtaining the third predicted frame image, the method further includes: and determining a fifth predicted frame image according to the third predicted frame image and a second frame prediction model, wherein the second frame prediction model is a generative confrontation neural network model, and the generative confrontation neural network model and the discrimination model generate a network model based on the same image.

According to an embodiment of the present application, before obtaining the third predicted frame image, the method further includes: and pre-training the first frame prediction model to obtain the pre-trained first frame prediction model.

According to an embodiment of the present application, before determining the sixth predicted frame image according to the third frame image and the second frame prediction model, the method further includes:

establishing a generative confrontation neural network model based on the pre-trained first frame prediction model to obtain a second frame prediction model; and carrying out countermeasure training on the second frame prediction model to obtain the second frame prediction model after the countermeasure training.

According to an embodiment of the present application, the performing countermeasure training on the second frame prediction model includes: taking the result of the first frame prediction model as the input of the generation model of the second frame prediction model, setting the training label as 1, and training the generation model of the second frame prediction model; and taking the result of the first frame prediction model as the input of the discrimination model of the second frame prediction model, setting the training label to be 0, and training the discrimination model of the second frame prediction model.

According to an embodiment of the present application, after performing countermeasure training on the second frame prediction model to obtain a second frame prediction model after the countermeasure training, the method further includes: and performing joint training on the first frame prediction model and the second frame prediction model after the countermeasure training.

According to a second aspect of embodiments of the present application, there is provided an image processing apparatus comprising: the optical flow prediction module is used for carrying out optical flow operation according to a first frame image at the t-1 moment and a second frame image at the t +1 moment to respectively obtain a first optical flow and a second optical flow, wherein t is a natural number, the first optical flow is from a frame image at the t-1 moment to a frame image at the t +1 moment, and the second optical flow is from the frame image at the t +1 moment to the frame image at the t-1 moment; the depth prediction module is used for carrying out depth operation according to the first frame image and the second frame image to respectively obtain a first depth and a second depth; a first predicted frame image prediction module for determining a first predicted frame image from the first frame image, the first optical flow and the first depth; a second predicted frame image prediction module for determining a second predicted frame image from the second frame image, the second optical flow, and the second depth; and the frame synthesis module is used for synthesizing the first prediction frame image and the second prediction frame to obtain a third prediction frame image.

According to a third aspect of embodiments herein, there is provided a computer-readable storage medium comprising a set of computer-executable instructions which, when executed, are operable to perform any of the image processing methods described above.

The embodiment of the application provides an image processing method and device, wherein when frames are interpolated according to an optical flow method, not only optical flow calculation but also depth calculation is carried out. Since the depth can indicate that the distance between the subject and the lens is far and near, a larger depth indicates a farther distance, and the pixel corresponding to the depth should be lower. By utilizing the principle, the pixels in the optical flow can be strengthened or weakened through the depth, so that the inserted frame image is clearer and more reasonable and is closer to a real frame image, the video picture is smoother, and the watching experience is better.

It is to be understood that not all of the above advantages need to be achieved in the present application, but that a specific technical solution may achieve a specific technical effect, and that other embodiments of the present application may also achieve advantages not mentioned above.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

FIG. 1 is a schematic flow chart of frame interpolation using a conventional optical flow method;

FIG. 2 is a schematic view illustrating an implementation flow of an image processing method according to an embodiment of the present application;

FIG. 3 is a schematic flowchart illustrating an application of an image processing method according to another embodiment of the present application;

FIG. 4 is a schematic flowchart illustrating an application of an image processing method according to another embodiment of the present application;

FIG. 5 is a schematic flow chart illustrating the training of the model used in the embodiment shown in FIG. 4;

fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the objects, features and advantages of the present application more obvious and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

Typically, a continuously captured video is composed of a series of time point corresponding images over a continuous period of time. The frame images are ordered according to time to obtain a continuous frame image time sequence, and when the frame images are played, the original time points are restored, and each frame image is played in sequence, so that a continuous dynamic video can be formed.

the frame image at time t represents a frame image for display at a certain time, and time t +1 represents a frame image for display at the next time, where t represents the sequence number of the frame image at the time in the continuous frame image time sequence, and is a natural number.

The optical flow is the moving speed and moving direction of each pixel in a continuous time series of frame images from the frame image at time t to the frame image at time t + 1.

For example, in the frame image at time t, the position of the pixel point a is (x1, y 1); if the position of the pixel point a in the frame image at time t +1 is (x2, y2), it indicates that the pixel point a has moved from (x1, y1) to (x2, y2), and if (x2, y2) - (x1, y1) ═ (ux, vy), the optical flow from the frame image at time t to the frame image at time t +1 is: (ux, vy).

Determining the optical flow from the frame image at the time t to the frame image at the time t +1, and converting (warp) the frame image at the time t +1 and the optical flow to obtain the frame image at the time t; or converting the frame image at the time t and the optical flow to obtain a frame image at the time t + 1. By utilizing the method, namely the optical flow method, the frame insertion can be carried out on the frame image time sequence of the original video, and a more dense frame image time sequence is obtained, so that the video is more smooth in playing.

Fig. 1 shows a flow chart of frame interpolation using the existing optical flow method. Firstly, performing optical flow operation according to a frame image at a time t-1 and a frame image at a time t +1 to respectively obtain an optical flow from the frame image at the time t-1 to the frame image at the time t +1 and an optical flow from the frame image at the time t +1 to the frame image at the time t-1; then, carrying out linear estimation according to the optical flow from the frame image at the t-1 moment to the frame image at the t +1 moment and the frame image at the t-1 moment to obtain a first frame image at the t moment, and carrying out linear estimation according to the optical flow from the frame image at the t +1 moment to the frame image at the t-1 moment and the frame image at the t +1 moment to obtain a second frame image at the t moment; and then, synthesizing the first t-time frame image and the second t-time frame image to obtain a synthesized t-time frame image. The synthesized frame image at the time t can be inserted between the frame image at the time t-1 and the frame image at the time t +1, so that the video is smoother.

However, the above-mentioned optical flow method of frame interpolation has a high requirement for accuracy of optical flow prediction, and when the accuracy is insufficient, an image is blurred; the optical flow method requires constant brightness, and the image is distorted when the illumination condition changes.

To this end, the present application provides an image processing method as shown in fig. 2. Referring to fig. 2, the method includes: operation 210, performing optical flow operation according to a first frame image at a time t-1 and a second frame image at a time t +1 to obtain a first optical flow and a second optical flow respectively, where t is a natural number, where the first optical flow is an optical flow from a frame image at the time t-1 to a frame image at the time t +1, and the second optical flow is an optical flow from the frame image at the time t +1 to the frame image at the time t-1; operation 220, performing a depth operation according to the first frame image and the second frame image to obtain a first depth and a second depth, respectively; an operation 230 of determining a first predicted frame image from the first frame image, the first optical flow, and the first depth; an operation 240 of determining a second predicted frame image according to the second frame image, the second optical flow, and the second depth; in operation 250, the first predicted frame image and the second predicted frame image are synthesized to obtain a third predicted frame image.

Operation 210 is similar to the conventional optical flow method in which optical flow calculation is performed according to the frame image at the time t-1 and the frame image at the time t +1 to obtain an optical flow from the frame image at the time t-1 to the frame image at the time t +1 and an optical flow from the frame image at the time t +1 to the frame image at the time t-1, and therefore details are not repeated.

However, unlike the conventional optical flow method, the embodiment of the present application performs operation 220 in addition to operation 210, and performs a depth operation based on the first frame image at time t-1 and the second frame image at time t +1 to obtain a first depth and a second depth. The first depth is obtained by performing depth operation on the first frame image; the second depth is a depth obtained by performing a depth operation on the first frame image.

When a video is captured, a three-dimensional image is projected onto a two-dimensional plane to form a two-dimensional frame image. And the depth is a numerical value indicating the distance between the subject and the lens. The depth calculation refers to a process of obtaining a pixel depth by performing a computer operation (for example, depth calculation according to a formula, depth prediction according to a depth estimation model, or the like) on a two-dimensional picture, and corresponds to an inverse process of a photographing process.

Generally, when a shot object is far away from a lens, the projection is small, and the gray scale is low; when the shot object is close to the lens, the projection is large, and the gray scale is high. Therefore, the probability of a certain pixel point can be judged according to the depth value. For example, if the depth of a certain pixel point is large, that is, the pixel point is far from the lens when being shot, the probability of the occurrence of the certain pixel point is low; conversely, if the depth of a certain pixel point is small, that is, the pixel point is closer to the lens when being shot, the probability of the occurrence of the certain pixel point is high.

In operation 230, an estimated frame image at time t, that is, a first frame image at time t, may be obtained by performing a conversion according to the first frame image and the first optical flow, and then the occurrence probability of each pixel may be obtained by performing a depth operation on the first frame image, and the pixel of the estimated frame image at time t may be enhanced or weakened according to the occurrence probability of each pixel, so as to obtain a first predicted frame image that is clearer, more reasonable, and closer to a real frame image.

Similarly, in operation 240, a conversion may be performed according to the second frame image and the second optical flow, and an estimated frame image at the time t, that is, a frame image at the time t is also obtained, and then the occurrence probability of each pixel point is obtained through a depth operation of the second frame image, and the pixel points of the estimated frame image at the time t are enhanced or weakened according to the occurrence probability of each pixel point, so that a second predicted frame image that is clearer, more reasonable, and closer to the real frame image may be obtained.

Then, according to operation 250, the first predicted frame image and the second predicted frame image are synthesized to obtain a third predicted frame image.

Fig. 3 shows a specific flowchart of another embodiment of the present application to which the above image processing method is applied. The operation shown on the left side of fig. 3 is similar to the conventional optical flow method and will not be described again. Furthermore, in this embodiment, a depth operation (see operation 220 in detail) is added as shown in the right part of fig. 3, and the probability operation is performed on the 2 t-time frame images (the first t-time frame image and the second t-time frame image) at time t obtained according to the optical flow method through the first depth and the second depth obtained by the depth operation to obtain the prediction frame images (the first prediction frame image and the second prediction frame image) subjected to the probability operation processing, see operations 203 and 204 in detail; then, a frame synthesis is performed in operation 205 to obtain a synthesized predicted frame image (third predicted frame image).

In this embodiment, the first depth and the second depth are obtained according to the first frame image at the time t-1, the second frame image at the time t +1, and the depth network model. The depth can be estimated from a single picture through a depth network by using a machine learning method, so that the method is quick and effective.

In addition, when the depth is obtained through the deep network, the granularity (coarse granularity or fine granularity) of the deep network can be controlled through the learning rate defined in the training process.

The operation of depth operation is increased, and the depth is converted into the probability of pixel occurrence; then, according to the pixel probability, the probability operation is performed on the frame image at the time t (which may be defined as a fourth predicted frame image) to obtain a predicted frame image subjected to the operation processing. The first prediction frame image and the second prediction frame image which are processed through the operation are clearer, more reasonable and closer to a real frame image than the first t moment frame image and the second t moment frame image which are processed through the operation.

The larger the depth value is, the farther a shot object is away from a lens, and the lower the probability of pixel points is; the smaller the depth value is, the closer the shot object is to the lens, and the higher the probability of the pixel point is. In this embodiment, a value between (0, 1) is obtained by performing a monotonicity-opposite function operation on the first depth to represent the occurrence probability of each pixel.

The function with the opposite monotonicity represents a function with an opposite relationship between a change in a function value and a change in an argument, for example, an inversely proportional function.

The probability operation is a multiplication operation performed by using the occurrence probability of each pixel point and the fourth predicted frame image.

Accordingly, the predicted frame image synthesized in fig. 3 is also closer to the real frame image than the frame image at time t synthesized for the frame image at time t in fig. 1 and the frame image at time t. The synthesized predicted frame image in fig. 3 is used as the frame image at the time t and inserted between the frame image at the time t-1 and the frame image at the time t-2, so that the video picture can be smoother and the watching experience is better.

Fig. 4 shows a specific flowchart of another embodiment to which the above-described image processing method is applied. This embodiment is used to: test operation and operation results in the automatic test process are drawn as screenshots, and the screenshots are sent to the background every n milliseconds (for example, 1000 milliseconds). After the background receives the screenshot, the background carries out frame insertion by using the image processing method, and the smooth and complete user operation video is restored and synthesized for the research and development personnel to locate the problem scene.

Since the image processing method is based on the optical flow method and the depth estimation, the method is particularly suitable for a test scene comprising the following operations: mouse moving operation; keyboard input operation and option selection operation.

In the embodiment shown in fig. 4, all operations shown in fig. 3 are implemented by the first frame prediction model to complete the frame prediction, wherein the optical flow operation is performed by the optical flow prediction model, and the depth operation is performed by the depth prediction model.

As shown in fig. 4, the process of performing frame interpolation by applying the image processing method in the embodiment of the present application includes: inputting the frame image at the time t-1 and the frame image at the time t-2 into a first frame prediction model, completing the frame prediction flow shown in fig. 3 by the first frame prediction model, and outputting a third predicted frame image (a synthesized predicted frame image); then, inputting the third predicted frame image into a second frame prediction model, wherein the second frame prediction model is a generative confrontation neural network model, and a fifth predicted frame image is generated by the second frame prediction model according to the third predicted frame image; then, a fifth predicted frame image is inserted between the frame image at time t-1 and the frame image at time t-2 as the frame image at time t.

In the application scenario of this embodiment, the frame image needs to be sent to the background server through the communication network, so that before the frame synthesis, the operation of compressing and decompressing the predicted frame image subjected to the probability operation processing by the frame compression and decompression network is further added, so that the data amount transmitted by the network can be reduced as much as possible, and the transmission bandwidth can be saved.

In addition, due to factors such as unstable display voltage of the screen, the brightness may change during the test process, and at this time, a frame image may deform and even an unreasonable frame image may be generated, so that the image may jump and shake, and even a real image at an error moment during the test may not be captured.

For this purpose, the frame interpolation method shown in fig. 4 further adds a second frame prediction model after the third predicted frame image is output by the first prediction model, and determines a fifth predicted frame image according to the third frame image and the second frame prediction model.

Wherein the second frame prediction model is a generative confrontation neural network model, and the generative model and the discriminant model of the generative confrontation neural network model generate the network model based on the same image. Therefore, the fifth frame of image generated by the second frame prediction model can accord with the motion rule and the image logic of the pixel points, so that the inserted frame of image is more reasonable and better connected with the previous frame and the next frame.

In order to make the embodiment shown in fig. 4 capable of performing frame image prediction with higher accuracy, the embodiment also performs three-stage model training as shown in fig. 5.

Wherein, in the training process of the optical flow prediction model, the depth prediction model and the generative antagonistic neural network model, the weight of pre-training is also used. As shown in fig. 5, the specific training process includes:

and the first training stage is used for pre-training an optical flow prediction model and a depth prediction model in the first frame prediction model.

The learning rate during pre-training can be set to a smaller learning rate for coarse-grained depth prediction.

Inputting training data (a frame image at the t-1 moment and a frame image at the t +1 moment marked with a real frame image at the t moment) into a first frame prediction model, performing model operation by the first frame prediction model, outputting a predicted frame image, comparing the predicted frame image with the marked real frame image, calculating a loss function and optimizing the first frame prediction model.

The loss function may use EPE to calculate an average value of difference distances (euclidean distances) between the true values and the predicted values of all the pixel points.

And after the first-stage training is finished, locking model parameters of the optical flow prediction model and the depth prediction model.

And the second training stage is used for carrying out countermeasure training on the generative countermeasure neural network model in the second frame prediction model.

The training data used for carrying out the countermeasure training on the generative countermeasure neural network model in the second frame prediction model comprises the following steps:

taking the real frame image at the time t, the frame image at the time t-1, the frame image at the time t +1 and other real frame images in the continuous frame image time sequence as the input of a generation model of a second frame prediction model, setting a training label to be 1, and training the generation model of the second frame prediction model;

taking the result of the first frame prediction model as the input of the generation model of the second frame prediction model, setting the training label as 1, and training the generation model of the second frame prediction model;

and taking the result of the first frame prediction model as the input of the discrimination model of the second frame prediction model, setting the training label to be 0, and training the discrimination model of the second frame prediction model.

And alternately training the generation model and the discrimination model of the second frame prediction model by using the training data, calculating and calculating a generation type antagonistic neural network loss function after each training, and optimizing the second frame prediction model until convergence.

Wherein, when the second frame prediction model is trained, Binary Cross Entropy (BCE) can be used as a loss function of the generative countermeasure neural network.

As described above, since the generation model and the discrimination stage of the generative antagonistic neural network generate the network model based on the same image, when the training reaches equilibrium, that is, when the discrimination module has not discriminated that the generated picture is false, the generated predicted frame image must be an image frame reasonably connected with the previous and subsequent frames.

And the third training stage is used for carrying out joint training on the first frame prediction model pre-trained in the first training stage and the second frame prediction model subjected to the countertraining in the second training stage.

In the training stage, all parameters of all models can be unlocked, the frame image at the t-1 moment and the frame image at the t +1 moment marked with the real frame image at the t moment are used as training data, the predicted frame image output by the second true prediction model is compared with the real frame image at the t moment to calculate a loss function, and the second frame prediction model is optimized.

It should be noted that the embodiments shown in fig. 4 to fig. 5 are only exemplary illustrations of the image processing method of the present application, and are not limited to the implementation and the application scenarios, and an implementer may flexibly adopt any applicable implementation to apply to any applicable application scenario according to specific implementation requirements and implementation conditions.

Further, the embodiment of the application also provides an image processing device. As shown in fig. 6, the apparatus 60 includes: the optical flow prediction module 601 is configured to perform optical flow calculation according to a first frame image at a time t-1 and a second frame image at a time t +1 to obtain a first optical flow and a second optical flow respectively, where t is a natural number, where the first optical flow is an optical flow from a frame image at the time t-1 to a frame image at the time t +1, and the second optical flow is an optical flow from a frame image at the time t +1 to a frame image at the time t-1; a depth prediction module 602, configured to perform depth operation according to the first frame image and the second frame image to obtain a first depth and a second depth, respectively; a first predicted frame image prediction module 603 configured to determine a first predicted frame image according to the first frame image, the first optical flow, and the first depth; a second predicted frame image prediction module 604 for determining a second predicted frame image based on the second frame image, the second optical flow, and the second depth; and a frame synthesis module 605, configured to synthesize the first predicted frame image and the second predicted frame image to obtain a third predicted frame image.

According to an embodiment of the present application, the depth prediction module 602 is specifically configured to obtain a first depth and a second depth according to a first frame image at a time t-1, a second frame image at a time t +1, and a depth network model.

According to an embodiment of the present application, the first predicted frame image prediction module 603 includes: the fourth prediction frame image prediction sub-module is used for obtaining a fourth prediction frame image according to the first frame image and the first optical flow; the monotonicity opposite function operation submodule is used for carrying out monotonicity opposite function operation on the first depth to obtain the occurrence probability of each pixel point; and the first prediction frame image determining submodule is used for determining the first prediction frame image according to the fourth prediction frame image and the occurrence probability of each pixel point.

According to an embodiment of the present application, the apparatus 60 includes: and the first frame prediction model operation module is used for realizing the process from the optical flow operation of the first frame image at the time of t-1 and the second frame image at the time of t +1 to the synthesis of the first prediction frame image and the second prediction frame image to obtain a third prediction frame image.

According to an embodiment of the present application, the apparatus 60 further includes: and the second frame prediction model operation module is used for determining a fifth prediction frame image according to the third prediction frame image and the second frame prediction model, the second frame prediction model is a generative confrontation neural network model, and the generative confrontation neural network model and the discrimination model generate a network model based on the same image.

According to an embodiment of the present application, the apparatus 60 further includes: and the first frame prediction model training module is used for pre-training the first frame prediction model to obtain the pre-trained first frame prediction model.

According to an embodiment of the present application, the apparatus 60 further includes: the second frame prediction model training module is used for establishing a generative confrontation neural network model based on the pre-trained first frame prediction model to obtain a second frame prediction model; and carrying out countermeasure training on the second frame prediction model to obtain the second frame prediction model after the countermeasure training.

According to an embodiment of the present application, the second frame prediction model training module includes: the training submodule of the generating model is used for taking the result of the first frame prediction model as the input of the generating model of the second frame prediction model, setting the training label as 1 and training the generating model of the second frame prediction model; and the training submodule of the discrimination model is used for taking the result of the first frame prediction model as the input of the discrimination model of the second frame prediction model, setting the training label to be 0 and training the discrimination model of the second frame prediction model.

According to an embodiment of the present application, the apparatus 60 further includes: and the joint training module is used for carrying out joint training on the first frame prediction model and the second frame prediction model after the confrontation training.

Here, it should be noted that: the above description of the image processing embodiment and the above description of the computer storage medium embodiment are similar to the description of the foregoing method embodiments, and have similar beneficial effects to the foregoing method embodiments, and therefore, the description is omitted here for brevity. For technical details that have not been disclosed in the present application for describing embodiments of an image processing apparatus and embodiments of a computer storage medium, please refer to the description of the foregoing method embodiments of the present application for understanding, and therefore will not be described again for brevity.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of a unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another device, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage medium, a Read Only Memory (ROM), a magnetic disk, and an optical disk.

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof that contribute to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a removable storage medium, a ROM, a magnetic disk, an optical disk, or the like, which can store the program code.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of image processing, the method comprising:

performing optical flow operation according to a first frame image at the t-1 moment and a second frame image at the t +1 moment to respectively obtain a first optical flow and a second optical flow, wherein t is a natural number, the first optical flow is from a frame image at the t-1 moment to a frame image at the t +1 moment, and the second optical flow is from the frame image at the t +1 moment to the frame image at the t-1 moment;

performing depth operation according to the first frame image and the second frame image to respectively obtain a first depth and a second depth;

determining a first predicted frame image from the first frame image, the first optical flow, and the first depth;

determining a second predicted frame image from the second frame image, the second optical flow, and the second depth;

and synthesizing the first predicted frame image and the second predicted frame image to obtain a third predicted frame image.

2. The method of claim 1, wherein performing a depth operation according to the first frame image and the second frame image to obtain a first depth and a second depth respectively comprises:

and obtaining a first depth and a second depth according to the first frame image at the time t-1, the second frame image at the time t +1 and the depth network model.

3. The method of claim 1, the determining a first predicted frame image from the first frame image, the first optical flow, and the first depth, comprising:

obtaining a fourth predicted frame image according to the first frame image and the first optical flow;

performing monotonicity opposite function operation on the first depth to obtain the occurrence probability of each pixel point;

and determining a first predicted frame image according to the fourth predicted frame image and the occurrence probability of each pixel point.

4. The method according to claim 1, wherein the process from performing optical flow operation on the first frame image at the time t-1 and the second frame image at the time t +1 to synthesizing the first predicted frame image and the second predicted frame image to obtain the third predicted frame image is implemented by the first frame prediction model.

5. The method of claim 4, after said deriving a third predicted frame picture, further comprising:

and determining a fifth predicted frame image according to the third predicted frame image and a second frame prediction model, wherein the second frame prediction model is a generative confrontation neural network model, and the generative model and the discriminant model of the generative confrontation neural network model generate a network model based on the same image.

6. The method of claim 5, prior to said deriving a third predicted frame picture, further comprising:

and pre-training the first frame prediction model to obtain the pre-trained first frame prediction model.

7. The method of claim 6, prior to said determining a sixth predicted frame picture from said third frame picture and second frame prediction model, further comprising:

establishing a generative confrontation neural network model based on the pre-trained first frame prediction model to obtain a second frame prediction model;

and carrying out countermeasure training on the second frame prediction model to obtain the second frame prediction model after the countermeasure training.

8. The method of claim 7, the opponent training the second frame prediction model, comprising:

taking the result of the first frame prediction model as the input of the generation model of the second frame prediction model, setting a training label as 1, and training the generation model of the second frame prediction model;

and taking the result of the first frame prediction model as the input of the discrimination model of the second frame prediction model, setting a training label to be 0, and training the discrimination model of the second frame prediction model.

9. The method of claim 7, after the countertraining the second frame prediction model to obtain a countertrained second frame prediction model, the method further comprising:

and performing joint training on the first frame prediction model and the confrontation-trained second frame prediction model.

10. An image processing apparatus, the apparatus comprising:

the optical flow prediction module is used for carrying out optical flow operation according to a first frame image at the t-1 moment and a second frame image at the t +1 moment to respectively obtain a first optical flow and a second optical flow, wherein t is a natural number, the first optical flow is from a frame image at the t-1 moment to a frame image at the t +1 moment, and the second optical flow is from the frame image at the t +1 moment to the frame image at the t-1 moment;

the depth prediction module is used for carrying out depth operation according to the first frame image and the second frame image to respectively obtain a first depth and a second depth;

a first predicted frame image prediction module for determining a first predicted frame image from the first frame image, the first optical flow, and the first depth;

a second predicted frame image prediction module for determining a second predicted frame image from the second frame image, the second optical flow, and the second depth;

and the frame synthesis module is used for synthesizing the first prediction frame image and the second prediction frame to obtain a third prediction frame image.