CN116033183A

CN116033183A - Video frame inserting method and device

Info

Publication number: CN116033183A
Application number: CN202211648783.5A
Authority: CN
Inventors: 邱慎杰
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-04-28
Also published as: WO2024131035A1

Abstract

The application provides a video frame inserting method and a device, wherein the video frame inserting method comprises the following steps: acquiring a continuous first video frame and a continuous second video frame from a video to be inserted; determining an interframe optical flow diagram corresponding to the first video frame and the second video frame, wherein the interframe optical flow diagram is used for indicating motion information of each pixel point from the first video frame to the second video frame; determining a target frame inserting time between the first video frame and the second video frame based on the inter-frame optical flow diagram; and determining a corresponding target synthetic frame according to the first video frame, the second video frame and the target frame inserting time, and inserting the target synthetic frame between the first video frame and the second video frame. Therefore, the frame insertion is carried out at any position between the first video frame and the second video frame, the characteristics that the frame insertion artifact is less when the frame insertion is closer to any input video frame are utilized, the accuracy of generating the composite frame based on two continuous video frames is improved, and therefore the frame insertion quality and the frame insertion effect are improved.

Description

Video frame inserting method and device

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a video frame inserting method. The present application also relates to a video framing apparatus, a computing device, and a computer-readable storage medium.

Background

With the rapid development of computer technology and network technology, various video layers are endless, and watching video has become an important way for people to work, relax and entertain. In order to improve the frame rate and fluency of the video, a composite frame can be inserted between two continuous video frames in the video, so that the inter-frame display time is shortened.

In the prior art, a picture of a previous frame or a next frame can be copied as a composite frame, and inserted between two continuous video frames, namely, a copied frame; or, the front and rear video frames can be subjected to double exposure-like blurring processing to obtain a composite frame, namely a mixed frame; or, the frame insertion can be performed based on the deep learning model, the optical flow is generated by analyzing and modeling the front frame image and the rear frame image, so that the linear mapping relation between frames is obtained, and finally the composite frame is combined.

However, in the first method, the frame rate is improved by completely copying the previous frame or the next frame, so that visual improvement is not brought, and video impression is sometimes stuck; the second method refers to the information of the front frame and the rear frame, but the simple double exposure blurring can cause serious artifacts, and meanwhile, the video encoding and decoding are additionally burdened by clear one frame blurring; in the third frame inserting method based on the optical flow estimation model, when a large optical flow really exists between the original two frames, serious artifacts can occur in the synthesized frames due to inaccurate optical flow estimation and other reasons. That is, in the existing method for performing video frame interpolation, the accuracy of generating the composite frame based on two continuous frames of video frames when there is a large motion is low, serious artifacts may occur, and the frame interpolation quality and the frame interpolation effect are poor.

Disclosure of Invention

In view of this, the embodiment of the application provides a video frame inserting method. The application relates to a video frame inserting device, a computing device and a computer readable storage medium, so as to solve the technical problems that in the prior art, the accuracy of generating a composite frame based on two continuous frames of video frames is low, serious artifacts can occur, and the frame inserting quality and the frame inserting effect are poor.

According to a first aspect of an embodiment of the present application, there is provided a video frame inserting method, including:

acquiring a continuous first video frame and a continuous second video frame from a video to be inserted;

determining an interframe optical flow diagram corresponding to the first video frame and the second video frame, wherein the interframe optical flow diagram is used for indicating motion information of each pixel point from the first video frame to the second video frame;

determining a target frame inserting time between the first video frame and the second video frame based on the inter-frame optical flow diagram;

and determining a corresponding target synthetic frame according to the first video frame, the second video frame and the target frame inserting time, and inserting the target synthetic frame between the first video frame and the second video frame.

According to a second aspect of an embodiment of the present application, there is provided a video frame inserting apparatus, including:

The acquisition module is configured to acquire a first video frame and a second video frame which are continuous from the video to be inserted;

the first determining module is configured to determine an inter-frame optical flow diagram corresponding to the first video frame and the second video frame, wherein the inter-frame optical flow diagram is used for indicating motion information of each pixel point from the first video frame to the second video frame;

the second determining module is configured to determine a target frame inserting time between the first video frame and the second video frame based on the inter-frame optical flow diagram;

and the inserting module is configured to determine a corresponding target composite frame according to the first video frame, the second video frame and the target inserting frame time, and insert the target composite frame between the first video frame and the second video frame.

According to a third aspect of embodiments of the present application, there is provided a computing device comprising:

a memory and a processor;

the memory is for storing computer executable instructions and the processor is for executing the computer executable instructions to implement the method of:

According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of any video interpolation method.

According to the video frame inserting method, continuous first video frames and continuous second video frames are obtained from the video to be inserted; determining an interframe optical flow diagram corresponding to the first video frame and the second video frame, wherein the interframe optical flow diagram is used for indicating motion information of each pixel point from the first video frame to the second video frame; determining a target frame inserting time between the first video frame and the second video frame based on the inter-frame optical flow diagram; and determining a corresponding target synthetic frame according to the first video frame, the second video frame and the target frame inserting time, and inserting the target synthetic frame between the first video frame and the second video frame.

In this case, the target frame inserting time between the first video frame and the second video frame can be determined based on the inter-frame optical flow diagram corresponding to the first video frame and the second video frame, then the target composite frame corresponding to the target frame inserting time is determined based on the first video frame and the second video frame, the frame inserting time is adaptively guided to move forwards or backwards and is closer to the first video frame or the second video frame, and the situation that the obtained target composite frame is greatly different from the front video frame and the rear video frame is avoided. Therefore, the frame insertion is carried out at any position between the first video frame and the second video frame, the characteristics that the frame insertion artifact is less when the frame insertion is closer to any input video frame are utilized, the accuracy of generating a composite frame based on two continuous video frames is improved, the artifact possibly occurring in large motion in the two video frames is greatly improved, and the frame insertion quality and the frame insertion effect are improved.

Drawings

FIG. 1 is a flowchart of a video frame inserting method according to an embodiment of the present application;

FIG. 2 is an interframe light flow diagram provided by one embodiment of the present application;

FIG. 3a is a schematic diagram illustrating a processing procedure of a video frame inserting method applied to a double frame inserting scene according to an embodiment of the present application;

FIG. 3b is a schematic diagram of a first video frame according to an embodiment of the present application;

FIG. 3c is a schematic diagram of a second video frame according to one embodiment of the present application;

FIG. 3d is a schematic diagram of a composite frame according to one embodiment of the present application;

FIG. 3e is a schematic diagram of another composite frame provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a video frame inserting device according to an embodiment of the present application;

FIG. 5 is a block diagram of a computing device according to one embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.

The terminology used in one or more embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of one or more embodiments of the application. As used in this application in one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present application will be explained.

Video interpolation: the method mainly aims at increasing 1 frame between two continuous video frames, namely 2 times of inserting frames.

Optical flow: the instantaneous speed of the pixel motion of a spatially moving object in the viewing imaging plane.

Artifacts: in the composite frame, artifacts, abnormal, and artifacts are visible.

It should be noted that, there are three general methods for inserting frames at present: the first method can copy the picture of the previous frame or the next frame as a composite frame, and insert the composite frame between two continuous video frames, namely, copy the frame; in the second method, the front video frame and the rear video frame can be subjected to double exposure-like blurring processing to obtain a composite frame, namely a mixed frame; the third method can insert frames based on a deep learning model, generate optical flow by analyzing and modeling the front frame picture and the rear frame picture so as to obtain an inter-frame linear mapping relation, and finally combine the synthesized frames.

In the above mentioned three frame inserting methods, the first method relies on copying the previous frame or the next frame completely to improve the frame rate, but in practical application, visual improvement is not brought, and sometimes video looking and blocking is caused; in the second method, although the information of the front frame and the rear frame is referred, the simple double exposure blurring can cause serious artifacts, and meanwhile, the video encoding and decoding are additionally burdened by clear one frame blurring; the third method is based on the frame inserting method of the deep learning model, and because the mapping relation between the target intermediate frame and the front frame and the rear frame is effectively modeled by means of fitting and the like, the synthesized frame is restored, compared with the first frame inserting method and the second frame inserting method, the synthesized frame is more reasonable, and a large number of practical application results also show that the frame inserting result based on the deep learning model is far better than that of the frame inserting method of the copy frame and the mixed frame.

Although the deep learning model-based interpolation method has been able to produce sufficiently satisfactory results, one of the common problems is that: when a large optical flow actually exists between the original two frames, serious artifacts can occur in the synthesized target synthesized frames due to inaccurate optical flow estimation and the like. For example, when large optical flow is caused by a swinging arm, the resultant moving arm often appears to be "amputated"; when large optical flow is caused by rapid movement of foreground/background, the synthesized foreground/background often appears as blur, etc.

The current common interpolation frame is 2 times interpolation frame, namely interpolation frame is carried out at the middle position of two frames, and the optical flow generated by the model at the position is relatively inaccurate, because the distance between the position and the 1 st frame (position 0) and the 2 nd frame (position 1) is the farthest, and the longer time distance means larger optical flow and larger difference from the original image. When a large optical flow exists between two input video frames, the optical flow estimated by the model is often inaccurate, so that the effect of the composite frames is affected, and the overall video appearance is adversely affected.

Aiming at the defect that the deep learning model has poor effect on the large motion scene frame insertion generally, the frame insertion model based on the optical flow can be used for frame insertion at any time position (0-1) between two frames, and the time is closer to any input frame, the frame insertion artifact is less, the size of the optical flow between frames is judged by means of an additional optical flow estimation model to determine the frame insertion time position, so that the problem that serious artifact is likely to occur when the model directly inserts frames at the middle time is avoided, and the sensory effect of a final frame insertion result is greatly improved.

In the present application, a video frame inserting method is provided, and the present application relates to a video frame inserting apparatus, a computing device, and a computer readable storage medium, which are described in detail in the following embodiments.

It should be noted that, when the user edits the video through the video processing platform, the smoothness of the video may be improved by inserting frames, or the video may be slowly played without causing video clip, and these functions may be implemented by a video frame inserting method, so that improving the quality of the frames plays a vital role for the user experience.

Fig. 1 shows a flowchart of a video frame inserting method according to an embodiment of the present application, which specifically includes the following steps:

step 102: and acquiring continuous first video frames and second video frames from the video to be inserted.

It should be noted that, the video to be inserted is a video in which an intermediate composite frame is to be inserted between two video frames that are consecutive in time sequence of the video. The first video frame and the second video frame are two continuous frames in the video to be inserted, and the first video frame is positioned before the second video frame in time sequence.

In practical application, the video processing platform can intercept the video to be inserted according to the set frequency to obtain each video frame included in the video to be inserted, select two continuous video frames, and use the video frame with the front time sequence as the first video frame and the rear time sequence as the second video frame.

For example, assume that a video to be inserted is intercepted to obtain a video frame 1, a video frame 2, a video frame 3, … …, a video frame N-1 and a video frame N, and the video frame 1, the video frame 2, the video frame 3, … …, the video frame N-1 and the video frame N are sequentially arranged according to a time sequence, at this time, the video frame 1 and the video frame 2 are continuous, the video frame 1 is a first video frame, and the video frame 2 is a second video frame; the video frames 2 and 3 are continuous, the video frame 2 is a first video frame, and the video frame 3 is a second video frame; … …; the video frame N-1 is continuous with the video frame N, the video frame N-1 is a first video frame, and the video frame N is a second video frame.

In an optional implementation manner of this embodiment, after the continuous first video frame and the second video frame are obtained from the video to be inserted, scaling processing may be further performed on the first video frame and the second video frame, that is, after the first video frame and the second video frame are obtained from the video to be inserted, the method further includes:

and scaling the first video frame and the second video frame to set multiples to obtain a first updated video frame and a second updated video frame.

Specifically, the set multiple may be a preset value for indicating the scaling of the first video frame and the second video frame, and in order to reduce the sizes of the first video frame and the second video frame to increase the analysis speed, the set multiple may be set to be less than 1, for example, the set multiple is 0.25 times, 0.5 times, or the like.

Since the smaller the scale, the faster the analysis speed, but the corresponding accuracy will decrease, the analysis speed and the analysis accuracy need to be considered at the same time when setting the set multiple, for example, 0.25 times is a value that is compatible with the analysis speed and the optical flow estimation accuracy with respect to other multiples, that is, preferably, the set multiple may be set to 0.25.

In practical application, scaling of the first video frame can be achieved through nearest neighbor interpolation, cubic spline interpolation, linear interpolation, region interpolation and other methods, and the first video frame and the second video frame are scaled to set multiples so as to obtain a first updated video frame and a second updated video frame.

For example, assume that the first video frame is video frame I ₀ The second video frame is video frame I ₁ At this time, I can be ₀ And I ₁ Scaled to 0.25 times the original size, i.e. 4 times downsampled, to obtain an updated video frame I ₀ ' and I ₁ ' the follow-up may be based on the updated video frame I ₀ ' and I ₁ ' processing is carried out, and a corresponding inter-frame optical flow diagram is determined.

In this embodiment of the present application, after continuous first video frames and second video frames are obtained from a video to be inserted, scaling may be further performed on the first video frames and the second video frames, so as to reduce the sizes of the first video frames and the second video frames, and improve the analysis speed of the video frames when optical flow analysis is performed subsequently on the basis of ensuring a certain accuracy.

Step 104: and determining an inter-frame optical flow diagram corresponding to the first video frame and the second video frame, wherein the inter-frame optical flow diagram is used for indicating motion information of each pixel point from the first video frame to the second video frame.

It should be noted that, after the first video frame and the second video frame are acquired, an interframe light flow graph corresponding to the first video frame and the second video frame may be determined, where the light flow is a motion vector and may represent a motion direction and a motion distance of the pixel point, so that the interframe light flow graph may indicate motion information of each pixel point from the first video frame to the second video frame, that is, may indicate a change condition of each pixel point.

In practical application, the first video frame and the second video frame may be input into the trained optical flow estimation model, to obtain an inter-frame optical flow map output by the optical flow estimation model. The optical flow estimation model may be any optical flow estimation model based on deep learning (for example, RAFT (Recurrent All Pairs Field Transforms for Optical Flow, recursive full-pair field transformation of optical flow field), a new optical flow deep neural architecture, flowNet (Learning Optical Flow with Convolutional Networks, neural optical flow network), implementing optical flow prediction by convolution network, … …, etc.), or a conventional optical flow estimation algorithm (for example, lucas-Kanade, an optical flow estimation algorithm of two-frame difference).

The inter-frame optical flow graph may represent an inter-frame optical flow size from a first timestamp to a second timestamp (or from the second timestamp to the first timestamp, where no direction is considered), where the first timestamp is a timestamp of a first video frame, and the second timestamp is a timestamp of a second video frame. Fig. 2 is an inter-frame optical flow chart according to an embodiment of the present application, as shown in fig. 2, a pixel value of each pixel point in the inter-frame optical flow chart may represent an optical flow intensity of the pixel point, where, for example, the larger the pixel value, the darker the color, which represents the greater the optical flow intensity.

In addition, training data of the optical flow estimation model is mostly synthesized manually, for example, in the 3D game making process, a computer usually generates motion vectors of game characters and scenes, namely optical flows, at this time, two continuous frames of images and corresponding optical flow diagrams can be used as a pair of training data, at this time, the input of the initial model is two frames of images, the labels are optical flow diagrams in the training data, and the initial model is trained in a supervised training mode to obtain a trained optical flow estimation model.

In an optional implementation manner of this embodiment, if scaling processing is performed after the first video frame and the second video frame are acquired, then determining an interframe light flow graph corresponding to the first video frame and the second video frame may be implemented as follows:

And inputting the first updated video frame and the second updated video frame into the trained optical flow estimation model to obtain an inter-frame optical flow diagram output by the optical flow estimation model.

If the first video frame and the second video frame are obtained, the first video frame and the second video frame are scaled to a set multiple to obtain a first updated video frame and a second updated video frame, and at this time, the first updated video frame and the second updated video frame may be input into the optical flow estimation model after training, so as to obtain an inter-frame optical flow map output by the optical flow estimation model. In this way, the sizes of the first video frame and the second video frame are reduced, the first updated video frame and the second updated video frame with reduced sizes are analyzed through the optical flow estimation model, the corresponding inter-frame optical flow diagram is determined, and the analysis speed of the video frames is improved on the basis of ensuring a certain accuracy.

Step 106: based on the inter-frame optical flow map, a target inter-frame time between the first video frame and the second video frame is determined.

It should be noted that, because the inter-frame optical flow diagram is used to indicate the motion information of each pixel point from the first video frame to the second video frame, the motion change condition of each pixel point from the first video frame to the second video frame can be analyzed based on the inter-frame optical flow diagram, and the target frame inserting time suitable for inserting the video frame between the first video frame and the second video frame can be determined.

The video frame inserting method provided by the embodiment of the invention can be applied to twice frame inserting, namely, one frame of video frame is inserted between two continuous video frames, and in specific implementation, the corresponding synthesized frame is not directly inserted at the middle position of the two continuous frames of video frames, but the motion change condition of each pixel point from the first video frame to the second video frame is analyzed, the target frame inserting moment suitable for inserting the video frame between the first video frame and the second video frame is determined, and the artifact caused by large-amplitude motion is avoided.

Of course, in practical application, the video frame inserting method provided in the embodiment of the present application may also be applied to four-time frame inserting, eight-time frame inserting, and other scenes, and only the corresponding number of target frame inserting moments suitable for inserting video frames between the first video frame and the second video frame need to be determined based on the inter-frame optical flow diagram.

In the embodiment of the present application, the determined target frame inserting time may be any time position between the first video frame and the second video frame, that is, in the embodiment of the present application, the magnitude of the inter-frame optical flow may be determined by means of an additional optical flow estimation model, so as to determine the frame inserting time position, so that the frame is inserted at any time position between two video frames, thereby avoiding the problem that serious artifacts may occur when the frame is directly inserted at the middle time, and greatly improving the sensory effect of the final frame inserting result.

In an optional implementation manner of this embodiment, based on the interframe light flow graph, the target frame inserting time between the first video frame and the second video frame is determined, and the specific implementation process may be as follows:

determining an optical flow intensity index between the first video frame and the second video frame based on the inter-frame optical flow map;

and determining the target frame inserting time corresponding to the optical flow intensity index according to the corresponding relation between the optical flow intensity range and the frame inserting time.

It should be noted that, the correspondence between the optical flow intensity range and the frame inserting time may be preset in the video processing platform, and after determining the optical flow intensity index between the first video frame and the second video frame based on the inter-frame optical flow map, the target frame inserting time corresponding to the optical flow intensity index may be determined directly based on the correspondence between the optical flow intensity range and the frame inserting time.

In practical application, the smaller the optical flow intensity index is, the smaller the movement is, and the middle time can be selected as the target frame inserting time; the greater the optical flow intensity index, the greater the motion should be, and the closer to the first video frame or the second video frame is to the back as much as possible, so in the correspondence between the optical flow intensity range and the frame inserting time, one optical flow intensity range may correspond to two frame inserting times, one frame inserting time is shifted forward to be close to the first video frame, the other frame inserting time is shifted backward to be close to the second video frame, and the forward shift value and the backward shift value are symmetrical. When the target frame inserting time corresponding to the optical flow intensity index is determined based on the corresponding relation between the optical flow intensity range and the frame inserting time, the frame inserting time which is shifted forwards or the frame inserting time which is shifted backwards can be arbitrarily selected, but the selection rule is consistent for one video to be inserted, namely, for one video to be inserted, when the optical flow intensity index is large, each frame inserting is shifted forwards or is shifted backwards.

For example, the table of the correspondence between the preset optical flow intensity range and the frame inserting time is shown in the following table 1, and it is assumed that, based on the inter-frame optical flow map, the optical flow intensity index between the first video frame and the second video frame is determined to be 30, and if the optical flow intensity index is large, the frame inserting time is shifted to the left, and based on the following table 1, it is possible to determine the corresponding target frame inserting time t=0.25.

Table 1 correspondence table between optical flow intensity ranges and frame insertion times

Optical flow intensity range	Time of inserting frame
		Less than or equal to 15	0.5
More than 15 and less than or equal to 20	0.4 (or 0.6)
		More than 20 and less than or equal to 25	0.3 (or 0.7)
Greater than 25	0.25 (or 0.75)

In the embodiment of the application, the pixel values of all the pixel points in the inter-frame optical flow graph can be analyzed to analyze the optical flow intensity of all the pixel points, so as to determine the optical flow intensity index between the first video frame and the second video frame; then, according to the corresponding relation between the optical flow intensity range and the frame inserting time, the target frame inserting time corresponding to the optical flow intensity index can be determined. Therefore, the target frame inserting time corresponding to the optical flow intensity index can be determined through the preset corresponding relation, the operation is simple, the analysis efficiency is improved, the specific frame inserting time position is determined by judging the size of the optical flow between frames by means of the additional optical flow estimation model, the problem that serious artifacts are likely to occur when the model is directly inserted at the middle time is avoided, and the sensory effect of the final frame inserting result is greatly improved.

In an optional implementation manner of this embodiment, determining an optical flow intensity index between the first video frame and the second video frame based on the inter-frame optical flow map includes:

determining a horizontal axis component of a target pixel point in the horizontal axis direction and a vertical axis component of the target pixel point in the vertical axis direction, wherein the target pixel point is any pixel point in an interframe light flow graph;

determining an average transverse axis component based on the transverse axis components of each pixel point in the interframe light flow graph, and determining an average longitudinal axis component based on the longitudinal axis components of each pixel point in the interframe light flow graph;

an optical flow intensity index between the first video frame and the second video frame is determined based on the transverse axis component, the longitudinal axis component, the average transverse axis component, and the average longitudinal axis component.

In practical application, the optical flow intensity index between the first video frame and the second video frame can be determined by the following formula (1):

Ind＝max(max ₉₉ (F _x )/max(1.0，mean(abs(F _x )))，max ₉₉ (F _y )/max(1.0，mean(abs(F _y )))) (1)

wherein Ind represents an optical flow intensity index; max is the maximum function, min is the minimum function, mean is the average function, abs is the absolute function. F (F) _x A transverse axis component F in the transverse axis direction of the optical flow F of the target pixel point _y A longitudinal axis component in the longitudinal axis direction of the optical flow F that is the target pixel point; max (max) ₉₉ The maximum value is represented, and a value of 99 bits in the optical flow of each pixel point is taken as a maximum optical flow value, so as to eliminate the influence of an abnormally large value on the final result.

As can be seen from the above formula (1), the ratio of the maximum optical flow value to the average optical flow value on the two components is calculated, and the maximum value of the optical flow is not directly used as the final optical flow intensity index, so that the action of the motion amplitude in the inter-frame optical flow map can be reduced. Further, since the value of mean (abs (Fx)) may be less than 1, setting the lower limit of the average light flow value to 1 in the above formula (1) can prevent the result of the optical flow intensity index obtained when the average light flow value is less than 1 from being amplified, and also prevent the influence of an abnormally small value on the result, and then take the largest ratio of the two components as the final optical flow intensity index.

It should be noted that, the optical flow is a vector, the commonly obtained inter-frame optical flow map is image data of two channels (h, w, 2) and the same size as the original video frame, if the same object in the two frames of video frames is only one pixel point in the motion process, the distance and direction of the object moving between the two frames of video frames are the optical flow of the object or the pixel point, the moving distance and direction in the horizontal direction are the horizontal axis component of the optical flow in the horizontal axis direction, the moving distance and direction in the vertical direction are the vertical axis component of the optical flow in the vertical axis direction, and the two vectors form the final optical flow vector.

In the embodiment of the application, the optical flow intensity index between the first video frame and the second video frame can be determined based on the inter-frame optical flow map, the target frame inserting time suitable for inserting the video frame between the first video frame and the second video frame can be determined based on the optical flow intensity index, the inter-frame optical flow size is judged by means of the additional optical flow estimation model to determine the specific frame inserting time position, and therefore the problem that serious artifacts are likely to occur when the model is directly inserted at the middle time is avoided, and the sensory effect of a final frame inserting result is greatly improved.

Step 108: and determining a corresponding target synthetic frame according to the first video frame, the second video frame and the target frame inserting time, and inserting the target synthetic frame between the first video frame and the second video frame.

It should be noted that, the determined target frame inserting time is a time suitable for inserting a corresponding composite frame between the first video frame and the second video frame, so that a corresponding target composite frame can be determined according to the first video frame, the second video frame and the target frame inserting time, the target composite frame is a composite frame corresponding to the target frame inserting time, and then the target composite frame can be inserted between the first video frame and the second video frame, so as to realize the insertion of the composite frame between the two video frames.

In particular, although the target composite frame is a mixed result corresponding to the target frame inserting time, when the target composite frame is inserted between the first video frame and the second video frame, the target composite frame may be inserted at any position between the first video frame and the second video frame, and it is only necessary to ensure that the target composite frame is located between the first video frame and the second video frame.

In an optional implementation manner of this embodiment, determining a corresponding target composite frame according to the first video frame, the second video frame, and the target frame insertion time includes:

generating frame inserting time information according to the target frame inserting time;

inputting the first video frame, the second video frame and the frame inserting moment information into a frame inserting model which is completed in training, and obtaining a target composite frame output by the frame inserting model, wherein the target composite frame is a composite frame corresponding to the target frame inserting moment indicated by the frame inserting moment information.

Specifically, the frame inserting time information is information which is generated based on the target frame inserting time and can be identified and analyzed by the frame inserting model, and the frame inserting time information can be an independent vector generated based on the target frame inserting time. In addition, the frame inserting model can be any model (for example: RIFE (Real-Time Intermediate Flow Estimation forVideo Frame Interpolation, a Real-time intermediate stream estimation algorithm), IFRNet (Intermediate Feature Refine Network for Efficient Frame Interpolation, video frame inserting network with only one codec structure), … …, etc.) based on deep learning and supporting frame inserting at any time, and the frame inserting model is adopted because of its fast reasoning speed and capability of frame inserting at any time position between two video frames.

In practical application, a section of training video can be obtained, a plurality of video frames are intercepted from the training video, every three continuous video frames are used as a group of ternary training data, a first video frame and a third video frame in each group of ternary training data are used as input data to be input into an initial model, the middle video frame is used as a sample label, and the initial model is trained in a supervised training mode to obtain a training-completed inserted frame model.

It should be noted that, the first video frame, the second video frame and the frame inserting time information can be input into the frame inserting model after training, the frame inserting model can obtain the target synthetic frame output by aiming at the target frame inserting time, the frame inserting model based on the optical flow can be utilized to insert frames at any time position between two frames, and the time is closer to any input frame, the frame inserting artifact is less, the frame inserting time position is determined by judging the size of the optical flow between frames by means of the additional optical flow estimation model, so that the problem that serious artifact is likely to occur when the model inserts frames directly at the middle time is avoided, and the sensory effect of the final frame inserting result is greatly improved.

In addition, if the first video frame and the second video frame are obtained and then scaled, a first updated video frame and a second updated video frame are obtained, and the first updated video frame and the second updated video frame are only used for determining an interframe light flow graph by the light flow estimation model, so that the interpolation frame model has enough reference information, and the first video frame and the second video frame which are originally obtained are always input into the interpolation frame model.

In an optional implementation manner of this embodiment, the first video frame, the second video frame, and the frame inserting time information are input into the frame inserting model after training is completed, and the target composite frame output by the frame inserting model is obtained, and the specific implementation process may be as follows:

inputting the first video frame, the second video frame and the frame inserting time information into an optical flow analysis layer of the frame inserting model, and determining a first optical flow from a first time stamp to a target frame inserting time and a second optical flow from the target frame inserting time to a second time stamp through the optical flow analysis layer, wherein the first time stamp is the time stamp of the first video frame, and the second time stamp is the time stamp of the second video frame;

sampling from a first video frame based on a first optical flow through a sampling layer of the frame insertion model to obtain a first sampling result, and sampling from a second video frame based on a second optical flow to obtain a second sampling result;

and fusing the first sampling result and the second sampling result based on the set fusion weight through a fusion layer of the frame insertion model to obtain and output a target synthetic frame.

In practical application, two continuous video frames and target frame inserting time needing frame inserting are input into a frame inserting model, an optical flow analysis layer of the frame inserting model generates a first optical flow from a first time stamp to the target frame inserting time and a second optical flow from the target frame inserting time to a second time stamp, then the first sampling result and the second sampling result are obtained by sampling from two input video frames respectively through a layer mapping operation (warp operation), and the two sampling results are fused according to fusion weights generated by the frame inserting model through a fusion layer of the frame inserting model, so that a target synthetic frame corresponding to the final target frame inserting time is generated.

It should be noted that, the motion direction and the size of a certain pixel point in the first video frame between two video frames can be known from the obtained optical flows, and then the corresponding pixel point can be mapped back to the corresponding coordinate position according to the obtained two optical flows through the mapping operation (warp operation), so as to realize sampling of the two input video frames. In addition, the fusion weight is set as an intermediate output result of the frame insertion model, and the frame insertion model can calculate the weight of each pixel point of the front video frame and the rear video frame by itself. For example, both the pixel point a of the first video frame and the pixel point b of the second video frame are mapped to a certain position C of the middle composite frame, but only one pixel point is placed at a position, and the specific gravity of the two pixel points a and b in C needs to be weighed, and the above-mentioned set fusion weight can be used for balancing.

In the embodiment of the application, the optical flow between two frames can be obtained by means of an additional optical flow estimation model, and the optical flow is used as a priori information to adaptively guide the frame inserting time of the frame inserting model to move forwards or backwards. Moving forward, the target composite frame is closer to the first video frame, at the moment, the optical flow from the first time stamp to the target frame inserting time is relatively smaller, although the optical flow from the target frame inserting time to the second time stamp is quite large, even if the optical flow from the target frame inserting time to the second time stamp is inaccurately estimated, the fusion weight of the optical flow from the target frame inserting time to the second time stamp is quite small due to a frame inserting model mechanism in the fusion process, so that the final result is not influenced, the finally smaller fusion weight can reduce the influence degree of the optical flow from the target frame inserting time to the second time stamp on the result, and the situation that the optical flow is unlike the front and rear frames is not caused is ensured, but the optical flow is closer to the first video frame; similarly, if the target composite frame time is shifted backward, the target composite frame can be made more like the second video frame. Therefore, the characteristics of less frame inserting artifacts of any input frame at any moment and the closer the frame inserting is to any input frame are utilized, and the problem of artifacts of the frame inserting at the middle moment caused by large motion is greatly solved.

In an optional implementation manner of this embodiment, after inserting the target composite frame between the first video frame and the second video frame, the method further includes:

determining whether a frame inserting ending condition is met currently;

if the frame inserting ending condition is met, taking the video after the frame is synthesized by the inserting target as the obtained frame inserting video;

if the frame inserting ending condition is not met, the operation step of determining the continuous first video frame and the continuous second video frame from the video to be inserted is continuously carried out.

Specifically, the frame inserting end condition is a preset condition that needs to be met when the frame inserting of the video to be inserted is completed, for example, the frame inserting end condition can be that the target of the insertion is completed between any two continuous video frames in the video to be inserted; alternatively, if the frame insertion end condition is set, the insertion target composite frame is completed between two consecutive video frames in the video to be inserted.

In practical application, inserting the target synthesized frame between the first video frame and the second video frame, and then determining whether the inserting frame end condition is met currently; if the frame inserting ending condition is met, the frame inserting is completed, and the video after the frame is synthesized by the inserting target can be used as the finally obtained frame inserting video; if the frame inserting ending condition is not met, the operation steps of determining the continuous first video frame and the continuous second video frame from the video to be inserted are continuously executed, and the composite frame is continuously inserted at the target frame inserting moment between the two video frames until the frame inserting ending condition is met.

It should be noted that, the video frame inserting method provided by the embodiment of the application can be used as a plug-and-play expansion module and can be used for any frame inserting model supporting frame inserting at any time.

According to the video frame inserting method, the target frame inserting time between the first video frame and the second video frame can be determined firstly based on the inter-frame optical flow diagram corresponding to the first video frame and the second video frame, then the target synthetic frame corresponding to the target frame inserting time is determined based on the first video frame and the second video frame, the frame inserting time is adaptively guided to move forwards or backwards and is closer to the first video frame or the second video frame, and the situation that the obtained target synthetic frame is greatly different from the front video frame and the rear video frame is avoided. Therefore, the frame insertion is carried out at any position between the first video frame and the second video frame, the characteristics that the frame insertion artifact is less when the frame insertion is closer to any input video frame are utilized, the accuracy of generating a composite frame based on two continuous video frames is improved, the artifact possibly occurring in large motion in the two video frames is greatly improved, and the frame insertion quality and the frame insertion effect are improved.

The following describes a video frame inserting method by taking an example of application of the video frame inserting method provided in the present application in a double frame inserting scene with reference to fig. 3 a. Fig. 3a is a schematic diagram illustrating a processing procedure of a video frame inserting method applied to a double frame inserting scene according to an embodiment of the present application, fig. 3b is a schematic diagram of a first video frame according to an embodiment of the present application, fig. 3c is a schematic diagram of a second video frame according to an embodiment of the present application, fig. 3d is a schematic diagram of a composite frame according to an embodiment of the present application, and fig. 3e is a schematic diagram of another composite frame according to an embodiment of the present application, which specifically includes the following steps:

two video frames I, which are known to be temporally consecutive ₀ And I ₁ As shown in fig. 3b and 3c, the target composite frame I between two video frames is now to be composed _t Wherein t is any floating point number between 0 and 1, and the specific implementation manner can be as follows:

two consecutive video frames I are processed ₀ And I ₁ Inputting an optical flow estimation model F, outputting a corresponding inter-frame optical flow diagram, calculating an optical flow intensity index Ind according to the obtained inter-frame optical flow diagram, and determining a target frame inserting time t of the frame inserting according to the optical flow intensity index Ind; then, video frame I ₀ And I ₁ Inputting the frame inserting model M at the target frame inserting time t, and outputting a final target synthetic frame I _t Synthesizing the target into a frame I _t Insertion into two video frames I ₀ And I ₁ At any position in between.

It should be noted that when the optical flow-based frame insertion model processes complex motion or large motion, serious artifacts usually occur due to an error in optical flow estimation, in practical application, common frames are inserted into the video twice, that is, a frame is inserted between two continuous video frames, the frame insertion time is t=0.5, the time is farthest from both ends (0, 1), and when the large optical flow exists between two input video frames, the modeling capability of the frame insertion model on the large motion is poor, so that the final result is as shown in fig. 3d, and the waving forearm disappears.

In this embodiment, after a specific target frame inserting moment is determined by introducing an optical flow intensity index, a frame inserting model can learn in advance that an optical flow between two video frames is too large and may cause an artifact to appear in a frame inserting result, so that the most reasonable t=0.5 moment in the frame inserting moment can be abandoned, the frame inserting moment (the target frame inserting moment calculated based on the optical flow intensity index) with the possibly better t=0.25 moment in the frame inserting result is adopted, the later is more easily fitted by the frame inserting model and the artifact is less likely to appear, the finally obtained target is still complete and has no "broken limb" as shown in fig. 3e, although the arm in the figure only swings to the 0.25 moment, the continuity in the video playing process may be weaker than the 0.5 moment under ideal conditions, but the integrity is far higher than the 0.5 moment under actual conditions, so that under the loss of smoothness, the frame inserting is assisted, the video is obtained, the integrity is more easily obtained, and the full-cut image is more easily found to be more complete because the full-cut-off effect is more easily found in the human eyes, and the full-cut-off effect is more easily found in the human eyes due to the fact that the full-cut-off image is more completely and the full-cut-off image is more easily found in the human eyes.

Corresponding to the above method embodiment, the present application further provides an embodiment of a video frame inserting apparatus, and fig. 4 shows a schematic structural diagram of a video frame inserting apparatus according to an embodiment of the present application. As shown in fig. 4, the apparatus includes:

an acquisition module 402 configured to acquire consecutive first video frames and second video frames from a video to be interpolated;

a first determining module 404 configured to determine an inter-frame optical flow graph corresponding to the first video frame and the second video frame, where the inter-frame optical flow graph is used to indicate motion information of each pixel from the first video frame to the second video frame;

a second determining module 406 configured to determine a target frame insertion time between the first video frame and the second video frame based on the inter-frame optical flow map;

the inserting module 408 is configured to determine a corresponding target composite frame according to the first video frame, the second video frame and the target inserting frame time, and insert the target composite frame between the first video frame and the second video frame.

Optionally, the second determining module 406 is further configured to:

Optionally, the insertion module 408 is further configured to:

Optionally, the apparatus further comprises a scaling module configured to:

scaling the first video frame and the second video frame to a set multiple to obtain a first updated video frame and a second updated video frame;

accordingly, the first determination module 404 is further configured to:

Optionally, the apparatus further comprises a third determining module configured to:

determining whether a frame inserting ending condition is met currently;

if the end of frame condition is not satisfied, the operation of the acquisition module 402 continues to be returned.

According to the video frame inserting device, the target frame inserting time between the first video frame and the second video frame can be determined firstly based on the inter-frame optical flow diagram corresponding to the first video frame and the second video frame, then the target synthetic frame corresponding to the target frame inserting time is determined based on the first video frame and the second video frame, the frame inserting time is adaptively guided to move forwards or backwards and is closer to the first video frame or the second video frame, and the situation that the obtained target synthetic frame is greatly different from the front video frame and the rear video frame is avoided. Therefore, the frame insertion is carried out at any position between the first video frame and the second video frame, the characteristics that the frame insertion artifact is less when the frame insertion is closer to any input video frame are utilized, the accuracy of generating a composite frame based on two continuous video frames is improved, the artifact possibly occurring in large motion in the two video frames is greatly improved, and the frame insertion quality and the frame insertion effect are improved.

The above is a schematic scheme of a video frame inserting apparatus of this embodiment. It should be noted that, the technical solution of the video frame inserting apparatus and the technical solution of the video frame inserting method belong to the same concept, and details of the technical solution of the video frame inserting apparatus, which are not described in detail, can be referred to the description of the technical solution of the video frame inserting method.

FIG. 5 illustrates a block diagram of a computing device provided in accordance with an embodiment of the present application. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530 and database 550 is used to hold data.

Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, localAreaNetwork), wide area networks (WAN, wideAreaNetwork), personal area networks (PAN, personalAreaNetwork), or combinations of communication networks such as the internet. The access device 540 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network Interface Controller), such as an IEEE802.11 wireless local area network (WLAN, wireless LocalAreaNetworks) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability forMicrowave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present application, the above-described components of computing device 500, as well as other components not shown in FIG. 5, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 5 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 500 may also be a mobile or stationary server.

Wherein the processor 520 is configured to execute the following computer executable instructions to implement the following method:

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the video frame inserting method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the video frame inserting method.

An embodiment of the present application also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any video interpolation method.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the video frame inserting method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the video frame inserting method.

The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code which may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The above-disclosed preferred embodiments of the present application are provided only as an aid to the elucidation of the present application. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of this application. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This application is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A method for video framing, comprising:

determining an inter-frame optical flow diagram corresponding to the first video frame and the second video frame, wherein the inter-frame optical flow diagram is used for indicating motion information of each pixel point from the first video frame to the second video frame;

determining a target frame insertion time between the first video frame and the second video frame based on the inter-frame optical flow map;

2. The video interpolation method of claim 1, wherein the determining a target interpolation time between the first video frame and the second video frame based on the inter-frame optical flow map comprises:

3. The video interpolation method of claim 2, wherein the determining an optical flow intensity index between the first video frame and the second video frame based on the inter-frame optical flow map comprises:

determining a horizontal axis component of a target pixel point in the horizontal axis direction and a vertical axis component of the target pixel point in the vertical axis direction, wherein the target pixel point is any pixel point in the interframe optical flow graph;

An optical flow intensity index between the first video frame and the second video frame is determined based on the horizontal axis component, the vertical axis component, the average horizontal axis component, and the average vertical axis component.

4. The method of video interpolation according to claim 1, wherein determining a corresponding target composite frame based on the first video frame, the second video frame, and the target interpolation time, comprises:

inputting the first video frame, the second video frame and the frame inserting moment information into a frame inserting model after training is completed, and obtaining a target composite frame output by the frame inserting model, wherein the target composite frame is a composite frame corresponding to the target frame inserting moment indicated by the frame inserting moment information.

5. The method for video frame interpolation according to claim 4, wherein inputting the first video frame, the second video frame, and the frame interpolation time information into a frame interpolation model after training is completed, obtaining a target composite frame output by the frame interpolation model, comprises:

inputting the first video frame, the second video frame and the frame inserting moment information into an optical flow analysis layer of the frame inserting model, and determining a first optical flow from a first time stamp to the target frame inserting moment and a second optical flow from the target frame inserting moment to a second time stamp through the optical flow analysis layer, wherein the first time stamp is a time stamp of the first video frame, and the second time stamp is a time stamp of the second video frame;

Sampling from the first video frame based on the first optical flow through a sampling layer of the frame insertion model to obtain a first sampling result, and sampling from the second video frame based on the second optical flow to obtain a second sampling result;

and fusing the first sampling result and the second sampling result based on the set fusion weight through a fusion layer of the frame inserting model to obtain and output the target synthetic frame.

6. The method for inserting frames according to any one of claims 1 to 5, further comprising, after the first video frame and the second video frame are obtained from the video to be inserted, the steps of:

accordingly, the determining the inter-frame optical flow map corresponding to the first video frame and the second video frame includes:

and inputting the first updated video frame and the second updated video frame into a trained optical flow estimation model to obtain an inter-frame optical flow diagram output by the optical flow estimation model.

7. The video interpolation method according to any one of claims 1 to 5, wherein after the target composite frame is inserted between the first video frame and the second video frame, further comprising:

Determining whether a frame inserting ending condition is met currently;

if the frame inserting ending condition is met, taking the video after the target composite frame is inserted as the obtained frame inserting video;

and if the frame inserting ending condition is not met, continuing to execute the operation step of determining the continuous first video frame and the continuous second video frame from the video to be inserted.

8. A video framing apparatus, comprising:

a first determining module configured to determine an inter-frame optical flow graph corresponding to the first video frame and the second video frame, wherein the inter-frame optical flow graph is used for indicating motion information of each pixel point from the first video frame to the second video frame;

a second determination module configured to determine a target inter-frame time instant between the first video frame and the second video frame based on the inter-frame optical flow map;

9. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions to implement the method of:

10. A computer readable storage medium, characterized in that it stores computer executable instructions which, when executed by a processor, implement the steps of the video interpolation method of any one of claims 1 to 7.