CN111212278A

CN111212278A - Method and system for predicting displacement frame

Info

Publication number: CN111212278A
Application number: CN202010015402.4A
Authority: CN
Inventors: 黄志奇; 陈东义; 杨雁杰
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2020-05-29
Anticipated expiration: 2040-01-07
Also published as: CN111212278B

Abstract

The invention discloses a method and a system for predicting a displacement frame, which solves the problem of halving the effective information of the frame displacement method. The present invention includes a method for predicting displaced frames and a system for implementing the improved frame displacement method. One of the left and right channels of the present invention displays the real frame and the other is the predicted frame, because the data for generating the predicted frame comes from the real frame, but the prediction is different from the actual one, and the real frame and the predicted frame are displayed at the same time. , it really meets the display requirements of 3D video. The 3D effect generated in the channel data of the present invention is controllable and adjustable.

Description

Method and system for predicting displacement frame

Technical Field

The invention relates to the field of information processing, in particular to a method and a system for predicting a displacement frame.

Background

The image display of any screen consists of three signals: a row scanning signal, a column scanning signal, a pixel data signal; the working principle is as follows: the line scanning signal and the column scanning signal move on the screen one by one, just like typing, when moving to a new pixel, the 'pixel data signal' is a new value which represents the color of the pixel, and the 'scanning signal' continuously moves to finally have the color of the whole screen, when the scanning speed is very high (when the scanning is finished, only 1/25 seconds are needed, namely, 25 images can be displayed in 1 second), the displayed video is dynamic. In addition, we refer to a picture in a video as a frame, and for a video, at least 25 frames of video need to be displayed in one second for human eyes to feel that the video is smooth.

The frame displacement method is a method for converting a common 2D video into a 3D video, and the key principle of the 3D video lies in that the slightly different images seen by the left eye and the right eye of a person are utilized, and the left eye and the right eye of the person are required to independently display the approximately same slightly different images when the person wants to display the 3D video. The frame shifting method continuously distributes frames in an input section of common 2D video to displays (VR glasses) corresponding to left and right eyes, and supposing that 60 frames exist in the 2D video within one second, the first frame is used for the left-eye display, the second frame is used for the right-eye display, the third frame is used for the left eye, and so on. When the video input by people is dynamic, the frame-to-frame difference exists (the method is invalid when the video is static), so that the images displayed by the left eye and the right eye of people have the slight difference, the 2D video is converted into the 3D video, and the video displayed by the single-side display is halved into 30 frames/second instead of the original 60 frames/second.

The frame shift method has the following defects:

the frame shift method is actually a 3D effect enhanced by reducing the information amount of the one-sided display (originally, the information amount of one channel is 60 frames/s, and now becomes 30 frames/s);

because odd frames and even frames of the same video are distributed to the left display and the right display, image display of the left display and the right display is actually performed alternately, and actually, no time exists when the left eye and the right eye of a person receive images simultaneously, only because the image updating speed is fast enough, the eyes of the person cannot respond so that the left image and the right image are considered to be displayed simultaneously.

Disclosure of Invention

The invention provides a method and a system for predicting a displacement frame, which solve the problem of halving effective information of a frame displacement method.

The invention is realized by the following technical scheme:

a method of predicting displaced frames, comprising the processing steps of, for a plurality of image frames in a dynamic video:

step 1: obtaining a plurality of image frames on a single channel in a dynamic video and allocating the image frames as real frames to a channel A and a channel B: sequentially numbering a plurality of real frames, sequentially pre-allocating real frames with odd serial numbers to a channel A, sequentially pre-allocating real frames with even serial numbers to a channel B, sequentially numbering the real frames as a real frame 1, a real frame 2, a real frame 3, … and a real frame N, and N is the number of the image frames;

step 2: inserting predicted frames at blank positions in time sequence in a channel A, numbering the predicted frames on the channel A as predicted frames A1, predicted frames A2, predicted frames A3 and …, inserting predicted frames at blank positions in time sequence in a channel B, numbering the predicted frames on the channel B as predicted frames B1, predicted frames B2 and predicted frames B3 and …, numbering the image frames in time sequence in the complete channel A as follows: real frame 1, predicted frame a1, real frame 3, predicted frame a2, real frame 5, predicted frame A3, …, and the image frame numbers in the time sequence in the complete channel B are: true frame 2, predicted frame B1, true frame 4, predicted frame B2, true frame 6, predicted frame B3, …;

and step 3: according to the inter-frame prediction method based on the image block segmentation, according to two real frames in time sequence in each channel, predicting to obtain a first predicted frame behind a second real frame in the two real frames, and sequentially obtaining all predicted frames except a predicted frame A1 and a predicted frame B1;

and 4, step 4: and obtaining all frame data in the respective channels and outputting the frame data backwards.

Further, the channel is a data channel of the 3D device.

Further, the channel a is a left eye channel, and the channel B is a right eye channel.

Further, the channel a is a right eye channel, and the channel B is a left eye channel.

Further, in step 3, a tile partition method in the inter prediction method based on tile partition is detailed as follows:

dividing a whole image into a plurality of image blocks according to the image resolution x y according to the row-column scanning display principle, dividing each image frame into a plurality of image blocks, setting a row scanning counting signal hcnt and a column scanning counting signal vcnt, nesting and circulating the hcnt signal and the vcnt signal, circularly increasing the hcnt from 1 to x, clearing to 1 after x, repeating the operation, increasing the vcnt by 1 after the hcnt counts to x, wherein the vcnt ranges from 1 to y, and obtaining two scanning signals which continuously shift on a screen;

at the same time, according to the division of the row and column n blocks, equally dividing x pixels of the row and y pixels of the column into n parts respectively to obtain n²The method comprises the following steps that (x/n) × (y/n) rectangular blocks are divided by artificially designating pixels;

judging the image blocks to which the current scanning counting signal belongs, uniformly processing the pixel data of the points in the same image block, wherein the processing is different between different image blocks, m (x/n) points (y/n) points exist in any image block, each point has corresponding pixel information, the pixel information is two numbers within 0-255, and performing numerical operation on the m pieces of pixel information to obtain the feature vector of the image blocks, wherein the feature vector has two dimensions, 1 dimension represents the brightness, and the other 1 dimension represents the color.

Further, in the step 3, a prediction method in the inter prediction method based on the tile partition is detailed as follows:

pretreatment: scanning the images in sequence to obtain n²Storing the data of all the characteristic vectors, wherein in step 3, the first real frame image of two real frames in each channel in time sequence is completely scanned, the second real frame image of the two real frames is scanned, and the two real frame images are divided into n according to the image block segmentation method²All the image blocks are stored, and for two image blocks at the same position in the two real frame images;

the prediction algorithm processing process comprises the following steps: inputting the feature vectors of two groups of tiles into a prediction algorithm, wherein the first real frame is a reference frame, the feature vector of the tile in the reference frame is S1 ═ (S1_ liang, S1_ se), the luminance component in the feature vector S1 is S1_ liang, and the color component in the feature vector S1 is divided into two groupsThe quantity is S1_ se, the second real frame is the current frame, the feature vector of the tile in the current frame is S2 ═ (S2_ liang, S2_ se), the luminance component in the feature vector S2 is S2_ liang, and the color component in the feature vector S2 is S2_ se, then the change vector S of two frames before and after the video is S2-S1 can be obtained, the change vector S is applied to the feature vector of the tile in the current frame is S2, the feature vector S3 of the tile of the first predicted frame after the second real frame in the two real frames in step 3 is obtained, the first predicted frame after the second real frame in the two real frames is the frame to be predicted, the feature vector S3 of the block to be predicted, the feature vector S3 is used as the luminance and color data of the whole block to generate a whole new tile, and all n and n are sequentially generated²Obtaining a whole new image after each image block;

and (3) recursion: and similarly, the prediction algorithm processing process is adopted, and new prediction frames are continuously generated and inserted into the channel in sequence by analogy.

Further, the numerical operation is an average operation, a median operation, a mode operation, and the like, and the selection of the numerical operation mode is determined according to the actual effect of the application scene.

Further, in the two channels, the predicted frame a1 and the predicted frame B1 are not displayed, and at the same time, the predicted frame a1 and the predicted frame B1 are not displayed at the frame position corresponding to the real frame 1 in the channel a in the channel B, so that no influence is caused on the actual effect, in the actual use, when the 2D video data stream is converted into the two channels in the 3D data, only the initial three frame data are lost, and because there are multi-frame images within 1s, the video image observed by human eyes is still not smooth under the condition that at least 25 frames are guaranteed to be displayed in one second.

Further, the specific difference between the real frame and the predicted frame of each of the two channels depends on the used 'prediction algorithm', and can be controlled by the factors such as the difference of the selected feature vectors, the size of the block and the like, so that the processing process of the prediction algorithm is controllable, and the optimal scheme can be debugged according to the actual situation of the application scene.

A system for realizing an improved frame displacement method comprises a scanning counting module, a positioning module, a predicted frame generating module, a channel switching module and an output channel, wherein the scanning counting module scans and numbers a dynamic video into a plurality of real frames and then sends the real frames to the positioning module, the positioning module processes and stores image data and outputs the image data to the predicted frame generating module, data processing between the positioning module and the predicted frame generating module is that single-channel video source data is input and image segmentation is carried out through the positioning module and then characteristic vectors of image blocks are extracted, the predicted frame generating module processes the characteristic vectors extracted by the positioning module based on a prediction algorithm and then generates and numbers a plurality of predicted frames, the channel switching module carries out time slot switching on the plurality of predicted frames to two channels and inserts the predicted frames into corresponding time slots, the channel switching module allocates odd-numbered real frames to a channel A, and the channel switching module allocates even-numbered real frames to a channel B.

Further, a left-eye video output channel and a right-eye video output channel are also included, and two situations are also included, namely situation 1: the channel A is connected to the left eye video output channel, and the channel B is connected to the right eye video output channel; case 2: and the channel A is connected with the right eye video output channel, and the channel B is connected with the left eye video output channel.

The invention has the following advantages and beneficial effects:

the invention fills the problem of halving effective information of a frame displacement method by inserting 'prediction frames'.

In the invention, one of the left channel and the right channel displays a real frame at any time, and the other channel displays a predicted frame, because the data for generating the predicted frame comes from the real frame, but the prediction is different from the reality, and the real frame and the predicted frame are displayed at the same time really, the display requirement of the 3D video is really met.

The 3D effect generated in the channel data is controllable and adjustable.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a schematic flow chart of the method of the present invention.

FIG. 2 is a system diagram of the present invention.

Detailed Description

Before any embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangements of components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any inventive changes, are within the scope of the present invention.

A method for predicting displaced frames, as shown in fig. 1, includes the following processing steps for a plurality of image frames in a dynamic video:

Further, the channel is a data channel of the 3D device.

the implementation is visible to code segment 1.

see in particular code segment 2.

wire block1_1＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*0&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*1)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*0&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*1))？1'b1:1'b0；

wire block1_2＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*0&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*1)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*1&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*2))？1'b1:1'b0；

wire block1_3＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*0&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*1)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*2&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*3))？1'b1:1'b0；

wire block1_4＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*0&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*1)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*3&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*4))？1'b1:1'b0；.

.

{ Total n²Section }

.

wire blockn_18＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*19&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*n)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*17&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*(n-2))？1'b1:1'b0；

wire blockn_19＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*19&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*n)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*18&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*(n-1))？1'b1:1'b0；

wireblockn_n＝((vcnt>＝`V_SYNC+`V_BACK+`V_DISP/n*19&&vcnt<`V_SYNC+`V_BACK+`V_DISP/n*n)&&(hcnt>＝`H_SYNC+`H_BACK+`H_DISP/n*19&&hcnt<`H_SYNC+`H_BACK+`H_DISP/n*n))？1'b1:1'b0；

the prediction algorithm processing process comprises the following steps: inputting the feature vectors of two groups of tiles into a prediction algorithm, where the first real frame is a reference frame, the feature vector of the tile in the reference frame is S1 ═ (S1_ liang, S1_ se), the luminance component in the feature vector S1 is S1_ liang, the color component in the feature vector S1 is S1_ se, the second real frame is a current frame, the feature vector of the tile in the current frame is S2 ═ S2_ liang, S2_ se, the luminance component in the feature vector S2 is S2_ liang, the color component in the feature vector S2 is S2_ se, then the change vector S of the two frames before and after the video is S2-S1, the feature vector S in the current frame is S2, and the change vector S in the first frame after the change vector S2 is obtained, and the second frame is the second real frame to be predicted, and the second frame S3 is the second real frame to be predicted, obtaining a feature vector S3 of the block to be predicted, generating a complete new block by using the feature vector S3 as the brightness and color data of the whole block, and sequentially generating all n²Obtaining a whole new image after each image block;

In one embodiment, the above method is applied to both left and right channels, and a left-right channel switching module is added, similar to a shunt switch, for alternately distributing the input single-channel data to the left and right channels, where channel a is channel L and channel B is channel R, and the following results are finally achieved: at the first moment, a left eye displays a real frame 1, and a right eye does not display the real frame; at the second moment, the left eye is not displayed. The right eye displays the real frame 2; and a third moment: the left eye displays the real frame 3, and the right eye does not display; at the fourth moment, the left eye displays the predicted frame L2, and the right eye displays the real frame 4; at a fifth moment, the left eye displays a real frame 5. The right eye displays the predicted frame R2; and a sixth time: the left eye displays the predicted frame L3, and the right eye displays the real frame 6; at the seventh time instant, the left eye displays real frame 8, the right eye predicts frame R3, and so on.

The channel switching module code is as follows:

based on another embodiment of the foregoing embodiments, a system for implementing an improved frame shift method includes, as shown in fig. 2, a scan counting module, a positioning module, a predicted frame generation module, a channel switching module, and an output channel, where the scan counting module scans and numbers the dynamic video into a plurality of real frames and sends the real frames to the positioning module, the positioning module processes and stores image data and outputs the image data to the predicted frame generation module, the data processing from the positioning module to the predicted frame generation module is that single-channel video source data is input and image segmentation is performed by the positioning module to extract feature vectors of image blocks, the predicted frame generation module processes the feature vectors extracted by the positioning module based on a prediction algorithm and generates and numbers a plurality of predicted frames, the channel switching module performs slot switching on the plurality of predicted frames to two channels according to a time slot and inserts the predicted frames into corresponding slots, the channel switching module allocates odd-numbered real frames to a channel A, and the channel switching module allocates even-numbered real frames to a channel B.

Preferably, a left-eye video output channel and a right-eye video output channel are also included, and two situations are also included, namely situation 1: the channel A is connected to the left eye video output channel, and the channel B is connected to the right eye video output channel; case 2: and the channel A is connected with the right eye video output channel, and the channel B is connected with the left eye video output channel.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. a method for predicting displacement frame, is characterized in that, comprises the following processing steps for a plurality of image frames in dynamic video:

Step 1: Obtain multiple image frames on a single channel in the dynamic video and assign the image frames to channel A and channel B as real frames: Number the multiple real frames sequentially, and sequence the real frames with odd numbers Pre-allocate to channel A, pre-allocate real frames of even serial numbers to channel B, and multiple said real frames are sequentially numbered as real frame 1, real frame 2, real frame 3, ..., real frame N, where N is the the number of image frames;

Step 2: Insert a prediction frame in a time-sequential blank position in channel A, and number the prediction frames on channel A as prediction frame A1, prediction frame A2, prediction frame A3, ..., and in the time-sequential blank position in channel B Predicted frames are inserted on the channel B, and the predicted frames on the channel B are numbered as predicted frame B1, predicted frame B2, predicted frame B3, . Frame A1, real frame 3, predicted frame A2, real frame 5, predicted frame A3, ..., the image frame numbers in the complete channel B in time order are: real frame 2, predicted frame B1, real frame 4, predicted frame B2, real frame 6, predicted frame B3, ...;

Step 3: The inter-frame prediction method based on tile segmentation, according to the two real frames in time sequence in each channel, predict the first predicted frame after the second real frame in the two real frames, and get rid of them in turn. All predicted frames except predicted frame A1 and predicted frame B1;

Step 4: Get all frame data in the respective channels and output them backwards.

2 . The method of claim 1 , wherein the channel is a data channel of a 3D device. 3 .

3 . The method of claim 2 , wherein the channel A is a left-eye channel, and the channel B is a right-eye channel. 4 .

4 . The method of claim 2 , wherein the channel A is a right-eye channel, and the channel B is a left-eye channel. 5 .

5. The method for predicting a displacement frame according to claim 1, wherein in the step 3, the tile segmentation method in the inter-frame prediction method based on tile segmentation is detailed as follows:

According to the principle of image row and column scanning display, the whole image is divided into several blocks according to the image resolution x*y, and each image frame is divided into several blocks, and a row scan count signal hcnt and a column scan count signal are set. vcnt, nest the hcnt signal and the vcnt signal in a loop, the hcnt is incremented from 1 to x, and then cleared to 1 after reaching x. Repeat the operation. Whenever hcnt counts to x, vcnt increases by 1, and the range of vcnt is 1 to y. , two scanning signals that are continuously displaced on the screen are obtained;

At the same time, according to the division of n blocks of rows and columns, the x pixels of the row and the y pixels of the column are divided into n equal parts to obtain n ² (x/n)*(y/n) rectangular blocks. accomplish;

Determine the block to which the current scan count signal belongs, and uniformly process the pixel data of the points in the same block. There are m=(x/n)*(y/n) points in any block, and each point has Corresponding pixel information, the pixel information is two numbers in the range of 0 to 255, perform numerical operation on m pixel information, and obtain the feature vector of the tile, wherein the feature vector has two dimensions, one dimension Represents brightness, and another dimension represents color.

6. The method for predicting a displacement frame according to claim 5, wherein in the step 3, the prediction method in the inter-frame prediction method based on tile segmentation is as follows:

Preprocessing: scan the images in turn to obtain the feature vectors of n ² blocks, and store the data of all feature vectors. In the step 3, the first real frame images in the two real frames in time sequence in each channel are all scanned. , start to scan the second real frame image in the two real frames, and also divide it into n ² tiles according to the tile segmentation method and store them all, for the two tiles in the same position in the two real frame images;

Prediction algorithm processing process: input the feature vectors of two groups of image blocks into the prediction algorithm, the first real frame is the reference frame, the feature vector of the image block in the reference frame is S1=(S1_liang, S1_se), the feature vector The luminance component in S1 is S1_liang, the color component in the feature vector S1 is S1_se, the second real frame is the current frame, and the feature vector of the block in the current frame is S2=(S2_liang, S2_se), The luminance component in the feature vector S2 is S2_liang, and the color component in the feature vector S2 is S2_se, then the change vector S=S2-S1 of the two frames before and after the video can be obtained, and the change vector S is applied to the block in the current frame. The feature vector is S2, and the feature vector S3 of the tile of the first predicted frame after the second real frame in the two real frames in the step 3 is obtained, and the second real frame in the two real frames S3. The first predicted frame after the real frames is the frame to be predicted, and the feature vector S3 of the block to be predicted is obtained, and the feature vector S3 is used as the brightness and color data of the entire block to generate a complete new block, and all n ² A whole new image is obtained after a block;

Recursive link: the process of the prediction algorithm is also used, and by analogy, new prediction frames are continuously generated and inserted into the channel in sequence.

7 . The method for predicting a displacement frame according to claim 5 , wherein the numerical operation is an average operation, a median operation, a mode operation, and the like. 8 .

8. The system for realizing a method for predicting a displacement frame according to claim 5, wherein the system comprises a scan count module, a positioning module, a predicted frame generation module, a channel switching module and an output channel, and the scan count module The module scans and numbers the dynamic video into a plurality of real frames and sends it to the positioning module. The positioning module processes the image data and stores it, and outputs it to the prediction frame generation module. The data processing between the frame generation modules is that the single-channel video source data is input and the feature vector of the tile is extracted after image segmentation is performed by the positioning module, and the predicted frame generation module processes the feature vector extracted by the positioning module based on the prediction algorithm. Generating and numbering a plurality of predicted frames, the channel switching module switches the plurality of predicted frames to dual channels by time slot and inserts the predicted frames in the corresponding time slots, and the channel switching module assigns odd-numbered real frames to channels A, the channel switching module assigns even-numbered real frames to channel B.

9. The system of claim 8, further comprising a left-eye video output channel and a right-eye video output channel, further comprising two situations, situation 1: the channel A The left-eye video output channel is then connected to the left-eye video output channel, and the channel B is connected to the right-eye video output channel; Case 2: The channel A is connected to the right-eye video output channel, and the channel B is connected to the the left eye video output channel.