CN108111833A

CN108111833A - For the method, apparatus and system of stereo video coding-decoding

Info

Publication number: CN108111833A
Application number: CN201611043145.5A
Authority: CN
Inventors: 黄敦笔; 张磊; 杜武平
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-11-24
Filing date: 2016-11-24
Publication date: 2018-06-01

Abstract

This application discloses a kind of method, including：N number of sequence of video images of carrying three-dimensional video-frequency is obtained, N is the integer more than or equal to 2；Based on N number of sequence of video images, 1 the second image sequence of the first image sequence and N is determined；By 1 the second coding image sequences generation three-dimensional video-frequency bit stream of described first image sequence and N；Wherein, coding mode includes used by being encoded for the second image sequence：Inter-sequence prediction coding mode；The inter-sequence prediction coding mode refers to, predictive coding is carried out using the image in described first image sequence as reference frame, to the block of pixels in second image sequence.

Description

Method, device and system for stereo video coding and decoding

Technical Field

The present application relates to the field of stereoscopic video technologies, and in particular, to a method, an apparatus, a system, and a machine-readable medium for stereoscopic video encoding and decoding.

Background

With the continuous progress of industrial technologies, 3D televisions and stereoscopic movies (3D movies) are becoming popular, and some consumers no longer satisfy sensory acquisition and entertainment requirements for traditional two-dimensional (2D) video content information, but tend to have more realistic effect experience for stereoscopic videos. Compared with the traditional two-dimensional video, the three-dimensional video can show the depth sense and the layering sense of scene pictures, so that the reality restoring by using the three-dimensional video technology has more telepresence and reality, and represents the important direction of the virtual reality technology.

The stereo video technology utilizes the principle of human binocular parallax, utilizes a camera to synchronously shoot and record at least two video image sequences under the same scene condition, then generates a stereo video bit stream through coding processing, and stores the stereo video bit stream on a storage medium or transmits the stereo video bit stream to a receiver through a network. When the stereoscopic video needs to be played, the stereoscopic video bit stream read from the storage medium or received from the network is decoded and restored to generate a stereoscopic video signal, the stereoscopic video signal is sent to a stereoscopic video display to be displayed, and a viewer can acquire the longitudinal depth information of the scene and experience stereoscopic sensation through the parallax in binocular sense.

In the prior art, when at least two video image sequences are coded, coding standards such as ITU h.264/AVC or ISO MPEG-H HEVC are usually adopted to independently code each video image sequence, which results in a large data volume of a stereoscopic video bit stream constant generated after coding, thereby posing a large challenge to a storage space or a network bandwidth. Taking the example of transmitting the stereoscopic video bit stream through the network, since a large amount of network bandwidth is required to be occupied, the stereoscopic video playing process is jammed or delayed more due to network congestion and packet loss under the condition that the network bandwidth cannot meet the requirement, and the watching experience of a user is affected.

Disclosure of Invention

The present application provides a method comprising:

acquiring N video image sequences bearing a stereoscopic video, wherein N is an integer greater than or equal to 2;

determining a first image sequence and N-1 second image sequences based on the N video image sequences;

encoding the first image sequence and the N-1 second image sequences to generate a stereoscopic video bitstream;

wherein the encoding mode adopted for encoding the second image sequence comprises: inter-sequence prediction coding mode; the inter-sequence prediction encoding mode is to perform prediction encoding on a pixel block in the second image sequence using an image in the first image sequence as a reference frame.

Drawings

FIG. 1 is a flow chart of an embodiment of a method provided herein;

fig. 2 is an example of images corresponding to a left eye and a right eye respectively in a video image sequence provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of generating a sub-sampled image sequence using an up-down coding mode according to an embodiment of the present application;

fig. 4 is a flowchart of a process for encoding an image to be encoded in a second image sequence according to an embodiment of the present application;

FIG. 5 is a schematic view of an embodiment of an apparatus provided herein;

FIG. 6 is a flow chart of an embodiment of another method provided herein;

FIG. 7 is a schematic view of an embodiment of another apparatus provided herein;

FIG. 8 is a schematic diagram of an example of a system provided herein;

FIG. 9 is a schematic diagram of an embodiment of a system provided herein.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit and scope of this application, and it is therefore not limited to the specific implementations disclosed below.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. However, it should be understood by those skilled in the art that the purpose of the present description is not to limit the technical solution of the present application to the specific embodiments disclosed in the present description, but to cover all modifications, equivalents, and alternative embodiments consistent with the technical solution of the present application.

References in the specification to "an embodiment," "this embodiment," or "exemplary embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

Embodiments of the present application may be implemented in software, hardware, firmware, or a combination thereof, or otherwise. Embodiments of the application may also be implemented as instructions stored on a non-transitory or non-transitory machine-readable medium (e.g., a computer-readable medium) that may be read and executed by one or more processors. A machine-readable medium includes any storage device, mechanism, or other physical structure that stores or transmits information in a form readable by a machine. For example, a machine-readable medium may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, and others.

In the drawings provided in this specification, some structural or methodical features are typically presented in a particular arrangement and/or order. It is to be understood that such specific arrangements and/or sequences are not required. In some embodiments, the features may be organized in a different arrangement and/or order than shown in the figures. Furthermore, the inclusion of a feature in a structure or method in a drawing does not imply that the feature is included in all embodiments, in some embodiments the feature may not be included, or the feature may be combined with other features.

For the convenience of understanding, the technical scheme of the application is briefly described.

N video image sequences carrying a stereoscopic video are typically video image sequences synchronously recorded through different angles for the same scene, N being an integer greater than or equal to 2. For N video image sequences, the image frames in one video image sequence usually have corresponding image frames in the other video image sequences, respectively, that is: the image frames are shot from different angles at the same time, have the same time information, and can be identified by the time stamp of the image frames in the specific implementation.

In the prior art, when a stereoscopic video is coded, coding standards such as ITU h.264/AVC or ISO MPEG-HHEVC are usually adopted to independently code each video image sequence, which results in a large data volume of a stereoscopic video bit stream obtained after coding, and puts high requirements on a storage space and a network transmission bandwidth.

In fact, the image frames belonging to different video image sequences and having the same or similar time information are shot and recorded at the same or similar time aiming at the same scene, so that the image frames have stronger correlation and the further data compression coding is possible. Based on the above consideration, the technical solution of the present application introduces, after determining a first image sequence and N-1 second image sequences based on a video image sequence, an inter-sequence prediction encoding mode different from a conventional intra prediction encoding mode and an inter prediction encoding mode in the process of encoding the second image sequences, that is: in the process of encoding the pixel block in the second image sequence, the image in the first image sequence can be used as a reference frame for predictive encoding, so that the data compression rate in the encoding process can be greatly improved, and the occupation of a storage space or a network bandwidth by a stereoscopic video bit stream generated by encoding can be reduced.

In this technical solution, the predictive coding is a coding technique that predicts the next signal by using one or more previous signals according to the characteristic that there is a certain correlation between discrete signals, and codes the difference between an actual value and a predicted value. The pixel block refers to an image block composed of one pixel or more than one spatially adjacent pixels in an image, for example: an 8 x 8 pixel block is an image block consisting of 8 adjacent rows and 8 adjacent columns of pixels in an image.

In the technical scheme of the application, the number of video image sequences bearing the stereoscopic video is N, wherein N is an integer greater than or equal to 2. In the following examples, for convenience of explanation, the implementation process of the present invention will be described with an example of N ═ 2.

In the following, an embodiment of a method provided by the present application is described in detail. Please refer to fig. 1, which is a flowchart illustrating an embodiment of a method according to the present application. The method comprises the following steps:

step 101, two video image sequences bearing a stereoscopic video are obtained.

This step acquires two video image sequences bearing a stereoscopic video, which are typically a video image sequence corresponding to the left eye and a video image sequence corresponding to the right eye, respectively. Referring to fig. 2, two frames of images having the same time information belonging to two sequences of stereoscopic video images are shown, wherein (a) is an image corresponding to the left eye and (b) is an image corresponding to the right eye.

In specific implementation, two video image sequences respectively shot by the two shooting devices can be obtained, and the two video image sequences can also be obtained by reading a multimedia resource file storing a three-dimensional video and executing corresponding transcoding operation. The two video image sequences each contain a series of image frames, and the image frames recorded at the same time in the two video image sequences have the same time information, for example, the same time stamp.

In specific implementation, when two video image sequences are obtained, image parameter information can be obtained through information actively reported by a camera device or through analysis of video images, and the method comprises the following steps: the resolution of the image and the color format of the image. The resolution comprises an image width w and an image height h, and the unit is a pixel; color format generally refers to a representation describing a color space of an image, such as RGB or YUV.

Step 102, determining a first image sequence and a second image sequence based on the two video image sequences.

This step further determines two image sequences, which are input signals encoded in step 103, on the basis of the two video image sequences acquired in step 101, and the image sequences determined in this step are referred to as a first image sequence and a second image sequence, respectively, in order to distinguish them from the two video image sequences acquired in step 101.

As a simple and easy implementation, the video image sequence corresponding to the left eye can be directly taken as the first image sequence and the video image sequence corresponding to the right eye can be taken as the second image sequence. For the same reason, it is also possible to use the video image sequence corresponding to the right eye as the first image sequence and the video image sequence corresponding to the left eye as the second image sequence.

Preferably, in order to improve the encoding compression rate, the present embodiment provides a preferred implementation of down-sampling and then determining the first image sequence and the second image sequence. Specifically, each frame of image in the two video image sequences may be down-sampled in a preset down-sampling manner, and then the first image sequence and the second image sequence may be selected from the two down-sampled video image sequences.

The down-sampling in this embodiment is an image processing method for reducing the number of pixels by sampling an image. The preset down-sampling mode comprises the following steps: interlaced down-sampling, or interpolated down-sampling. For interlaced down-sampling, each frame of image in two video image sequences may be processed as follows: only the even lines in the image or only the odd lines in the image; for alternate column downsampling, only even columns or odd columns in the image may be retained; for the interpolation down-sampling, a new pixel row may be generated by interpolation smoothing calculation of two adjacent pixel rows, or a new pixel column may be generated by interpolation smoothing calculation of two adjacent pixel columns, and the down-sampled image may be synthesized with the new pixel row or pixel column. By using the down-sampling method, the data amount of the first image sequence and the second image sequence as input can be greatly reduced before encoding in step 103, for example: by adopting the listed down-sampling mode, the data volume can be reduced by half, so that the whole data compression rate is improved. After the down-sampling operation is completed, a first image sequence and a second image sequence are selected from the two down-sampled video image sequences.

Preferably, in order to improve the encoding compression rate, the present embodiment further provides a preferred implementation based on the sub-sampled image sequence. Specifically, the sub-sampling image sequence may be generated according to a preset pre-processing coding mode according to two video image sequences; splitting each frame of image in the sub-sampling image sequence into two images with the same time information according to a splitting mode corresponding to the preprocessing coding mode, thereby obtaining two image sequences; a first image sequence is selected from the two image sequences, and the other image sequence is taken as a second image sequence.

The pre-processing encoding mode comprises: an interlaced coding mode, an up-down coding mode, a left-right coding mode, etc., or a checkerboard coding mode.

The interlaced coding mode may be that two frames of images having the same temporal information in one video image sequence and the other video image sequence are processed as follows: and interlacing the pixels on the odd lines of one frame of image with the pixels on the even lines of the other frame of image to synthesize a new image on one frame.

The up-down coding mode may be that two frames of images having the same time information in one video image sequence and the other video image sequence are processed as follows: and combining pixels of odd lines or pixels of even lines or pixels of adjacent lines of one frame of image into a first target image, combining pixels of even lines or pixels of odd lines or pixels of adjacent lines of another frame of image into a second target image, and vertically splicing the first target image and the second icon image into a new frame of image.

The left-right encoding mode is similar to the up-down encoding mode, except that image columns are processed in the process of generating the first target image and the second target image, and finally the first target image and the second icon image are horizontally spliced into a new frame of image.

In addition, other pre-processing encoding modes may be employed, such as a checkerboard mode, i.e.: the distribution of the pixels in the two images with the same time information in the synthesized new image conforms to the staggered distribution form of the black and white grids in the chessboard.

In the present step, according to a preset preprocessing coding mode, corresponding splicing or synthesizing processing is performed on every two frames of images with the same time information in the video image sequence obtained in step 101, so as to obtain a new image sequence, and since the new image sequence is derived from the video image sequence obtained in step 101, but the number of pixels is reduced by half, the new image sequence is called a sub-sampling image sequence. Subsequently, splitting each frame of image in the sub-sampling image sequence into two images with the same time information according to a splitting mode corresponding to the preprocessing coding mode (for example, the up-down coding mode corresponds to the up-down splitting mode, and the left-right coding mode corresponds to the left-right splitting mode), so as to obtain two image sequences; finally, a first image sequence is selected from the two image sequences and the other image sequence is taken as a second image sequence.

The following describes an embodiment of generating the above-described sub-sampled image sequence, taking the vertical encoding mode as an example. Referring to fig. 3, the resolution of each frame of image in the video image sequence is w × h, (a) is the image in the video image sequence corresponding to the left eye, (b) is the image having the same time information as (a) in the video image sequence corresponding to the right eye, and (c) is the image encoded in the vertical encoding modeAnd (6) directly splicing the new images. LR_n，RR_nRespectively, the nth row pixels in (a) and (b). (c) Row pixel values VR in (1)_n' may be calculated by the following formula 1, formula 2, or formula 3.

VR_n'＝VR_2n(ii) a - - - - -equation 1

VR_n'＝VR_2n+1(ii) a - - - - -equation 2

VR_n'＝α×VR_2n+(1-α)×VR_2n+1(ii) a - - - - -equation 3

in equation 3, α is a weighting factor for interpolation, where α takes 0.5 to indicate that even lines and odd lines are sampled equally, α >0.5 takes even lines before odd lines, and α <0.5 takes odd lines before even lines.

After the sub-sampling image sequence is obtained in the above manner, each frame of image in the sub-sampling image sequence may be split according to a splitting manner corresponding to the up-down coding manner, for example: for (c) in fig. 3, splitting can be performed according to the horizontal center line shown by the thick solid line, resulting in two images with the same time information, each image having a resolution of w × (h/2).

By adopting the above-described sub-sampling image sequence-based embodiment, a flexible synthesis and splitting manner can be provided, for example, after the sub-sampling image sequence is synthesized in a checkerboard mode, the sub-sampling image sequence can be split in a black-and-white alternate manner of a checkerboard, or in a left-and-right or up-and-down manner, so that the image sequence input to step 103 has flexible diversity, and different compression ratios can be obtained in specific applications. Compared with other preprocessing coding modes, the image sequences generated by the upper-lower coding mode and the left-right coding mode respectively contain independent image texture information corresponding to the left eye or the right eye, namely: there is more abundant image context information and therefore is the preferred pre-processing coding mode.

In particular implementations, each image in the first and second image sequences may retain time information for a corresponding image in the video image sequence and have an index that uniquely identifies itself. After the processing in this step, the first image sequence and the second image sequence still bear the stereoscopic video signal, and the two sequences have data correlation, so in the subsequent step 103, an inter-sequence prediction coding mode can be introduced, so that the data compression rate is improved by using the correlation between the first image sequence and the second image sequence.

And 103, encoding the first image sequence and the second image sequence to generate a stereoscopic video bit stream.

This step encodes the first image sequence and the second image sequence to generate a stereoscopic video bitstream. In the technical solution provided in this embodiment, since the first image sequence is used as a basic image sequence that can be referred to by the second image sequence in the encoding process, the encoding process can be usually started before the second image sequence.

The first image sequence may be encoded using an industry-legacy DPCM coding standard, for example, an itu.264/AVC, ISO MPEG-H HEVC, or other coding standard.

The requirements of video coding standards on coding order are typically: the first image frame is generally an image frame belonging to an intra-frame coding type, with several image frames of the inter-frame coding type being inserted in between, followed by an image frame of another intra-frame coding type. By adopting the mode, the random access requirement can be provided, and the image frame signal can be generated by decoding the image frame belonging to the intra-frame coding type without depending on the content of other image frames; and the inter-frame coding type has higher coding compression efficiency than the intra-frame coding type. A group of pictures is formed between the two picture frames of the intra-coding type. For an image frame of an intra-frame coding type, the pixel block therein is usually coded by using an intra-frame prediction coding mode, and for an image frame of an inter-frame coding type, the pixel block therein may be coded by using an intra-frame prediction coding mode or an inter-frame prediction coding mode.

In this step, the first image sequence is encoded in the above manner, and this part of the processing procedure is the same as the video encoding manner in the prior art, and is not described here again.

In the process of encoding the second image sequence, an inter-sequence predictive coding mode is also introduced on the basis of various predictive coding modes provided by the prior art. The inter-sequence prediction encoding mode is to perform prediction encoding on a pixel block in a second image sequence using an image in a first image sequence as a reference frame. The encoding process of the second image sequence is explained in detail below.

The encoding type of each frame of image in the second image sequence may be set in the same manner as that of the first image sequence, and in the process of encoding the second image sequence, each frame of image (i.e., an image to be encoded, which is described below) may be encoded according to the encoding type of each frame of image (intra-frame encoding type or inter-frame encoding type) in a corresponding order, and the encoding mode used in encoding may include an inter-sequence prediction encoding mode.

The process of encoding the image to be encoded in the second image sequence comprises the following steps 103-1 to 103-4, which are described in detail below with reference to fig. 4.

Step 103-1, dividing the image to be coded into a plurality of pixel blocks to be coded according to the pixel block size set according to a preset mode.

The pixel block size is usually represented by n × n, where n may be a predetermined fixed number, for example: n is 64.

Preferably, this embodiment provides a preferred implementation for setting the pixel block size according to the corresponding image resolution parameter of the video image sequence. Namely: in this embodiment, the pixel block size n × n may be adaptively and dynamically adjusted based on the image resolution parameter obtained in step 101. For example: when the image resolution satisfies wxh ≦ 640 × 480, n may be set to 32; when the image resolution satisfies wxh ≧ 3840 × 2160, n may be set to 128; when the value of W × H is between 640 × 480 and 3840 × 2160, n may be set to 64. With this preferred embodiment, video image sequences of different resolutions, even ultra high resolutions such as 4K, 8K, etc., can be efficiently processed.

And 103-2, judging whether pixel blocks to be coded exist in the image to be coded, if so, executing the step 103-3, and otherwise, ending the coding of the image to be coded.

The pixel blocks to be coded in the image to be coded are coded sequentially, usually from left to right and from top to bottom. Judging whether pixel blocks which are not subjected to coding processing exist according to the sequence, if so, executing the step 103-3 bits, otherwise, ending the coding of the image to be coded.

And 103-3, selecting a coding mode meeting preset conditions for the pixel block to be coded from a corresponding coding mode set comprising the inter-sequence prediction coding modes according to the coding type of the image to be coded. Wherein the encoding mode satisfying the preset condition includes: and the coding mode of the rate distortion optimization model based on cost minimization is satisfied.

In the following, the encoding types from the image to be encoded are: the process of selecting the encoding mode will be described in terms of both the intra-coding type and the inter-coding type.

The image to be coded belongs to the intra-frame coding type.

For intra-coding types, the corresponding set of coding modes includes: intra prediction coding mode, inter-sequence prediction coding mode.

In this embodiment, a rate distortion optimization model based on cost minimization is designed for an intra-frame coding type, and the model considers the contribution of inter-sequence prediction coding modes on one hand and the contribution of a conventional intra-frame prediction coding mode (hereinafter referred to as an intra-frame prediction coding mode) on the other hand, and selects an optimization effect of rate distortion from the two modes, that is: and selecting the coding mode with the minimum coding cost.

For the intra-coding type, the inter-sequence prediction coding mode is to code a first related picture in a first picture sequence as a reference frame, where the first related picture is a picture having the same or close temporal information as a picture to be coded, and may be a picture having the same temporal information as the picture to be coded. Since the image to be encoded and the first related image are images captured at the same or similar time based on the same scene, and there is a large correlation in the main area, this characteristic is useful for eliminating signal redundancy between images, thereby improving the compression rate.

Therefore, the rate-distortion model based on cost minimization is shown in the following equation 4, which represents the resulting RDO of rate-distortion optimization for intra coding type_IntraIs an intra-frame prediction coding Cost_anchorSum-sequence predictive coding Cost_inter-viewThe smaller of these.

RDO_Intra＝Min(Cost_anchor,Cost_inter-view) (ii) a - - - - - -equation 4

The process of selecting the coding mode for the block of pixels to be coded on the basis of the above model comprises the following 1) to 3):

1) and calculating a first cost for coding the pixel block to be coded by adopting an intra-frame prediction coding mode.

A first Cost (i.e., Cost) is calculated_anchor) The factors of (a) include: residual data, and a code rate introduced by using intra-frame prediction coding. The calculation process is the same as the prior art and is not described herein.

2) And calculating a second cost for coding the pixel block to be coded by using an inter-sequence prediction coding mode by using a first associated image in the first image sequence as a reference frame.

The second Cost (i.e., Cost) is calculated using the following equation 5_inter-view)：

Cost_inter-view＝SAD+λ_motion×Bits_mv(ii) a - - - -equation 5

Second cost and Sum of Absolute error (Sum of Absolute Differences, abbreviated as SAD), λ_motionAnd Bits_mvIt is related. Wherein SAD denotes the pixel block to be encoded and in the reference frame, namely: in the first associated image, the size of distortion between the searched matching pixel blocks is as follows: the sum of absolute values of the differences between the values of the corresponding pixels between the blocks is calculated, and the calculation formula is shown as the following formula 6, wherein Cur_ijIs the pixel value, Ref, of a pixel point (i, j) in a pixel block to be encoded_ijIs the pixel value of the pixel point (i, j) in the matching pixel block. Lambda [ alpha ]_motionFor the corresponding lagrangian parameter, given by equation 7 and equation 8, where QP is the quantization parameter. Bits_mvRepresents the number of bits occupied by the Motion Vector (Motion Vector).

It should be noted that, the searching for a matching pixel block in a reference frame or similar description statements described in this embodiment generally refers to searching for a matching pixel block in a reconstructed image encoded for a reference frame. And will not be described in detail hereinafter.

3) And taking the coding mode corresponding to the minimum value in the first cost and the second cost as the coding mode selected for the pixel block to be coded.

Selecting a minimum value from the first cost and the second cost calculated in 1) and 2), and taking the coding mode corresponding to the minimum value as the coding mode selected for the pixel block to be coded. I.e. the coding mode that is finally selected, is the one that yields less residual data and code rate.

And (II) the image to be coded belongs to an interframe coding type.

For the inter-coding type, the corresponding set of coding modes includes: intra-prediction encoding mode, inter-prediction encoding mode, and inter-sequence prediction encoding mode.

Similar to (one), the rate distortion model based on Cost minimization designed by the present embodiment is shown in the following formula 9 for the interframe coding type, wherein Cost is_anchorIs the minimum Cost, of using intra-frame predictive coding and inter-frame predictive coding_inter-viewIs the cost of inter-sequence predictive coding introduced by the technical solution of the present embodiment.

RDO_Inter＝Min(Cost_anchor,Cost_inter-view) (ii) a - - - -equation 9

The process of selecting the coding mode for the block of pixels to be coded on the basis of the above model comprises the following 1) to 4):

1) and calculating a third cost for coding the pixel block to be coded by adopting an intra-frame prediction coding mode.

2) And calculating a fourth cost for encoding the pixel block to be encoded by adopting an inter-frame prediction encoding mode.

The processing procedures 1) and 2) above respectively calculate the third cost and the fourth cost generated by using the intra-frame and inter-frame predictive coding modes, and the processing procedures belong to the prior art and are not described herein again.

3) And calculating a fifth cost of coding the pixel block to be coded by using the inter-sequence prediction coding mode by using the second associated image in the first image sequence as a reference frame.

The second associated image includes: the image to be coded has the same time information as the image to be coded, the image with the time information not later than the image to be coded, or the image with the time information earlier than the image to be coded and the image with the time information later than the image to be coded.

The process of calculating the fifth cost is similar to inter-frame prediction coding in the prior art, except that the reference frame is selected from the first sequence of images.

In particular, a unidirectional prediction mode can be adopted. Accordingly, the second associated image may include: for images with the same time information, the fifth cost is calculated in the same manner as that shown in formula 5 in (a); the second associated image may also include: the temporal information is not later than the image of the image to be encoded, for example: the time information corresponding to the image to be coded is t4, 4 images with time information not later than t4, such as t3, t2, t1 and t0, in the first image sequence can be selected when the fifth cost is calculated, the coding cost of the pixel block to be coded is calculated by respectively adopting the calculation mode of the formula 5, the minimum value is selected as the fifth cost, and the image corresponding to the minimum value is the corresponding unidirectional prediction reference frame.

In particular, a bidirectional prediction mode can also be adopted. Correspondingly, the second associated image is at least two frames, and may include: the time information is earlier than the image of the image to be encoded and the time information is later than the image of the image to be encoded. In this case, when the fifth cost is calculated in the calculation manner shown in the above equation 5, SAD may use the weighted calculation result, Bits_mvIs the length of the corresponding motion vector. For example: the time information corresponding to the image to be coded is t4, the images at the time points of t0 and t8 in the first image sequence are selected as reference frames to perform bidirectional prediction, two SAD values and two motion vectors are respectively obtained, then when the fifth cost is calculated by adopting the formula 5, the result of weighted summation of the two SAD values can be used as SAD in the formula 5, and the total number of occupied Bits of the two motion vectors is used as Bits in the formula 5_mvThereby calculating a fifth cost of inter-sequence predictive coding.

4) And taking the coding mode corresponding to the minimum value in the third, fourth and fifth costs as the coding mode selected for the pixel block to be coded.

Selecting a minimum value from the third cost, the fourth cost and the fifth cost calculated in 1), 2) and 3), and taking a coding mode corresponding to the minimum value as a coding mode selected for the pixel block to be coded. I.e. the coding mode that is finally selected, is the one that yields less residual data and code rate.

In the embodiments given above, the intra-coding type is introduced into the inter-sequence prediction coding mode, and by performing motion prediction (i.e. searching for a matching pixel block) based on the associated image in the first image sequence, the coding data amount of residual data can be effectively reduced, which plays a role in compressing data; an inter-sequence prediction coding mode is introduced into the inter-coding type, the coding data amount of residual data can be reduced, especially under the background of frequent scene change or large-range motion, the inter-sequence prediction coding can receive double gains compared with the prior art, and the data compression rate is obviously improved.

And step 103-4, coding the pixel block to be coded by adopting the selected coding mode, and then turning to step 103-2 to execute.

The process of coding the pixel block to be coded mainly comprises coding residual data to compress data volume and writing the coding result into a stereo video bit stream. In specific implementation, the corresponding encoding description information, for example: information such as coding mode, reference frame index, motion vector and the like is coded according to a preset mode and then written into the stereo video bit stream for use in decoding. And after the coding processing of the pixel blocks to be coded is finished, the step 103-2 is carried out until all the pixel blocks to be coded in the image to be coded are coded.

In the above, the process of selecting and encoding the corresponding encoding mode for the pixel block in the image to be encoded after introducing the inter-sequence prediction encoding mode is described.

Preferably, in the process of calculating the second cost or the fifth cost for the pixel block to be encoded in the image to be encoded, when the matching pixel block of the pixel block to be encoded is searched in the reconstructed image of the first associated image or the second associated image, the search start coordinate in the reconstructed image is determined according to the first motion vector obtained by searching the matching pixel block in the reconstructed image and the coordinate of the pixel block to be encoded in the process of encoding the image to be encoded, and the matching pixel block is searched from the search start coordinate. For example: when an image to be coded of an intra-coding type is coded, a motion vector obtained by searching a matching pixel block in a reconstructed image of a first associated image for the first time is (Δ x, Δ y), and a coordinate of a current pixel block to be coded (for example, a coordinate value of the upper left corner of the pixel block) is (x, y), so that in order to calculate the cost of inter-sequence predictive coding for the pixel block to be coded, when the matching pixel block is searched in the reconstructed image of the first associated image, the search may be performed by using (x + Δ x, y + Δ y) as a starting search coordinate. In this way, the convergence speed of the motion search can be accelerated significantly, and the matching pixel block can be found in as little time as possible.

In addition, the present embodiment also provides a preferred implementation of merging the encoded description information. Specifically, in the process of encoding the image to be encoded in the second image sequence, if two or more consecutive pixel blocks have the same reference frame and the same motion vector, the encoding description information of the consecutive pixel blocks may be merged into one group and written into the stereoscopic video bitstream. For example: if 8 consecutive pixel blocks all adopt the inter-sequence prediction coding mode, and the motion vectors are (Δ x, Δ y), and the reference frame indices are all 100, the description information of these pixel blocks can be merged into a form similar to "inter-sequence prediction coding merging mode, motion vector is (Δ x, Δ y), reference frame index is 100, and consecutive blocks are 8", and written into the stereoscopic video bitstream. In this way, the compression rate of stereoscopic video encoding can be further improved.

It should be noted that, in a specific implementation, during the process of encoding the first image sequence, a decoded image Buffer List (DPB queue) based on the first image sequence, that is, a reconstructed image queue after encoding of the first image sequence may be output through a decoding reconstruction process of an encoded image, so as to be used for inter-sequence prediction encoding of the second image sequence, and perform corresponding motion prediction and motion compensation calculation.

Up to this point, the encoding processes of the first image sequence and the second image sequence are described separately. In specific implementation, according to the requirement of the actual application scenario, the encoding process of the second image sequence may be started after the first image sequence is encoded, or the first image sequence may be encoded to a certain extent, for example: it is also possible to start the encoding process of the second image sequence after the encoding of the preset number of images is completed.

Accordingly, upon encoding the first image sequence and the second image sequence, a different form of stereoscopic video bitstream may be generated. For example, a bit stream generated by encoding a first image sequence and a bit stream generated by encoding a second image sequence may be sequentially output to form a stereoscopic video bit stream; the bit stream generated by encoding the first video sequence and the bit stream generated by encoding the second image sequence can be interleaved according to a preset mode to generate a stereo video bit stream. And the party performing the decoding operation performs corresponding decoding operation according to the form of the stereoscopic video bit stream, so that two paths of video image sequences carrying the stereoscopic video can be restored.

The method for sequentially outputting the bit stream generated by encoding the first image sequence and the bit stream generated by encoding the second image sequence to form the stereoscopic video bit stream can be used for decoding only the bit stream of the first image sequence for a video decoder only supporting the traditional decoding method, and can acquire two-dimensional video image output under the condition that the first image sequence contains independent texture information of video images corresponding to the left eye or the right eye, so that the method is compatible with the prior art.

in addition, in the specific implementation process, in order to facilitate the party performing the decoding operation to correctly decode the stereoscopic video bitstream, the parties performing the encoding and decoding operations may negotiate in advance relevant parameters, including image resolution, color mode, and pixel block size N, and may further include parameters such as down-sampling coefficients, or a stereoscopic video encoding mode related to a sub-sampled image sequence, and weight α, or the encoding party may encode the above parameter information according to a format agreed by the parties and write the encoded parameter information into the stereoscopic video bitstream for the decoding party to use.

So far, through the above steps 101-103, the implementation of the method provided by the present embodiment is described. In an implementation, the encoded stereoscopic video bitstream may be written to a storage medium, for example: storing the file on a hard disk; or sent to a receiving party of the stereoscopic video bit stream through the network so as to be decoded and played by the receiving party.

It should be noted that the above-described embodiment performs the encoding operation on the two video image sequences obtained in step 101, and in other embodiments, N may be an integer greater than 2, that is: in step 101, more than 2 video image sequences can be obtained, and in this case, images in other N-1 second image sequences can still be encoded with reference to images in the selected first image sequence, which can also implement the technical solution of the present application and obtain corresponding beneficial effects.

In summary, in the method provided in this embodiment, because the correlation of the stereoscopic video signal is considered, after the first image sequence and the second image sequence are determined based on the video image sequences, and the inter-sequence predictive coding mode is introduced in the process of coding the second image sequence, under the condition of obtaining the stereoscopic video bit stream with the same quality, the compression rate can be further improved, and the data volume of the generated stereoscopic video bit stream can be reduced, so that the occupation of the storage space can be reduced, or the occupation of the network bandwidth can be reduced, the problems of stutter and the like in the stereoscopic video playing process due to network congestion and packet loss can be avoided, and the application experience of the stereoscopic video playing product can be effectively improved.

In the above embodiment, a method is provided, and correspondingly, an apparatus is also provided. Please refer to fig. 5, which is a schematic diagram of an embodiment of an apparatus of the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

The device of the embodiment comprises: a video image sequence obtaining unit 501, configured to obtain N video image sequences carrying a stereoscopic video, where N is an integer greater than or equal to 2; an image sequence determination unit 502 for determining a first image sequence and N-1 second image sequences based on the N video image sequences; an image sequence encoding unit 503 for encoding the first image sequence and the N-1 second image sequences to generate a stereoscopic video bitstream; wherein the encoding mode adopted for encoding the second image sequence comprises: inter-sequence prediction coding mode; the inter-sequence prediction encoding mode is to perform prediction encoding on a pixel block in the second image sequence using an image in the first image sequence as a reference frame.

Optionally, N is 2, and the video image sequence obtaining unit is specifically configured to obtain two video image sequences respectively corresponding to the left eye and the right eye.

Optionally, the image sequence determining unit is specifically configured to select one video image sequence from the N video image sequences as the first image sequence, and respectively use other video image sequences as each sequence of the N-1 second image sequences.

Optionally, the image sequence determining unit includes:

the down-sampling sub-unit is used for respectively down-sampling each frame of image in each video image sequence according to a preset down-sampling mode;

and the sequence selection subunit is used for selecting one video image sequence from the N video image sequences subjected to down-sampling as the first image sequence, and respectively using other video image sequences subjected to down-sampling as each sequence in the N-1 second image sequences.

Optionally, the image sequence determining unit includes:

a sub-sampling sequence generation subunit, configured to generate a sub-sampling image sequence according to the N video image sequences according to a preset pre-processing coding mode;

a sub-sampling sequence splitting subunit, configured to split each frame of image in the sub-sampling image sequence into N images with the same time information according to a splitting manner corresponding to the preprocessing coding mode, so as to obtain N image sequences;

and the image sequence selection subunit is used for selecting a first image sequence from the N image sequences and taking other image sequences as the N-1 second image sequences.

Optionally, the image sequence encoding unit includes:

a first image sequence encoding subunit operable to encode the first image sequence;

a second image sequence encoding subunit, configured to encode the N-1 second image sequences;

the second image sequence encoding subunit includes:

the pixel block dividing subunit is used for dividing the image to be coded into a plurality of pixel blocks to be coded according to the pixel block size set according to a preset mode;

the cyclic control subunit is used for sequentially calling the following mode selection subunits and the pixel block coding subunits for coding each pixel block to be coded;

a mode selection subunit, configured to select, for a pixel block to be encoded, an encoding mode that satisfies a preset condition from a corresponding encoding mode set including an inter-sequence prediction encoding mode according to an encoding type of the image to be encoded;

and the pixel block coding subunit is used for coding the pixel block to be coded by adopting the selected coding mode.

Optionally, the mode selecting subunit is specifically configured to select, according to the coding type of the image to be coded, a coding mode that satisfies a cost-minimization-based rate-distortion optimization model for a pixel block to be coded from a corresponding coding mode set including inter-sequence prediction coding modes.

Optionally, the mode selection sub-unit comprises the following sub-units for intra coding type:

the first cost calculation subunit is used for calculating a first cost for coding the pixel block to be coded by adopting an intra-frame prediction coding mode;

the second cost calculation subunit is used for calculating a second cost of coding the pixel block to be coded by using the first associated image in the first image sequence as a reference frame and adopting an inter-sequence prediction coding mode;

and the first mode selection subunit is configured to use the coding mode corresponding to the minimum value of the first cost and the second cost as the coding mode selected for the pixel block to be coded.

Optionally, the mode selection subunit includes the following subunits for the inter-coding type:

the third price calculating subunit is used for calculating a third price for coding the pixel block to be coded by adopting an intra-frame prediction coding mode;

the fourth cost calculation subunit is used for calculating a fourth cost for coding the pixel block to be coded by adopting an inter-frame prediction coding mode;

a fifth cost calculating subunit, configured to calculate a fifth cost for encoding the pixel block to be encoded by using the second associated image in the first image sequence as a reference frame and using an inter-sequence prediction encoding mode;

and the second mode selection subunit is configured to use the coding mode corresponding to the minimum value of the third, fourth, and fifth costs as the coding mode selected for the pixel block to be coded.

Optionally, the pixel block partitioning unit is specifically configured to set the pixel block size according to an image resolution parameter corresponding to the video image sequence, and partition an image to be encoded into a plurality of pixel blocks according to the pixel block size.

Optionally, when searching for the matching pixel block of the pixel block to be encoded in the reconstructed image of the first associated image or the second associated image, the second cost calculation subunit or the fifth cost calculation subunit determines a search start coordinate in the reconstructed image according to a first motion vector obtained by searching for the matching pixel block in the reconstructed image in the encoding process for the image to be encoded and the coordinate of the pixel block to be encoded, and searches for the matching pixel block from the search start coordinate.

Optionally, in the process of encoding, if two or more consecutive pixel blocks have the same reference frame and the same motion vector, the second image encoding subunit merges the encoding description information of the consecutive pixel blocks into a group and writes the group into the stereoscopic video bitstream.

Optionally, the apparatus further comprises:

a storage unit operable to write the stereoscopic video bitstream generated by the image sequence encoding unit into a storage medium; or,

and the sending unit is used for sending the stereoscopic video bit stream generated by the image sequence coding unit to a receiving party.

Corresponding to one method provided by the application, the application also provides another method. Please refer to fig. 6, which is a flowchart illustrating another method embodiment provided in the present application, wherein the same parts as those in the above method embodiment are not repeated, and the following description focuses on differences. Another method provided by the present application includes:

step 601, obtaining a stereoscopic video bitstream to be decoded.

This step may obtain the stereoscopic video bitstream to be decoded in different ways, including reading the stereoscopic video bitstream from a storage medium, for example: reading from a file storing a stereoscopic video bitstream; a stereoscopic video bitstream transmitted by a transmitting side may also be received through a network.

Step 602, obtaining a first image sequence and a second image sequence from the stereoscopic video bitstream by decoding.

For the case that the stereoscopic video bit stream sequentially includes the encoded bit stream of the first image sequence and the encoded bit stream of the second image sequence, the decoding operation for the first image sequence may be performed first, and the decoding operation for the second image sequence is started after the decoding is completed or to a certain extent; if the stereoscopic video bitstream is a bitstream in an interleaved form, the encoded bitstreams for the first image sequence and the second image sequence may be separated in a predetermined manner, that is, the first image sequence bitstream and the second image sequence bitstream are separated from the stereoscopic video bitstream, and then the decoding operation is performed in the manner described above.

The first image sequence bitstream is a standard bitstream, which can be decoded by the prior art, and is not described herein again. For the second image sequence bitstream, each parameter related to the obtained video image sequence may be determined first, and each parameter may be negotiated by both the encoding and decoding parties in advance, or may be obtained after decoding according to parameter information carried in the stereoscopic video bitstream.

Then, using the parameter information, each image in the second sequence of images is generated by a decoding process of: and analyzing the bit stream corresponding to the image in the bit stream of the second image sequence, acquiring the coding mode and residual data corresponding to each pixel block by decoding, reconstructing the corresponding pixel block according to the coding mode and residual data of each pixel block, and synthesizing the image by using the reconstructed pixel blocks, for example, synthesizing the image from left to right and from top to bottom.

For the pixel block with the encoding mode being the inter-sequence prediction encoding mode, reconstruction needs to be performed by using a decoded and output corresponding image in the first image sequence as a reference frame according to reference frame index information obtained by decoding. Specifically, when reconstructing a pixel block belonging to the intra-frame coding type, the reference frame used includes: the images in the first image sequence have the same time information as the images to which the pixel blocks belong; when reconstructing a block of pixels belonging to the type of inter-coding, the reference frames used comprise, in the first sequence of images: the image with the same time information as the image to which the pixel block belongs, the image with the time information not later than the image to which the pixel block belongs, or the image with the time information earlier than the image to which the pixel block belongs and the image with the time information later than the image to which the pixel block belongs.

Through the above-described processing procedure, the first image sequence and the second image sequence are generated.

Step 603, obtaining two video image sequences bearing the stereoscopic video according to the obtained first image sequence and second image sequence.

Corresponding to the manner in which the two image sequences are determined from the two video image sequences during encoding, the step may also obtain the two video image sequences carrying the stereoscopic video from the first image sequence and the second image sequence output in step 602 in a corresponding manner. Several ways are listed below.

The first image sequence and the second image sequence output by step 602 may be regarded as each of the two video image sequences, respectively; or according to a down-sampling mode adopted by encoding, performing corresponding up-sampling processing on each frame of image in the first image sequence and the second image sequence, and then respectively taking the up-sampled first image sequence and second image sequence as each sequence of two video image sequences; for the merging and splitting manner based on the sub-sampling image sequence adopted during encoding, the step may first perform corresponding merging operation on the images with the same time information in the first image sequence and the second image sequence according to the splitting manner adopted during encoding to obtain the sub-sampling image sequence, and then correspondingly divide the sub-sampling image sequence into two image sequences according to the stereo video encoding mode adopted during encoding to obtain the two video image sequences bearing the stereo video.

The two video image sequences obtained by the above-described processing procedure are typically video image sequences corresponding to the left eye and the right eye, respectively. The two video image sequences can be sent to a stereoscopic video display to be displayed, and the vertical depth information of the scene can be acquired and stereoscopic impression can be experienced by a viewer through parallax in binocular sense.

Thus, another method embodiment provided by the present application is described through steps 601-603 above. In the above embodiment, the first image sequence and the second image sequence are obtained by decoding, and in the case that the stereoscopic video bit stream includes two or more second image sequences, the corresponding bit stream may also be decoded in the same manner to obtain each second image sequence, and finally obtain a corresponding number of video image sequences.

It can be seen from the above description that, on the basis of encoding by using the method described in the method embodiment provided before the present application, decoding by using the method provided in this embodiment can enable the encoded stereoscopic video to be decoded and restored correctly, thereby improving the data compression rate of the stereoscopic video and ensuring the normal playing of the stereoscopic video.

In the above, embodiments of another method of the present application are provided, and in correspondence therewith, embodiments of another apparatus of the present application are provided below. Please refer to fig. 7, which is a schematic diagram of another embodiment of the apparatus of the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

The device of the embodiment comprises: a stereoscopic video bit stream acquiring unit 701 configured to acquire a stereoscopic video bit stream to be decoded; a stereoscopic video bitstream decoding unit 702 configured to acquire a first image sequence and N-1 second image sequences from the stereoscopic video bitstream by decoding; wherein each image in the second sequence of images is obtained by the following decoding process: reconstructing each pixel block belonging to the image according to the corresponding coding mode and residual data carried by a second image sequence bit stream in the stereoscopic video bit stream, and synthesizing the image by using the reconstructed pixel blocks; the coding modes carried by the second image sequence bitstream comprise: inter-sequence prediction coding mode; a video image sequence generating unit 703, configured to obtain N video image sequences carrying a stereoscopic video according to the obtained first image sequence and the N-1 second image sequences.

Optionally, N is 2; the stereoscopic video bitstream decoding unit is specifically configured to obtain a first image sequence and a second image sequence from the stereoscopic video bitstream by decoding; the video image sequence generating unit is specifically configured to obtain two video image sequences bearing a stereoscopic video according to the obtained first image sequence and second image sequence.

Optionally, the video image sequence generating unit is specifically configured to use the first image sequence and the N-1 second image sequences as respective sequences of the N video image sequences.

Optionally, the video image sequence generating unit includes:

the up-sampling sub-unit is used for carrying out corresponding up-sampling processing on each frame of image in the first image sequence and the second image sequence according to a down-sampling mode adopted by coding;

and the image sequence generation subunit is used for respectively taking the first image sequence and the N-1 second image sequences after the up-sampling as each sequence in the N video image sequences.

Optionally, the video image sequence generating unit includes:

a sub-sampling sequence reduction subunit, configured to perform, according to a splitting manner adopted during encoding, corresponding merging operations on images with the same time information in the first image sequence and the second image sequence, so as to obtain a sub-sampling image sequence;

and the video image sequence dividing subunit is used for correspondingly dividing the sub-sampling image sequence into N image sequences according to a stereo video coding mode adopted during coding, so as to obtain the N video image sequences bearing the stereo video.

Optionally, when the stereoscopic video bitstream decoding unit reconstructs the pixel block belonging to the intra-frame coding type according to the inter-sequence prediction coding mode, the adopted reference frame includes: and the images in the first image sequence have the same time information as the images to which the pixel blocks belong.

Optionally, when the stereoscopic video bitstream decoding unit reconstructs the pixel block belonging to the inter-frame coding type according to the inter-sequence prediction coding mode, the adopted reference frame includes the following images in the first image sequence:

an image having the same time information as an image to which the pixel block belongs; or,

the time information is not later than the image of the image to which the pixel block belongs; or,

the time information is earlier than the image of the image to which the pixel block belongs, and the time information is later than the image of the image to which the pixel block belongs.

Optionally, the stereoscopic video bitstream obtaining unit is specifically configured to read a stereoscopic video bitstream from a storage medium, or receive a stereoscopic video bitstream sent by a sender.

Please refer to fig. 8, which is a schematic diagram of an example of a system provided in the present application. As shown in fig. 8, a system 800 includes an apparatus 801 (referred to as a stereoscopic video encoding apparatus in this embodiment) provided by one apparatus embodiment described above, and an apparatus 802 (referred to as a stereoscopic video decoding apparatus in this embodiment) provided by another apparatus embodiment described above, as shown in fig. 8.

The stereoscopic video encoding apparatus 801 includes: a video image sequence obtaining unit 801-1, an image sequence determining unit 801-2 and an image sequence encoding unit 801-3, wherein the functions of each unit are described in the previous embodiment of the apparatus, and are not described herein again. The stereoscopic video decoding device 802 includes: the stereo video bitstream acquisition unit 802-1, the stereo video bitstream decoding unit 802-2, and the video image sequence generation unit 802-3, wherein the functions of each unit are described in the previous embodiments of the apparatus, which are not described herein again.

In specific implementation, the stereoscopic video encoding apparatus 801 and the stereoscopic video decoding apparatus 802 may be respectively disposed on different electronic devices, where the electronic devices include a personal computer, a mobile computing device, and the like, and the mobile computing device may include but is not limited to: a laptop, a tablet, a mobile phone, and/or other smart devices, etc. The stereoscopic video encoding device 801 may generate a stereoscopic video bit stream through an encoding operation on a video image sequence captured by the imaging device or a video image sequence obtained by reading a multimedia resource file containing a stereoscopic video and transcoding, and send the stereoscopic video bit stream through a network, where an encoding mode used in an encoding process includes: inter-sequence prediction coding mode; the stereoscopic video decoding device 802 can receive the stereoscopic video bitstream from the network and perform a corresponding decoding operation to restore the sequence of video images bearing the stereoscopic video for presentation by a corresponding stereoscopic video display. Because the stereo video encoding device 801 adopts the inter-sequence prediction encoding mode in the encoding process, the compression rate of stereo video data can be effectively improved, the occupation of network bandwidth is reduced, and the situation of network congestion and packet loss is avoided, thereby providing guarantee for the stereo video decoder 802 to obtain smooth stereo video through decoding.

In specific implementation, the stereoscopic video encoding apparatus 801 and the stereoscopic video decoding apparatus 802 may also be disposed on the same electronic device, where the electronic device includes a personal computer or a mobile computing device, and the mobile computing device may include but is not limited to: a laptop, a tablet, a mobile phone, and/or other smart devices, etc. The stereoscopic video encoding apparatus 801 may write the stereoscopic video bitstream generated using the inter-sequence prediction encoding mode into a storage medium, for example, in the form of a file stored on a hard disk; when the stereoscopic video needs to be played, the stereoscopic video decoding apparatus 802 may read a stereoscopic video bitstream from a storage medium, for example: and reading from the hard disk file storing the stereoscopic video bit stream, and restoring a video image sequence carrying the stereoscopic video through corresponding decoding operation for playing. Because the stereo video encoding device 801 adopts the inter-sequence prediction encoding mode in the encoding process, the compression rate of stereo video data can be effectively improved, the occupation of a storage medium is reduced, and the storage space is saved.

In addition, the present application further provides an embodiment of a system, please refer to fig. 9, which shows a schematic diagram of an embodiment of the system provided in the present application.

Among other things, system 900 may include: a processor 901, a System control unit 902 coupled to the processor, a System Memory (System Memory)903 coupled to the System control unit, a non-volatile Memory (non-NVM) or storage device 904 coupled to the System control unit, and a network interface 905 coupled to the System control unit.

The processor 901 may include at least one processor, and each processor may be a single-core processor or a multi-core processor. The processor 901 may include any combination of general-purpose processors and special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.).

The system control unit 902 may include any corresponding interface controller that provides an interface for at least one of the processors 901 and/or any device or component in communication with the system control unit 902.

The system control unit 902 may include at least one memory controller that provides an interface to the system memory 903. The system memory 903 may be used to load and store data and/or instructions. The system memory 903 may include any volatile memory, such as Dynamic Random Access Memory (DRAM).

The non-volatile memory or storage 904 may include at least one tangible, non-transitory computer-readable medium for storing data and/or instructions. The non-volatile memory or storage 904 may include any form of non-volatile memory, such as flash memory, and/or any non-volatile storage device, such as at least one Hard Disk Drive (HDD), at least one optical disk drive, and/or at least one Digital Versatile Disk (DVD) drive.

The system memory 903 and the non-volatile storage or storage 904 may store a temporary copy and a persistent copy of the instructions 907, respectively.

The instructions in the instructions 907, when executed by at least one of the processors 901, may cause the system 900 to perform the following operations: acquiring N video image sequences bearing a stereoscopic video, wherein N is an integer greater than or equal to 2; determining a first image sequence and N-1 second image sequences based on the N video image sequences; encoding the first image sequence and the N-1 second image sequences to generate a stereoscopic video bitstream; wherein the encoding mode adopted for encoding the second image sequence comprises: inter-sequence prediction coding mode.

Alternatively, the system 900 is caused to perform the following operations: acquiring a stereoscopic video bit stream to be decoded; obtaining a first image sequence and N-1 second image sequences from the stereoscopic video bit stream by decoding; obtaining N video image sequences bearing a stereoscopic video according to the obtained first image sequence and the N-1 second image sequences; wherein each image in the second sequence of images is obtained by the following decoding process: reconstructing each pixel block belonging to the image according to the corresponding coding mode and residual data carried by a second image sequence bit stream in the stereoscopic video bit stream, and synthesizing the image by using the reconstructed pixel blocks; the coding modes carried by the second image sequence bitstream comprise: inter-sequence prediction coding mode.

The network interface 905 may include a transceiver that provides a wireless interface for the system 900, through which the system 900 may communicate across a network and/or with other devices. The network interface 905 may include any hardware and/or firmware. The network interface 905 may include multiple antennas that provide a multiple-input, multiple-output wireless interface. In particular implementations, the network interface 905 may be a network adapter, a wireless network adapter, a telephone modem, and/or a wireless modem.

In particular, at least one of the processors 901 may be packaged together with control logic of at least one of the controllers in the system control unit 902. In specific implementation, at least one of the processors 901 may be packaged together with the control logic of at least one of the controllers in the System control unit 902 to form a System in Package-SiP. In particular, at least one of the processors 901 may be integrated on the same chip with the control logic of at least one of the controllers in the system control unit 902. In a specific implementation, at least one of the processors 901 may be integrated on the same Chip with the control logic of at least one of the controllers in the System control unit 902, so as to form a System on Chip (SoC).

The system 900 may include an input/output (I/O) device 906. The input/output devices 906 may include a user interface for user interaction with the system 900 and/or a peripheral component interface for peripheral components to interact with the system 900.

In various embodiments, the user interface may include, but is not limited to: a display (e.g., a liquid crystal display, a touch screen display, etc.), a speaker, a microphone, at least one camera device (e.g., a camera, and/or a camcorder), a flash, and a keyboard.

In various embodiments, the peripheral component interface may include, but is not limited to: a non-volatile memory port, an audio jack, and a power interface.

In various embodiments, the system 900 may be deployed on an electronic device such as a personal computer, mobile computing device, and the like, which may include, but is not limited to: a laptop, a tablet, a mobile phone, and/or other smart devices, etc. In different embodiments, the system 900 may include more or fewer components, and/or different architectures.

This description may include various exemplary embodiments disclosed below.

In exemplary embodiment 1, a method may include: acquiring N video image sequences bearing a stereoscopic video, wherein N is an integer greater than or equal to 2; determining a first image sequence and N-1 second image sequences based on the N video image sequences; encoding the first image sequence and the N-1 second image sequences to generate a stereoscopic video bitstream; wherein the encoding mode adopted for encoding the second image sequence comprises: inter-sequence prediction coding mode; the inter-sequence prediction encoding mode is to perform prediction encoding on a pixel block in the second image sequence using an image in the first image sequence as a reference frame.

In exemplary embodiment 2, N described in exemplary embodiment 1 is 2, and the N video image sequences are video image sequences corresponding to the left eye and the right eye, respectively.

In exemplary embodiment 3, the determining a first sequence of images and N-1 second sequences of images based on the N sequences of video images as described in any of exemplary embodiments 1-2 comprises: and selecting one video image sequence from the N video image sequences as the first image sequence, and respectively using other video image sequences as each sequence in the N-1 second image sequences.

In exemplary embodiment 4, the determining a first sequence of images and N-1 second sequences of images based on the N sequences of video images as described in any of exemplary embodiments 1-3 further comprises: respectively performing down-sampling on each frame of image in each video image sequence according to a preset down-sampling mode; the selecting one video image sequence from the N video image sequences as the first image sequence, and using other video image sequences as each sequence in the N-1 second image sequences, respectively, includes: and selecting one video image sequence from the N video image sequences after the down sampling as the first image sequence, and respectively using other video image sequences after the down sampling as each sequence in the N-1 second image sequences.

In exemplary embodiment 5, the determining a first sequence of images and N-1 second sequences of images based on the N sequences of video images as described in any of exemplary embodiments 1-4 comprises: generating a sub-sampling image sequence according to the N video image sequences according to a preset preprocessing coding mode; splitting each frame of image in the sub-sampling image sequence into N images with the same time information according to a splitting mode corresponding to the preprocessing coding mode, thereby obtaining N image sequences; and selecting a first image sequence from the N image sequences, and taking other image sequences as the N-1 second image sequences.

In exemplary embodiment 6, any of exemplary embodiments 1-5 encodes the image to be encoded in the second sequence of images using the steps of: dividing an image to be coded into a plurality of pixel blocks to be coded according to the size of the pixel blocks set in a preset mode; and sequentially executing the following coding operations on each pixel block to be coded: according to the coding type of the image to be coded, selecting a coding mode meeting preset conditions for a pixel block to be coded from a corresponding coding mode set comprising an inter-sequence prediction coding mode; and coding the pixel block to be coded by adopting the selected coding mode.

In exemplary embodiment 7, the encoding mode satisfying the preset condition according to any one of exemplary embodiments 1 to 6 includes: and the coding mode of the rate distortion optimization model based on cost minimization is satisfied.

In exemplary embodiment 8, any one of exemplary embodiments 1-7 is directed to an intra-coding type, the respective set of coding modes comprising: intra-frame prediction coding mode, inter-sequence prediction coding mode; the method for selecting the coding mode meeting the preset conditions for the pixel block to be coded from the corresponding coding mode set comprising the inter-sequence prediction coding modes according to the coding type of the image to be coded comprises the following steps: calculating a first cost for coding the pixel block to be coded by adopting an intra-frame prediction coding mode; calculating a second cost for coding the pixel block to be coded by using a sequence-to-sequence prediction coding mode by taking a first associated image in a first image sequence as a reference frame; and taking the coding mode corresponding to the minimum value of the first cost and the second cost as the coding mode selected for the pixel block to be coded.

In exemplary embodiment 9, the first associated image of any of exemplary embodiments 1-8, comprising: and the image has the same time information as the image to be coded.

In exemplary embodiment 10, any of exemplary embodiments 1-9 is directed to an inter-coding type, the respective set of preset coding modes comprising: an intra-prediction encoding mode, an inter-prediction encoding mode, and an inter-sequence prediction encoding mode; the method for selecting the coding mode meeting the preset conditions for the pixel block to be coded from the corresponding coding mode set comprising the inter-sequence prediction coding modes according to the coding type of the image to be coded comprises the following steps: calculating a third cost for coding the pixel block to be coded by adopting an intra-frame prediction coding mode; calculating a fourth cost for encoding the pixel block to be encoded by adopting an inter-frame prediction encoding mode; calculating a fifth cost of coding the pixel block to be coded by using a sequence-to-sequence prediction coding mode by taking a second associated image in the first image sequence as a reference frame; and taking the coding mode corresponding to the minimum value in the third, fourth and fifth costs as the coding mode selected for the pixel block to be coded.

In exemplary embodiment 11, the second associated image of any of exemplary embodiments 1-10 comprises: an image having the same temporal information as the image to be encoded; or the time information is not later than the image of the image to be coded; or, the time information is earlier than the image of the image to be encoded and the time information is later than the image of the image to be encoded.

In exemplary embodiment 12, the setting of the pixel block size in the preset manner as described in any of exemplary embodiments 1-11 includes: and setting the pixel block size according to the image resolution parameter corresponding to the video image sequence.

In exemplary embodiment 13, any of exemplary embodiments 1 to 12 determines a search start coordinate in the reconstructed image based on a first motion vector obtained by searching a matching pixel block in the reconstructed image and coordinates of the pixel block to be encoded in the encoding process for the image to be encoded when searching the matching pixel block in the reconstructed image of the first related image or the second related image in the process of calculating the second cost or the fifth cost, and searches the matching pixel block from the search start coordinate.

In exemplary embodiment 14, any of exemplary embodiments 1 to 13, in encoding an image to be encoded in a second image sequence, if reference frames of two or more consecutive pixel blocks are the same and motion vectors are the same, merge encoding description information of the consecutive pixel blocks into a group to be written into the stereoscopic video bitstream.

In exemplary embodiment 15, any of exemplary embodiments 1-14, after encoding the first sequence of images and the N-1 second sequences of images to generate the stereoscopic video bitstream, comprises: writing the stereoscopic video bitstream to a storage medium; alternatively, the stereoscopic video bitstream is transmitted to a receiving side.

In an exemplary embodiment 16, an apparatus may comprise: the video image sequence acquisition unit is used for acquiring N video image sequences bearing a stereo video, wherein N is an integer greater than or equal to 2; an image sequence determination unit for determining a first image sequence and N-1 second image sequences based on the N video image sequences; an image sequence encoding unit for encoding the first image sequence and the N-1 second image sequences to generate a stereoscopic video bit stream; wherein the encoding mode adopted for encoding the second image sequence comprises: inter-sequence prediction coding mode; the inter-sequence prediction encoding mode is to perform prediction encoding on a pixel block in the second image sequence using an image in the first image sequence as a reference frame.

In exemplary embodiment 17, N of exemplary embodiment 16 is 2, and the video image sequence acquisition unit is specifically configured to acquire two video image sequences corresponding to a left eye and a right eye, respectively.

In exemplary embodiment 18, the image sequence determining unit of any of exemplary embodiments 16 to 17 is specifically configured to select one video image sequence from the N video image sequences as the first image sequence, and respectively take other video image sequences as each of the N-1 second image sequences.

In exemplary embodiment 19, the image sequence determining unit of any of exemplary embodiments 16-18 includes: the down-sampling sub-unit is used for respectively down-sampling each frame of image in each video image sequence according to a preset down-sampling mode; and the sequence selection subunit is used for selecting one video image sequence from the N video image sequences subjected to down-sampling as the first image sequence, and respectively using other video image sequences subjected to down-sampling as each sequence in the N-1 second image sequences.

In exemplary embodiment 20, the image sequence determining unit of any of exemplary embodiments 16-19 includes: a sub-sampling sequence generation subunit, configured to generate a sub-sampling image sequence according to the N video image sequences according to a preset pre-processing coding mode; a sub-sampling sequence splitting subunit, configured to split each frame of image in the sub-sampling image sequence into N images with the same time information according to a splitting manner corresponding to the preprocessing coding mode, so as to obtain N image sequences; and the image sequence selection subunit is used for selecting a first image sequence from the N image sequences and taking other image sequences as the N-1 second image sequences.

In exemplary embodiment 21, the image sequence encoding unit of any of exemplary embodiments 16-20 includes: a first image sequence encoding subunit operable to encode the first image sequence; a second image sequence encoding subunit, configured to encode the N-1 second image sequences; the second image sequence encoding subunit includes: the pixel block dividing subunit is used for dividing the image to be coded into a plurality of pixel blocks to be coded according to the pixel block size set according to a preset mode; the cyclic control subunit is used for sequentially calling the following mode selection subunits and the pixel block coding subunits for coding each pixel block to be coded; a mode selection subunit, configured to select, for a pixel block to be encoded, an encoding mode that satisfies a preset condition from a corresponding encoding mode set including an inter-sequence prediction encoding mode according to an encoding type of the image to be encoded; and the pixel block coding subunit is used for coding the pixel block to be coded by adopting the selected coding mode.

In exemplary embodiment 22, the mode selection subunit of any of exemplary embodiments 16 to 21 is specifically configured to select, for a block of pixels to be encoded, a coding mode that satisfies a cost minimization based rate-distortion optimization model from a corresponding set of coding modes including inter-sequence prediction coding modes according to a coding type of the image to be encoded.

In exemplary embodiment 23, the mode selection subunit of any of exemplary embodiments 16-22 comprises the following subunits for intra coding type: the first cost calculation subunit is used for calculating a first cost for coding the pixel block to be coded by adopting an intra-frame prediction coding mode; the second cost calculation subunit is used for calculating a second cost of coding the pixel block to be coded by using the first associated image in the first image sequence as a reference frame and adopting an inter-sequence prediction coding mode; and the first mode selection subunit is configured to use the coding mode corresponding to the minimum value of the first cost and the second cost as the coding mode selected for the pixel block to be coded.

In exemplary embodiment 24, the mode selection subunit of any of exemplary embodiments 16-23 includes the following subunits for inter coding types: the third price calculating subunit is used for calculating a third price for coding the pixel block to be coded by adopting an intra-frame prediction coding mode; the fourth cost calculation subunit is used for calculating a fourth cost for coding the pixel block to be coded by adopting an inter-frame prediction coding mode; a fifth cost calculating subunit, configured to calculate a fifth cost for encoding the pixel block to be encoded by using the second associated image in the first image sequence as a reference frame and using an inter-sequence prediction encoding mode; and the second mode selection subunit is configured to use the coding mode corresponding to the minimum value of the third, fourth, and fifth costs as the coding mode selected for the pixel block to be coded.

In exemplary embodiment 25, the pixel block partitioning sub-unit of any of exemplary embodiments 16 to 24 is specifically configured to set the pixel block size according to an image resolution parameter corresponding to the sequence of video images, and partition an image to be encoded into a plurality of pixel blocks according to the pixel block size.

In exemplary embodiment 26, the second cost calculation subunit or the fifth cost calculation subunit described in any of exemplary embodiments 16 to 25, when searching for a matching pixel block of the pixel block to be encoded in a reconstructed image of a first related image or a second related image, determines a search start coordinate in the reconstructed image based on a first motion vector obtained by searching for the matching pixel block in the reconstructed image in an encoding process for the image to be encoded and coordinates of the pixel block to be encoded, and searches for the matching pixel block starting from the search start coordinate.

In exemplary embodiment 27, the second image coding sub-unit as described in any of exemplary embodiments 16 to 26, during coding, if two or more consecutive pixel blocks have the same reference frame and the same motion vector, merges the coding description information of the consecutive pixel blocks into a group to be written into the stereoscopic video bitstream.

In exemplary embodiment 28, any of exemplary embodiments 16-27 further comprises: a storage unit operable to write the stereoscopic video bitstream generated by the image sequence encoding unit into a storage medium; or, the sending unit is used for sending the stereoscopic video bit stream generated by the image sequence coding unit to a receiving party.

In an example embodiment 29, a method may comprise: acquiring a stereoscopic video bit stream to be decoded; obtaining a first image sequence and N-1 second image sequences from the stereoscopic video bit stream through decoding, wherein N is an integer greater than or equal to 2; obtaining N video image sequences bearing a stereoscopic video according to the obtained first image sequence and the N-1 second image sequences; wherein each image in the second sequence of images is obtained by the following decoding process: reconstructing each pixel block belonging to the image according to the corresponding coding mode and residual data carried by a second image sequence bit stream in the stereoscopic video bit stream, and synthesizing the image by using the reconstructed pixel blocks; the coding modes carried by the second image sequence bitstream comprise: inter-sequence prediction coding mode.

In exemplary embodiment 30, N in exemplary embodiment 29 is 2, and the N video image sequences are video image sequences corresponding to a left eye and a right eye, respectively.

In exemplary embodiment 31, the deriving N video image sequences carrying stereoscopic video from the acquired first image sequence and N-1 second image sequences as in any of exemplary embodiments 29-30 comprises: and respectively taking the first image sequence and the N-1 second image sequences as each sequence in the N video image sequences.

In exemplary embodiment 32, the deriving N video image sequences carrying stereoscopic video from the acquired first image sequence and N-1 second image sequences as in any of exemplary embodiments 29-31 further comprises: performing corresponding up-sampling processing on each frame of image in the first image sequence and the second image sequence according to a down-sampling mode adopted by encoding; the taking the first image sequence and the N-1 second image sequences as respective ones of the N video image sequences comprises: and respectively taking the first image sequence and the N-1 second image sequences after up-sampling as each sequence in the N video image sequences.

In exemplary embodiment 33, the deriving N video image sequences carrying stereoscopic video from the acquired first image sequence and N-1 second image sequences as in any of exemplary embodiments 29-32 comprises: according to a splitting mode adopted during coding, corresponding merging operation is executed aiming at images with the same time information in the first image sequence and the second image sequence, and a sub-sampling image sequence is obtained; and correspondingly dividing the sub-sampling image sequence into N image sequences according to a stereo video coding mode adopted during coding, namely obtaining the N video image sequences bearing the stereo video.

In exemplary embodiment 34, any of exemplary embodiments 29-33, when reconstructing a block of pixels belonging to an intra coding type according to an inter-sequence prediction coding mode, the reference frame used comprises: and the images in the first image sequence have the same time information as the images to which the pixel blocks belong.

In exemplary embodiment 35 any of exemplary embodiments 29-34, when reconstructing a block of pixels belonging to an inter-coding type according to an inter-sequence prediction coding mode, the reference frames used comprise the following pictures in the first sequence of pictures: an image having the same time information as an image to which the pixel block belongs; or the time information is not later than the image of the image to which the pixel block belongs; or, the time information is earlier than the image of the image to which the pixel block belongs, and the time information is later than the image of the image to which the pixel block belongs.

In exemplary embodiment 36, the obtaining a stereoscopic video bitstream to be decoded of any of exemplary embodiments 29-35, comprising: reading a stereoscopic video bitstream from a storage medium; alternatively, a stereoscopic video bitstream transmitted by a transmitting side is received.

In an exemplary embodiment 37, an apparatus may comprise: a stereoscopic video bit stream acquisition unit for acquiring a stereoscopic video bit stream to be decoded; a stereoscopic video bit stream decoding unit configured to acquire a first image sequence and N-1 second image sequences from the stereoscopic video bit stream by decoding, where N is an integer greater than or equal to 2; wherein each image in the second sequence of images is obtained by the following decoding process: reconstructing each pixel block belonging to the image according to the corresponding coding mode and residual data carried by a second image sequence bit stream in the stereoscopic video bit stream, and synthesizing the image by using the reconstructed pixel blocks; the coding modes carried by the second image sequence bitstream comprise: inter-sequence prediction coding mode; and the video image sequence generating unit is used for obtaining N video image sequences bearing the stereoscopic video according to the acquired first image sequence and the N-1 second image sequences.

In exemplary embodiment 38, N is 2 as described in exemplary embodiment 37; the stereoscopic video bitstream decoding unit is specifically configured to obtain a first image sequence and a second image sequence from the stereoscopic video bitstream by decoding; the video image sequence generating unit is specifically configured to obtain two video image sequences bearing a stereoscopic video according to the obtained first image sequence and second image sequence.

In exemplary embodiment 39, the video image sequence generating unit of any of exemplary embodiments 37-38 is specifically configured to treat the first image sequence and the N-1 second image sequences as respective ones of the N video image sequences.

In exemplary embodiment 40, the video image sequence generation unit of any of exemplary embodiments 37-39, comprising: the up-sampling sub-unit is used for carrying out corresponding up-sampling processing on each frame of image in the first image sequence and the second image sequence according to a down-sampling mode adopted by coding; and the image sequence generation subunit is used for respectively taking the first image sequence and the N-1 second image sequences after the up-sampling as each sequence in the N video image sequences.

In exemplary embodiment 41, the video image sequence generation unit of any of exemplary embodiments 37-40, comprising: a sub-sampling sequence reduction subunit, configured to perform, according to a splitting manner adopted during encoding, corresponding merging operations on images with the same time information in the first image sequence and the second image sequence, so as to obtain a sub-sampling image sequence; and the video image sequence dividing subunit is used for correspondingly dividing the sub-sampling image sequence into N image sequences according to a stereo video coding mode adopted during coding, so as to obtain the N video image sequences bearing the stereo video.

In exemplary embodiment 42, the stereoscopic video bitstream decoding unit as described in any of exemplary embodiments 37 to 41, when reconstructing the pixel block belonging to the intra coding type according to the inter-sequence prediction coding mode, uses the reference frame comprising: and the images in the first image sequence have the same time information as the images to which the pixel blocks belong.

In exemplary embodiment 43, the stereoscopic video bitstream decoding unit as described in any of exemplary embodiments 37 to 42, when reconstructing a pixel block belonging to an inter-coding type according to an inter-sequence prediction coding mode, the reference frames used comprise the following pictures in the first picture sequence: an image having the same time information as an image to which the pixel block belongs; or the time information is not later than the image of the image to which the pixel block belongs; or, the time information is earlier than the image of the image to which the pixel block belongs, and the time information is later than the image of the image to which the pixel block belongs.

In exemplary embodiment 44, the stereoscopic video bitstream acquisition unit as described in any of exemplary embodiments 37 to 43, in particular, is configured to read a stereoscopic video bitstream from a storage medium, or to receive a stereoscopic video bitstream transmitted by a transmitting side.

In exemplary embodiment 45, a machine-readable medium may store instructions that when read and executed by a processor perform the method of any of exemplary embodiments 1-15.

In exemplary embodiment 46, a machine-readable medium may store instructions that when read and executed by a processor perform the method of any of exemplary embodiments 29-36.

In an exemplary embodiment 47, a system may comprise: a processor, and a memory; the memory is configured to store instructions that, when read and executed by the processor, perform the method of any of exemplary embodiments 1-15.

In an exemplary embodiment 48, a system may comprise: a processor, and a memory; the memory is configured to store instructions that, when read and executed by the processor, perform the method of any of exemplary embodiments 29-36.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

Claims

1. A method, comprising:

2. The method of claim 1, wherein N is 2; the N video image sequences are video image sequences corresponding to the left eye and the right eye, respectively.

3. The method of claim 1, wherein determining a first image sequence and N-1 second image sequences based on the N video image sequences comprises:

and selecting one video image sequence from the N video image sequences as the first image sequence, and respectively using other video image sequences as each sequence in the N-1 second image sequences.

4. The method of claim 3, wherein determining a first image sequence and N-1 second image sequences based on the N video image sequences further comprises:

respectively performing down-sampling on each frame of image in each video image sequence according to a preset down-sampling mode;

the selecting one video image sequence from the N video image sequences as the first image sequence, and using other video image sequences as each sequence in the N-1 second image sequences, respectively, includes: and selecting one video image sequence from the N video image sequences after the down sampling as the first image sequence, and respectively using other video image sequences after the down sampling as each sequence in the N-1 second image sequences.

5. The method of claim 1, wherein determining a first image sequence and N-1 second image sequences based on the N video image sequences comprises:

generating a sub-sampling image sequence according to the N video image sequences according to a preset preprocessing coding mode;

splitting each frame of image in the sub-sampling image sequence into N images with the same time information according to a splitting mode corresponding to the preprocessing coding mode, thereby obtaining N image sequences;

and selecting a first image sequence from the N image sequences, and taking other image sequences as the N-1 second image sequences.

6. Method according to claim 1, characterized in that the pictures to be coded in the second picture sequence are coded by the following steps:

dividing an image to be coded into a plurality of pixel blocks to be coded according to the size of the pixel blocks set in a preset mode;

and sequentially executing the following coding operations on each pixel block to be coded:

according to the coding type of the image to be coded, selecting a coding mode meeting preset conditions for a pixel block to be coded from a corresponding coding mode set comprising an inter-sequence prediction coding mode; and coding the pixel block to be coded by adopting the selected coding mode.

7. The method according to claim 6, wherein the coding mode satisfying the preset condition comprises: and the coding mode of the rate distortion optimization model based on cost minimization is satisfied.

8. The method of claim 7, wherein for an intra-coding type, the respective set of coding modes comprises: intra-frame prediction coding mode, inter-sequence prediction coding mode;

the method for selecting the coding mode meeting the preset conditions for the pixel block to be coded from the corresponding coding mode set comprising the inter-sequence prediction coding modes according to the coding type of the image to be coded comprises the following steps:

calculating a first cost for coding the pixel block to be coded by adopting an intra-frame prediction coding mode;

calculating a second cost for coding the pixel block to be coded by using a sequence-to-sequence prediction coding mode by taking a first associated image in a first image sequence as a reference frame;

and taking the coding mode corresponding to the minimum value of the first cost and the second cost as the coding mode selected for the pixel block to be coded.

9. The method of claim 8, wherein the first associated image comprises: and the image has the same time information as the image to be coded.

10. The method of claim 7, wherein for the inter-coding type, the respective preset set of coding modes comprises: an intra-prediction encoding mode, an inter-prediction encoding mode, and an inter-sequence prediction encoding mode;

calculating a third cost for coding the pixel block to be coded by adopting an intra-frame prediction coding mode;

calculating a fourth cost for encoding the pixel block to be encoded by adopting an inter-frame prediction encoding mode;

calculating a fifth cost of coding the pixel block to be coded by using a sequence-to-sequence prediction coding mode by taking a second associated image in the first image sequence as a reference frame;

and taking the coding mode corresponding to the minimum value in the third, fourth and fifth costs as the coding mode selected for the pixel block to be coded.

11. The method of claim 10, wherein the second associated image comprises:

an image having the same temporal information as the image to be encoded; or,

the time information is not later than the image of the image to be coded; or,

the time information is earlier than the image of the image to be encoded and the time information is later than the image of the image to be encoded.

12. The method of claim 6, wherein setting the pixel block size according to a preset manner comprises: and setting the pixel block size according to the image resolution parameter corresponding to the video image sequence.

13. The method according to claim 8 or 10, wherein, when searching for a matching pixel block of the pixel block to be encoded in a reconstructed image of a first associated image or a second associated image in the process of calculating a second cost or a fifth cost, a search start coordinate in the reconstructed image is determined based on a first motion vector obtained by searching for the matching pixel block in the reconstructed image in the process of encoding for the image to be encoded and coordinates of the pixel block to be encoded, and the matching pixel block is searched from the search start coordinate.

14. The method according to claim 6, wherein in the process of encoding the image to be encoded in the second image sequence, if the reference frames of two or more consecutive pixel blocks are the same and the motion vectors are the same, the encoding description information of the consecutive pixel blocks are merged into a group to be written into the stereoscopic video bitstream.

15. The method of claim 1, after encoding the first image sequence and the N-1 second image sequences to generate a stereoscopic video bitstream, comprising:

writing the stereoscopic video bitstream to a storage medium; or,

sending the stereoscopic video bitstream to a recipient.

16. An apparatus, comprising:

the video image sequence acquisition unit is used for acquiring N video image sequences bearing a stereo video, wherein N is an integer greater than or equal to 2;

an image sequence determination unit for determining a first image sequence and N-1 second image sequences based on the N video image sequences;

an image sequence encoding unit for encoding the first image sequence and the N-1 second image sequences to generate a stereoscopic video bit stream; wherein the encoding mode adopted for encoding the second image sequence comprises: inter-sequence prediction coding mode; the inter-sequence prediction encoding mode is to perform prediction encoding on a pixel block in the second image sequence using an image in the first image sequence as a reference frame.

17. The apparatus according to claim 16, wherein N is 2, and the video image sequence acquisition unit is specifically configured to acquire two video image sequences corresponding to the left eye and the right eye, respectively.

18. The apparatus according to claim 16, wherein the image sequence determining unit is specifically configured to select one of the N video image sequences as the first image sequence, and to respectively take other video image sequences as each of the N-1 second image sequences.

19. The apparatus of claim 18, wherein the image sequence determination unit comprises:

20. The apparatus according to claim 16, wherein the image sequence determining unit comprises:

21. The apparatus of claim 16, wherein the image sequence encoding unit comprises:

the second image sequence encoding subunit includes:

22. The apparatus according to claim 21, wherein the mode selection subunit is configured to select, for a block of pixels to be encoded, a coding mode that satisfies a cost-minimization based rate-distortion optimization model from a respective set of coding modes comprising inter-sequence prediction coding modes, in particular according to a coding type of the image to be encoded.

23. The apparatus of claim 22, wherein the mode selection subunit comprises the following subunits for intra coding type:

24. The apparatus of claim 22, wherein the mode selection subunit comprises the following subunits for inter coding types:

25. The apparatus of claim 21, wherein the pixel block partitioning sub-unit is specifically configured to set the pixel block size according to an image resolution parameter corresponding to the sequence of video images, and to partition an image to be encoded into a plurality of pixel blocks according to the pixel block size.

26. The apparatus according to claim 23 or 24, wherein the second cost calculation subunit or the fifth cost calculation subunit, when searching for a matching pixel block of the pixel block to be encoded in a reconstructed image of the first related image or the second related image, determines a search start coordinate in the reconstructed image based on a first motion vector obtained by searching for a matching pixel block in the reconstructed image during encoding for the image to be encoded and coordinates of the pixel block to be encoded, and searches for the matching pixel block starting from the search start coordinate.

27. The apparatus of claim 16, wherein the second image coding sub-unit merges the coding description information of two or more consecutive pixel blocks into a group written in the stereoscopic video bitstream if the reference frames of the consecutive pixel blocks are the same and the motion vectors of the consecutive pixel blocks are the same during the coding process.

28. The apparatus of claim 16, further comprising:

29. A method, comprising:

acquiring a stereoscopic video bit stream to be decoded;

obtaining a first image sequence and N-1 second image sequences from the stereoscopic video bit stream through decoding, wherein N is an integer greater than or equal to 2;

obtaining N video image sequences bearing a stereoscopic video according to the obtained first image sequence and the N-1 second image sequences;

wherein each image in the second sequence of images is obtained by the following decoding process: reconstructing each pixel block belonging to the image according to the corresponding coding mode and residual data carried by a second image sequence bit stream in the stereoscopic video bit stream, and synthesizing the image by using the reconstructed pixel blocks; the coding modes carried by the second image sequence bitstream comprise: inter-sequence prediction coding mode.

30. The method of claim 29, wherein N is 2; the N video image sequences are video image sequences corresponding to the left eye and the right eye, respectively.

31. The method according to claim 29, wherein said deriving N video image sequences carrying stereoscopic video from said acquired first image sequence and N-1 second image sequences comprises:

and respectively taking the first image sequence and the N-1 second image sequences as each sequence in the N video image sequences.

32. The method according to claim 29, wherein said deriving N video image sequences carrying stereoscopic video from said acquired first image sequence and N-1 second image sequences further comprises:

performing corresponding up-sampling processing on each frame of image in the first image sequence and the second image sequence according to a down-sampling mode adopted by encoding;

the taking the first image sequence and the N-1 second image sequences as respective ones of the N video image sequences comprises: and respectively taking the first image sequence and the N-1 second image sequences after up-sampling as each sequence in the N video image sequences.

33. The method according to claim 29, wherein said deriving N video image sequences carrying stereoscopic video from said acquired first image sequence and N-1 second image sequences comprises:

according to a splitting mode adopted during coding, corresponding merging operation is executed aiming at images with the same time information in the first image sequence and the second image sequence, and a sub-sampling image sequence is obtained;

and correspondingly dividing the sub-sampling image sequence into N image sequences according to a stereo video coding mode adopted during coding, namely obtaining the N video image sequences bearing the stereo video.

34. The method according to claim 29, wherein the reference frame used when reconstructing a block of pixels belonging to the intra coding type according to the inter-sequence prediction coding mode comprises: and the images in the first image sequence have the same time information as the images to which the pixel blocks belong.

35. Method according to claim 29, characterized in that when reconstructing a block of pixels belonging to the inter coding type according to the inter-sequence prediction coding mode, the reference frames used comprise the following pictures of the first sequence of pictures:

36. The method according to any of claims 29-35, wherein said obtaining a stereoscopic video bitstream to be decoded comprises:

reading a stereoscopic video bitstream from a storage medium; or,

a stereoscopic video bitstream transmitted by a transmitting side is received.

37. An apparatus, comprising:

a stereoscopic video bit stream acquisition unit for acquiring a stereoscopic video bit stream to be decoded;

a stereoscopic video bit stream decoding unit configured to acquire a first image sequence and N-1 second image sequences from the stereoscopic video bit stream by decoding, where N is an integer greater than or equal to 2; wherein each image in the second sequence of images is obtained by the following decoding process: reconstructing each pixel block belonging to the image according to the corresponding coding mode and residual data carried by a second image sequence bit stream in the stereoscopic video bit stream, and synthesizing the image by using the reconstructed pixel blocks; the coding modes carried by the second image sequence bitstream comprise: inter-sequence prediction coding mode;

and the video image sequence generating unit is used for obtaining N video image sequences bearing the stereoscopic video according to the acquired first image sequence and the N-1 second image sequences.

38. The apparatus of claim 37, wherein N is 2; the stereoscopic video bitstream decoding unit is specifically configured to obtain a first image sequence and a second image sequence from the stereoscopic video bitstream by decoding; the video image sequence generating unit is specifically configured to obtain two video image sequences bearing a stereoscopic video according to the obtained first image sequence and second image sequence.

39. The apparatus according to claim 37, wherein the video image sequence generating unit is specifically configured to use the first image sequence and the N-1 second image sequences as respective ones of the N video image sequences.

40. The apparatus of claim 37, wherein the video image sequence generating unit comprises:

41. The apparatus of claim 37, wherein the video image sequence generating unit comprises:

42. The apparatus of claim 37, wherein the reference frame used by the stereoscopic video bitstream decoding unit to reconstruct the pixel block belonging to the intra coding type according to the inter-sequence prediction coding mode comprises: and the images in the first image sequence have the same time information as the images to which the pixel blocks belong.

43. The apparatus of claim 37, wherein the stereoscopic video bitstream decoding unit reconstructs the pixel block belonging to the inter-coding type according to the inter-sequence prediction coding mode, and wherein the reference frames used comprise the following pictures in the first picture sequence:

44. The apparatus according to any of the claims 37-43, wherein the stereoscopic video bitstream retrieving unit is specifically adapted to read a stereoscopic video bitstream from a storage medium or to receive a stereoscopic video bitstream transmitted by a transmitting party.

45. A machine-readable medium storing instructions which, when read and executed by a processor, perform the method of any one of claims 1-15.

46. A machine-readable medium storing instructions which, when read and executed by a processor, perform the method of any one of claims 29-36.

47. A system, comprising:

a processor;

a memory for storing instructions that, when read and executed by the processor, perform the method of any of claims 1-15.

48. A system, comprising:

a processor;

a memory for storing instructions that, when read and executed by the processor, perform the method of any of claims 29-36.