CN104168482A

CN104168482A - Method and device for video coding and decoding

Info

Publication number: CN104168482A
Application number: CN201410302970.7A
Authority: CN
Inventors: 田永鸿; 陈伟; 王耀威; 黄铁军
Original assignee: Peking University; China Security and Fire Technology Co Ltd
Current assignee: Peking University; China Security and Fire Technology Co Ltd
Priority date: 2014-06-27
Filing date: 2014-06-27
Publication date: 2014-11-26
Anticipated expiration: 2034-06-27
Also published as: CN104168482B

Abstract

The invention discloses a method and a device for video coding and decoding. Because in the process of decoding the, layer division can be performed on all original video images to be coded from the pixel level, the technical scheme the invention discloses can more flexibly support the layer level expression during the video coding process and can improve the accuracy of couture description of the video image during the video coding process. Furthermore, the invention can incorporate the pixel attribute identification value for reflecting video content hierarchy corresponding to all pixels in the original video images in an output video bit stream in order to decode the original video image while decoding the pixel attribute identification information of all pixels and to extract the objects of different hierarchies according to the pixel attribute identification information of all pixels, and can more flexibly support the expression of the hierarchy of the object in the process of video decoding and can improve the accuracy of expressing the couture of the video image in the process of video decoding.

Description

Video coding and decoding method and device

Technical Field

The present invention relates to the field of video encoding and decoding technologies, and in particular, to a video encoding and decoding method and apparatus.

Background

In recent years, with the wide application of video systems in various fields, various video coding optimization techniques have been rapidly developed. Among them, the technology of performing video coding optimization specifically for the subjective meaning of video is more and more concerned by the academic and industrial circles. The technology generally divides a picture into a background Region and a plurality Of foreground regions (foreground regions, also called Regions Of Interest (ROI)), namely regions which are more interested in the video picture subjectively by audiences, according to the subjective meaning Of the video, and achieves the purpose Of optimizing the video quality Of the corresponding regions under the condition Of a certain code rate by dividing the foreground regions and the background regions and combining with an encoding technology.

That is, in the related art, when video coding is performed, video content to be coded is generally layered according to the ROI. Because the block degree of the ROI is usually large, the layered block degree when layering the video content to be coded based on the ROI is large, so that the layered representation of the object in the video coding and decoding process cannot be flexibly supported; in addition, since the block size of the ROI is usually large, the outline description of the video image is not accurate enough to some extent. Therefore, it is desirable to provide a new video codec to solve the above problems.

Disclosure of Invention

The embodiment of the invention provides a video coding and decoding method and device, which are used for solving the problems that the hierarchical representation of an object in the video coding and decoding process cannot be flexibly supported, the outline description of a video image is not accurate enough and the like due to the large hierarchical block degree of video content at present.

The embodiment of the invention provides a video coding method, which comprises the following steps:

determining the pixel attribute of each pixel in an original video image according to any original video image in an original video image sequence to be coded, and generating a pixel attribute image corresponding to the original video image according to the determined pixel attribute of each pixel; wherein, for any pixel, the pixel attribute of the pixel is represented by a pixel attribute identification value corresponding to the pixel attribute of the pixel and used for reflecting the video content level corresponding to the pixel;

and respectively encoding the original video image and the pixel attribute image to obtain a corresponding original video image encoded bit stream and a corresponding pixel attribute image encoded bit stream, and writing the obtained original video image encoded bit stream and the obtained pixel attribute image encoded bit stream into an output video bit stream.

Correspondingly, an embodiment of the present invention further provides a video decoding method, including:

receiving an input original video image coding bit stream and a pixel attribute image coding bit stream; the original video image coding bit stream and the pixel attribute image coding bit stream are respectively obtained after coding an original video image to be coded and a pixel attribute image corresponding to the original video image; the pixel attribute image is generated according to the determined pixel attribute of each pixel in the original video image; wherein, for any pixel, the pixel attribute of the pixel is represented by a pixel attribute identification value corresponding to the pixel attribute of the pixel and used for reflecting the video content level corresponding to the pixel;

decoding the original video image coded bit stream and/or the pixel attribute image coded bit stream to obtain a corresponding original video image and/or a pixel attribute image;

and generating a corresponding output video image according to the original video image and/or the pixel attribute image obtained by decoding.

Further, an embodiment of the present invention further provides a video encoding apparatus, including:

the attribute determining module is used for determining the pixel attribute of each pixel in an original video image aiming at any original video image in an original video image sequence to be coded; wherein, for any pixel, the pixel attribute of the pixel is represented by a pixel attribute identification value corresponding to the pixel attribute of the pixel and used for reflecting the video content level corresponding to the pixel;

the image generation module is used for generating a pixel attribute image corresponding to the original video image according to the pixel attribute of each pixel in the original video image determined by the attribute determination module;

the image coding module is used for coding the original video image and the pixel attribute image respectively to obtain a corresponding original video image coding bit stream and a corresponding pixel attribute image coding bit stream;

and the encoding output module is used for writing the original video image encoding bit stream and the pixel attribute image encoding bit stream which are obtained by encoding of the image encoding module into an output video bit stream.

Further, an embodiment of the present invention further provides a video decoding apparatus, including:

the code stream receiving module is used for receiving an input original video image coding bit stream and a pixel attribute image coding bit stream; the original video image coding bit stream and the pixel attribute image coding bit stream are respectively obtained after coding an original video image to be coded and a pixel attribute image corresponding to the original video image; the pixel attribute image is generated according to the determined pixel attribute of each pixel in the original video image; wherein, for any pixel, the pixel attribute of the pixel is represented by a pixel attribute identification value corresponding to the pixel attribute of the pixel and used for reflecting the video content level corresponding to the pixel;

a code stream decoding module, configured to decode the original video image coded bit stream and/or the pixel attribute image coded bit stream received by the code stream receiving module to obtain a corresponding original video image and/or a corresponding pixel attribute image;

and the image generation module is used for generating a corresponding output video image according to the original video image and/or the pixel attribute image obtained by decoding of the code stream decoding module.

The invention has the following beneficial effects:

in the technical scheme of the embodiment of the invention, because each original video image to be coded can be layered from a pixel level in the coding process, the hierarchical representation of an object in the video coding process can be more flexibly supported, and the effect of improving the accuracy of the outline description of the video image in the video coding process can be achieved to a certain extent. In addition, in the encoding process, the pixel attribute identification value for reflecting the video content level corresponding to each pixel in the original video image can be also coded into the output video bit stream, so that the pixel attribute identification information of each pixel in the original video image can be decoded while the original video image is decoded, and then the objects of different layers can be extracted according to the pixel attribute identification information of each pixel, thereby achieving the purposes of more flexibly supporting the layer representation of the objects in the video decoding process and improving the accuracy of the outline description of the video image in the video decoding process.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart illustrating a video encoding method according to a first embodiment of the invention;

fig. 2 is a flowchart illustrating a video decoding method according to a second embodiment of the present invention;

FIG. 3(a) is a block diagram illustrating a first exemplary structure of a video encoding apparatus according to a third embodiment of the present invention;

FIG. 3(b) is a schematic structural diagram of a video encoding apparatus according to a third embodiment of the present invention;

FIG. 4(a) is a first block diagram illustrating a video decoding apparatus according to a fourth embodiment of the present invention;

fig. 4(b) is a schematic structural diagram of the video decoding apparatus according to the fourth embodiment of the present invention.

Detailed Description

In the technical scheme of the embodiment of the invention, because each original video image to be coded can be layered from a pixel level in the coding process, the hierarchical representation of an object in the video coding process can be more flexibly supported, and the effect of improving the accuracy of the outline description of the video image in the video coding process can be achieved. In addition, in the encoding process, the pixel attribute identification value for reflecting the video content level corresponding to each pixel in the original video image can be also coded into the output video bit stream, so that the pixel attribute identification information of each pixel in the original video image can be decoded while the original video image is decoded, and then the objects of different layers can be extracted according to the pixel attribute identification information of each pixel, thereby achieving the purposes of more flexibly supporting the layer representation of the objects in the video decoding process and improving the accuracy of the outline description of the video image in the video decoding process.

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment is as follows:

as shown in fig. 1, which is a flowchart illustrating a video encoding method according to a first embodiment of the present invention, the video encoding method may include the following steps:

step 101: determining the pixel attribute of each pixel in an original video image according to any original video image in an original video image sequence to be coded, and generating a pixel attribute image corresponding to the original video image according to the determined pixel attribute of each pixel; wherein, for any pixel, the pixel attribute of the pixel is represented by a pixel attribute identification value corresponding to the pixel attribute of the pixel and used for reflecting the video content level corresponding to the pixel.

Specifically, in the technical solution of the embodiment of the present invention, the pixel attribute may at least include any one or more of the following attribute information: the method and the device for determining the pixel attributes are used for representing scene attribute information that the pixels are foreground pixels or background pixels, scene depth, dynamic and static attribute information that the pixels are static pixels or dynamic pixels, shielding relations, or texture information and the like, so as to achieve the purposes of improving richness and accuracy of pixel attribute determination and further improving selectivity and flexibility when the pixels are layered with corresponding pixel attributes in the follow-up process, and the embodiment of the invention is not limited in any way.

Further, for any pixel, the pixel attribute identification value for reflecting the video content level corresponding to the pixel may be a set luminance value or a set chrominance value. That is to say, in the technical solution according to the embodiment of the present invention, a specific luminance value or a specific chrominance value may be used to represent the video content level corresponding to each pixel. It should be noted that, other set identification values may also be used to represent the video content hierarchy corresponding to each pixel, which is not limited in this embodiment of the present invention.

Further, it should be noted that, in the technical solution described in the embodiment of the present invention, because the determined pixel attribute of the pixel may be multiple for any pixel, correspondingly, a pixel attribute identification value corresponding to the pixel attribute of the pixel may also be multiple (may be represented in a manner of a pixel attribute identification set including at least one pixel attribute identification value), and each pixel attribute identification value may respectively uniquely correspond to one pixel attribute of multiple pixel attributes of the pixel, and may be used for reflecting a video content hierarchy corresponding to the pixel when video content is layered according to the attribute information corresponding to the pixel attribute. For example, assuming that the determined pixel attributes of the pixel include scene attribute information, scene depth, and texture information, a pixel attribute identification value corresponding to the pixel attribute of the pixel may be expressed in a form of { a, B, C }, where the identification value a may correspond to the scene attribute information, the identification value B may correspond to the scene depth, and the identification value C may correspond to the texture information, which is not described in detail in this embodiment of the present invention.

Further, it should be noted that, in the technical solution of the embodiment of the present invention, layering video content refers to dividing video content into different levels, such as background or foreground, according to needs. The number of layers into which the video content is divided and the object indicated by each layer can be set according to the actual situation, which is not limited in the embodiment of the present invention.

Further, in the technical solution of the embodiment of the present invention, because, for any pixel, the pixel attribute of the pixel may be represented by a pixel attribute identification value corresponding to the pixel attribute of the pixel and used for reflecting the video content level corresponding to the pixel, after determining the pixel attribute of each pixel in the original video image, and before generating a corresponding pixel attribute image according to the determined pixel attribute of each pixel, the pixel attribute identification value of each pixel in the original video image may also be determined.

Specifically, for any original video image, the pixel attribute identification value of each pixel in the original video image may be determined by:

the first method is as follows: dividing the original video image into a plurality of image areas according to the pixel attributes of all pixels in the original video image; for example, taking the determined pixel attribute of each pixel as the scene depth as an example, each pixel with the scene depth not greater than the set first depth threshold value may be divided into the first image area; dividing each pixel of which the scene depth is greater than a set first depth threshold and not greater than a set second depth threshold into a second image area; dividing each pixel with the scene depth larger than a set second depth threshold value into a third image area; wherein the first depth threshold is less than the second depth threshold;

determining a video content level corresponding to any image area; for example, according to a preset rule, determining the first image area as a first video content level; determining the second image area as a second video content level; and determining the third image area as a third video content hierarchy;

taking the pixel attribute identification value corresponding to the determined video content level as the pixel attribute identification value of each pixel in the image area; for example, a pixel attribute identification value "0" corresponding to the first video content level may be taken as a pixel attribute identification value of each pixel within the first image region; taking a pixel attribute identification value "1" corresponding to the second video content level as a pixel attribute identification value of each pixel within the second image region; and, the pixel attribute identification value "2" corresponding to the third video content level may be used as the pixel attribute identification value of each pixel in the third image region, which is not described in detail in the embodiment of the present invention.

The second method comprises the following steps: generating a scene image by taking a plurality of original video images in the original video image sequence to be coded as a training set; and determining the pixel attribute identification value of each pixel in the original video image by taking the scene image as a background image.

Specifically, generating a scene image by using a plurality of original video images in the original video image sequence to be encoded as a training set may include:

and generating a scene image by using a plurality of original video images in the original video image sequence to be coded as a training set and using a sliding average method or a segmented weighted sliding average method. When the scene image is generated by using the segment weighted sliding mean method, the historical pixel value of each input pixel point can be divided into a plurality of data segments in real time in the process of calculating the sliding mean, and then the model value of the current pixel position can be calculated according to the mean and the length of each data segment, so that the scene image is trained. It should be noted that the scene image generated by using this method is cleaner than the scene image generated by using the normal sliding average method.

In addition, it should be noted that, in the technical solution of the embodiment of the present invention, the scene image may be generated once every several frames (i.e., several original video images), and the newly generated scene image is updated with the old scene image, so as to realize the periodic or real-time update of the scene image and ensure the accuracy of determining the pixel attribute identification value.

Further, in the technical solution of the embodiment of the present invention, determining the pixel attribute identification value of each pixel in the original video image by using the scene image as a background image may include:

subtracting the scene image from the original video image (for example, subtracting the brightness value of each pixel in the scene image from the brightness value of each pixel in the original video image) to obtain an image with a pixel attribute identification value (specifically, when the image is subtracted based on the brightness value, the obtained pixel attribute identification value is also generally the corresponding brightness value); then, using a set filling algorithm (such as a four-way filling algorithm or an eight-way filling algorithm), removing noise pixels in the image with the pixel attribute identification value, and determining the pixel attribute identification value of each pixel in the original video image according to the image obtained after denoising.

Further, after the pixel attribute identification value of each pixel in the original video image is obtained, a corresponding pixel attribute image can be generated according to the pixel attribute identification value of each pixel in the original video image.

Furthermore, it should be noted that, if the range of the interval corresponding to the pixel attribute identification value of each pixel in the original video image is too small, there is much distortion during coding quantization; and if the interval range corresponding to the pixel attribute identification value of each pixel in the original video image is too large, the code rate of the coded bit stream is greatly increased. Therefore, in the technical solution of the embodiment of the present invention, after the pixel attribute identification value of each pixel in the original video image is obtained, quantization processing may be performed on the pixel attribute identification value corresponding to the pixel attribute of each pixel in the original video image to limit the pixel attribute identification value of each pixel in the original video image within a reasonably set interval range, and then a pixel attribute image corresponding to the original video image is generated according to the quantized pixel attribute identification value of each pixel in the original video image.

The set interval range may be determined according to the number of layering layers when layering video content, and the larger the number of layering layers is, the larger the set interval range may be, which is not limited in this embodiment of the present invention.

Step 102: and respectively encoding the original video image and the pixel attribute image to obtain a corresponding original video image encoded bit stream and a corresponding pixel attribute image encoded bit stream, and writing the obtained original video image encoded bit stream and the obtained pixel attribute image encoded bit stream into an output video bit stream.

Specifically, in the technical solution of the embodiment of the present invention, the original video image and the pixel attribute image may be encoded by using applicable encoding methods, respectively. The respective encoding has the advantage that the corresponding encoded bit streams can be selected for decoding according to the corresponding decoding requirements during decoding, so that the flexibility of decoding is improved. For example, if only the corresponding original video image needs to be obtained, only the encoded bit stream of the original video image may be decoded, and if more detailed information of the original video image needs to be obtained, such as pixel attribute information or pixel attribute identification information of each pixel in the original video image, the encoded bit stream of the corresponding pixel attribute image may be further decoded, which is not described in detail in the embodiments of the present invention.

Further, taking encoding the original Video image as an example, in the technical solution of the embodiment of the present invention, the original Video image may be encoded by using an existing Video encoding method (which may be referred to as a common Video encoding method), such as a High-Efficiency Video Coding (HEVC \ h.265) method, to obtain a corresponding original Video image encoded bit stream; or, the original video image can be encoded by using the optimized video encoding method based on the pixel attribute, so as to obtain a corresponding original video image encoding bit stream.

The optimized video coding method based on the pixel attributes is to classify the macro blocks of the video image according to the pixel attribute identification information of each pixel in the video image when the video image is coded, and to code the macro blocks of different classes by using different coding modes (such as using different motion estimation, reference frame management, mode selection, transformation, quantization, entropy coding strategies, and the like). For example, taking the original video image as an example for encoding, the original video image (or the scene image) can be divided into two kinds of invariant macro blocks and variant macro blocks; moreover, for the invariant macro block, the original video image (or scene image) of the previous frame can be referred to, and the coding is carried out by adopting an inter-frame prediction mode; and, for the changed macroblock, the coding may be performed by using an intra prediction mode, which is not described in detail in the embodiment of the present invention.

Similarly, taking the pixel attribute image as an example for encoding, the pixel attribute image may be encoded by using an existing video encoding method, such as HEVC \ h.26 method, to obtain a corresponding pixel attribute image encoding bit stream; or, the pixel attribute image may be encoded by using an optimized video encoding method based on pixel attributes to obtain a corresponding pixel attribute image encoded bit stream, which is not described in detail in the embodiments of the present invention. It should be noted that, when the optimized video coding method based on the pixel attribute is used to code the image with the pixel attribute, the use of the intra-frame prediction mode can be limited, so as to improve the use ratio of the SKIP mode; moreover, in order to reduce coding distortion, loop filtering for removing blocking artifacts may also be disabled, which is not described in detail in this embodiment of the present invention.

Further, in order to distinguish the coded bit streams, in the technical solution of the embodiment of the present invention, the video coding methods adopted when the original video image and the pixel attribute image are coded may be different from each other; it should be noted that, of course, video encoding methods used for encoding the original video image and the pixel attribute image may also be the same, and this is not limited in this embodiment of the present invention.

Further, in the technical solution of the embodiment of the present invention, when the original video image and the pixel attribute image are encoded respectively to obtain a corresponding original video image encoded bit stream and a corresponding pixel attribute image encoded bit stream, identification information of a video encoding method used when the original video image and the pixel attribute image are encoded may be further encoded into the corresponding original video image encoded bit stream and the corresponding pixel attribute image encoded bit stream respectively; alternatively, the identification information for indicating that the coded bit stream is the original video image coded bit stream or the pixel attribute image coded bit stream may be respectively coded into the corresponding original video image coded bit stream and the pixel attribute image coded bit stream, which is not described in detail in this embodiment of the present invention.

Further, in the technical solution of the embodiment of the present invention, after the original video image and the pixel attribute image are respectively encoded to obtain the corresponding original video image encoded bit stream and the pixel attribute image encoded bit stream, the obtained original video image encoded bit stream and the pixel attribute image encoded bit stream may be synchronously written into the output video bit stream to complete the video encoding process for the original video image.

Specifically, when the obtained original video image coded bit stream and the obtained pixel attribute image coded bit stream are synchronously written into the output video bit stream, a corresponding timestamp may be inserted into the coded bit stream to synchronize the original video image coded bit stream and the pixel attribute image coded bit stream, which is not described in detail in the embodiment of the present invention.

In the first technical solution of the present invention, because each original video image to be coded can be layered from a pixel level in a coding process, compared with layering a video image based on ROI block size in the prior art, a hierarchical representation of an object in a video coding process can be more flexibly supported, and an effect of improving accuracy of a contour description of the video image in the video coding process can be achieved. In addition, in the encoding process, the pixel attribute identification value for reflecting the video content level corresponding to each pixel in the original video image can be also coded into the output video bit stream, so that the pixel attribute identification information of each pixel in the original video image can be decoded while the original video image is decoded, and then the objects of different layers can be extracted according to the pixel attribute identification information of each pixel, therefore, the aims of more flexibly supporting the layer representation of the objects in the video decoding process and improving the accuracy of the outline description of the video image in the video decoding process can be achieved.

Example two:

as shown in fig. 2, which is a flowchart illustrating a video decoding method according to a second embodiment of the present invention, the video decoding method may include the following steps:

step 201: an input original video image encoded bit stream and a pixel attribute image encoded bit stream are received.

The original video image coding bit stream and the pixel attribute image coding bit stream are respectively obtained after coding an original video image to be coded and a pixel attribute image corresponding to the original video image; the pixel attribute image is generated according to the determined pixel attribute of each pixel in the original video image; wherein, for any pixel, the pixel attribute of the pixel is represented by a pixel attribute identification value corresponding to the pixel attribute of the pixel and used for reflecting the video content level corresponding to the pixel.

Specifically, in the technical solution of the embodiment of the present invention, the pixel attribute may at least include any one or more of the following attribute information: the attribute information of the scene is used to indicate that the pixel is a foreground pixel or a background pixel, the depth of the scene is used to indicate that the pixel is a static pixel or a dynamic pixel, and the occlusion relationship or texture information, and the like.

Step 202: decoding the original video image coded bit stream and/or the pixel attribute image coded bit stream to obtain corresponding original video images and/or pixel attribute images.

Specifically, when the original video image coded bit stream and/or the pixel attribute image coded bit stream are decoded to obtain a corresponding original video image and/or a pixel attribute image, the corresponding coded bit stream may be selected from the original video image coded bit stream and the pixel attribute image coded bit stream according to a set decoding requirement, and the selected coded bit stream is decoded to obtain a corresponding original video image and/or a pixel attribute image. For example, if only the corresponding original video image needs to be obtained, only the encoded bit stream of the original video image may be decoded, and if more detailed information of the original video image needs to be obtained, such as pixel attribute information or pixel attribute identification information of each pixel in the original video image, the encoded bit stream of the corresponding pixel attribute image may be further decoded, which is not described in detail in the embodiments of the present invention.

Furthermore, because the original video image coded bit stream or the pixel attribute coded bit stream may generally carry identification information of a corresponding video coding method, when decoding the selected original video image coded bit stream, the video coding method adopted by the original video image coded bit stream may be determined according to the identification information of the video coding method carried in the original video image coded bit stream, and the original video image coded bit stream may be decoded according to a video decoding method matched with the determined video coding method adopted by the original video image coded bit stream; or, when the selected pixel attribute image coded bit stream is decoded, the video coding method adopted by the pixel attribute image coded bit stream can be determined according to the identification information of the video coding method carried in the pixel attribute image coded bit stream, and the pixel attribute image coded bit stream is decoded according to the video decoding method matched with the video coding method adopted by the determined pixel attribute image coded bit stream.

For example, taking decoding the original video image coded bit stream as an example, if it is determined that the original video image is directly encoded by using the existing video encoding method according to the identification information of the video encoding method carried in the original video image coded bit stream, the original video image coded bit stream is decoded by using the corresponding existing video decoding method (such as an intra-frame prediction decoding method) to obtain a corresponding original video image; if it is determined that the original video image is encoded by using the optimized pixel attribute-based video encoding method according to the identification information of the video encoding method carried in the original video image encoding bit stream, corresponding decoding methods can be adopted for decoding different types of macro blocks, which is not described in detail in the embodiment of the present invention.

Step 203: and generating a corresponding output video image according to the original video image and/or the pixel attribute image obtained by decoding.

Specifically, generating a corresponding output video image according to the original video image and/or the pixel attribute image obtained by decoding may include:

if the pixel attribute image coding bit stream is determined not to need to be decoded according to the set decoding requirement, generating a corresponding output video image according to the original video image obtained by decoding; or,

and if the pixel attribute image coding bit stream to be decoded is determined according to the set decoding requirement, generating a corresponding output video image according to the original video image and the pixel attribute image obtained by decoding. Specifically, at this time, the original video image obtained by decoding and the pixel attribute image may be fused to obtain a video image corresponding to a specific application scene and including pixels belonging to a specific video content level, which is not described in detail in the embodiments of the present invention.

It should be noted that, in the technical solution of the embodiment of the present invention, after decoding, which video content hierarchy each pixel in a video image belongs to, such as a foreground pixel or a background pixel, whether it is a foreground interested region, and the like, can be directly obtained according to the pixel attribute identification information of each pixel in the video image, so that, during decoding, objects of different hierarchies can be directly extracted according to different pixel attribute identification values, thereby reducing some preprocessing processes in video analysis and improving the efficiency of video decoding.

The second embodiment of the present invention provides a video decoding method, which can decode a received original video image coded bit stream and/or a received pixel attribute image coded bit stream to obtain a corresponding original video image and/or a corresponding pixel attribute image, and generate a corresponding output video image according to the original video image and/or the pixel attribute image obtained by decoding, that is, the original video image can be decoded while pixel attribute identification information of each pixel in the original video image is decoded, and then objects of different layers are extracted according to the pixel attribute identification information of each pixel, so that the purposes of more flexibly supporting layer representation of the objects in the video decoding process and improving accuracy of outline description of the video image in the video decoding process can be achieved.

Example three:

a third embodiment of the present invention provides a video encoding apparatus for implementing the video encoding method according to the first embodiment of the present invention, as shown in fig. 3(a), which is a schematic structural diagram of the video encoding apparatus according to the third embodiment of the present invention, and the video encoding apparatus may include an attribute determining module 31, an image generating module 32, an image encoding module 33, and an encoding output module 34, where:

the attribute determining module 31 may be configured to determine, for any original video image in an original video image sequence to be encoded, a pixel attribute of each pixel in the original video image; wherein, for any pixel, the pixel attribute of the pixel is represented by a pixel attribute identification value corresponding to the pixel attribute of the pixel and used for reflecting the video content level corresponding to the pixel.

Specifically, in the technical solution of the embodiment of the present invention, the pixel attribute may at least include any one or more of the following attribute information: the attribute information of the scene is used to indicate that the pixel is a foreground pixel or a background pixel, the depth of the scene is used to indicate that the pixel is a static pixel or a dynamic pixel, and the occlusion relationship or texture information, and the like. Moreover, for any pixel, the pixel attribute identification value for reflecting the video content level corresponding to the pixel may be a set luminance value or a set chrominance value. That is to say, in the technical solution of the embodiment of the present invention, a specific luminance value or a specific chrominance value may be adopted to represent the video content level corresponding to each pixel, which is not described in detail in the embodiment of the present invention.

The image generating module 32 is configured to generate a pixel attribute image corresponding to the original video image according to the pixel attribute of each pixel in the original video image determined by the attribute determining module 31;

the image encoding module 33 may be configured to encode the original video image and the pixel attribute image respectively to obtain a corresponding original video image encoded bit stream and a corresponding pixel attribute image encoded bit stream;

the encoding output module 34 may be configured to write the original video image encoded bitstream and the pixel attribute image encoded bitstream encoded by the image encoding module 33 into an output video bitstream.

Specifically, as shown in fig. 3(b) (fig. 3(b) is a schematic structural diagram of the video encoding apparatus according to the third embodiment of the present invention), in the technical solution according to the third embodiment of the present invention, the video encoding apparatus may further include a first attribute identifier determining module 35 or a second attribute identifier determining module 36;

the first attribute identifier determining module 35 may include a region dividing sub-module 351 and a first identifier determining sub-module 352: the region dividing sub-module 351 is configured to divide the original video image into a plurality of image regions according to pixel attributes of pixels in the original video image; the first identifier determining sub-module 352 may be configured to determine, for any image region divided by the region dividing sub-module 351, a video content hierarchy corresponding to the image region, and use a pixel attribute identification value corresponding to the determined video content hierarchy as a pixel attribute identification value of each pixel in the image region;

the second attribute identification determination module 36 may include a scene image training sub-module 361 and a second identification determination sub-module 362, wherein: the scene image training submodule 361 may be configured to generate a scene image by using a plurality of original video images in the original video image sequence to be encoded as a training set; the second identifier determining sub-module 362 may be configured to determine a pixel attribute identifier value of each pixel in the original video image, using the scene image generated by the scene image training sub-module 361 as a background image.

Specifically, the second identifier determining sub-module 362 is specifically configured to use the scene image as a background image to determine the pixel attribute identifier value of each pixel in the original video image by:

subtracting the scene image and the original video image to obtain an image with a pixel attribute identification value; then, using a set filling algorithm (such as a four-way connection algorithm or an eight-way connection algorithm) to remove noise pixels in the image with the pixel attribute identification value, and determining the pixel attribute identification value of each pixel in the original video image according to the image obtained after denoising.

Further, as shown in fig. 3(b), the image generation module 32 may include a quantization processing sub-module 321 and an image generation sub-module 322, wherein:

the quantization processing sub-module 321 may be configured to perform quantization processing on a pixel attribute identifier value corresponding to a pixel attribute of each pixel in the original video image, so as to limit the pixel attribute identifier value of each pixel in the original video image within a set interval range, where the set interval range may be determined according to a layering number when layering video content, and the greater the layering number, the greater the set interval range may be, which is not limited in this embodiment of the present invention;

accordingly, the image generating sub-module 322 is configured to generate a pixel attribute image corresponding to the original video image according to the pixel attribute identification value of each pixel in the original video image after quantization processing by the quantization processing sub-module 321.

It should be noted that, in the technical solution of the embodiment of the present invention, the pixel attribute identifier value of each pixel in the original video image can be limited within the set interval range through a quantization processing manner, so that a problem of coding quantization distortion caused when the interval range corresponding to the pixel attribute identifier value of each pixel in the original video image is too small can be avoided, and a problem that a code rate of a coded bit stream is greatly increased when the interval range corresponding to the pixel attribute identifier value of each pixel in the original video image is too large can also be avoided.

Further, as shown in fig. 3(b), in the technical solution of the embodiment of the present invention, the image encoding module 33 may include an original video image encoding sub-module 331 and a pixel attribute image encoding sub-module 332, where:

the original video image encoding sub-module 331 is configured to encode the original video image by using an existing video encoding method or an optimized pixel attribute-based video encoding method, so as to obtain a corresponding original video image encoded bit stream; the optimized video coding method based on the pixel attributes is to classify the macro blocks of the video image when the video image is coded, and to code the macro blocks of different classes by using different coding modes;

the pixel attribute image encoding sub-module 332 may be configured to encode the pixel attribute image using an existing video encoding method or an optimized video encoding method based on pixel attributes, so as to obtain a corresponding pixel attribute image encoded bit stream.

It should be noted that, in the technical solution of the embodiment of the present invention, since the original video image and the pixel attribute image can be encoded by using applicable encoding methods, respectively, a corresponding encoded bit stream can be selected for decoding according to corresponding decoding requirements during decoding, so as to improve the flexibility of decoding. For example, if only the corresponding original video image needs to be obtained, only the encoded bit stream of the original video image may be decoded, and if more detailed information of the original video image needs to be obtained, such as pixel attribute information or pixel attribute identification information of each pixel in the original video image, the encoded bit stream of the corresponding pixel attribute image may be further decoded, which is not described in detail in the embodiments of the present invention.

It should be noted that, in order to distinguish the coded bit streams, in the technical solution of the embodiment of the present invention, a video coding method adopted by the original video image coding submodule 331 when coding the original video image is different from a video coding method adopted by the pixel attribute image coding submodule 332 when coding the pixel attribute image; it should be noted that, of course, the video encoding method adopted by the original video image encoding sub-module 331 when encoding the original video image may be the same as the video encoding method adopted by the pixel attribute image encoding sub-module 332 when encoding the pixel attribute image, and this is not limited in this embodiment of the present invention.

Further, as shown in fig. 3(b), in the technical solution of the embodiment of the present invention, the image encoding module 33 may further include a first identification information adding sub-module 333 or a second identification information adding sub-module 334, where:

the first identification information adding sub-module 333 may be configured to respectively encode identification information of a video encoding method used when encoding the original video image and the pixel attribute image into a corresponding original video image encoded bit stream and a corresponding pixel attribute image encoded bit stream;

the second identification information adding sub-module 334 is configured to encode identification information indicating whether the encoded bit stream is the original video image encoded bit stream or the pixel attribute image encoded bit stream into the corresponding original video image encoded bit stream and pixel attribute image encoded bit stream, respectively.

Further, in the technical solution of the embodiment of the present invention, after the original video image and the pixel attribute image are respectively encoded to obtain the corresponding original video image encoded bit stream and the pixel attribute image encoded bit stream, the encoding output module 34 may be configured to synchronously write the obtained original video image encoded bit stream and the pixel attribute image encoded bit stream into the output video bit stream.

Specifically, when the obtained original video image coded bit stream and the obtained pixel attribute image coded bit stream are synchronously written into the output video bit stream, the coding output module 34 may insert a corresponding timestamp into the coded bit stream to synchronize the original video image coded bit stream and the pixel attribute image coded bit stream, which is not described in detail in this embodiment of the present invention.

In the third technical solution of the third embodiment of the present invention, because the video coding apparatus can layer each original video image to be coded from a pixel level in a coding process, compared with layering a video image based on ROI block size in the prior art, the video coding apparatus can more flexibly support hierarchical representation of an object in a video coding process, and can achieve an effect of improving accuracy of contour description of the video image in the video coding process. Furthermore, in the encoding process of the video encoding device, the pixel attribute identification value for reflecting the video content hierarchy corresponding to each pixel in the original video image can be also encoded into the output video bit stream, so that the pixel attribute identification information of each pixel in the original video image can be decoded while the original video image is decoded, and then the objects of different hierarchies can be extracted according to the pixel attribute identification information of each pixel, thereby achieving the purposes of more flexibly supporting the hierarchy representation of the objects in the video decoding process and improving the accuracy of the outline description of the video image in the video decoding process.

Example four:

a fourth embodiment of the present invention provides a video decoding apparatus for implementing the video decoding method according to the second embodiment of the present invention, as shown in fig. 4(a), which is a schematic structural diagram of the video decoding apparatus according to the fourth embodiment of the present invention, and the video decoding apparatus may include a code stream receiving module 41, a code stream decoding module 42, and an image generating module 43, where:

the code stream receiving module 41 is configured to receive an input original video image coded bit stream and a pixel attribute image coded bit stream; the original video image coding bit stream and the pixel attribute image coding bit stream are respectively obtained after coding an original video image to be coded and a pixel attribute image corresponding to the original video image; the pixel attribute image is generated according to the determined pixel attribute of each pixel in the original video image; wherein, for any pixel, the pixel attribute of the pixel is represented by a pixel attribute identification value corresponding to the pixel attribute of the pixel and used for reflecting the video content level corresponding to the pixel;

the code stream decoding module 42 may be configured to decode the original video image coded bit stream and/or the pixel attribute image coded bit stream received by the code stream receiving module 41 to obtain a corresponding original video image and/or a corresponding pixel attribute image;

the image generating module 43 may be configured to generate a corresponding output video image according to the original video image and/or the pixel attribute image obtained by decoding by the code stream decoding module 42.

Specifically, as shown in fig. 4(b) (fig. 4(b) is a schematic structural diagram of the video decoding apparatus in the fourth embodiment of the present invention), the code stream decoding module 42 may include a code stream selecting sub-module 421 and a code stream decoding sub-module 422, where:

the code stream selecting submodule 421 is configured to select a corresponding encoded bit stream from the original video image encoded bit stream and the pixel attribute image encoded bit stream according to a set decoding requirement;

the code stream decoding sub-module 422 may be configured to decode the encoded bit stream selected by the code stream selecting sub-module 421 to obtain a corresponding original video image and/or a pixel attribute image.

For example, if only the corresponding original video image needs to be obtained, only the encoded bitstream of the original video image may be selected and decoded, and if more detailed information of the original video image needs to be obtained, such as pixel attribute information or pixel attribute identification information of each pixel in the original video image, the encoded bitstream of the corresponding pixel attribute image may be further selected and decoded to improve the flexibility of decoding, which is not described in detail in the embodiments of the present invention.

Further, as shown in fig. 4(b), the code stream decoding submodule 422 may include an original video image encoded bit stream decoding submodule 4221 and/or a pixel attribute image encoded bit stream decoding submodule 4222:

the original video image coded bit stream decoding submodule 4221 is configured to, when decoding a selected original video image coded bit stream, determine a video coding method used by the original video image coded bit stream according to identification information of a video coding method carried in the original video image coded bit stream, and decode the original video image coded bit stream according to a video decoding method matched with the determined video coding method used by the original video image coded bit stream;

the pixel attribute image coded bit stream decoding submodule 4222 may be configured to, when decoding a selected pixel attribute image coded bit stream, determine, according to identification information of a video coding method carried in the pixel attribute image coded bit stream, a video coding method adopted by the pixel attribute image coded bit stream, and decode the pixel attribute image coded bit stream according to a video decoding method matched with the video coding method adopted by the determined pixel attribute image coded bit stream.

For example, taking the original video image coded bit stream as an example, if it is determined that the original video image is directly coded by using the existing video coding method according to the identification information of the video coding method carried in the original video image coded bit stream, the original video image coded bit stream decoding submodule 4221 may use a corresponding existing video decoding method (such as an intra prediction decoding method) to decode the original video image coded bit stream to obtain a corresponding original video image; if it is determined that the original video image is encoded by using the optimized pixel attribute-based video encoding method according to the identification information of the video encoding method carried in the original video image encoding bit stream, the original video image encoding bit stream decoding submodule 4221 may adopt a corresponding decoding method for decoding the macroblocks of different categories, which is not described in detail in the embodiment of the present invention.

Further, as shown in fig. 4(b), the image generation module 43 may include a first image generation submodule 431 or a second image generation submodule 432, wherein:

the first image generation submodule 431 is configured to generate a corresponding output video image according to an original video image obtained by decoding if it is determined that the pixel attribute image coding bit stream does not need to be decoded according to a set decoding requirement;

the second image generation submodule 432 is configured to, if it is determined that the pixel attribute image coded bit stream needs to be decoded according to a set decoding requirement, generate a corresponding output video image according to an original video image and a pixel attribute image obtained by decoding; specifically, at this time, the second image generation sub-module 432 may fuse the original video image obtained by decoding and the pixel attribute image to obtain a video image corresponding to a specific application scene and including pixels belonging to a specific video content level, which is not described in detail in the embodiment of the present invention.

The fourth embodiment of the present invention provides a video decoding apparatus for implementing the video decoding method in the second embodiment of the present invention, wherein during the decoding process of the video decoding apparatus, the received original video image encoded bitstream and/or pixel property image encoded bitstream may be decoded, to obtain corresponding original video images and/or pixel attribute images, and generate corresponding output video images according to the original video images and/or pixel attribute images obtained by decoding, namely, the original video image can be decoded, the pixel attribute identification information of each pixel in the original video image can be decoded, and then the objects of different layers can be extracted according to the pixel attribute identification information of each pixel, therefore, the method can achieve the purposes of more flexibly supporting the hierarchical representation of the object in the video decoding process and improving the accuracy of the outline description of the video image in the video decoding process.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus (device), or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A video encoding method, comprising:

2. The video coding method of claim 1, wherein the pixel attributes include at least any one or more of the following attribute information:

the device comprises a scene attribute information used for representing that the pixel is a foreground pixel or a background pixel, a scene depth, a dynamic and static attribute information used for representing that the pixel is a static pixel or a dynamic pixel, a shielding relation or texture information.

3. The video coding method of claim 1 or 2, wherein the pixel attribute identification value is a set luminance value or a set chrominance value.

4. The video coding method of claim 3, wherein for any original video image, the pixel attribute identification value of each pixel in the original video image is determined by:

dividing the original video image into a plurality of image areas according to the pixel attributes of all pixels in the original video image; for any image area, determining a video content hierarchy corresponding to the image area, and taking a pixel attribute identification value corresponding to the determined video content hierarchy as a pixel attribute identification value of each pixel in the image area; or,

generating a scene image by taking a plurality of original video images in the original video image sequence to be coded as a training set; and determining the pixel attribute identification value of each pixel in the original video image by taking the scene image as a background image.

5. The video coding method of claim 4, wherein determining the pixel attribute identification value of each pixel in the original video image using the scene image as a background image comprises:

subtracting the scene image and the original video image to obtain an image with a pixel attribute identification value; and then, removing noise pixels in the image with the pixel attribute identification values by using a set filling algorithm, and determining the pixel attribute identification values of all pixels in the original video image according to the image obtained after denoising.

6. The video encoding method of claim 1 or 2, wherein generating a pixel attribute image corresponding to the original video image based on the determined pixel attribute of each pixel comprises:

and quantizing the pixel attribute identification values corresponding to the pixel attributes of the pixels in the original video image to limit the pixel attribute identification values of the pixels in the original video image within a set interval range, and generating a pixel attribute image corresponding to the original video image according to the quantized pixel attribute identification values of the pixels in the original video image.

7. The video encoding method of claim 1 or 2, wherein encoding the original video image and the pixel attribute image to obtain a corresponding original video image encoded bitstream and a pixel attribute image encoded bitstream, respectively, comprises:

encoding the original video image by using the existing video encoding method or the optimized video encoding method based on the pixel attribute to obtain a corresponding original video image encoding bit stream; encoding the pixel attribute image by using the existing video encoding method or the optimized video encoding method based on the pixel attribute to obtain a corresponding pixel attribute image encoding bit stream;

the optimized video coding method based on the pixel attributes is to classify the macro blocks of the video image when the video image is coded, and to code the macro blocks of different classes by using different coding modes.

8. The video encoding method of claim 7, wherein the original video image and the pixel attribute image are encoded by different video encoding methods.

9. The video encoding method of claim 7, wherein the original video image and the pixel attribute image are encoded separately to obtain a corresponding original video image encoded bit stream and a pixel attribute image encoded bit stream, further comprising:

respectively encoding identification information of video encoding methods adopted when the original video image and the pixel attribute image are encoded into corresponding original video image encoding bit streams and pixel attribute image encoding bit streams; or,

and respectively encoding identification information for representing that the coded bit stream is the original video image coded bit stream or the pixel attribute image coded bit stream into the corresponding original video image coded bit stream and the pixel attribute image coded bit stream.

10. The video encoding method of claim 1 or 2, wherein writing the resulting original video image encoded bitstream and pixel property image encoded bitstream into an output video bitstream comprises:

and synchronously writing the obtained original video image coded bit stream and the pixel attribute image coded bit stream into an output video bit stream.

11. A video decoding method, comprising:

12. The video decoding method of claim 11, wherein decoding the original video image encoded bitstream and/or the pixel property image encoded bitstream to obtain corresponding original video images and/or pixel property images comprises:

and selecting corresponding coded bit streams from the original video image coded bit streams and the pixel attribute image coded bit streams according to the set decoding requirements, and decoding the selected coded bit streams to obtain corresponding original video images and/or pixel attribute images.

13. The video decoding method of claim 12, wherein decoding the selected encoded bitstream to obtain corresponding original video images and/or pixel attribute images comprises:

when the selected original video image coding bit stream is decoded, determining a video coding method adopted by the original video image coding bit stream according to identification information of a video coding method carried in the original video image coding bit stream, and decoding the original video image coding bit stream according to a video decoding method matched with the determined video coding method adopted by the original video image coding bit stream;

or when the selected pixel attribute image coding bit stream is decoded, determining the video coding method adopted by the pixel attribute image coding bit stream according to the identification information of the video coding method carried in the pixel attribute image coding bit stream, and decoding the pixel attribute image coding bit stream according to the video decoding method matched with the determined video coding method adopted by the pixel attribute image coding bit stream.

14. The video decoding method according to any of claims 11 to 13, wherein generating the corresponding output video image according to the original video image and/or the pixel attribute image obtained by decoding comprises:

and if the pixel attribute image coding bit stream to be decoded is determined according to the set decoding requirement, generating a corresponding output video image according to the original video image and the pixel attribute image obtained by decoding.

15. A video encoding apparatus, comprising:

16. The video encoding apparatus of claim 15, wherein the pixel attributes include at least any one or more of the following attribute information:

17. The video encoding apparatus according to claim 15 or 16, wherein the pixel attribute identification value is a set luminance value or a set chrominance value.

18. The video encoding apparatus of claim 17, wherein the video encoding apparatus further comprises a first attribute identification determination module or a second attribute identification determination module;

the first attribute identification determination module comprises an area division submodule and a first identification determination submodule, wherein:

the region division submodule is used for dividing the original video image into a plurality of image regions according to the pixel attribute of each pixel in the original video image;

the first identifier determining submodule is configured to determine, for any image area divided by the area dividing submodule, a video content hierarchy corresponding to the image area, and use a pixel attribute identification value corresponding to the determined video content hierarchy as a pixel attribute identification value of each pixel in the image area;

the second attribute identification determining module comprises a scene image training submodule and a second identification determining submodule, wherein:

the scene image training submodule is used for generating scene images by taking a plurality of original video images in the original video image sequence to be coded as a training set;

and the second identifier determining submodule is used for determining the pixel attribute identifier value of each pixel in the original video image by taking the scene image generated by the scene image training submodule as a background image.

19. The video encoding apparatus of claim 18,

the second identifier determining submodule is specifically configured to subtract the scene image and the original video image to obtain an image with a pixel attribute identifier value; and then, removing noise pixels in the image with the pixel attribute identification values by using a set filling algorithm, and determining the pixel attribute identification values of all pixels in the original video image according to the image obtained after denoising.

20. The video coding device of claim 15 or 16, wherein the image generation module comprises a quantization processing sub-module and an image generation sub-module:

the quantization processing submodule is used for performing quantization processing on the pixel attribute identification value corresponding to the pixel attribute of each pixel in the original video image so as to limit the pixel attribute identification value of each pixel in the original video image within a set interval range;

the image generation submodule is used for generating a pixel attribute image corresponding to the original video image according to the pixel attribute identification value of each pixel in the original video image quantized by the quantization processing submodule.

21. The video encoding apparatus of claim 15 or 16, wherein the image encoding module comprises an original video image encoding sub-module and a pixel attribute image encoding sub-module:

the original video image coding submodule is used for coding the original video image by utilizing the existing video coding method or the optimized video coding method based on the pixel attribute so as to obtain a corresponding original video image coding bit stream;

the pixel attribute image coding submodule is used for coding the pixel attribute image by utilizing the existing video coding method or the optimized video coding method based on the pixel attribute so as to obtain a corresponding pixel attribute image coding bit stream;

22. The video encoding apparatus of claim 21, wherein the original video image encoding sub-module encodes the original video image by a different video encoding method from the pixel attribute image encoding sub-module encodes the pixel attribute image.

23. The video encoding apparatus of claim 21, wherein the image encoding module further comprises either a first identification information adding sub-module or a second identification information adding sub-module:

the first identification information adding submodule is used for respectively encoding identification information of a video encoding method adopted when the original video image and the pixel attribute image are encoded into a corresponding original video image encoding bit stream and a corresponding pixel attribute image encoding bit stream;

and the second identification information adding submodule is used for respectively encoding identification information for representing that the coded bit stream is the original video image coded bit stream or the pixel attribute image coded bit stream into the corresponding original video image coded bit stream and the pixel attribute image coded bit stream.

24. The video encoding apparatus of claim 15 or 16,

the encoding output module is specifically configured to write the obtained original video image encoded bit stream and the obtained pixel attribute image encoded bit stream into an output video bit stream synchronously.

25. A video decoding apparatus, comprising:

26. The video decoding apparatus of claim 25, wherein the bitstream decoding module comprises a bitstream selection sub-module and a bitstream decoding sub-module:

the code stream selection submodule is used for selecting corresponding coded bit streams from the original video image coded bit streams and the pixel attribute image coded bit streams according to a set decoding requirement;

and the code stream decoding submodule is used for decoding the coded bit stream selected by the code stream selection submodule to obtain a corresponding original video image and/or a pixel attribute image.

27. The video decoding apparatus of claim 26, wherein the bitstream decoding sub-module comprises an original video image encoded bitstream decoding sub-module and/or a pixel attribute image encoded bitstream decoding sub-module:

the original video image coded bit stream decoding submodule is used for determining a video coding method adopted by the original video image coded bit stream according to identification information of a video coding method carried in the original video image coded bit stream when the selected original video image coded bit stream is decoded, and decoding the original video image coded bit stream according to a video decoding method matched with the determined video coding method adopted by the original video image coded bit stream;

and the pixel attribute image coded bit stream decoding submodule is used for determining a video coding method adopted by the pixel attribute image coded bit stream according to identification information of a video coding method carried in the pixel attribute image coded bit stream when the selected pixel attribute image coded bit stream is decoded, and decoding the pixel attribute image coded bit stream according to a video decoding method matched with the determined video coding method adopted by the pixel attribute image coded bit stream.

28. The video decoding apparatus of any of claims 25 to 27, wherein the image generation module comprises either a first image generation sub-module or a second image generation sub-module:

the first image generation submodule is used for generating a corresponding output video image according to an original video image obtained by decoding if the pixel attribute image coding bit stream does not need to be decoded according to the set decoding requirement;

and the second image generation submodule is used for generating a corresponding output video image according to the original video image and the pixel attribute image obtained by decoding if the pixel attribute image coding bit stream needing to be decoded is determined according to the set decoding requirement.