WO2012060172A1

WO2012060172A1 - Movie image encoding device, movie image decoding device, movie image transmitting system, method of controlling movie image encoding device, method of controlling movie image decoding device, movie image encoding device controlling program, movie image decoding device controlling program, and recording medium

Info

Publication number: WO2012060172A1
Application number: PCT/JP2011/072291
Authority: WO
Inventors: 純生佐藤
Original assignee: シャープ株式会社
Priority date: 2010-11-04
Filing date: 2011-09-28
Publication date: 2012-05-10

Abstract

A movie image encoding device comprises: a distance image segmentation processing unit (22) that segments each frame image of a movie image into a plurality of regions; a number assignment unit (24) that determines a representative value for each region segmented by the distance image segmentation processing unit (22); and a distance value encoding unit (25) that selects whether to encode a numerical sequence, wherein the representative values determined by the number assignment unit (24) are arranged in a predetermined order, by either adaptive encoding or static encoding, and that generates encoding data for the frame image using the selected encoding method.

Description

Moving picture encoding apparatus, moving picture decoding apparatus, moving picture transmission system, moving picture encoding apparatus control method, moving picture decoding apparatus control method, moving picture encoding apparatus control program, moving picture decoding apparatus control program, and recording Medium

The present invention relates to a moving image encoding apparatus that encodes a distance video, a control method of the moving image encoding apparatus, a moving image encoding apparatus control program, and a moving image decoding that decodes encoded data encoded by these The present invention relates to an apparatus, a video decoding device control method, a video decoding device control program, a video transmission system including the same, and a recording medium.

In recent years, displays that represent the three-dimensional shape of a subject have become widespread. There are roughly two types of formats for expressing a three-dimensional shape. One is a spectacle method in which the user can recognize the displayed subject in three dimensions by wearing dedicated glasses when viewing the display, and the other is that the user does not wear the glasses. In this method, the displayed subject can be recognized in three dimensions.

In the case of the glasses method, the subject can be recognized in three dimensions by viewing the image for the left eye with the left eye and the image for the right eye with the right eye, so two images corresponding to the left and right eyes are necessary. Become.

On the other hand, in the case of the naked eye method, images of 8 to 9 viewpoints are required. Therefore, when transmitting images of 8 to 9 viewpoints, the amount of transmission data becomes enormous. Therefore, various techniques for transmitting images from a plurality of viewpoints have been proposed.

For example, instead of transmitting each of the images from a plurality of viewpoints, two types of images are recorded: a normal two-dimensional image (texture image) and a distance image that is an image representing the distance from the camera to the subject. There is a way to transmit. By transmitting the texture image and the distance image, it is possible to create a multi-viewpoint image (described later) at the transmission destination, so that the amount of transmission data can be suppressed as compared to transmitting each multi-viewpoint image.

Here, the distance image is an image expressing the distance from the camera to the subject for each pixel for all the subjects in the image. The distance from the camera to the subject can be obtained by a distance measuring device installed in the vicinity of the camera, or by analyzing texture images taken by the camera from two or more viewpoints. can do.

For the distance image, the Moving Depth Experts Group (MPEG), which is a working group of the International Organization for Standardization / International Electrotechnical Commission (ISO / IEC), sets the distance depth (distance from the camera to the subject) in 256 stages, that is, 8 MPEG-C part3, which is a standard expressed by bit luminance values, is defined.

According to this standard, the distance image is an image expressed in 8-bit gray scale. In the distance image, a higher brightness is assigned as the distance is shorter, so that the subject closer to the camera becomes whiter and the subject farther away becomes blacker.

Then, if there are a texture image and a distance image, the distance for each pixel of the subject reflected in the texture image can be known, so that the subject can be restored to a three-dimensional shape in 256 stages. In addition, since the shape can be geometrically projected onto the two-dimensional plane of the other viewpoint, the texture image can be converted into the texture image from the other viewpoint.

However, in a texture image from one viewpoint, there is a blind spot behind the subject that does not appear in the image. Therefore, if projection conversion is simply performed, blank pixels (occlusion) that cannot be filled by projection conversion will occur. .

Therefore, the occurrence of occlusion is prevented by using a plurality of viewpoint images. Specifically, when the texture image from the viewpoint A is projected and converted to the texture image from the virtual viewpoint B, the texture from the virtual viewpoint B is similarly applied from the texture image from the viewpoint C, which is a viewpoint different from A. Project to an image. As a result, two images from the same virtual viewpoint B can be created, and the blind image is different between the texture image from the viewpoint A and the texture image from the viewpoint C. Therefore, the occlusion in the image from one viewpoint is reduced from the other viewpoint. It can be supplemented by images.

In general, occlusion can be supplemented for the projection conversion to the virtual viewpoint on the line connecting the viewpoint A and the viewpoint C, and an image from the virtual viewpoint on the line can be created.

By using this technique, it is possible to create texture images from 8 to 9 viewpoints from, for example, 2 viewpoints or 3 viewpoints texture images and corresponding distance images. Therefore, by transmitting 2-viewpoint or 3-viewpoint texture images and corresponding distance images, it is possible to create texture images from 8 to 9 virtual viewpoints on the receiving side, reducing the amount of transmitted data. can do.

Further, Non-Patent Document 1 discloses a method of compressing a video from a plurality of viewpoints by efficiently eliminating the redundancy of a video between a plurality of viewpoints (video having an image as each frame). By applying this to two groups of multiple texture images and multiple distance images, it becomes possible to eliminate redundancy between texture images and between distance images. Transmission data can be compressed.

Further, in Patent Document 1, a distance video at a specific viewpoint is created by performing the above-described projection conversion on a distance video at a certain viewpoint, thereby creating a distance video at a specific viewpoint and removing holes in the created distance video. A method for generating a video is disclosed.

Japanese Patent Publication “JP 2009-105894 A (published May 14, 2009)”

First, the characteristics of the distance image will be described. The distance image represents the distance from the camera to the subject as discrete values step by step for each pixel, and has the following characteristics. The first feature is that the edge portion of the subject is in common with the texture image. That is, as long as the texture image includes information that can distinguish the subject and the background as an image, the boundary (edge) between the subject and the background is common to the texture image and the distance image. Therefore, the edge information of the subject is one of the large elements of the correlation information between the texture image and the distance image.

The second feature is that the distance depth value is relatively flat in the portion inside the edge of the subject.

For example, if the subject is a person, the texture image shows information about the clothes worn by the person, but the distance image does not show the clothes pattern information, but only the depth information. Is done. For this reason, the distance depth value on the same subject is flat or changes more slowly than the texture image.

Due to these two features, if the pixels are divided for each range where the distance / depth value is constant, the distance / depth value is constant within that range, so very efficient coding is performed without performing orthogonal transformation or the like. Can be performed. Furthermore, if the range to be divided is determined based on some rule in the texture image, it is not necessary to transmit information regarding the divided range, and the coding efficiency can be further improved.

Here, the pixel group included in the range divided based on the distance depth value is called a segment. Since the coding efficiency can be improved as the number of segments is smaller, the shape of the segment is not limited, and the coding efficiency can be further improved by using a flexible shape.

Therefore, let us consider a case where a range image is divided by segments of various shapes. In many cases, the distribution of distance depth values corresponding to each segment is similar to a distance image corresponding to a different viewpoint image between frames that change in time.

Therefore, further compression is possible if such characteristics are used to eliminate redundancy between frames that change in time or between distance images corresponding to different viewpoint images.

In Non-Patent Document 1, the compression of the texture image is promoted by reusing square segments between temporally adjacent frames or between different viewpoint images. More specifically, in Non-Patent Document 1, by using a motion compensation vector or a parallax compensation vector, an image is divided into square segments (blocks), and between temporally adjacent frames or between different viewpoint images. Data is compressed by reusing blocks.

However, when the method described in Non-Patent Document 1 is applied to the distance image divided into the flexible segment shapes described above, the encoding efficiency is extremely deteriorated. This is because the method described in Patent Document 1 is a method suitable for a method in which segments are square and each segment is orthogonally transformed. When the segments are made flexible, the vector information to be transmitted becomes enormous. It is because it ends.

In addition, since the method described in Patent Document 1 substitutes a single viewpoint for a distance image of a plurality of viewpoints, an error becomes large and the quality is greatly deteriorated.

As described above, if the encoding method is limited, preferable encoding cannot always be performed due to the advantages and disadvantages of each encoding method.

The present invention has been made in view of the above problems, and an object of the present invention is to realize a moving picture encoding apparatus and the like that can select an encoding method.

In order to solve the above-described problem, a moving image encoding device according to the present invention is a moving image encoding device that encodes a moving image, in which each frame image of the moving image is divided into a plurality of regions. Means, a representative value determining means for determining a representative value of each area divided by the image dividing means, and a number sequence in which the representative values determined by the representative value determining means are arranged in a predetermined order for each frame image, Adaptive coding for adaptively updating and coding an adaptive codebook in which a sequence pattern and a code word are associated, and the representative values determined by the representative value determining means for each frame image in a predetermined order The coded data is generated by performing at least one of the static coding in which codewords having different numbers of bits are assigned to each representative value according to the appearance rate of the representative value in the frame image and coded. Do Encoding means; and an encoding method selection means for selecting either the adaptive encoding or the static encoding for each frame image, wherein the encoding means selects the encoding method The frame image is encoded using the encoding method selected by the means to generate encoded data.

Also, the control method of the moving image encoding device according to the present invention is a control method of the moving image encoding device for encoding a moving image, and each frame image of the moving image is encoded by the moving image encoding device. An image dividing step for dividing the image into a plurality of regions, a representative value determining step for determining a representative value of each region divided in the image dividing step, and a representative value determined in the representative value determining step are arranged in a predetermined order. The sequence is adaptively encoded by adaptively updating and encoding an adaptive codebook in which a sequence pattern and a code word are associated, and the representative values determined in the representative value determination step are arranged in a predetermined order, Coding method selection for selecting each of the frame images for each of the frame images, in which static coding is performed by allocating code words having different numbers of bits depending on the appearance rate of the representative value in the frame image. And at least one of the adaptive encoding and the static encoding, and the frame image is encoded using the encoding method selected in the encoding method selection step. And an encoding step for generating encoded data.

According to the above configuration or method, each frame image of a moving image is divided into a plurality of regions, and a representative value of each divided region is determined. Then, for each frame image, the representative positions are arranged in a predetermined order, and it is selected whether the encoding is performed by adaptive encoding or static encoding. For each frame image, encoding is performed using the selected encoding method.

Here, the predetermined order is an order in which the position corresponding to the representative value can be specified in the frame image. For example, the order in which any pixel included in each region is first scanned when the frame image is raster scanned can be set as a predetermined order.

As a result, when encoding is performed, either adaptive encoding or static encoding can be selected for each frame image, and encoding is performed with a more preferable encoding method for each frame image. be able to.

For example, if the encoding method with the smaller amount of information after encoding is selected, more compressed encoded data can be generated. Further, if an encoding method with few processing procedures is selected, more efficient encoding can be performed.

In order to solve the above-described problem, the moving picture decoding apparatus according to the present invention divides each frame image of a moving picture into a plurality of areas, and a sequence pattern for a number sequence in which representative values of each area are arranged in a predetermined order. Or adaptive coding for adaptively updating and coding a codebook in which codewords are associated with each other or arranging the representative values in a predetermined order, and each representative value is represented by the representative value in the frame image. A moving image decoding apparatus for decoding image encoded data which is data encoded by any one of static encoding in which codewords having different numbers of bits depending on the appearance rate are allocated, and the image encoded data And an acquisition unit that acquires encoding information that is information indicating an encoding method of the encoded image data, and a decoding method corresponding to the encoding method indicated by the encoding information acquired by the acquisition unit, and the frame image In For each corresponding encoded image data, decoding means for decoding the encoded image data to generate decoded data, decoded data generated by the decoding means, and information indicating the region, And an image generating means for generating each frame image.

In addition, the control method of the moving picture decoding apparatus according to the present invention divides each frame image of a moving picture into a plurality of areas, and a number sequence pattern and codeword for a number sequence in which representative values of each area are arranged in a predetermined order. Or adaptive coding that adaptively updates and encodes the codebook, or the representative values are arranged in a predetermined order, and each representative value is represented by the appearance rate of the representative value in the frame image. A method for controlling a moving picture decoding apparatus that decodes picture encoded data that is data encoded by any one of static encodings in which codewords having different bit numbers are allocated and encoded, the moving picture decoding apparatus The acquisition step of acquiring the encoded image data and the encoded information that is information indicating the encoding method of the encoded image data, and the encoding method indicated by the encoded information acquired in the acquiring step You In the decoding method, for each of the image encoded data corresponding to the frame image, a decoding step for decoding the image encoded data to generate decoded data, decoded data generated in the decoding step, and the region And an image generation step of generating each frame image of the moving image from the information.

According to the configuration or method described above, each frame image of a moving image is divided into a plurality of regions, and a sequence pattern and a code word are associated with a sequence of numbers in which representative values of each region are arranged in a predetermined order. Adaptive coding for adaptively updating and coding a codebook, or a code in which the representative values are arranged in a predetermined order, and the number of bits varies depending on the appearance rate of the representative value in the frame image. Image encoded data, which is data encoded by any one of static encoding in which words are allocated and encoded, is decoded. Then, an image is generated from the decoded data and the information indicating the area.

As a result, for each image encoded data corresponding to the frame image, adaptive decoding is performed for the adaptively encoded image encoded data, and static decoding is performed for the statically encoded image encoded data. As described above, decoding can be performed appropriately.

Note that the moving image encoding device and the moving image decoding device may be realized by a computer. In this case, the moving image encoding device and the moving image decoding device are operated by causing the computer to operate as the respective means. A video encoding device and a video decoding device control program realized by a computer and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.

As described above, the moving image encoding apparatus according to the present invention includes an image dividing unit that divides each frame image of a moving image into a plurality of regions, and a representative that determines a representative value of each region divided by the image dividing unit. An adaptive codebook in which a numerical sequence in which the representative values determined by the representative value determining unit are arranged in a predetermined order is associated with a sequence pattern and a codeword is updated adaptively for each frame image. The representative values determined by the representative value determining means are arranged in a predetermined order for each frame image, and each representative value is represented by the appearance rate of the representative value in the frame image. Encoding means for generating encoded data by performing at least one of static encoding that allocates and encodes codewords having different bit numbers, and the adaptive encoding and the above for each frame image Encoding method selection means for selecting any one of the encoding methods, and the encoding means encodes the frame image using the encoding method selected by the encoding method selection means, This is a configuration for generating encoded data.

Also, the control method of the moving image encoding device according to the present invention is a control method of the moving image encoding device for encoding a moving image, and each frame image of the moving image is encoded by the moving image encoding device. An image dividing step for dividing the image into a plurality of regions, a representative value determining step for determining a representative value of each region divided in the image dividing step, and a representative value determined in the representative value determining step are arranged in a predetermined order. The sequence is adaptively encoded by adaptively updating and encoding an adaptive codebook in which a sequence pattern and a code word are associated, and the representative values determined in the representative value determination step are arranged in a predetermined order, Coding method selection for selecting each of the frame images for each of the frame images, in which static coding is performed by allocating code words having different numbers of bits depending on the appearance rate of the representative value in the frame image. And at least one of the adaptive encoding and the static encoding, and the frame image is encoded using the encoding method selected in the encoding method selection step. And an encoding step of generating encoded data.

As a result, when encoding is performed, either adaptive encoding or static encoding can be selected for each frame image, and encoding is performed with a more preferable encoding method for each frame image. There is an effect that can be.

The decoding apparatus according to the present invention includes image acquisition data, acquisition means for acquiring encoding information that is information indicating an encoding method of the image encoding data, and encoding information acquired by the acquisition means. A decoding unit that decodes the encoded image data to generate decoded data for each of the encoded image data corresponding to the frame image in a decoding method corresponding to the encoding method shown in FIG. An image generating means for generating each frame image of the moving image from the data and information indicating the region is provided.

The moving picture decoding apparatus control method according to the present invention provides the moving picture decoding apparatus that obtains the encoded image data and encoded information that is information indicating an encoding method of the encoded image data. A decoding method corresponding to the encoding method indicated by the encoding information acquired in the step and the acquisition step, and decoding the image encoded data by decoding the image encoded data for each of the image encoded data corresponding to the frame image. The method includes: a decoding step to generate; an image generation step to generate each frame image of the moving image from the decoded data generated in the decoding step and information indicating the region.

As a result, for each image encoded data corresponding to the frame image, adaptive decoding is performed for the adaptively encoded image encoded data, and static decoding is performed for the statically encoded image encoded data. Thus, there is an effect that decoding can be performed appropriately.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an embodiment of the present invention and is a block diagram illustrating a main configuration of a moving image encoding device. It is a figure for demonstrating which picture a certain picture refers in AVC encoding. It is a figure for demonstrating the example which divides | segments a texture image. It is a figure for demonstrating the example which divides | segments a texture image. It is a figure for demonstrating the example which divides | segments a texture image. It is a figure which shows the distance image corresponding to the texture image of FIG. It is a figure for demonstrating the pixel contained in a segment. It is a figure for demonstrating the pixel contained in a segment. It is a figure for demonstrating the pixel contained in a segment. It is a figure for demonstrating the method of assigning a segment number. It is a figure for demonstrating the method of assigning a segment number. It is a figure which shows a segment table. It is a figure which shows the numerical sequence in which the representative value was located in order of the segment number. It is a figure which shows the algorithm of a LZW system. It is a figure which shows the code book produced by the LZW system. It is a figure which shows the codeword string produced by the LZW system. It is a figure which shows the binary string which expressed the code word string shown in FIG. 16 by the binary of 9 digits. It is a figure which shows the number of appearance of each distance value of 0-255 in a certain distance image. It is a figure which shows the static encoding table which matched each distance value, the appearance rate of each distance value, and a codeword. It is a flowchart which shows the flow of the process which determines the data to output in a distance value encoding part. It is the figure which showed the structure of the NAL unit typically. It is a flowchart which shows operation | movement of a moving image encoder. FIG. 3 is a block diagram illustrating a main configuration of a moving image decoding apparatus according to an embodiment of the present invention. It is a flowchart which shows operation | movement of a moving image decoding apparatus. FIG. 32, showing another embodiment of the present invention, is a block diagram illustrating a configuration of a main part of a video encoding device. It is a figure for demonstrating MVC encoding. It is a block diagram which shows the said other embodiment and shows the principal part structure of a moving image decoding apparatus. It is a flowchart figure which shows an example of the operation | movement which prescribes | regulates a some segment. It is a flowchart figure which shows the subroutine of the segment coupling | bonding process in the flowchart of FIG. It is a figure for demonstrating that a moving image decoding apparatus and a moving image encoding apparatus can be utilized for transmission / reception of a moving image, (a) of FIG. 30 shows the structure of the transmitter which mounts a moving image encoding apparatus. FIG. 30B is a block diagram showing a configuration of a receiving apparatus equipped with a moving picture decoding apparatus. FIG. 30 is a diagram for explaining that a moving image decoding apparatus and a moving image encoding apparatus can be used for recording and reproduction of moving images. FIG. 30A illustrates a recording apparatus in which the moving image encoding apparatus 2 is mounted. FIG. 30B is a block diagram showing the configuration of a playback apparatus equipped with a moving picture decoding apparatus.

[Embodiment 1]
One embodiment of the present invention will be described below with reference to FIGS. First, the moving picture coding apparatus 1 according to the present embodiment will be described. Generally speaking, the moving image encoding apparatus 1 according to the present embodiment roughly describes a texture image and a distance image (each pixel value is expressed by a depth value) constituting each frame of each frame constituting the three-dimensional moving image. This is a device for generating encoded data by encoding (image).

The moving image encoding apparatus 1 according to the present embodiment uses H.264 for encoding texture images. H.264 / MPEG (Moving Picture Experts Group) -4 Video coding using the coding technique employed in the AVC (Advanced Video Coding) standard, while using the coding technique peculiar to the present invention to encode distance images An image encoding device.

The above encoding technique unique to the present invention is an encoding technique developed by paying attention to the fact that there is a correlation between a texture image and a distance image. When the two images include information indicating the edge of the subject in the texture image, the edge of the subject in the distance image is the same, and the pixel group included in the subject area is all or substantially all pixels. Are more likely to take the same distance value.

(Configuration of image encoding device)
First, the configuration of the video encoding apparatus according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram illustrating a main configuration of the moving image encoding device 1.

As shown in FIG. 1, the moving image encoding apparatus 1 includes an image encoding unit (AVC encoding means) 11, an image decoding unit 12, a distance image encoding unit 20, and a packaging unit 28. The distance image encoding unit 20 includes an image division processing unit 21, a distance image division processing unit (image dividing unit) 22, a distance value correcting unit 23, a numbering unit (representative value determining unit) 24, and a distance value encoding. Part (encoding method selection means, encoding means, static codebook creation means) 25.

The image encoding unit 11 The texture image # 1 is encoded by AVC (Advanced Video Coding) coding defined in the H.264 / MPEG-4 AVC standard. The encoded data (AVC encoded data) # 11 is output to the image decoding unit 12 and the packaging unit 28.

The image encoding unit 11 also selects the type of the selected picture (predicted image) (described later), and the information for identifying the type of the selected picture and the reference picture (the selected picture is a P picture or Picture information # 11A indicating “in the case of a B picture” is output to the distance value encoding unit 25. In this embodiment, the picture information # 11A includes information indicating the type of the selected picture. When the IDR picture is encoded, the codebook used when encoding the previous picture is used. This is for avoiding the reference. The IDR picture is a picture for refreshing the decoding operation on the decoding side.

The image decoding unit 12 decodes the texture image # 1 ′ from the encoded data # 11 of the texture image # 1 acquired from the image encoding unit 11. Then, the texture image # 1 ′ is output to the image division processing unit 21.

Note that the decoded texture image # 1 ′ is the same as the image decoded by the moving image decoding device 2 on the receiving side when there is no bit error when transmitting from the moving image encoding device 1. This is different from the texture image # 1. This is because when the original texture image # 1 is AVC-encoded, a quantization error of an orthogonal transform coefficient applied in units called blocks that divide pixels into squares occurs.

Further, the output of the texture image # 1 ′ from the image decoding unit 12 to the image division processing unit 21 is not necessarily performed in the frame order. The reason for this will be described with reference to FIG. FIG. 2 is a diagram for explaining which picture a certain picture refers to in AVC coding. Each picture (201 to 209) in FIG. 2 constitutes a video in this order, and a picture (moving image) is obtained by switching the picture with time. A picture means one image (frame image) at a certain discrete time.

AVC coding performs prediction before and after a picture in order to eliminate redundancy in the time direction. Here, the prediction means that the screen is divided into square areas (blocks) of a certain size, and each area of the picture to be coded is the one close to the area in other pictures that are temporally related. To find out.

And it is classified into I picture, P picture, and B picture according to the selection method of pictures used for prediction. An I picture refers to a picture that is not predicted using another picture. A P picture is a picture that uses only a temporally forward picture for prediction. A B picture is a picture that uses both forward and backward pictures for prediction. Prediction is performed for each block. For example, in the case of a B picture, up to two pictures can be specified.

Therefore, it is necessary to refer to the I picture or P picture that exists later in time for the B picture, and decoding is possible only when the picture to be referenced arrives. For example, the B picture 201 in FIG. 2 can be decoded only after the I picture 202 as the reference object arrives. Also, the B picture 203 whose reference pictures are the I picture 202 and the P picture 205 can be decoded only after the P picture 205 arrives.

Therefore, the image decoding unit 12 can perform the decoding process on the B picture that refers to the subsequent picture only after the I picture or the P picture to be referenced arrives. In some cases, an I picture that is a later picture is first decoded and output to the image division processing unit 21.

The image division processing unit 21 divides the entire area of the texture image into a plurality of segments (areas). Then, the image division processing unit 21 outputs segment information # 21 including position information of each segment to the distance image division processing unit 22. The segment position information is information indicating the position of the segment in the texture image # 1.

As a method of dividing a texture image into a plurality of segments, for example, the following method can be cited.

First, the texture image is repeatedly smoothed while leaving edge information. Thereby, the noise which an image has can be removed. Then, adjacent similar colored segments are joined together. However, a segment whose width or height exceeds a predetermined number of pixels is likely to change a distance value within the segment, and is therefore divided so as not to exceed a predetermined number of pixels. By this method, the texture video can be divided into segment units.

This point will be described in more detail with reference to FIGS. 3 to 5 are diagrams for explaining an example of dividing a texture image.

For example, when the image at a certain discrete time is an image 301 as shown in FIG. 3, when the image 301 is input to the image division processing unit 21, the image division processing unit 21 displays the image 401 as shown in FIG. 4. Divide into segments. In the image 301, the left and right hairs of the girl's head division are drawn in two colors, brown and light brown, and the image division processing unit 21 uses pixels of similar colors such as brown and light brown. The closed region is defined as one segment (FIG. 4). The skin portion of the girl's face is also drawn in two colors, the skin color and the pink color of the cheek portion, but the image division processing unit 21 separates the skin color region and the pink region from each other. It is defined as a segment (Fig. 4). This is because the skin color and the pink color are not similar (that is, the difference between the skin color pixel value and the pink pixel value exceeds a predetermined threshold value). In the image 401, the closed region drawn by the same pattern indicates one segment.

In addition, here, the process of dividing a segment into small segments when the width or height exceeds a predetermined number of pixels is omitted.

Then, when only the segment shape is extracted from the image 401, information as shown in the image 501 in FIG. 5 is obtained.

FIG. 6 shows a distance image 601 corresponding to the image 301 (texture image) in FIG. As shown in FIG. 6, the distance image is an image having a different distance value for each segment.

When the distance image (frame image) # 2 and the segment information # 21 that are each frame image of the distance video are input, the distance image division processing unit 22 performs the distance image # 2 for each segment in the texture image # 1 ′. A distance value set composed of distance values of each pixel included in the corresponding segment (region) in the center is extracted. Then, the distance image division processing unit 22 generates segment information # 22 in which the distance value set and the position information are associated with each segment from the segment information # 21. Then, the generated segment information # 22 is output to the distance value correction unit 23.

Specifically, the distance image division processing unit 22 refers to the input segment information # 21, identifies the position of each segment in the texture image # 1 ′, and is the same as the segment division pattern in the texture image # 1 ′. In this division pattern, the distance image # 2 is divided into a plurality of segments. Therefore, the segment division pattern in the texture image # 1 ′ and the segment division pattern in the distance image # 2 are the same.

The distance value correction unit 23 calculates the mode value as the representative value # 23a from the distance value set of the segment included in the segment information # 22 for each segment of the distance image # 2. That is, when the segment i in the distance image # 2 includes N pixels, the distance value correcting unit 23 calculates the mode value from the N distance values. The distance value correcting unit 23 may calculate an average of N distance values as an average value, or a median value of N distance values or the like as a representative value # 23a instead of the mode value. In addition, when the average value or the median value becomes a decimal value as a result of the calculation, the distance value correcting unit 23 may round the decimal value to an integer value by rounding down, rounding up, or rounding.

Then, the distance value correcting unit 23 replaces the distance value set of each segment included in the segment information # 22 with the representative value # 23a of the corresponding segment, and outputs it to the number assigning unit 24 as the segment information # 23.

The reason for calculating the representative value # 23a is as follows. Ideally, all the pixels included in each segment in the distance image # 2 have the same distance value. However, pixels having different distance values in the same segment of the distance image # 2 when the edges of the texture image # 1 and the distance image # 2 are shifted due to, for example, inaccuracy of the distance image # 2. Groups may exist. In such a case, for example, if the distance value of the pixel group having a small pixel value is replaced with the distance value of the pixel group having the maximum pixel value, the distance values in the same segment are all the same, and the distance image # 2 is segmented. It can be rolled into a shape.

Further, since the accuracy of the edge portion is generally better for the texture image # 1 than the distance image # 2, the above processing also has the effect of improving the accuracy of the edge portion of the distance image # 2. Will have.

When the segment information # 23 is input, the number assigning unit 24 associates identifiers having different values with each representative value # 23a included in the segment information # 23. Specifically, the number assigning unit 24 sets the segment number # 24 according to the representative value # 23a and the position information for each set of the position information and the representative value # 23a of the M sets included in the segment information # 23. Associate. Then, the number assigning unit 24 outputs the data in which the segment number # 24 and the representative value # 23a are associated to the distance value encoding unit 25.

A segment is included in the same segment when the distance values of pixels connected in the vertical or horizontal direction are the same, but even if there are pixels with the same distance value in the diagonal direction, Are not considered to be included in the same segment. That is, the segment is formed by a group of pixels having the same distance value connected in the vertical or horizontal direction.

Specifically, this will be described with reference to FIGS. 7 to 9 are diagrams for explaining pixels included in a segment.

7 and 8 are connected to each other in the vertical or horizontal direction, so if the distance values are the same, they are included in the same segment. On the other hand, since the pixel A and the pixel B shown in FIG. 9 are connected in an oblique direction, even the same distance value is not included in the same segment. That is, the pixel A and the pixel B in FIG. 9 are different segments.

Next, a method for assigning segment number # 24 will be described with reference to FIGS. 10 to 12 are diagrams for explaining a method of assigning segment number # 24.

Since segment number # 24 does not have to overlap in the same image (frame) of the same video, pixels are scanned line by line from the upper left to the lower right of the image (FIG. 10, raster scan). When the number to the segment including the target pixel is not assigned, it is conceivable to assign the number in order from 0.

For example, when segment number # 24 is assigned to the image 501 in FIG. 5 by raster scan, the segment number “0” is assigned to the segment R0 positioned at the head in the raster scan order as shown in FIG. Further, the segment number “1” is assigned to the segment R1 that is positioned second in the raster scan order. Similarly, segment numbers “2” and “3” are assigned to the third and fourth segments R2 and R3, respectively, in the raster scan order.

Thereby, data such as the segment table 1201 shown in FIG. 12 is obtained. Then, the obtained data is output to the distance value encoding unit 25.

The distance value encoding unit 25 performs compression encoding processing on the data (segment table 1201) in which the segment number # 24 and the representative value # 23a are associated, and the obtained encoded data (image encoded data) # 25. And reference compression (encoding information) # 25A and reference picture information (image specifying information) # 25B for static compression encoding are output to the packaging unit 28.

More specific explanation. The distance value is expressed in 256 stages, and the data in which the segment number # 24 and the representative value # 23a are associated with each other is as shown in FIG. It can be expressed as a numerical sequence 1301.

Then, this number sequence is encoded by a hybrid method of adaptive compression coding and static compression coding. The hybrid method is a method of performing compression encoding using a preferable encoding method among adaptive compression encoding and static compression encoding.

Adaptive compression coding is a process of compression that creates a correspondence table (codebook for adaptation) between codewords and pre-coding values (sequence pattern), and adaptively updates the codebook for adaptation. This is a coding method that goes on. This is a suitable method when the appearance rate of each value before encoding is not known. However, the compression rate is low compared to static compression coding.

On the other hand, static compression encoding refers to encoding in which the number of bits of a code word is made different based on the appearance rate when the appearance rate of each value before encoding is known.

In order to perform static compression encoding, the appearance rate of each value before encoding is required, so that a sequence having a known appearance rate can be encoded with a high compression rate. However, for sequences that do not know the appearance rate of each value, first calculate the appearance rate of each value by scanning the sequence once until the end, and counting the frequency of each value, in order to obtain the appearance rate of each value. There is a need to. Then, static compression encoding is performed based on the appearance rate. Therefore, in order to obtain the appearance rate, it is necessary to scan an extra number sequence, and there is a drawback that it takes time for processing. Also, since it is necessary for the decoding side apparatus to perform decoding corresponding to the encoding method, the decoding side apparatus similarly takes time for processing.

Therefore, in the present embodiment, adaptive compression coding (adaptive coding) and static compression coding (static coding) are switched and used (hybrid method). As a result, encoding with high compression efficiency and compression rate can be realized.

First, the adaptive compression coding method is an adaptive entropy coding method that adaptively updates the codebook (event occurrence probability table) of Huffman coding and arithmetic coding methods classified as entropy coding. Various methods have been proposed. Here, as an example, a Lempel-Ziv-Welch (LZW) coding method for adaptively updating a Lempel-Ziv coding codebook (dictionary), which is a typical example of lexicographic coding, will be described. The LZW system is an encoding system developed by Terry Welch as an example of an implementation of the LZ78 encoding system announced by Abraham Lempel and Jacob Ziv in 1978. In this method, paying attention to the pattern in which values are arranged, a newly appearing pattern is sequentially registered in the code book and at the same time a code word is output. On the decoding side, a new pattern is registered in the codebook and decoded based on the received codeword in the same manner as the code side, whereby the original sequence can be completely reproduced. Therefore, this method is a so-called lossless encoding method in which information is not lost by encoding. This encoding method is one of the encoding methods having excellent compression efficiency, and is widely used practically in image compression and the like.

FIG. 14 shows this LZW algorithm. However, since the LZW method is an encoding method developed for compressing a character string, the expression assumes a case where the character string is compressed. However, since the character string can be expressed by a binary (bit) sequence of several digits, this algorithm can be applied to the numerical sequence 1301 of the distance value as it is.

First, the code book is initialized and all single characters are registered in the code book (S51). For example, if only three letters a, b, and c of the alphabet are used, these three alphabets are registered in the code book, and 0 is assigned to a, 1 is assigned to b, and 2 is assigned to c.

Next, the first character of the character string to be encoded is read and assigned to ω (ω is a variable) (S52). Further, the next one character is read and assigned to K (K is a variable) (S53). Then, it is further determined whether or not there is an input character string (S54). If there is no further input character string (NO in S54), the code word corresponding to the character string stored in ω is output and the process ends (S55). On the other hand, if there are more input character strings (YES in S54), it is determined whether or not the character string ωK exists in the code book (S56).

If the character string ωK exists in the code book (YES in S56), the character string ωK is substituted for ω (S57), and the process returns to step S53.

On the other hand, if the character string ωK does not exist in the code book (NO in S56), the code word corresponding to the character string stored in ω is output (S58), and the character string ωK is registered in the code book ( Further, K is substituted for ω (S60). Thereafter, the process returns to step S53.

The above is the LZW algorithm. As can be seen from this algorithm, in the LZW method, as the same pattern is included in the character string to be encoded, the pattern portion can be replaced with a single codeword, so that significant compression is possible. It becomes.

Suppose that in this LZW algorithm, characters are replaced with distance values and the sequence 1301 shown in FIG. 13 is encoded.

First, initialize the code book and register all the values from 0 to 255 in the code book. At this point, codewords 0 to 255 are filled. Therefore, the next registration is performed from 256 codewords. Then, when the number sequence 1301 is encoded by the algorithm described above, a code book 1501 shown in FIG. 15 is created, and a code word sequence 1601 shown in FIG. 16 is output.

Then, each codeword included in the output codeword string 1601 is converted into a 9-digit binary value as shown in FIG. 17 and output to the packaging unit 28 as a binary string 1701. Here, the code word “89” is converted into a binary “001011001”, the code word “182” is converted into a binary “010110110”, and so on. The same applies to the following. Here, a binary value of 9 digits is used as a value representing each code word. However, since the code book becomes larger as the encoding progresses, if the number of code words exceeds 512, 2 digits of 9 digits are used. It cannot be expressed with a value. Therefore, in this case, when the number of codewords exceeds 512, the digit is increased by one and expressed by a binary value of 10 digits. Even in this way, the LZW method has a rule that the size of the codebook increases by 1 at the timing when the code word is output, so the number of digits can be determined on the decoding side. Therefore, if the decoding side counts the number of codewords to be received, it is possible to determine the number of digits at each time point.

Also, in the LZW system, the codebook size increases as the coding continues, so it is necessary to limit the codebook size at some point.

¡Various methods are widely used for this problem. For example, the codebook maximum size is determined in advance, and when the codebook size reaches the specified size, the codebook is reset to the initial value, or the newest one in order from the pattern with the longest unused period There is an LZT method that is a method of replacing the Here, it is assumed that the LZT method is used.

If this LZT encoding is applied to the sequence 1301, a plurality of distance values appearing in the same pattern can be expressed by one code word, so that the number of code words is larger than the number of distance values. As a result, the amount of data can be compressed.

Here, let us consider the characteristics of range images. FIG. 18 is a diagram showing the number of appearances of each value of 0 to 255 in a certain distance image. As shown in FIG. 18, each value appears every 6 to 7, and the number of appearances is 0 during that time. Thus, in the distance image, not all values appear, but may appear at intervals.

This is due to the distance image generation method. When the distance image is generated by a dedicated measurement device, the graph does not have a shape as shown in FIG. 18, and each value appears to some extent. On the other hand, when a parallax is calculated from an image of two viewpoints to generate a distance image, a graph having a shape in which each value appears every 6 to 7 as shown in FIG. When generating a distance image by calculating parallax from images of two viewpoints, in general, a distance image is obtained by shifting the images of two viewpoints in units of 1/4 to 1 pixel and estimating the distance of each pixel by matching processing. Is generated. Therefore, when the resolution of the image is low or when the shifting accuracy of the matching process is rough, the estimated distance is not continuous. As a result, as shown in FIG. 18, values appear every 6 to 7.

And the appearance of these values is almost the same for each image (each frame image) in one video.

In addition, as described above, the distance images are similar to each other in the temporal relationship.

Therefore, the generated codebook for static compression coding (static codebook) may be able to be reused in pictures that are temporally mixed, thereby enabling efficient compression coding. Can be realized.

Specifically, a method for switching between adaptive compression coding and static compression coding will be described.

First, in the image encoding unit 11, when the first texture image is encoded as an I picture, information that the I picture has been selected (picture information # 11A) and the segment table of the distance image corresponding to the texture image 1201 is input to the distance value encoding unit 25.

Then, the distance value encoding unit 25 creates a code book 1501 from the segment table 1201 in accordance with the above-described adaptive compression encoding algorithm, and converts each code word of the code word string 1601 into a 9-digit binary value. A binary string 1701 is output. Also, “0” is set as the reference flag # 25A for static compression encoding, and the binary string 1701 is output to the packaging unit 28. This reference flag is set to “1” when static compression encoding is performed.

Then, when adaptive compression coding is performed, the number of occurrences of each value in this picture is counted, and a static coding table 1901 (see FIG. 19) is created. The appearance rate of the static encoding table 1901 is a value obtained by dividing the number of appearances of each value by the total number of segments. The code word is a code word when Huffman coding is performed based on the appearance rate. Huffman coding is a well-known technique, and a detailed description thereof is omitted.

As shown in the static coding table 1901 in FIG. 19, in the Huffman coding, the higher the appearance rate, the shorter the code word is assigned and the bit rate is kept low. In the example shown in the static encoding table 1901, the distance values “0” and “255” have a low appearance rate, so the code word is 10 bits (“1100011110” and “1100001001”). On the other hand, since the appearance values of the distance values “126” and “130” are high, the code word is 5 bits (“10011” and “11010”). Further, since the appearance values of the distance values “1”, “125”, “127”, “128”, “129”, and “131” are 0, no code word is assigned.

Then, a code book (static code book) 1902 in which the distance value and the code word are associated in the static coding table 1901 is stored until the next picture processing. Note that the saving of the code book 1902 is not limited to the processing of the next picture, and the code book 1902 may be saved until the code book 1902 is no longer needed.

Next, when the image encoding unit 11 encodes the second texture image as, for example, a P picture, information indicating that the P picture has been selected, information indicating which picture was referenced, the texture image, The corresponding distance image segment table 1201 is input to the distance value encoding unit 25.

Here, it is assumed that this P picture refers to the previous I picture. In AVC encoding, in the case of a B picture, since up to two reference destinations are permitted, there may be two reference destinations. In this case, information on both reference destinations is input.

Then, Huffman encoding is performed on the binary string 1701 using the saved code book 1902, and the number of bits of encoded data after encoding is calculated. Since this code book 1902 is a code book created based on the appearance rate of each value in the previous picture, more efficient encoding is possible as the previous picture and the current picture are similar. It becomes. However, since the code book 1902 is created based on the previous picture, there may be a case where a value that is included in the current picture but not included in the code book 1902 exists. In this case, the static encoding method is not performed.

When counting the number of occurrences of each value, 1 is added to all the values, a code book in which codewords are assigned to all the values is created, and the Huffman code is used using this code book. May also be performed. In this codebook, since codewords are assigned to all values, it is possible to prevent encoding from being performed due to the presence of values not assigned in the codebook.

Next, adaptive compression encoding is performed on the current picture, and the number of encoded data bits after encoding is calculated. Similarly to the first picture, the number of occurrences of each value in the picture is counted, and a static coding table 1901 is created in which each value is associated with the appearance rate of each value and a code word.

Then, the number of encoded data bits when the adaptive compression encoding is performed, and the number of encoded data bits after the static compression encoding (Huffman encoding) Compare

As a result, when the number of bits of encoded data after encoding by static compression encoding is larger, the reference flag is set to “0” as in the case of the first picture, and adaptive compression is performed. Output encoded data encoded by encoding. On the other hand, when the number of bits of encoded data after encoding by static compression encoding is smaller, the reference flag is set to “1”, and the reference flag and the picture number of the reference destination (here, 1 Picture information # 25B indicating the previous picture) and encoded data subjected to static compression encoding (Huffman encoding) are output.

The flow of this operation will be described with reference to FIG. FIG. 20 is a flowchart showing the flow of processing for determining data to be output in the distance value encoding unit 25.

First, the distance value encoding unit 25 determines whether or not there is a code table (code book) referred to for static compression encoding of a picture (S81). In this determination, in the case of the first image or IDR picture, since similarity with the previous image cannot be expected, it is assumed that there is no code table to be referred to. In other cases, it is assumed that there is a code table to be referenced.

If there is no code table to be referenced (NO in S81), adaptive compression coding is performed, the number of occurrences of each value in the picture is counted, and each value is associated with the appearance rate of each value and the code word. The attached static encoding table 1901 is created (S82). Then, the code book 1902 is stored in the static encoding table 1901. Further, “0” is set as a reference flag and output (S86).

On the other hand, when there is a code table to be referred to (YES in S81), static compression coding (Huffman coding) is executed using the code table, and the number of bits of the coded data after coding is calculated. When there are a plurality of code tables that can be referred to, static compression encoding is executed using each code table, and the number of bits of encoded data after each encoding is calculated.

Regarding the range of the code table to be referred to, for example, the code table for two times immediately before the picture to be encoded can be considered. If the range of the code table to be referenced is limited to the previous one, it is not necessary to transmit the reference picture number to the decoding side.

Furthermore, adaptive compression encoding is performed, and the number of bits of encoded data after encoding is calculated. Also, the number of occurrences of each value in the picture is counted, and a static encoding table 1901 is created in which each value, the appearance rate of each value, and a code word are associated with each other. Then, the number of bits after encoding is compared between when static compression encoding is performed and when adaptive compression encoding is performed (S83).

If the number of bits of the encoded data after performing the static compression encoding is smaller (YES in S84), the encoded data subjected to the static compression encoding is output and the reference data is set to “1”. The reference flag # 25A and picture information # 25B indicating the reference picture are output (S85).

On the other hand, if the encoded data after the static compression encoding has a larger number of bits (NO in S84), the process proceeds to step S86, and the encoded data after the adaptive compression encoding is output. At the same time, the reference flag is set to “0” and output.

The above is the process of determining data to be output in the distance value encoding unit 25.

The packaging unit 28 associates the input encoded data # 11 of the texture image # 1, the encoded data # 25 of the distance image # 2, the reference flag # 25A, and the picture information # 25B as encoded data # 28. The video is output to the video decoding device 2. The picture information # 25B is not output when the reference flag # 25A is “0”.

Specifically, the packaging unit 28 is H.264. In accordance with the format of the NAL unit defined in the H.264 / MPEG-4 AVC standard, the texture image encoded data # 11 and the distance image encoded data # 25 are integrated.

FIG. 21 is a diagram schematically showing the configuration of the NAL unit 1801. As shown in FIG. 21, the NAL unit 1801 is composed of three parts: a NAL header 1802, an RBSP 1803, and an RBSP trailing bit 1804.

An identifier indicating the encoding scheme performed by the distance value encoding unit 25 in the nal_unit_type (identifier indicating the type of NAL unit) field of the NAL header 1802 of the NAL unit 1801 corresponding to each slice (main slice) of the main picture. Enters. The RBSP 1803 contains encoded data # 11 and encoded data # 25, which are encoded data. The RBSP trailing bit 1804 is an adjustment bit for specifying the last bit position of the RBSP 1803.

Also, the reference flag # 25A and the picture information # 25B extend header information called PPS (Picture Parameter Set) indicating the coding mode of the entire picture, and are stored and transmitted here.

In the above embodiment, the moving picture encoding apparatus 1 is an H.264 standard. The texture image # 1 is encoded using AVC encoding defined in the H.264 / MPEG-4 AVC standard, but the present invention is not limited to this. That is, the image encoding unit 11 of the moving image encoding apparatus 1 may encode the texture image # 1 using another encoding method such as MPEG-2 or MPEG-4.

(Operation of video encoding device)
Next, the operation of the moving picture encoding apparatus 1 will be described below with reference to FIG. FIG. 22 is a flowchart showing the operation of the moving image encoding apparatus 1. Note that the operation of the moving image encoding apparatus 1 described here is an operation of encoding a texture image and a distance image of the t frame from the head in a moving image including a large number of frames. That is, the moving image encoding apparatus 1 repeats the operation described below as many times as the number of frames of the moving image in order to encode the entire moving image. In the following description of the operation, unless otherwise specified, each data # 1 to # 28 is interpreted as data of the t-th frame.

First, the image encoding unit 11 and the distance image division processing unit 22 respectively receive the texture image # 1 and the distance image # 2 from the outside of the moving image encoding device 1 (S1). As described above, the texture image # 1 and the distance image # 2 received from the outside are correlated with each other in the content of the image, as can be seen, for example, by comparing the texture image of FIG. 3 and the distance image of FIG. is there.

Next, the image encoding unit 11 The texture image # 1 is encoded by the AVC encoding method stipulated in the H.264 / MPEG-4 AVC standard, and the obtained texture image encoded data # 11 is transmitted to the packaging unit 28 and the image decoding unit 12. Output (S2). In step S <b> 2, the image encoding unit 11 outputs the reference picture to the distance value encoding unit 25 when the selected picture type and the selected picture are a B picture or a P picture.

Then, the image decoding unit 12 decodes the texture image # 1 ′ from the encoded data # 11 and outputs it to the image division processing unit 21 (S3). Thereafter, the image division processing unit 21 defines a plurality of segments from the input texture image # 1 ′ (S4).

Next, the image division processing unit 21 generates segment information # 21 including position information of each segment, and outputs it to the distance image division processing unit 22 (S5). As the position information of the segment, for example, each coordinate value of the pixel group located at the boundary with the other segment of the segment can be cited. That is, when each segment is defined from the texture image of FIG. 3, the coordinate value of each coordinate located in the contour portion of the closed region in FIG. 5 becomes the position information of the segment.

Thereafter, the distance image division processing unit 22 divides the input distance image # 2 into a plurality of segments. Then, the distance image division processing unit 22 extracts a distance value of each pixel included in the segment as a distance value set for each segment of the distance image # 2. Furthermore, the distance image division processing unit 22 associates the distance value set extracted from the corresponding segment with the position information of each segment included in the segment information # 21. Then, the distance image division processing unit 22 outputs the segment information # 22 obtained thereby to the distance value correction unit 23 (S6, image division step).

Next, the distance value correction unit 23 calculates a representative value # 23a from the distance value set of the segment included in the segment information # 22 for each segment of the distance image # 2. Then, each of the distance value sets included in the segment information # 22 is replaced with the representative value # 23a of the corresponding segment, and is output to the number assigning unit 24 as the segment information # 23 (S7, representative value determining step).

Then, the number assigning unit 24 associates the representative value # 23a with the segment number # 24 corresponding to the position information for each set of the position information and the representative value # 23a included in the segment information # 23, and sets M sets The representative value # 23a and the segment number # 24 are output to the distance value encoding unit 25 (S8).

Thereafter, the distance value encoding unit 25 performs encoding processing on the input representative value # 23a and segment number # 24, and outputs the obtained encoded data # 25 to the packaging unit 28 (S9, encoding) Scheme selection step, encoding step).

Then, the packaging unit 28 integrates the encoded data # 11 output from the image encoding unit 11 in step S2 and the encoded data # 25 output from the distance value encoding unit 25 in step S9. The encoded data # 28 is output to the video decoding device 2 (S10).

The above is the operation of the video encoding device 1.

(Configuration of video decoding device)
Next, the moving picture decoding apparatus 2 according to an embodiment of the present invention will be described below with reference to FIGS. The video decoding device 2 according to the present embodiment decodes the texture image # 1 ′ and the distance image # 2 ′ from the encoded data # 28 transmitted from the above-described video encoding device 1. Then, the decoded texture image # 1 ′ and distance image # 2 ′ are output as frame images to a device constituting the moving image.

First, the configuration of the video decoding device 2 according to the present embodiment will be described with reference to FIG. FIG. 23 is a block diagram illustrating a main configuration of the video decoding device 2.

As shown in FIG. 23, the moving image decoding apparatus 2 includes an image decoding unit 12, an image division processing unit 21 ′, an unpackaging unit (acquisition unit) 31, a distance value decoding unit (decoding unit, static codebook generation unit). ) 32 and a distance value assigning unit (image generating means) 33.

The unpackaging unit 31 extracts the encoded data # 11 of the texture image # 1 and the encoded data # 25 of the distance image # 2 from the encoded data # 28. The encoded data # 11 of the texture image # 1 is output to the image decoding unit 12, and the encoded data # 25 of the distance image # 2 is output to the distance value decoding unit 32.

The image decoding unit 12 decodes the texture image # 1 ′ from the encoded data # 11. The image decoding unit 12 is the same as the image decoding unit 12 included in the moving image encoding device 1. That is, the image decoding unit 12 is configured to transmit the encoded data # 28 from the moving image encoding apparatus 1 to the moving image decoding apparatus 2 as long as no noise is mixed in the encoded data # 28. The texture image # 1 ′ having the same content as the texture image decoded by the image decoding unit 12 is decoded. Then, the decoded texture image # 1 ′ is output.

Also, the image decoding unit 12 outputs the decoded picture type of the texture image # 1 ′ and the reference picture information to the distance value decoding unit 32.

The image division processing unit 21 ′ divides the entire area of the texture image # 1 ′ into a plurality of segments (areas) using the same algorithm as the image division processing unit 21 of the moving image encoding device 1. Then, the image division processing unit 21 ′ outputs segment information # 21 ′ including the position information of each segment to the distance value giving unit 33.

The distance value decoding unit 32 decodes the representative value # 23a and the segment number # 24 (decoded data) from the encoded distance image encoded data # 25, the reference flag # 25A, and the picture information # 25B. Thereby, the sequence 1301 of FIG. 13 encoded by the distance value encoding unit 25 of the moving image encoding apparatus 1 is decoded.

More specifically, when the reference flag # 25A is “0”, it is encoded data that has been subjected to adaptive compression encoding, so adaptive decoding is performed. When the reference flag # 25A is “1”, it is encoded data that has been subjected to static compression encoding, and therefore static decoding is performed.

Specifically, since the reference flag is “0”, the first picture has been subjected to adaptive compression coding. Therefore, adaptive decoding is performed, and the sequence 1301 is decoded. At this time, similarly to the case where the adaptive encoding is performed in the distance value encoding unit 25 of the moving image encoding apparatus 1, the number of occurrences of each value of the distance value is counted, and a static encoding table 1901 is created. The code book 1902 is stored until the next encoded data is processed. Note that the storage period is not limited to the processing of the next encoded data, and may be until the code book 1902 is not required.

Thereafter, static decoding is performed on the encoded data input together with the reference flag # 25A “1” by using the stored code book 1902 to be referred to. The code book 1902 to be referred to selects the one corresponding to the picture number indicated by the picture information # 25B input together with the reference flag # 25A.

Thus, the moving picture decoding apparatus 2 can create the code book 1902 for static decoding without transmitting it from the moving picture encoding apparatus 1 to the moving picture decoding apparatus 2. Therefore, it is possible to transmit the encoded data or the like with a greatly reduced amount of information.

Then, the segment table 1201 in FIG. 12 is decoded. This segment table 1201 is output to the distance value assigning unit 33.

Based on the input representative value # 23a and segment number # 24, the distance value assigning unit 33 applies a pixel value (distance value), which is a representative value of the segment, to the pixel included in each segment. Restore 2 '. Then, the restored distance image # 2 ′ is output.

From the above, texture image # 1 and distance image # 2 can be decoded.

For example, when the texture image screen is divided into segments by the following method, if the input texture image is an image of 1024 × 768 dots, about several thousand segments (for example, 3000 to 5000 segments) Can be divided into In the AVC encoding method, the total number of blocks (4 × 4 = 16 pixels) is about 49000.

Specifically, the image division processing unit 21 calculates an average value calculated from the pixel values of the pixel group included in the segment and a segment adjacent to the segment from the input texture image # 1 ′. A plurality of segments whose difference from the average value calculated from the pixel values of the included pixel group is equal to or less than a predetermined threshold value are defined.

A specific algorithm for defining a plurality of segments in which the difference between the average values is equal to or greater than a predetermined threshold will be described below with reference to FIGS. 28 and 29.

FIG. 28 is a flowchart showing an operation in which the video encoding device 1 defines a plurality of segments based on the above algorithm. FIG. 29 is a flowchart showing a subroutine of segment combination processing in the flowchart of FIG.

The image division processing unit 21 performs one independent segment (provisional segment) for each of all the pixels included in the texture image in the initialization step in FIG. 28 with respect to the texture image subjected to the smoothing process. And the pixel value itself of the corresponding pixel is set as the average value (average color) of all the pixel values in each provisional segment (S41).

Next, the process proceeds to the segment combination processing step (S42), and the provisional segments having similar colors are combined. This segment combining process will be described in detail below with reference to FIG. 29, and this combining process is repeated until the combination is not performed.

The image division processing unit 21 performs the following processing (S51 to S55) for all provisional segments.

First, the image division processing unit 21 determines whether or not the height and width of the temporary segment of interest are both equal to or less than a threshold value (S51). If it is determined that both are equal to or lower than the threshold (YES in S51), the process proceeds to step S52. On the other hand, when it is determined that any one is larger than the threshold value (NO in S51), the process of step S51 is performed for the temporary segment to be focused next. The temporary segment that should be noted next may be, for example, the temporary segment that is positioned next to the temporary segment that is focused in the raster scan order.

The image division processing unit 21 selects a temporary segment having an average color closest to the average color of the temporary segment of interest among the temporary segments adjacent to the temporary segment of interest (S52). As an index for judging the closeness of colors, for example, the Euclidean distance between vectors when the three RGB values of pixel values are regarded as a three-dimensional vector can be used. As a pixel value of each segment, an average value of all pixel values included in each segment is used.

After the process of step S52, the image division processing unit 21 determines whether or not the proximity of the temporary segment of interest and the temporary segment that is determined to have the closest color is equal to or less than a certain threshold ( S53). If it is determined that the value is larger than the threshold value (NO in S53), the process of step S51 is performed for the temporary segment that should be noted next. On the other hand, if it is determined that the value is equal to or less than the threshold (NO in S53), the process proceeds to step S54.

After the process of step S53, the image division processing unit 21 converts two provisional segments (provisional segments determined to be closest in color to the provisional segment of interest) into one provisional segment. (S54). The number of provisional segments is reduced by 1 by the process of step S54.

After the process of step S54, the average value of the pixel values of all the pixels included in the converted target segment is calculated (S55). If there is a segment that has not yet been subjected to the processing of steps S51 to S55, the processing of step S51 is performed for the temporary segment to be noticed next.

After completing the processes of steps S51 to S55 for all the provisional segments, the process proceeds to the process of step S43.

The image division processing unit 21 compares the number of provisional segments before the process of step S42 with the number of provisional segments after the process of step S42 (S43).

If the number of provisional segments has decreased (YES in S43), the process returns to step S42. On the other hand, when the number of temporary segments does not change (NO in S43), the image division processing unit 21 defines each current temporary segment as one segment.

By the above algorithm, as described above, when the input texture image is an image of 1024 × 768 dots, it can be divided into about several thousand (for example, 3000 to 5000) segments.

As described above, the segment is used to divide the distance image. Therefore, if the size of the segment becomes too large, various distance values are included in one segment, resulting in a pixel having a large error from the representative value, and the encoding accuracy of the distance image is lowered. Therefore, in the present invention, the process of step S51 is not essential, but it is desirable to prevent the segment size from becoming too large by limiting the segment size as in step S51.

As described above, in the above-described algorithm, the segment is divided into about several thousand segments (for example, 3000 to 5000 segments), whereas in the AVC encoding method, the total number of blocks (4 × 4 = 16 pixels) is about 49000. It becomes a piece. Then, orthogonal transform is performed for each block, and the coefficients are quantized and transmitted.

Therefore, in this embodiment, the number of segments can be made significantly smaller than the number of processing units for orthogonal transformation. Further, since the distance value in each segment is constant, it is not necessary to perform orthogonal transform, and the distance value can be transmitted with 8-bit information. Furthermore, in this embodiment, it is possible to further improve the compression efficiency by performing an adaptive compression encoding method and reusing a code book. Therefore, in this embodiment, the compression efficiency can be greatly improved as compared with the case where the texture video (image) and the distance video (image) are each encoded by the AVC encoding method.

(Operation of video decoding device)
Next, the operation of the video decoding device 2 will be described below with reference to FIG. FIG. 24 is a flowchart showing the operation of the video decoding device 2. The operation of the moving image decoding apparatus 2 described here is an operation of decoding a texture image and a distance image of the t-th frame from the top in a three-dimensional moving image including a large number of frames. That is, the moving image decoding apparatus 2 repeats the operation described below as many times as the number of frames of the moving image in order to decode the entire moving image. In the following description, unless otherwise specified, each data # 1 to # 28 is interpreted as data at the t-th frame.

First, the unpackaging unit 31 starts from the encoded data # 28 received from the moving image encoding apparatus 1 and encodes the texture image encoded data # 11, the distance image encoded data # 25, the reference flag # 25A, and the picture. Information # 25B is extracted. Then, the unpackaging unit 31 outputs the encoded data # 11 to the image decoding unit 12, and outputs the encoded data # 25, the reference flag # 25A, and the picture information # 25B to the distance value decoding unit 32 (S21, Acquisition step).

The image decoding unit 12 decodes the texture image # 1 ′ from the input encoded data # 11, and sends it to the image division processing unit 21 ′ and a stereoscopic video display device (not shown) outside the moving image decoding device 2. Output (S22). Further, the image decoding unit 12 outputs the picture information # 11A report indicating the type of the selected picture and the reference picture to the distance value decoding unit 32.

Next, the image division processing unit 21 ′ defines a plurality of segments using the same algorithm as the image division processing unit 21 of the moving image encoding device 1.

Then, for each segment, the image division processing unit 21 ′ replaces the pixel value of each pixel included in each segment with a representative value in the raster scan order in the texture image # 1 ′, so that the segment identification image # 21 ′ is generated. The image division processing unit 21 ′ outputs the segment identification image # 21 ′ to the distance value providing unit 33 (S23).

On the other hand, the distance value decoding unit 32 decodes the binary string 1701 described above from the encoded data # 25 of the distance image, the reference flag # 25A, and the picture information # 25B. Further, the distance value decoding unit 32 decodes the segment number and the representative value # 23a from the binary string 1701. Then, the distance value decoding unit 32 outputs the obtained representative value # 23a and segment number # 24 to the distance value giving unit 33 (S24, decoding step).

The distance value assigning unit 33 converts the pixel values of all the pixels in the segment identification image # 21 into the representative value # 23a included in the segment based on the input representative value # 23a and the segment number # 24. Thus, the distance image # 2 ′ is decoded. Then, the distance value assigning unit 33 outputs the distance image # 2 ′ to the above-described stereoscopic video display device (S25, image generation step).

The operation of the video decoding device 2 has been described above. The distance image # 2 ′ decoded by the distance value assigning unit 33 in step S25 is generally the distance image # input to the video encoding device 1. The distance image approximates to 2.

As described above, this is because, from the correlation between the texture image # 1 and the distance image # 2, “when the texture image # 1 ′ is divided into a plurality of segments each composed of a group of pixels having similar colors. This is because it can be said that all or almost all pixels included in a single segment in the distance image # 2 have the same distance value. That is, the distance image # 2 ′ is the same as the image obtained by changing the distance value of a very small part included in the segment in the distance image # 2 to the representative value in the segment. It can be said that the distance image # 2 is approximate.

Also, the moving image transmission system including the moving image encoding device 1 and the moving image decoding device 2 described above also exhibits the above-described effects.

[Embodiment 2]
The following will describe another embodiment of the present invention with reference to FIGS. For convenience of explanation, members having the same functions as those shown in the first embodiment are given the same reference numerals, and explanation thereof is omitted.

This embodiment is different from the first embodiment in that there are a plurality of viewpoints of texture images and distance images corresponding to the texture images. That is, the moving image encoding device 1A according to the present embodiment performs the texture image and distance image encoding processing using the same encoding method as the moving image encoding device 1 of the first embodiment. This is different from the moving image encoding apparatus 1 in that a plurality of sets of texture images and distance images are encoded per frame.

Here, the plurality of sets of texture images and distance images are images of subjects simultaneously captured by cameras and ranging devices installed at a plurality of locations so as to surround the subject. That is, the plurality of sets of texture images and distance images are images for generating a free viewpoint image. Each set of texture images and distance images includes camera parameters such as camera position, direction, and focal length as metadata, along with actual data of the texture images and distance images of the set.

(Configuration of video encoding device)
First, the configuration of the moving picture encoding apparatus 1A will be described with reference to FIG. FIG. 25 is a block diagram showing a main configuration of the moving picture encoding apparatus 1A according to the present embodiment.

As shown in FIG. 25, the moving image encoding apparatus 1A includes an image encoding unit (MVC encoding unit) 11A, an image decoding unit (MVC decoding unit) 12A, a distance image encoding unit 20A, and a packaging unit 28 ′. It has. The distance image encoding unit 20A includes an image division processing unit 21, a distance image division processing unit 22, a distance value correction unit 23, a number assigning unit 24, and a distance value encoding unit (adaptive encoding unit and output unit). 25A.

The image encoding unit 11A performs the same encoding as the image encoding unit 11 described above, but differs in that it compresses and encodes images from a plurality of viewpoints. Specifically, the image encoding unit 11A performs encoding using MVC (Multiview Video Coding). AVC used in the first embodiment is a standard for compressing and encoding video (image) from one viewpoint, whereas MVC is a standard for compressing and encoding multi-view video (image). is there. Therefore, the encoded data # 11 output from the image encoding unit 11A is MVC encoded data.

MVC coding performs the prediction described in the first embodiment even between viewpoints in order to eliminate redundancy between viewpoints. This will be specifically described with reference to FIG. FIG. 26 is a diagram for explaining MVC encoding.

As shown in FIG. 26, with respect to the encoding target image 2301, an image is predicted in block units from the time direction and the viewpoint direction (space direction). Here, the

images

2303 and 2305 can be referred to as images in the time direction, and the

images

2302 and 2304 can be referred to as images in the viewpoint direction.

Since changing the position of the subject in the screen over time is equivalent to changing the position of the subject in the screen depending on the viewpoint, there is a prediction method similar to image prediction in the time direction between viewpoints. Applicable.

Therefore, the same method can be used to eliminate the redundancy between images in the time direction and the redundancy between images in the spatial direction.

Here, when the prediction in the temporal direction and the spatial direction as described above is performed, a reference destination image is generated in the spatial direction as in the temporal direction. Up to two reference images in the spatial direction can be referred to.

Therefore, as described above, from the image encoding unit 11A, as in the first embodiment, information on the picture type, the reference destination picture number, and the reference destination viewpoint number is stored in the distance value encoding unit 25A. Is output.

Similar to the image decoding unit 12, the image decoding unit 12A decodes the texture image # 1 ′ from the encoded data # 11 of the texture image # 1 obtained from the image encoding unit 11A. Then, the texture image # 1 ′ is output to the image division processing unit 21.

Similar to the distance value encoding unit 25, the distance value encoding unit 25A performs compression encoding processing on the data in which the segment number # 24 and the representative value # 23a are associated, and obtains the obtained encoded data # 25. Output to the packaging unit 28 '.

The packaging unit 28 'encodes the encoded data # 11 (-1 to -N) of the texture images # 1-1 to # 1-N and the encoded data # 25 (- 1 to -N), reference flag # 25A (-1 to -N), and picture information # 25B (-1 to -N) are integrated to generate encoded data # 28 '. Then, the packaging unit 28 ′ transmits the generated encoded data # 28 ′ to the video decoding device 2A.

(Configuration of video decoding device)
Next, the configuration of the moving picture decoding apparatus 2A according to the present embodiment will be described with reference to FIG. FIG. 27 is a block diagram showing a main configuration of the moving picture decoding apparatus 2A.

As shown in FIG. 27, the moving image decoding apparatus 2A includes an image decoding unit 12A, an image division processing unit 21 ′, an unpackaging unit 31 ′, a distance value decoding unit 32A, and a distance value giving unit 33. .

Upon receiving the encoded data 28 ′, the unpackaging unit 31 ′ receives the encoded data # 11 (−1 to −N), the encoded data # 25 (−1 to −N), and the reference flag # 25A ( -1 to -N) and picture information # 25B (-1 to -N) are extracted, and the encoded data # 11 is output to the image decoding unit 12, and the encoded data # 25 is output to the distance value decoding unit 32. Is.

Other configurations are the same as those of the moving image decoding apparatus 2 except that there are a plurality of viewpoints of texture images and distance images. Also in the moving image decoding apparatus 2A, the reference image is determined according to the algorithm shown in FIG. 24, the distance value is decoded by reusing the code book, and the distance image is restored.

The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the embodiments can be obtained by appropriately combining technical means disclosed in different embodiments. The form is also included in the technical scope of the present invention.

(Application examples)
The moving picture decoding apparatus 1 and the moving picture encoding apparatus 2 described above can be used by being mounted on various apparatuses that perform moving picture transmission, reception, recording, and reproduction.

First, it will be described with reference to FIG. 30 that the moving picture decoding apparatus 1 and the moving picture encoding apparatus 2 described above can be used for transmission and reception of moving pictures.

FIG. 30 (a) is a block diagram showing a configuration of a transmission apparatus A in which the moving picture encoding apparatus 2 is mounted. As shown in FIG. 30 (a), the transmitting apparatus A encodes a moving image, obtains encoded data, and modulates a carrier wave with the encoded data obtained by the encoding unit A1. A modulation unit A2 that obtains a modulation signal by the transmission unit A2 and a transmission unit A3 that transmits the modulation signal obtained by the modulation unit A2. The moving image encoding device 2 described above is used as the encoding unit A1.

The transmission apparatus A has a camera A4 that captures a moving image, a recording medium A5 that records the moving image, and an input terminal A6 for inputting the moving image from the outside as a supply source of the moving image input to the encoding unit A1. May be further provided. FIG. 30A illustrates a configuration in which the transmission apparatus A includes all of these, but some of them may be omitted.

The recording medium A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium A5 according to the recording encoding method may be interposed between the recording medium A5 and the encoding unit A1.

FIG. 30B is a block diagram illustrating a configuration of the receiving device B on which the moving image decoding device 1 is mounted. As illustrated in FIG. 30B, the receiving device B includes a receiving unit B1 that receives a modulated signal, a demodulating unit B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit B1, and a demodulating unit. A decoding unit B3 that obtains a moving image by decoding the encoded data obtained by B2. The moving picture decoding apparatus 1 described above is used as the decoding unit B3.

The receiving apparatus B has a display B4 for displaying a moving image, a recording medium B5 for recording the moving image, and an output terminal for outputting the moving image as a supply destination of the moving image output from the decoding unit B3. B6 may be further provided. FIG. 30B illustrates a configuration in which the receiving apparatus B includes all of these, but a part of the configuration may be omitted.

Note that the recording medium B5 may be for recording an unencoded moving image, or is encoded by a recording encoding method different from the transmission encoding method. May be. In the latter case, an encoding unit (not shown) that encodes the moving image acquired from the decoding unit B3 in accordance with the recording encoding method may be interposed between the decoding unit B3 and the recording medium B5.

Note that the transmission medium for transmitting the modulation signal may be wireless or wired. Further, the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.

For example, a terrestrial digital broadcast broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) is an example of a transmitting device A / receiving device B that transmits and receives a modulated signal by wireless broadcasting. A broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) for cable television broadcasting is an example of a transmitting device A / receiving device B that transmits and receives a modulated signal by cable broadcasting.

Also, a server (workstation etc.) / Client (television receiver, personal computer, smart phone etc.) such as VOD (Video On Demand) service and video sharing service using the Internet is a transmitting device for transmitting and receiving modulated signals by communication. This is an example of A / reception device B (usually, either wireless or wired is used as a transmission medium in a LAN, and wired is used as a transmission medium in a WAN). Here, the personal computer includes a desktop PC, a laptop PC, and a tablet PC. The smartphone also includes a multi-function mobile phone terminal.

In addition to the function of decoding the encoded data downloaded from the server and displaying it on the display, the video sharing service client has a function of encoding a moving image captured by the camera and uploading it to the server. That is, the client of the video sharing service functions as both the transmission device A and the reception device B.

Next, the fact that the above-described moving picture decoding apparatus 1 and moving picture encoding apparatus 2 can be used for recording and reproduction of moving pictures will be described with reference to FIG.

FIG. 31A is a block diagram showing a configuration of a recording apparatus C equipped with the moving picture decoding apparatus 1 described above. As shown in FIG. 31 (a), the recording device C encodes a moving image to obtain encoded data, and writes the encoded data obtained by the encoding unit C1 to the recording medium M. And a writing unit C2. The moving image encoding device 2 described above is used as the encoding unit C1.

The recording medium M may be of a type built in the recording device C, such as (1) HDD (Hard Disk Drive) or SSD (Solid State Drive), or (2) SD memory. It may be of the type connected to the recording device C, such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray Disk: registration) (Trademark) or the like may be mounted on a drive device (not shown) built in the recording apparatus C.

In addition, the recording apparatus C receives a moving image as a supply source of the moving image input to the encoding unit C1, a camera C3 that captures the moving image, an input terminal C4 for inputting the moving image from the outside, and the moving image. The receiving section C5 may be further provided. FIG. 31A illustrates a configuration in which the recording apparatus C includes all of these, but some of them may be omitted.

The receiving unit C5 may receive an unencoded moving image, or receives encoded data encoded by a transmission encoding method different from the recording encoding method. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit C5 and the encoding unit C1.

Examples of such a recording device C include a DVD recorder, a BD recorder, and an HD (Hard Disk) recorder (in this case, the input terminal C4 or the receiving unit C5 is a main source of moving images). In addition, a camcorder (in this case, the camera C3 is a main source of moving images), a personal computer (in this case, the receiving unit C5 is a main source of moving images), a smartphone (in this case, the camera C3 or The receiving unit C5 is a main source of moving images) is an example of such a recording apparatus C.

FIG. 31 (b) is a block diagram showing the configuration of the playback device D on which the above-described moving image decoding device 1 is mounted. As shown in FIG. 31 (b), the playback device D obtains a moving image by decoding the read data D1 read by the read unit D1 and the read data read by the read unit D1. And a decoding unit D2. The moving picture decoding apparatus 1 described above is used as the decoding unit D2.

The recording medium M may be of a type built in the playback device D such as (1) HDD or SSD, or (2) such as an SD memory card or USB flash memory. It may be of a type connected to the playback device D, or (3) may be loaded into a drive device (not shown) built in the playback device D, such as DVD or BD. Good.

Further, the playback device D has a display D3 for displaying a moving image, an output terminal D4 for outputting the moving image to the outside, and a transmitting unit for transmitting the moving image as a supply destination of the moving image output by the decoding unit D2. D5 may be further provided. FIG. 31B illustrates a configuration in which the playback apparatus D includes all of these, but a part of the configuration may be omitted.

The transmission unit D5 may transmit a non-encoded moving image, or transmits encoded data encoded by a transmission encoding method different from the recording encoding method. You may do. In the latter case, an encoding unit (not shown) that encodes a moving image with a transmission encoding method may be interposed between the decoding unit D2 and the transmission unit D5.

Examples of such a playback device D include a DVD player, a BD player, and an HDD player (in this case, an output terminal D4 to which a television receiver or the like is connected is a main moving image supply destination). . In addition, a television receiver (in this case, the display D3 is a main destination of moving images), a desktop PC (in this case, the output terminal D4 or the transmission unit D5 is a main destination of moving images), A laptop or tablet PC (in this case, the display D3 or the transmission unit D5 is a main destination of moving images), a smartphone (in this case, the display D3 or the transmission unit D5 is a main destination of moving images) ) Is an example of such a reproducing apparatus D.

As described above, the moving image coding apparatus 1 according to the present invention includes the distance image division processing unit 22 that divides each frame image of a moving image into a plurality of regions, and the representative of each region divided by the distance image division processing unit 22. A number assigning unit 24 for determining a value, and an adaptive codebook in which a sequence obtained by associating a number sequence in which the representative values determined by the number assigning unit 24 are arranged in a predetermined order is associated with a code sequence and a codeword Adaptive coding to be encoded and representative values determined by the number assigning unit 24 are arranged in a predetermined order, and code words having different numbers of bits depending on the appearance rate of the representative values in the frame image are assigned to the representative values. A distance value encoding unit 25 that selects any one of the static encoding to encode and generates encoded data of the frame image using the selected encoding method.

As described above, the moving image encoding apparatus according to the present invention is a moving image encoding apparatus that encodes a moving image, and an image dividing unit that divides each frame image of the moving image into a plurality of regions; A representative value determining means for determining a representative value of each area divided by the image dividing means, and a sequence of representative values determined by the representative value determining means for each frame image, in a predetermined order, as a sequence pattern Adaptive coding for adaptively updating and coding an adaptive codebook associated with a codeword, and for each frame image, the representative values determined by the representative value determining means are arranged in a predetermined order, Encoding to generate encoded data by performing at least one of static encoding in which each representative value is encoded by allocating a codeword having a different number of bits depending on the appearance rate of the representative value in the frame image. Means and Coding method selection means for selecting either the adaptive coding or the static coding for each frame image, and the coding means selects the code selected by the coding method selection means. The frame image is encoded using an encoding method to generate encoded data.

In the video encoding device according to the present invention, when the adaptive encoding is performed, the appearance rate of each representative value in the frame image to be adaptively encoded is calculated, and each representative value is calculated based on the calculated appearance rate. A static codebook creating means for determining the number of bits of a codeword to be assigned to a value and creating a static codebook in which a representative value and a codeword having the determined number of bits are associated with each other; When static coding is selected, the coding means uses the static codebook created by the static codebook creating means when adaptively coding the previous frame image, and The frame image may be statically encoded.

According to the above configuration, a static codebook created previously can be used when performing static encoding. Therefore, it is not necessary to newly perform a process for creating a static codebook, and the efficiency of the static encoding process can be improved.

In the moving picture coding apparatus according to the present invention, the coding method selection means performs coding in which an amount of information after coding of a frame image to be coded is reduced, between adaptive coding and static coding. A method may be selected.

According to the above configuration, encoding is performed using an encoding method in which the information amount of encoded data after encoding is small. Therefore, encoding can be performed with an encoding method having a higher compression rate.

In the moving picture coding apparatus according to the present invention, the static code book creating means creates a static code book every time a frame image is adaptively coded, and the coding method selecting means is static. When the static coding is selected, the coding means has the smallest amount of information after coding the frame image to be coded among the plurality of static codebooks created by the static codebook creating means. A static codebook may be used to perform static encoding of a frame image to be encoded.

According to the above configuration, static coding can be performed using a code book having the smallest information amount of coded data after coding among a plurality of static code books. Therefore, encoding with a higher compression rate can be performed.

In the moving picture coding apparatus according to the present invention, the static code book creating means holds the created static code book, and when the number of held static code books exceeds a predetermined number, the static code book is discarded in the oldest order. You may do.

According to the above configuration, when the number of retained static codebooks exceeds a predetermined number, the oldest codebooks are discarded in the oldest order. As a result, it is possible to prevent the number of static codebooks held from increasing as much as possible, and to prevent the storage capacity from being compressed.

In the moving image encoding apparatus according to the present invention, the moving image is a moving image of a plurality of viewpoints, and the encoding means is configured such that the static codebook generating means adaptively encodes frame images of different viewpoints. Among the plurality of static codebooks created at the time, using the static codebook that minimizes the amount of information after the encoding of the encoding target frame image, and statically encoding the encoding target frame image It may be what performs.

According to the above configuration, when there is a static codebook corresponding to a plurality of viewpoint frame images, static encoding is performed using the static codebook that minimizes the amount of information after encoding. Therefore, encoding with a higher compression rate can be performed.

In the moving picture coding apparatus according to the present invention, the coding means includes a static codebook to which a codeword corresponding to each representative value of a frame image to be coded is not assigned when performing static coding. Other static codebooks may be used.

According to the above configuration, encoding can be performed using a static codebook that can be used for static encoding.

In the moving image encoding apparatus according to the present invention, the representative value is a numerical value included in a predetermined range, and the static codebook creating means includes a static code among the numerical values included in the predetermined range. A static code book in which codewords are assigned to numerical values different from the representative values in the frame image to be created may be created.

According to the above configuration, a static code book is created in which a code word is associated with a numerical value that is not a representative value. Therefore, it can be prevented that static coding cannot be performed because there is no corresponding code word.

The moving picture decoding apparatus according to the present invention divides each frame image of a moving picture into a plurality of areas, and associates a sequence pattern and a code word with a number sequence in which representative values of each area are arranged in a predetermined order. Adaptive coding that adaptively updates and encodes the attached codebook, or the representative values are arranged in a predetermined order, and each representative value has a bit number depending on the appearance rate of the representative value in the frame image. A moving image decoding apparatus for decoding image encoded data, which is data encoded by any one of static encoding for allocating and encoding different codewords, the image encoded data and the image encoding The image code corresponding to the frame image in an acquisition unit that acquires encoding information that is information indicating a data encoding method and a decoding method corresponding to the encoding method indicated by the encoding information acquired by the acquisition unit For each data, each frame image of the moving image is generated from decoding means for decoding the encoded image data to generate decoded data, decoded data generated by the decoding means, and information indicating the region. And an image generation means.

In the video decoding device according to the present invention, the decoding means calculates the appearance rate of each representative value from the decoded data generated when adaptive decoding, which is a decoding method corresponding to adaptive encoding, is performed. Static codebook creation means for determining the number of bits of a codeword to be assigned to each representative value according to the calculated appearance rate and creating a static codebook in which the representative value and the codeword of the determined number of bits are associated with each other And the decoding means uses the static codebook created from the decoded data generated when the static codebook creating means adaptively decodes the previous encoded image data. The image encoded data may be statically decoded corresponding to the static encoding.

According to the above configuration, a static codebook created previously can be used when performing static decoding. Therefore, it is not necessary to newly perform a process for creating a static codebook, and the efficiency of the static decoding process can be improved.

In the moving image decoding apparatus according to the present invention, when the image encoded data is statically encoded, the acquisition means includes a codebook used when the image encoded data is statically encoded. The image specifying information indicating the generated frame image is acquired, and the decoding means generates the static codebook generating means generated when the image encoded data of the frame image indicated by the image specifying information is adaptively decoded. A static code book may be used to perform static decoding of the statically encoded image encoded data.

According to the above configuration, it is possible to identify the frame image in which the static codebook used when statically encoded is created, so when adaptively decoding the encoded image data of the frame image Static decoding can be performed using the static codebook created in (1).

Therefore, it is possible to appropriately select and decode a static codebook used for static decoding.

In the moving picture decoding apparatus according to the present invention, the static codebook creation means holds the created static codebook, and when the number of held static codebooks exceeds a predetermined number, the static codebook creation means discards them in the oldest order. It may be a thing.

In the moving picture decoding apparatus according to the present invention, the representative value is a numerical value included in a predetermined range, and the static codebook creating means includes a static codebook among the numerical values included in the predetermined range. A static codebook to which codewords are assigned may be created for numerical values not included in the decoded data to be created.

According to the above configuration, a static codebook is created in which a numerical value that is not a representative value is associated with the codeword. Therefore, it can be prevented that static coding cannot be performed because there is no corresponding code word.

The moving image transmission system including the moving image encoding device and the moving image decoding device can achieve the effects described above.

(Configuration by software)
Finally, each block of the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A), particularly the image encoding unit 11 (11A), the image decoding unit 12, and the distance image encoding unit 20 (20A) ( Image division processing unit 21 (21 ′), distance image division processing unit 22, distance value correction unit 23, number assigning unit 24, distance value encoding unit 25 (25A)), distance value decoding unit 32, distance value providing unit 33 May be realized in hardware by a logic circuit formed on an integrated circuit (IC chip), or may be realized in software using a CPU (central processing unit).

In the latter case, the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A) include a CPU that executes instructions of a control program for realizing each function, a ROM (read only memory) that stores the program, A RAM (random access memory) for expanding the program and a storage device (recording medium) such as a memory for storing the program and various data are provided. The object of the present invention is to provide program codes (execution format program, intermediate code program, control code) of the video encoding device 1 (1A) and video decoding device 2 (2A) that are software for realizing the functions described above. A recording medium in which a source program is recorded so as to be readable by a computer is supplied to the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A), and the computer (or CPU or MPU (microprocessor unit)) ) Can also be achieved by reading and executing the program code recorded on the recording medium.

Examples of the recording medium include tapes such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy (registered trademark) disk / hard disk, a CD-ROM (compact disk-read-only memory) / MO (magneto-optical) / Disks including optical disks such as MD (Mini Disc) / DVD (digital versatile disk) / CD-R (CD Recordable), cards such as IC cards (including memory cards) / optical cards, mask ROM / EPROM (erasable Programmable read-only memory) / EEPROM (electrically erasable and programmable read-only memory) / semiconductor memory such as flash ROM, or logic circuits such as PLD (Programmable logic device) and FPGA (Field Programmable Gate Array) be able to.

Further, the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A) may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited as long as it can transmit the program code. For example, the Internet, intranet, extranet, LAN (local area network), ISDN (integrated service areas digital network), VAN (value-added network), CATV (community antenna network) communication network, virtual private network (virtual private network), A telephone line network, a mobile communication network, a satellite communication network, etc. can be used. The transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type. For example, IEEE (institute of electrical and electronic engineers) 1394, USB, power line carrier, cable TV line, telephone line, ADSL (asynchronous digital subscriber loop) line, etc. wired such as IrDA (infrared data association) or remote control , Bluetooth (registered trademark), IEEE 802.11 wireless, HDR (high data rate), NFC (Near Field Communication), DLNA (Digital Living Network Alliance), mobile phone network, satellite line, terrestrial digital network, etc. Is possible. The present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.

The present invention can be suitably applied to a content generation device that generates 3D-compatible content, a content playback device that plays back 3D-compatible content, and the like.

1 video encoding device (video encoding device)
2 Video decoding device (video decoding device)
DESCRIPTION OF

SYMBOLS

11, 11A Image encoding part 12 Image decoding part 22 Distance image division process part (image division means)
24 Numbering unit (representative value determining means)
25, 25A Distance value encoding unit (encoding method selection means, encoding means, static codebook creation means)
31 Unpacking part (acquisition means)
32 Distance value decoding unit (decoding means, static codebook creation means)
33 Distance value assigning unit (image generating means)

Claims

A moving image encoding device for encoding a moving image,
Image dividing means for dividing each frame image of the moving image into a plurality of regions;
Representative value determining means for determining a representative value of each area divided by the image dividing means;
For each frame image, a numerical sequence in which the representative values determined by the representative value determining means are arranged in a predetermined order is encoded by adaptively updating an adaptive codebook in which the sequence pattern and the code word are associated with each other. Adaptive coding,
For each frame image, the representative values determined by the representative value determining means are arranged in a predetermined order, and each representative value is encoded by assigning a code word having a different number of bits depending on the appearance rate of the representative value in the frame image. Encoding means for generating encoded data by performing at least one of static encoding and
Coding method selection means for selecting either the adaptive coding or the static coding for each frame image;
The moving picture encoding apparatus, wherein the encoding means encodes the frame image using the encoding system selected by the encoding system selection means and generates encoded data.
When the adaptive encoding is performed, the appearance rate of each representative value in the adaptive encoding target frame image is calculated, and the number of codeword bits to be assigned to each representative value is determined based on the calculated appearance rate. , Comprising a static codebook creating means for creating a static codebook in which a representative value and a codeword of the determined number of bits are associated,
When the encoding method selection unit selects static encoding, the encoding unit selects the static codebook created by the static codebook creation unit when adaptively encoding the previous frame image. The moving image encoding apparatus according to claim 1, wherein the frame image to be encoded is statically encoded.
The encoding method selecting means selects an encoding method that reduces an amount of information after encoding of a frame image to be encoded, from adaptive encoding and static encoding. 2. The moving image encoding apparatus according to 2.
The static code book creating means creates a static code book every time a frame image is adaptively encoded,
When the encoding method selection unit selects static encoding, the encoding unit encodes a frame image to be encoded among a plurality of static codebooks created by the static codebook creation unit. 4. The moving picture encoding apparatus according to claim 2, wherein the encoding of the frame image to be encoded is performed by using a static codebook with the smallest amount of information later.
5. The static code book creating means holds created static code books, and when the number of held static code books exceeds a predetermined number, the static code book creation means discards the oldest code books in the oldest order. Video encoding device.
The moving image is a moving image of a plurality of viewpoints,
The encoding unit is configured to encode a frame image to be encoded among a plurality of static codebooks generated when the static codebook generation unit adaptively encodes frame images of different viewpoints. 6. The moving picture coding apparatus according to claim 4 or 5, wherein static coding is performed on a frame image to be coded using a static codebook with the smallest amount of information.
The encoding means uses a static codebook other than a static codebook to which a codeword corresponding to each representative value of a frame image to be encoded is not assigned when performing static encoding. The moving picture encoding apparatus according to any one of claims 2 to 6.
The representative value is a numerical value included in a predetermined range,
The static code book creating means is a static code in which codewords are assigned to numerical values that are different from the representative values in the frame image for which a static code book is to be created among the numerical values included in the predetermined range. The moving picture coding apparatus according to any one of claims 2 to 7, wherein a book is created.
Each frame image of a moving image is divided into a plurality of regions, and a code book in which a sequence pattern and a code word are associated with a number sequence in which representative values of each region are arranged in a predetermined order is adaptively updated and encoded. Adaptive coding to be performed, or static codes in which the representative values are arranged in a predetermined order, and code words having different numbers of bits are assigned to the representative values according to the appearance rate of the representative values in the frame image. A video decoding device that decodes image encoded data that is data encoded by any of the following:
Acquisition means for acquiring the image encoded data and encoded information which is information indicating an encoding method of the image encoded data;
Decoding that decodes the image encoded data and generates decoded data for each of the image encoded data corresponding to the frame image in a decoding method corresponding to the encoding method indicated by the encoding information acquired by the acquisition means Means,
A moving picture decoding apparatus comprising: image generation means for generating each frame image of the moving picture from decoded data generated by the decoding means and information indicating the region.
The decoding means calculates an appearance rate of each representative value from decoded data generated when adaptive decoding, which is a decoding scheme corresponding to adaptive encoding, and each representative value is calculated based on the calculated appearance rate. A static codebook creating means for determining the number of bits of a codeword to be assigned to and creating a static codebook in which a representative value and a codeword of the determined number of bits are associated,
The decoding means uses the static codebook created from the decoded data generated by the static codebook creating means when the decoding means adaptively decodes the previous image encoded data, and the decoding target image The moving picture decoding apparatus according to claim 9, wherein the encoded data is statically decoded corresponding to the static encoding.
When the image encoded data is statically encoded, the acquisition means includes image specifying information indicating a frame image in which a code book used when the image encoded data is statically encoded is created. Get
The decoding means uses the static codebook created by the static codebook creation means when adaptively decoding the coded image data of the frame image indicated by the image specifying information, and performs the static coding. The moving image decoding apparatus according to claim 10, wherein static decoding of the encoded image data is performed.
12. The static code book creating means holds a created static code book, and when the number of held static code books exceeds a predetermined number, the static code book creating means discards the oldest code book in the oldest order. Video decoding device.
The representative value is a numerical value included in a predetermined range,
The static codebook creating means creates a static codebook in which codewords are assigned to numerical values that are not included in the decoded data for which the static codebook is to be created, among the numerical values included in the predetermined range. The moving picture decoding apparatus according to any one of claims 10 to 12, characterized by:
A moving picture transmission system including the moving picture encoding apparatus according to any one of claims 1 to 8 and the moving picture decoding apparatus according to any one of claims 9 to 13.
A method for controlling a moving image encoding apparatus for encoding a moving image, comprising:
In the above video encoding device,
An image dividing step of dividing each frame image of the moving image into a plurality of regions;
A representative value determining step for determining a representative value of each region divided in the image dividing step;
Adaptive encoding that adaptively updates and encodes an adaptive codebook in which a sequence pattern and a codeword are associated with a number sequence in which representative values determined in the representative value determination step are arranged in a predetermined order;
Static coding in which the representative values determined in the representative value determining step are arranged in a predetermined order, and each representative value is assigned with a codeword having a different number of bits depending on the appearance rate of the representative value in the frame image; An encoding method selection step for selecting any one of the frame images,
It performs at least one of the adaptive encoding and the static encoding, and encodes the frame image using the encoding method selected in the encoding method selection step. A method for controlling a moving picture encoding apparatus, comprising: an encoding step for generating encoded data.
Each frame image of a moving image is divided into a plurality of regions, and a code book in which a sequence pattern and a code word are associated with a number sequence in which representative values of each region are arranged in a predetermined order is adaptively updated and encoded. Adaptive coding to be performed, or static codes in which the representative values are arranged in a predetermined order, and code words having different numbers of bits are assigned to the representative values according to the appearance rate of the representative values in the frame image. A method of controlling a moving image decoding apparatus for decoding image encoded data which is data encoded by any of the following:
In the video decoding device,
An acquisition step of acquiring the image encoded data and encoded information which is information indicating an encoding method of the image encoded data;
Decoding that decodes the image encoded data and generates decoded data for each of the image encoded data corresponding to the frame image in a decoding method corresponding to the encoding method indicated by the encoding information acquired in the acquisition step Steps,
A method for controlling a moving image decoding apparatus, comprising: an image generation step for generating each frame image of the moving image from the decoded data generated in the decoding step and information indicating the region.
A control program for a moving picture coding apparatus for operating the moving picture coding apparatus according to any one of claims 1 to 8, wherein the moving picture coding apparatus is for causing a computer to function as each of the above means. Control program.
14. A moving picture decoding apparatus control program for operating the moving picture decoding apparatus according to any one of claims 9 to 13, wherein the moving picture decoding apparatus control program causes a computer to function as each of the means described above.
A computer-readable recording medium on which the control program according to at least one of claims 17 and 18 is recorded.